Automating Network Administration, Part Oneby Luke A. Kanies
I'm a sysadmin; it's my job to make sure my company's servers are doing what they're supposed to be doing when they are supposed to be doing it, and it's my job to solve any problems that interfere with that.
This job has given me what I call my Big Red Button Dream(tm): I dream of a separate entrance to my own office, with all of the monitors, servers, workstations, and whatever I need to do my job. No one sees me enter, no one sees me leave, no one knows if I'm working or sleeping. But when anything anywhere on the network breaks, a Big Red Button on the wall starts flashing to indicate a problem. In order to solve this problem, whether it's a service outage or a new server to build and deploy, I must reach over and smack that Big Red Button. This solves the problem, and I can go back to doing whatever it is that I am or am not doing. No one knows that all I do is push a button to fix every problem out there; all they know is that with me on the job, the systems never get in the way of the work they are supposed to do. Of course, the next step in the dream is to delegate the actual smacking of the button to someone else, but that requires there be someone else in my sysadmin cell, and it all kind of breaks down then.
This qualifies as a dream and not a goal because it is clearly unattainable, but maintaining this dream as a guiding ideal does a lot to keep me on what I see as the right track in my job as a sysadmin. In this two-part series, I hope to talk about how I use planning and automation in my quest to achieve this ideal, and specifically, why I begin both at the earliest possible point: when a server is built. Because not everyone truly understands the fundamental importance of planning and automation, the first part of this series will go through an explanation of the benefits of dedicating your life as a sysadmin to planning and automating everything you do, and the second part will focus on planning and automating the server build process.
Although I see automation as being contingent on planning, it is only when automation is attempted without planning that it becomes obvious how important the planning is. Therefore, I will discuss automation first, and that discussion will hopefully enlighten us as to the importance of planning.
From my perspective, automation provides five main benefits:
Reducing the amount of time a given task requires.
Automating a task means that less time is required each time that task is performed, which leaves more time to devote to other tasks, such as automation.
Reducing the opportunity for error in a given task.
Most tasks have to be done in certain ways, and leaving it to humans to perform those tasks leaves the chance that those humans will perform the task incompletely or incorrectly, or will break something essentially unrelated. When a task is automated, a preferred way can be found and the task can then be performed that way every time, essentially eliminating the chance for error in that specific task, as long as the automation was thoroughly planned and tested.
Reducing turnaround time for a given task.
While leaving more time for other work, automation also means that most work gets done faster. This is important in many situations, particularly while firefighting (solving service outages), performing work on production systems during short maintenance windows, or satisfying short project timelines. It is often worth spending more total time automating a task before it is needed because of the reduced time it takes to actually perform a task -- if it takes you twenty hours of scripting to successfully automate a four-hour task, but as a result you are able to fit it entirely within your server's two-hour maintenance window, then it was well worth the effort. This is, again, usually not possible without thorough planning and testing, which is often a significant portion of the automation time.
Enhancing and perpetuating configuration consistency across multiple systems.
In addition to humans potentially introducing error when they work, they also introduce something possibly more nefarious: individuality. Because they often cause outages of some kind, errors are usually caught and fixed, but when multiple people perform the same task in different but equally correct ways, there is no outage to catch. The problem with this situation is that once multiple people have started to do the same thing in different ways, system consistency is sacrificed. Once a network lacks overall consistency, it is far more difficult to come behind and automate. This situation also often ends up in a catch-22 of not enough to consistency to allow automation but a lack of automation causing consistency to deteriorate. In addition to making networks harder to automate, a lack of consistency also makes networks significantly more difficult to administer in general, because all the exceptions have to be kept in mind when any work is performed.
Providing a limited kind of process documentation.
Last but not least, an automated task is a documented task. It might not be well-documented (although hopefully the code is well-commented, at the least), but even if the person who did the automation leaves the company, you can still go behind and read the scripts. This is far superior to information leaving with an employee, and also provides a starting point for other employees to begin learning the process involved.
This is quite a lot, so you shouldn't need much more convincing. But in addition to these benefits, which I consider to be fundamental to automation and the main reasons for concentrating on it, automation allows you to package up a complex, senior-level task and delegate it to someone lower on the food chain. This provides the lower-level employee the opportunity to fully understand the task by reading and using the script, and it leaves the senior-level employee time for more important tasks, such as automation. Another great thing about automation is that it builds on itself; the more you automate the small tasks, the more you can build tools which automate the automation. This is obviously how my ideal of the Big Red Button happens: the button is the top of a very large pyramid of automation tools, set up so it can diagnose and solve any problem anywhere on the network.
As with all things, there are some caveats. Automation rewards in proportion to the complexity, repetition, or time consumption of a task, which means that sometimes automation ends up taking more time than it saves. Also, most of us are unfortunately hired into companies which already have computers, which often means that we walk into a situation with little or no consistency to start with; when this happens, we usually have to spend a significant amount of time bringing consistency to the network just to get to the point where we can start automating. This sometimes puts the benefits of automation far enough away so as to seem not worth it. Lastly, all automation requires significant testing, because by the time you notice there's a problem with your automation tool, it's usually too late to cancel it, and that is too large of a risk to take on a production system.
In the end, though, hopefully you'll see that, in the big picture, automation almost always profits you more than it costs. If you start by only automating the tasks that you get the biggest return on and then work your way down, you will soon find that there are only a few mundane tasks left to automate. Automating those last few tasks does take more time than it specifically saves, but now with that final automation, you have basically automated all of the low-level tasks on your network, and suddenly the whole is greater than the sum of its parts: instead of having to think in concrete terms about each task on your network, your tools provide an abstraction layer between yourself and the work you must do. This abstraction layer provides you a means of changing the way you think about your work -- instead of the network defining how you work, your tools do. Hopefully you've developed your tools to work the way you want them to, but if you haven't, you can reorganize how those tools work without actually impacting the underlying work they do -- this is the real benefit of this abstraction layer that the tools provide.
Pages: 1, 2