When I defined what DevOps engineering is, I listed some of the things you should look out for when working in industry. The point I made on “people not understanding what you are trying to achieve” requires a greater discussion and possibly a more comprehensive solution

The fear people have of automation

It might be Skynet or that robots look scary, but it’s most definitely coming whether you like it or not

The biggest fear people have with automation and particularly the automation I build in DevOps engineering, is that they feel the automation will make them redundant and they’ll be out of a job

You should be scared if your job can be completely automated. Why would a company hire you and deal with all sorts of HR hassles when they could just deploy an application that works all-day and all-night and never makes any mistakes?

This does not mean that certain types of people are immune to this. There is even an argument that code could write itself in future and then even software developers will be out of a job

Watch out for

What I’ve noticed in my career is that when people advocate for the manual or human involved approach, it’s due to these reasons:

They have a misunderstanding of how automation works
The fear is to overwhelming for them and they are desperate to hold onto their job

Don’t let them succeed in what they are advocating for. If they opt for a manual approach for tackling the problem, pushback. If they present arguments for something in the lines of: “routine tasks are just part of daily business”, pushback even harder

The automation approach needs to win. It’s good for the business (reduces cost) and it’s good for efficiency (prevents mistakes and human burn-out)

Misunderstandings people have

These are the two most common misunderstandings people have regarding automation. They are both very wrong and just demonstrate a big ignorance on the part of the people that hold these invalid opinions

“The logic is to complicated”

People that have been performing the task manually, for a while, may try to argue that the task's logic is too complicated. I’ve heard this been argued multiple times before and after I’ve analysed the task and observed what the logic is, I’ve built the automation and proven them wrong

It’s extremely easy to determine if software can perform the tasks that humans would otherwise do. Any task that follows a series of steps, notices patterns in output or is predictable in what you’d expect or not to expect, is all very compatible with completely being automated

Consider the following logic:

You notice an issue
You start typing some commands or you login to a system and view some information
Based on the output of the command, you type some other commands or based on what you read you click a button
You keep repeating these steps or you follow a similar process until you feel you have fixed the issue
You verify if the issue still exists, if it does you repeat a similar process. If it does not, you remember what you did for next time

The logic above is very well suited for being described in code

Point 1 can be substituted with an event-driven alerting system that supports the concept of webhooks (GCP Cloud Monitoring and AWS CloudWatch support this concept) or some type of triggering mechanism (this can even be email if it’s the only option). The cron scheduler is well suited for polling on a regular frequency if don’t opt for being event-driven

Points 2 to 5 can all be handled by any modern programming language. All the languages I use are capable of executing system commands, performing string manipulation and pattern matching. Furthermore, all major cloud providers offer very comprehensive APIs. It’s simple to integrate your automation into these cloud APIs and read all the information you require in order to make decisions

“The app might fail or not perform the critical task correctly”

The best thing about computers is that they do exactly what you tell them to do and nothing more

One of the attributes of well written software is that it has catered for all the cases or scenarios it can be used in and that it has effective error handling for when it encounters an issue

When you are building automation, it’s vital that you have analysed the problem and completely understand the task’s logic before you start and it’s probably likely that you won’t write the perfect automation in the first attempt. What is equally as vital is that your automation has effective error handling

The error handling needs to have the following:

Log all exceptions and tie these back to precise timestamps
Keep a trace of everything the automation performed, including all input and output data and a stack trace
Alert the humans that the automation has possibly failed (alerts via email, SMS and mobile push notifications etc)
If you can, use the functional programming paradigm (no mutable state, compose functions to describe the computation) as to attempt to avoid possible problems

Now it’s just a matter of going through the process of iterative refinement until the automation is where it needs to be

Personal examples of big-wins in automation

My entire career has been around improving the efficiency of technical processes and automating routine tasks. I’d look at tasks that are typically done on a routine basis and analyse how they work (determine the logic) and ultimately write an application that will perform this task automatically. This does typically remove the human from the process and this is a good thing! (they don’t lose their job, continue reading…)

Routine operating system patch management

I was able to automate a big part of what the operations team would otherwise routinely perform for operating system patch management

The logic I observed looked like this:

Login to the AWS management console and remove a target-group (which they were two of) from the application load-balancer
Wait 1 or 2 minutes for all the previous HTTP requests to return responses
SSH into each of the GNU/Linux servers in the target-group
Run the commands to update the installed operating system packages
Restart the servers
Send in some “warm-up” HTTP requests
Check the target-group back into the application load-balancer
Repeat the exact process with the other target-group

My initial thoughts were that it should be very easy to automate and it turned out to be exactly that. I watched someone on the operations team perform this task once just to ensure I had the correct idea of exactly how this worked

AWS has very comprehensive APIs wrapped around all of their infrastructure resources and they ship very extensive client libraries (so integration is often very easy to do)

It’s also very easy to execute the apt-get commands and have them run in non-interactive mode. I could also place a more sensitive emphasis on specific packages and ensure that only minor versions of these packages are updated (major versions would require the approval of a human that I’d assume has done the appropriate testing beforehand)

The application was split into two main components:

The PHP controller logic (used for more refined work like deregistering hosts from the load-balancers target-groups)
The BASH script wrappers around the GNU/Linux distributions package management (in this case, Apt)

The PHP controller was invoked by the cron scheduler (the task only needed to be performed weekly) and I defined a multidimensional associative array that contained the target-group names and which hosts where included in them

The PHP would then SSH into each host and execute the BASH script wrappers that returned whether or not the package upgrades worked or if it required the attention of the operations team

I could have automated this process further but it dramatically improved the efficiency of the operations team that basically just waited for the results of the automation

What can be done or taught

If someone on the team could potentially be negatively affected by the automation, there are things that you can do that ease the blow

Running in parallel with the automation

The purpose of automation is not to put people out of a job but not needing a human to perform the tasks can be the indirect result of deploying effective automation

It’s probably a good idea to have someone at helm but this does not mean you need a fully-fledged helpdesk. The people that would be performing the tasks manually can now focus on other tasks, that have yet to be automated, and they can simply monitor the automation (checking alerts every so often or receiving push notifications)

As the automation does get more effective, it will result in no human being required but there will probably be a buffer between now and then so they have time to follow the next point

Learning how to build automation and tooling

It doesn’t have to be doomsday. Learn how software works and build automation. You’d want to wrap your automation in web applications because you’ll find the web almost everywhere and the web makes it really easy to ship automation tooling