How to Fix What Plagues Cloud Native

Semaphore
8 min readMar 19, 2019

--

R̵e̵d̵ ̵H̵a̵t̵ AWS developer advocate Michael Hausenblas is on the road 70 percent of his time — yet he managed to carve out some time to speak to us about what ills cloud native and what to do about it. Hint: aligning the Dev with the Ops inside organizations can go a long way in solving many deployment issues.

You’re a developer advocate at a Cloud Native ambassador. You’ve also authored a couple of books on Kubernetes and DevOps practices in general, and you hold a Ph.D. in computer science. How do you combine the knowledge-sharing and community-focus part of your activities with the practical/technical aspects of your job?

At the end of the day, it all boils down to communicating good practices. Given that in cloud native land we’re still in the early days, oftentimes this means establishing or capturing them. The most important part is to very actively and carefully listen to and try to understand the problems folks are facing. Is it something that comes from the fact that someone is using the technology in a way in which it wasn’t meant to be used? Maybe someone is trying to continue to apply methods that used to be valid in an old-style monolithic set-up, but that doesn’t make sense with containerized microservices? Or maybe the technology itself has a developer experience (DX) issue, potentially even a bug.

Sometimes, and not very surprisingly given that we’re operating in a fast-moving space, it’s simply about raising awareness about tooling or good practices. While I do spend a certain amount of time (maybe one to two hours per day) on StackOverflow, Slack and the like to help folks one-on-one, I really try to focus on things that are scalable: a blog post, a video walkthrough or simply a well-documented demo up on GitHub. These are the things that typically benefit many people and turn out to be more efficient in the long run.

Nevertheless, direct interaction with end users is crucial, and most of that I get when I’m traveling, which is some 70 percent of my time, be it at meetups at conferences or with customers on-site. Why is this crucial? Because it’s only while being exposed to specific real-world problems users are facing, and getting a chance to discuss the challenges and the underlying motivations pertaining to them, that I can provide useful feedback to the engineering and product management teams.

You’ve created a lot of advocacy pages for people interested in the modern ways of doing DevOps, and one of them seems to stand out (not only because of the intriguing URL). The “Boring is cool” website is basically a quick guide to how to start introducing best practices when it comes to DevOps inside organizations. You mention that the tech part is actually the “easy stuff” of the process, and what’s really difficult is preparing the people inside the organization for the shift.

What are the best incentives for developers, software ops professionals and the business itself? Is it all about speed and agility, or is it rather about a more balanced and happier “business as usual”?

First, let me say that I don’t believe in best practices, but only in good practices. Our field is way too young and fast moving for anyone to claim a single best way of doing things. However, it does make sense to document things that have worked well in specific situations with specific requirements and constraints.

Coming back to your question concerning incentives: I’m with Courtney Eckhardt, who made the case in her 2016 talk ”How Should We Ops” that if the incentives of your developers and ops folks are not aligned, then all the tooling in the world won’t help you. In a nutshell, developers are incentivized to produce features: that is, to change things, and ops folks to keep stuff running — that is, to prevent change. If you don’t find a way to make both groups work in the same direction, the latest cloud native set of tools will not fix this. This starts with mutual awareness, understanding and appreciation of the respective roles. And while many ops folks have taught themselves how to program, I’d argue that only recently with the rise of Kubernetes have developers been exposed to and (to a certain degree) been forced to learn and deal with “traditional” ops topics. They have to take a look at some of the Kubernetes primitives, such as deployments and services, and that’s exactly how ops folks think about rolling out a new version of an app or load-balancing traffic.

On the same site, you mention that one of the basic requirements of shipping more often than not is to “have a CI/CD pipeline and know how to use it” as well as automating your CI/CD practices. We also see various industry reports and surveys proving that the wide adoption of CI/CD tools inside software-centric companies is still at an early stage. What might be the cause of this?

I think the issue is even deeper than that. Some of the traditional organizations I’ve come across are facing even more basic issues: e.g., where is the source code managed? Just because many of us use Git and manage services around it, it doesn’t mean that one should be surprised to find a bricks-and-mortar company amid a digital transformation campaign that is still keeping their source code on SharePoint.

That’s why, when I’m with customers, I always suggest getting the boring chores done first. That includes having all your source code as well as the infrastructure configuration in a DVCS such as Git, automating all the testing and integration infrastructure and having a secure and performant place for the artifacts that are supposed to be deployed.

I’m fully aware that this is not the exciting part; it’s like advising someone who is interested in driving a car at 220 mph to focus on building a proper, wide enough street first. It remains a fact of life, however, that driving fast (and safely) on a narrow and poorly maintained road is very dangerous, if not outright impossible.

A big part of your work is staying updated with the latest technologies and how they are influencing the everyday work of developers and DevOps professionals. One of the trends is going serverless, and the two most common questions it raises are, “When should we go serverless?” and, “Who will be responsible when services go down?”

What’s your take on the second question, and what are the industry trends for tackling this that you’ve observed so far?

There are many offerings that one can consider being serverless, and Subbu Allamaraju has a great post on this topic. I will focus on Function-as-a-Service (FaaS), such as AWS Lambda or the likes, in addressing your question. Let me start out by saying that many of the observations are nevertheless equally true for containerized microservices. So, in general, the question is, Who’s wearing a pager? The person who wrote the code running in a function? Or if not, who else?

It boils down to the fact that in environments where the platform (FaaS or a Kubernetes cluster) is managed by someone remotely potentially outside of your organization, you will have to make a decision as to who’s responsible for running the code. One policy could be that the developers themselves are the ones on call and responsible for taking care of the application-level day-two ops. Other options, such as those advocated by Google’s Site Reliability Engineering (SRE) model, introduce additional and dedicated roles supporting and coaching developers to take care of the operational issues.

The main point is that as the traditional administrator role (being responsible for infra-provisioning as well as operating your app) changes, organizations need to rethink the roles of developers. I’ve been promoting the term appops, originally coined by Bryan Liles, trying to suggest that there needs at least to be an operational awareness within developer teams, and that, in the more extreme cases, developers need to be actually responsible for the day-two operations themselves.

The good news is that serverless environments specifically provide developers with a lot of excellent tooling, from observability (metrics, tracing) to controlling releases and traffic. The bottom line is, however, that organizations adopting serverless (and containers alike) need to acknowledge this transition in terms of responsibilities and have to come up with a clear guidance. Just renaming a role to something that contains the term DevOps won’t cut it in the long run.

As a developer advocate you learn a lot about developer experience (DX). What do you think are the crucial factors for a developer tool/framework that needs to succeed in a highly competitive environment? Is it all about speed, or UX, or the community around it, or all of these factors combined?

Yes, DX is super-important, and it’s also not easy to get it right. If you look at any given tool — let’s say a CLI such as “kubectl” — the people developing the tooling have to deal with two challenges: being aware of and applying state-of-the-art paradigms in the respective area (see, for example, this excellent good practice guide for CLIs), and their own biases and assumptions about where or how the tool is being used.

A basic yet real-world example demonstrates this: While Go is a great language for developing cross-platform tooling, there are certain things that are fundamentally different between, say, *nix systems and Windows, such as handling of environment variables. Now, if I’m a CLI tool developer and only ever use and test my tools in my dev environment, say macOS, I’ll likely not be able to feel the pain users on other platforms might be experiencing.

In an open-source development context, this can go even further, potentially demotivating your contributors, or even preventing them from using your software. I had to learn this fact the hard way in one of my own projects, and this is something where technological diversity within the team goes a long way.

One of your projects is the cl0udn41v3 (Cloud Naive) Twitch stream, which looks at “all things infosec concerning *nix, containers, Kubernetes, and serverless computing” and was inspired by Bruce Schneier’s book “Click Here to Kill Everybody.”

Cloud native security is a fascinating, and at the same time scary, topic. While Schneier in his book calls out for governmental institutions to introduce more policies, you believe that “together we can tackle the threats the Internet+.” Could you elaborate a bit more? How much responsibility lies in the hands of open-source communities and cloud native vendors?

A lot of it is certainly awareness. I’m no Bruce Schneier, so I can only encourage folks to read the book, in which he makes great points. Given that our current systems (laptops, mobile phones, etc.) are already a challenge for most of us in terms of security, and now looking at the number of smart devices (from thermometers to personal assistants to refrigerators to cars) we increasingly use and depend on, it’s only getting worse.

Each of these devices is a networked computer, and it’s even less transparent for end users to assess what runs on it, and who has access to it, and the data it produces. So, education about this issue is the first step. Then, thanks to educated consumers demanding security, open-source communities can support the efforts to keep systems transparent, and cloud native vendors are incentivized to deliver the necessary features.

Please 👏 if you liked this post and follow us for updates 🙌.

Article originally published on semaphoreci.com.

--

--

Semaphore
Semaphore

Written by Semaphore

Supporting developers with insights and tutorials on delivering good software. · https://semaphoreci.com