Mitch Wyle's Web Log: 2020 Week 6 Dev-Ops Scouting Update

Sunday, February 9, 2020

2020 Week 6 Dev-Ops Scouting Update

I have decided to post my dev-ops musings and scouting on this public Internet blog instead of internally where I work because there is nothing here that is specific or proprietary to my employer and because a few friends have asked for my thoughts. I welcome all feedback, especially bad (criticism).

TL;DR

Gene Kim hawks his new book The Unicorn Project on the Ship Happens podcast

Accenture's Markos Rendell wrote a summary of Team Topologies (slides here)

Rick Branson explains why we should never count bugs / incidents

Great talk on Terraform without the mess by James Nugent, one of the Terraform authors

Control costs for no-ops (Google cloud functions, Amazon lambda) presentation

Here is a short deck about fun and infrequently used features of YAML

Netflix has open-sourced the riskquant python library for helping you quantify risks

ap4k has rebranded itself as dekorate and is in use by some teams where I work

I assume everyone knows that all software and services always takes on the design of the organization that ships it. Features and menu hierarchies in user experience, and micro-services decomposition always align to the shape of the organization that produces them. This phenomenon is known as "Conway's Law." Gene Kim, in his new book, explores other such phenomena and gives some prescriptive device for creating a high-performing engineering team. Much of the book is fluff and common sense. But there are numerous counter-intuitive phenomena and some good easy-to-implement practices.

#

On the topic of organizing a high-performing team, a pair of dev-ops authors wrote another interesting book recently on "cognitive load" and org structures that optimize for the success measures of your product. The slides from their presentation in October 19 are very approachable. The book summary by Markos Rendell is dry and comprehensive. The book itself has some important concepts and should be skimmed by senior leaders in a position to design dev-ops organizations.

#

Teams will always "game the system" when their success is measured by bug counts or incidents. If the goal is fewer bugs and fewer incidents, they will go to great lengths to hide their bugs & incidents from tracking. If the goal is "yield" (ratio of bugs we find before customers find them to the number of customer-reported bugs), then engineers will stuff the bug tracking system with trumped up tick-tack P6 bugs. Reactive support organizations will push every 30-second bit of customer interaction through their ticketing system and demand more headcount because of the flood of tickets.
Don't count bugs or incidents. Just don't! "Counts" is the wrong single-number summary. Instead, consider measurements from the customer's point of view (service level indicators, SLIs). Severe, ship-stopping or money-losing bugs or incidents should be aggregated and combined with a few other measurements into Management's desired single-number summary.

“What can be counted doesn’t always count, and not everything that counts can be counted.”

— William Bruce Cameron.

Terraform has evolved rapidly; James Nugent believes we should apply the "Composition Root" software pattern to factor our Terraform modules. In a recent talk he makes very strong and convincing arguments, especially in light of security and separation of concerns.

#

The reason you are using serverless cloud functions is to reduce operations cost to zero (no-ops). You were wise enough to realize that over 80% of the total cost of your software during its lifetime is maintenance and operation. Now you are looking at customer experience and that hefty public cloud bill with an eye on reducing latency and cost-cutting. How can you further reduce the cloud function costs? This presentation explains where to look. Hints: Don't use Java or a JVM language if you can avoid it.

A good craftsman understands the breadth, depth, and capabilities of the tools she uses. Learn and use more of YAMLs interesting features by reading this fun deck.

I have frequently tried to explain risk analysis and prioritization to people without much success. Netflix has open-sourced "riskquant" (get it?) a python library for risk people to do risk analysis. It's not just financial services institutions that need risk analysis. Software and service failure risks have hefty economic impact as well.

#

Dekorate generates your kubernetes (k8s) configurations for you when you include a dependency in your Java classpath; you can customize your k8s config by setting an annotation or application property in your code.

Mitch Wyle's Web Log

Sunday, February 9, 2020

2020 Week 6 Dev-Ops Scouting Update

TL;DR

No comments:

Labels

Subscribe via Email

Curriculum Vitae

Blog Archive

About Me