Mitch Wyle's Web Log: devops

Showing posts with label devops. Show all posts

Saturday, September 28, 2024

Cursed Knowledge

This idea, to record things we learn that we wish we never knew, is great. The collection of cursed knowledge nuggets is similar to but narrower than the"WTF" collection in my bullet journals. Thanks, Cory.

Sunday, August 4, 2024

Charity Majors on Pragmatic Leadership

If you work in technology and care about leadership you should subscribe to Charity's blog. She does not post frequently, but when she does, her ideas are always fantastic, even if you don't agree with her. Here is her take on "politics" in the workplace and focusing on what matters most.

Monday, September 4, 2023

Measuring Developer Productivity

This analysis by Kent Beck is worth sharing.

Measuring productivity is a complex and nuanced problem, and there is no single solution that will work for everyone. The best approach to measuring productivity depends on the specific needs of the organization, such as the goals of the measurement, the resources available, and the culture of the organization. Any attempt to measure productivity must be done carefully, as it can have unintended consequences, such as creating a culture of fear or mistrust. It is important to remember that productivity is not the only measure of success, and that other factors, such as quality and customer satisfaction, are important.

Sunday, October 23, 2022

API security: modelling, signing, encrypting, validating, rate limiting, TLA, error handling, auditing

Here is a fantastic and comprehensive summary of API security. Integrating all of these capabilities can simplify your implementation.

Sunday, August 21, 2022

Agile & Lean as simple as possible

I have always found the process of distilling important concepts down to clear, well-understood slogans or sound bites to be very difficult. Each audience of the idea and each person within the audience has a different context and a different meaning for the words or pictures you are trying to convey. However, I think my friend Michael has published a clever, unambiguous insight that most of us can understand.

The entire Agile Manifesto and agile fashion trend is trying to empower developers who are really "the means of intellectual property production and delivery" of an enterprise to execute efficiently and effectively. Michael has formulated a single question and corollary that can enable anyone to improve. To paraphrase Michael's single question:

Are you currently, with certainty, working on the single most-important thing you need to do to move your effort towards success?

If yes: good! Carry on!

If no: Is it (a) because you're distracted/impeded or (b) because you don't know what the most important thing is?

In case it's (a): removing the distraction or impediment is now the most important thing [you need to do]. In case it's (b): finding out about the most important thing is now the most important thing [you need to do]. Communication is key in both cases, that's knowing, finding, and involving the right people.

A similar analysis of this idea is in the book The One Thing.

Sunday, August 14, 2022

Why you should pay for your continuous integration system

The folks over at SymOps have published a pretty good, though highly-opinionated summary of the current state of Continuous Integration (CI) systems, including CI-as-a-Service. They come out very strongly in favor of buying a service and make compelling arguments. Their post is short and worth a read.

chat ops: yetibot, kibiya, AirBnB

https://yetibot.com/

https://kubiya.ai

AirBnB Writes Their Own:

Everyone associated with incident management hates all of the systems they are forced to use as part of the tracking workflow, with their most vehement, white-hot hatred reserved for Jira. So everyone tries desperately to avoid Jira any way they can and many on-call folks write or integrate what they label "automation" to push an incident lifecycle through Jira for them. I perceive one of the secrets of the success of PagerDuty and Slack is their lighter, friendlier, easier methods of implementing an incident workflow and built-in, friendly integrations with ServiceNow and Jira.

About 15,000 internet relay chat (IRC) free, open-source "bots" that have now mostly all become Slack bots are used by service operators to facilitate communication, diagnosis, resolution, and incident workflow data entry via chat. However, because very few developers can read code and treat each problem they encounter as new, buggy code to be written, more bots are written every day. AirBnB has published their adventure, writing a slack chatbot that can drive PagerDuty and Jira through a few tasks as a side effect of chatting about incidents. Their bot has no real automation, but it has a few slack forms to facilitate Jira workflow data entry. It seems they have only just started developing their bot.

I, personally, know of two very powerful automation platforms based on Slack bots that are worth learning and using. Yetibot is a free, open source shell environment with a rich set of commands, scripting statements, Unix Pipes, etc. It has fantastic integration with Slack and Jira (among many other systems). If you are looking for a free, powerful chat bot, check out Yetibot. Another commercial (non-free) chat bot that provides true automation, structured workflow, and extremely safe, powerful built-in modules and interfaces is kubiya. Kubiya provides guided data entry and safe, restricted self-service infrastructure automation workflows in slack. The coolest part about Kubiya is that it hides many of the details of Terraform, complex policies, and enables a much-more natural way to just "chat" (type at) a system that will guide and assist the human through completing relatively difficult tasks. For commercial enterprises with overworked service operators, Kubiya is definitely worth evaluating.

Who should write your terraform?

". . .make decisions about where the work goes with your eyes open about what the risks, trade-offs, and systemic preferences of your business are."

Here is a fantastic explanation and history lesson about the evolution and nuances of DevOps, why it is highly contextual, and great guidelines for determining where in your organization you should put the responsibility for "Infrastructure as code" (Terraform) responsibilities.

Delivery Lead Time, aka code velocity

If you care about software development for operations (DevOps) and the DevOps Research Assessment (DORA) guidelines, you know about the four DORA metrics. Following up on his excellent post about DORA metrics in 2022 earlier this year, Logan Mortimer has written an excellent, deep analysis of Delivery Time that's worth your time to read and understand, especially if you read Accelerate and want to implement DORA matrics in your organization.

Sunday, July 31, 2022

. . . and libyears to go before I sleep

A link to this idea appeared in my feed. I have noticed most people gravitate towards single-number summaries such as (shudder) arithmetic mean so the idea does have some merit. And, the measurement is very easy to drop into any set of repositories.

Apologies to Robert Frost for the title.

The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.

Complexity & Chaos, cause Crushing Complications

Do you remember when our production systems were less-complicated and easier to debug? Production Issues were easy to diagnose and it was always faster to recover. I remember writing an application in a single page of HTML that used server-side include directives in the web server. The entire application (order form, order acknowledgment receipt, order processing) is 200 lines of HTML, 4.6KB. Sigh. Pete Hodgeson at Honeycomb wrote an interesting analysis of the giant leaps backwards we have made in the last 25 years because we have piled on so much unneeded complexity in our systems. Someday I may create a conspiracy theory based on the cui bono (who profits) from this ridiculous inefficiency. I recommend browsing through Hodgeson's article but not my HTML code or my article.

Friday, July 29, 2022

Backups are much worse than useless

In information technology (IT) and software engineering, operators and DevOps folk define their "backup" systems and policies. However, they infrequently or never state their data restoration service levels. The only measurement that an end-user cares about is: how quickly, and from which times in the past, can you successfully recover the state of my data? And, if no data restores are needed, how do you test that your recovery service levels are met or exceeded as my data volume grows?

Making many copies (backups) of data frequently and storing all of those copies (or differences) costs a significant amount of network, compute, and storage, especially for longer retention periods. Therefore, the IT or DevOps team should discover and deliver only the restoration requirements their end-users need, and for which they are willing to pay or be cost-allocated.

We are in the recovery business, not the "backup" business. I consider the term "backup" to be a terrible word because it normally means only "untested, expensive copy." Therefore, I personally never use that word and instead talk about our recovery capabilities and service levels. And I always try to implement an automated test system that verifies these data restoration service levels are correct.

Sunday, July 24, 2022

Alerts & alerting again

Dan Ravenstone published a refresher on Alerting that's worth a quick listen or skim. Alerts are supposed to help us improve a service by enabling us to detect issues affecting our customers sooner and should also be useful to help us diagnose the issue quickly so that the issue can be mitigated, alleviating customer pain. Most alerts are false positives -- paging us out of bed for a problem that does not exist. Other alerts have no information about what is causing the issue, just some vague alarm that something is wrong. And, of course, the absence of alerts is the most frequent reason customers tell us when our service is not working instead of our alerting system.

Sunday, July 3, 2022

Systems Administration & Ops IT is not DevOps

I love rants like this one. All of the points the author makes are well-taken. Just as we re-labeled manual testing "software test engineering" or "quality assurance," so too are we now calling systems administration and data center operations "DevOps," as though the label itself will reap the benefits of true software development and maintenance for better operability.

https://leebriggs.co.uk/blog/2022/06/21/devops-is-a-failure

Sunday, June 26, 2022

InfoQ on DevOps Trends 2022

There is not much actionable information in this new publication on InfoQ and the odd fashion trend in 2022 to confuse concepts such as Observability with unrelated technologies (WASM, really?) continues to accelerate. This trend reminds me of the tech venture capitalists' use of buzzwords in 1997. My favorite was "Java Compliant."

Wednesday, June 15, 2022

Grafana at KubeCon 2022

Here is a collection of KubeCon 2022 talks given by folks from Grafana Labs.

if you must: node.js in docker

Bret Fisher has updated a comprehensive review of node.js in docker, including node image selection, CI, CD, which docker image to use, etc. I am not a fan of node; but it is very popular so I, and everyone else, need to support it. And since k8s domination is sweeping the fashion trends around the world, we must all bow to our container masters and support these heavy frameworks as well. I disagree with Bret's analysis of security concerns for a container base distro. I firmly believe there is no attack service like no attack surface, so I personally prefer the distroless image maintained by the evil search giant (ESG). Bret's analysis is a point-in-time, so tracking and re-analyzing his choices is still a moving target. In general, one should make a selection based on long-term viability and projected maintenance support of one's components (get on a train and stay on it for a few years). But his analysis is fantastic and worth reading.

Cloud domination

I love reading rants like this one about how & why entire developer environments are moving off of the desktop and into the cloud. As an old fart, I remember when we had telephone switches, answering machines, and call routing devices in our homes and offices, and how they moved quickly up to the "central." I also remember X-Terminal hardware, where all environments, not just developer systems were in the cloud, somewhat akin to Chromebooks today. Personally, I have always been comfortable developing on remote systems because I always used command line terminals that accessed remote systems and rarely ran local dev until IDEs conquered most of my terminal use for coding.

Monday, May 30, 2022

Heroes can help fix Hero Worship anti-pattern

I frequently rail against the all-too common anti-pattern of "Hero Worship Culture" in DevOps. Heroes don't scale. Hero worship disincentivizes high-quality, reliable software and focuses on heroic efforts to repair bad code. My litany is long.

Here is an interesting idea articulated in a Basketball analogy that does not always work. The author gives prescriptive methods for the heroes themselves to systematize reactive support, incident management, and DevOps culture.

KubeCon EU summaries

(Horizontal Pod Autoscaling)

The EU version of KubeCon was last week; some good conference read-outs and summaries are starting to emerge. Here is one good summary. Please leave a comment with others you have found.

Mitch Wyle's Web Log