by Gene Kim, Kevin Behr, and George Spafford
Publisher: Information Technology Process Institute
ISBN: 0975568612
Number of Pages: 100
Date Published: June 15, 2005
VisibleOps is one of my favorite computer geek books of all time. This book is a no-nonsense, straight forward guide to running a highly successful IT department. But, VisibleOps is not just some flavor of the week self-help management book. The lessons and goals presented in VisibleOps are the culmination of years of observation and research by the authors, who happened to notice that successful organizations had IT departments that operated in very similar ways. This book is a distillation of those observations into a methodology that is easy for anyone in IT to grok. Loosely based on the ITIL framework, VisibleOps cuts straight to the chase with four basic steps.
The Four Steps of Visbile Ops
Phase 1. Stabilize the PatientPhase 2. Catch & Release and Find Fragile ArtifactsPhase 3. Establish Repeatable Build LibraryPhase 4. Enable Continuous Improvement
Stabilize the Patient
In the first phase of VisibleOps, the goal is triage. Can you reduce the number and impact of outages? Some of the key ways to accomplish this goal is to implement and strengthen Change Management processes, only allow scheduled changes, and have a defined maintenance window.
Another huge benefit to the Change Management
process that often gets overlooked is its ability to act as a
communication tool and a way to publish a schedule of changes. With these processes in place, you will have better visibility for outage responders:
- What changed?
- How to back out that change
Fragile Artifacts
The second phase is all about using a risk based approach to identifying and cataloging critical systems. Some of the key indicators include:
- Systems with the highest Mean Time To Recovery (MTTR)
- Systems with low change success rates
- Systems with the highest downtime costs
But being able to understand and identify the cost of downtime requires understanding the business processes that each system supports. That is why this phase is based on the Configuration Management process and includes implementing a Configuration Management Database (CMDB). Once these processes are in place, you should see a reduction in variance, increased conformity in your systems, and it will be easier to detect anomalies within the environment.
Repeatable Build Library
In order to overcome the limitations imposed by the Fragile Artifacts, you must create a way to commoditize these systems. Phase three is all about implementing proper Build and Release Management processes to further reduce variance and increase your understanding of what your systems are actually doing. The thing that makes systems fragile in the first place is your lack of understanding about how that system operates.
Once you are able to obtain that level of understanding, it is much easier to swap out interchangeable components than it is to ad-hoc a resolution out of random troubleshooting steps that you can't really explain WHY those steps "fixed" the issue.
Continuous Improvement
You would think that phase four would be self explanatory. It is anything but that. In terms of implementation, I have found that this can be the absolute most difficult because it requires a major shift in the culture of most organizations. The VisibleOps Handbook provides some key indicators and metrics that can help track your progress on this journey. It does not, however, provide much advice on how to steer your Titanic to avoid icebergs along the way.
Reflection
The thing I love the most about the Visible Ops approach to ITIL and managing IT in general, is how corporeal it is. The word "visible" in the title obviously wasn't an accident; it is visible because the steps for implementation, the explanation of the methodology, really everything about it is so clearly evident that [almost] anybody should be able to thumb through this booklet and pick up some ideas that they can put to use right away and see results almost as fast.
Wow, actually got some traffic on my blog yesterday! Thanks to George and Gene for the RTs.

One thing I saw mentioned in The Phoenix Project was the phrase "technical debt", during chapter 15, page 164. In one of the phone conversations between Bill and Erik they are discussing how to correctly implement step three of the Theory of Constraints (ToC), how to subordinate all processes to the capacity of the constraint.
"So, here's your homework," [Erik] says. "Figure out how to set the tempo of work according to Brent. Once you make the appropriate mapping of IT Operations work to work on the plant floor, it will be obvious. Call me when you've figured it out."
"Wait, wait," I say hurriedly before he hangs up. "I'll do the homework, but aren't we missing the entire point here? What caused all the unplanned work is Phoenix. Why are we focusing on Brent right now? Don't we need to address all the issues with Phoenix inside of Development, where all the unplanned work actually came from?"
"Now you sound just like Jimmy [John the CISO], complaining about things you can't actually control," he sighs. "Of course Phoenix is causing all the problems. You get what you design for. Chester, your peer in Development, is spending all his cycles on features, instead of stability, security, scalability, manageability, operability, continuity, and all those other beautiful 'ities.
"On the other end of the assembly line, Jimmy keeps trying to retrofit production controls after the toothpaste is out of the tube," he says, scoffing. "Hopeless! Futile! It'll never work! You need to design these things, what some call 'nonfunctional requirements,' into the product. But your problem is that the person who knows the most about where your technical debt is and how to actually build code that is designed for Operations is too busy. You know who that person is, don't you?"
I groan. "Brent."
Time Slicing
I ran into this exact scenario a few years ago. I had a superstar senior engineer on my team that had technical knowledge, intuition and dedication beyond anyone I've ever worked with before or since. The problem was that he was burnt out on fire-fighting, getting emails and phone calls at 3 am to fix some emergency issue that someone higher up the food chain crammed into production at the last minute as an emergency change. The CIO I worked for at the time wanted to micro manage my team by telling me that everyone on my team had to be in the on-call rotation. I replied that it is ridiculous to waste the best engineer on the team's time answering emails and doing an emergency code push at 4 am when he should be spending his time designing and building a better deployment process, and to get a few entry level Operations people to answer the phone and reply to email.
I explained it to my boss this way (in part thanks to the lessons I learned about ToC from "The Goal" and "Critical Chain"), the more different tasks you add to one person or one team, the more time you waste trying to stop your mind from thinking about the last task you were just in the middle of and the longer it takes you to focus on the new emergency sitting in front of you. I called this concept "Time Slicing". When your team is spending more cycles time slicing than they are getting any real work done or completing a task, the less efficient the workflow will be.
The Snowball
I tried to draw a comparison to Dave Ramsey's get out of debt philosophy. The more different debts you have, the less progress you can make towards any of them if you are trying to focus on them all with equal attention (and every project we were trying to juggle was priority #1). Dave Ramsey's approach is to pay off the smallest debt first, regardless of the interest rate. This may seem counter intuitive to some people who look at the total amount of interest being paid out and say this doesn't make sense. You are wasting money on interest payments. But, by focusing on the smallest debt, the timeline for completing that task decreases, thereby freeing up additional resources to focus on the next smallest debt. This approach builds momentum (like a snowball rolling downhill), and most of all improves morale by generating a sense of accomplishment. Eventually, things began to improve. We created a brand new Tech Ops team to handle the firefighting and reserved the Engineer team for larger project work. As momentum increases, so does throughput and project completion rates improve. The key is not to have so many competing tasks/projects "time slicing" your team to death.
In The Phoenix Project, Bill tackles this problem by first getting Brent out of the first responder role and sets up a team of engineers to handle escalations and only that team gets to interrupt what Brent is working on if they need help with an issue. Then Brent can only explain what to do and how to do it. Brent isn't allowed to fix the issue himself. This starts to enhance the culture of cooperation and cross training so that Brent doesn't continue to be a silo of knowledge.
One thing I've noticed is that technical people with that level of skill are usually very open about sharing their knowledge, but they are never given time to do so. The business side of the house is usually more interested in faster resolution times than they are about redundancy of operational knowledge. This is disappointing, because with enough people intimately familiar with an issue, the better the opportunity for someone to find a creative solution for the long term.
Another great point in the book about increasing throughput is explained in chapter 19 during the "leadership off-site". Starting on page 198 through 202, Bill, Erik and Steve (CEO) discuss implementing a project freeze in order to reduce the number of tasks that the critical resources need to focus on (effectively erasing all of the small debts and leaving only one huge debt for the time being). Steve is extremely reluctant to allow this, seeing it as a huge waste of company resources akin to "subsidized potato farmers paid not to grow crops" (or in my example, wasting money on interest payments). Erik, in typical Jonah cadence draws on Steve's experience as a plant manager to relate due-date performance, WIP, inventory levels all back to taking on new orders and releasing work onto the plant floor. The result is obvious.
As common sense as this approach should be, many business executives, senior management and project managers just simply do not grasp this concept. If all of your resources are allocated to projects, systems keep breaking down and require "all hands on deck" to get things running again, project due dates are not being met, and quality of project deliverables are going down... in what world view does it make sense to add another project to the queue? Shouldn't it be time to pay down some technical debt first?
by Gene Kim, Kevin Behr, and George Spafford
Publisher: IT Revolution Press; 1st edition
ISBN: 0988262592
Number of Pages: 345
Date Published: 1/10/2013
Let me first say that it is incredible how many different lessons are packed into Gene Kim's latest book, “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win.” It was a really quick read because the story line was so easy to follow. (Hey, where are the hidden cameras in my office anyway? I actually know all the characters in this story.) But I've spent a few extra days going back over the parts of the book where the real instruction is interleaved into the plot.
The Phoenix Project combines the teaching of the several "management / improvement" genre books such as:
- Eliyahu M. Goldratt's "The Goal" which teaches the Theory of Constraints (ToC),
- Patrick Lencioni's "Five Dysfunctions of a Team",
- David J. Anderson's use of kanban boards to control the release of work and Work in Progress (WIP) for Development and IT Operations,
- Mike Rother's “Toyota Kata: Managing People for Improvement, Adaptiveness and Superior Results”,
- As well as TPM / LEAN strategies and many others...
- And last by not least, Kevin Behr, George Spafford and Gene Kim's previous research in Visible Ops Handbooks.
The narrative tells the story of Bill who reluctantly gets promoted to VP of IT at Parts Unlimited, an automotive parts manufacturing and retail company, after his previous boss and boss's boss get canned due to poor performance. Bill inherits all of the typical problems that come with an overworked, dysfunctional IT team that is so busy fighting fires, that they don't have time to look at the long view of their situation. And as if that weren't enough, the whole company is on a death march to deploy their latest home grown system, "Phoenix" that was intended to save the company's poor performance and lack of profitability for the past several quarters.
Sprinkled throughout the novel, the themes from the Visible Ops Handbook are intertwined as Bill takes measures to implement a new Change Management process, identify critical systems and get his team out of the fire fighting business and into something that resembles productivity.
The Four Steps From Visible Ops:
- Stabilize the Patient
- Catch & Release and Find Fragile Artifacts
- Establish Repeatable Build Library
- Enable Continuous Improvement
Bill has some good ideas and starts to make some progress for his team, but it is obvious that there is more work piled on Bill's team than they have cycles to work on. Bill quickly realizes that one of his key resources, Brent, is the bottleneck that is holding up a lot of work from getting done as well as a silo of critical knowledge for managing and maintaining his systems. Using the techniques from Goldratt's ToC, Bill instinctively starts to put some processes in place to manage work getting assigned to Brent.
Things really start to get interesting for Bill when he meets a potential new board member named Erik Reid, who turns out to have worked with Part
Unlimited years ago to help solve a crisis at their manufacturing plant by implementing what sound remarkably like the same solutions recommended by Jonah in "The Goal". Erik befriends Bill and takes on the role of mentor with the same Socratic approach to helping Bill find a way to fix all the troubles he is having. Erik first introduces Bill to "The Three Ways" in Chapter 7, page 91.
- "The First Way helps us understand how to create fast flow of work as it moves from Development into IT Operations..."
- "The Second Way shows us how to shorten and amplify feedback loops..."
- "The Third Way shows us how to create a culture that simultaneously fosters experimentation, learning from failure, and understanding that repetition and practice are the prerequisites to mastery."
Erik's first assignment for Bill is to identify the four types of work that he manages. By the time Bill gets around to taking Erik seriously (and for a raving mad man), it is already chapter 15, page 160, and the wheels are really starting to fall off at Parts Unlimited. There have been outages to the POS systems and a "small credit card breach". Bill gets the first assignment right recognizing that the four categories of work are:
- Business Projects
- Internal IT Projects
- Implementing Changes
- Unplanned Work
Bill and Erik discuss other progress that Bill's team has made with trying to simplify the input of change requests into the change management process, the use of kanban boards in the CAB meetings, and the first step in ToC identifying Brent as the constraint. Bill's next assignments are to figure out how to take needless work out of the system as much as it is to control WIP already in the system, and further define how to control the flow of work to Brent.
The Five Steps From The Theory of Constraints:
- Identify the constraint (the resource or policy that prevents the organization from obtaining more of the goal)
- Determine how to exploit the constraint (get the most capacity out of the constrained process)
- Subordinate all other processes to above decision (align the whole system or organization to support the decision made above)
- Elevate the constraint (make other major changes needed to break the constraint)
- If, as a result of these steps, the constraint has moved, return to Step 1. Don't let inertia become the constraint.
Despite all of Bill's progress, he is quickly falling out of favor with his new boss, CEO Steve. After another major outage and some harsh words, the two finally make up and get the team back together to figure out how to actually fix the problems that the company has created for itself. Erik helps Bill convince Steve to institute a project freeze to free up resources to work on the critical Phoenix project, which they are banking is the only hope to save the company from going under.
By Chapter 20, page 208, Erik and Bill are discussing WIP again and defining what a work center is... "every work center is made up of four things: the machine, the man, the method and the measures." Having personnel assigned to too many work centers is so ineffecient and is the crux of Brent being the constraint to so many projects. Erik introduces an interesting concept on page 213 showing that the wait time for any piece of work can be calculated by dividing the percentage that a resource is busy by the percentage that resource is idle. "When a resource is ninety-nine percent utilized, you have to wait ninety-nine times as long as if that resource is fifty percent utilized."
Another interesting character in the story is John, the CISO. John is a fanatic about security, but in typical Infosec fashion runs around barking out orders that he claims are mandated by some policy or regulation or auditor. In Chapter 22, Erik finally puts John in his place by showing that none of the controls that he has attempted to put in place have any impact on the business' ability to manage risk. After that John disappears for several weeks. And finally re-emerges as a changed man ready to finally look IT through the lens of the business' priorities.
Once Bill and John realize what the business' metrics for success are, they come to a hard realization that the Phoenix project will never fulfill the needs of the business in the project's current form. Bill puts together a proposal to build a SWAT team that will work in parallel with the Phoenix project but with permission to break all of the rules in order to help the business make its numbers. The new processes that emerge from this SWAT team are the fundamentals of #DevOps.
Like I mentioned earlier, the book is phenomenal, extremely well written and easy to follow. [Spoiler Alert] Of course everything works out by the end of this fictional tale, but most of the lessons really do work in real life... IF
Now here's where the story becomes somewhat of a fantasy apart from real life. In several of the situations I have been in or seen unfold in the past, senior management hasn't been willing to really look in the mirror and recognize that they don't really understand technology. And if they are having trouble understanding it then they are really going to have trouble managing it. Some of this stuff should be basic project management 101. It always amazes me how many project managers and senior management figures at some companies measure success by completing a project on time, and slightly over budget. But they don't analyze the success of the project in terms of actual return on effort to the business (maybe because they don't want to admit their pet project didn't achieve the objectives they claimed it would).
Not everyone in the IT world gets a personal Jonah or Erik to guide them through the murky waters of IT management and give credibility to the ideas and initiatives that you want to champion. If you find yourself with a team to manage, hopefully this book along with some of the other resources mentioned can strengthen your decision making abilities to help your team and your company to succeed in this crazy new world of #DevOps.
One other thing to note is that the artwork on the cover of the book is pretty cool!