Thursday, February 28, 2013

Technical Debt: The Snowball Approach

One thing I saw mentioned in The Phoenix Project was the phrase "technical debt", during chapter 15, page 164.  In one of the phone conversations between Bill and Erik they are discussing how to correctly implement step three of the Theory of Constraints (ToC), how to subordinate all processes to the capacity of the constraint.

"So, here's your homework," [Erik] says.  "Figure out how to set the tempo of work according to Brent.  Once you make the appropriate mapping of IT Operations work to work on the plant floor, it will be obvious.  Call me when you've figured it out."

"Wait, wait," I say hurriedly before he hangs up. "I'll do the homework, but aren't we missing the entire point here?  What caused all the unplanned work is Phoenix.  Why are we focusing on Brent right now?  Don't we need to address all the issues with Phoenix inside of Development, where all the unplanned work actually came from?"

"Now you sound just like Jimmy [John the CISO], complaining about things you can't actually control," he sighs.  "Of course Phoenix is causing all the problems.  You get what you design for.  Chester, your peer in Development, is spending all his cycles on features, instead of stability, security, scalability, manageability, operability, continuity, and all those other beautiful 'ities.

"On the other end of the assembly line, Jimmy keeps trying to retrofit production controls after the toothpaste is out of the tube," he says, scoffing.  "Hopeless! Futile! It'll never work!  You need to design these things, what some call 'nonfunctional requirements,' into the product.  But your problem is that the person who knows the most about where your technical debt is and how to actually build code that is designed for Operations is too busy.  You know who that person is, don't you?"

I groan. "Brent."



Time Slicing

I ran into this exact scenario a few years ago. I had a superstar senior engineer on my team that had technical knowledge, intuition and dedication beyond anyone I've ever worked with before or since.  The problem was that he was burnt out on fire-fighting, getting emails and phone calls at 3 am to fix some emergency issue that someone higher up the food chain crammed into production at the last minute as an emergency change.  The CIO I worked for at the time wanted to micro manage my team by telling me that everyone on my team had to be in the on-call rotation.  I replied that it is ridiculous to waste the best engineer on the team's time answering emails and doing an emergency code push at 4 am when he should be spending his time designing and building a better deployment process, and to get a few entry level Operations people to answer the phone and reply to email.

I explained it to my boss this way (in part thanks to the lessons I learned about ToC from "The Goal" and "Critical Chain"), the more different tasks you add to one person or one team, the more time you waste trying to stop your mind from thinking about the last task you were just in the middle of and the longer it takes you to focus on the new emergency sitting in front of you.  I called this concept "Time Slicing".  When your team is spending more cycles time slicing than they are getting any real work done or completing a task, the less efficient the workflow will be.


The Snowball

I tried to draw a comparison to Dave Ramsey's get out of debt philosophy.  The more different debts you have, the less progress you can make towards any of them if you are trying to focus on them all with equal attention (and every project we were trying to juggle was priority #1).  Dave Ramsey's approach is to pay off the smallest debt first, regardless of the interest rate.  This may seem counter intuitive to some people who look at the total amount of interest being paid out and say this doesn't make sense.  You are wasting money on interest payments.  But, by focusing on the smallest debt, the timeline for completing that task decreases, thereby freeing up additional resources to focus on the next smallest debt.  This approach builds momentum (like a snowball rolling downhill), and most of all improves morale by generating a sense of accomplishment.  Eventually, things began to improve.  We created a brand new Tech Ops team to handle the firefighting and reserved the Engineer team for larger project work.  As momentum increases, so does throughput and project completion rates improve.  The key is not to have so many competing tasks/projects "time slicing" your team to death.

In The Phoenix Project, Bill tackles this problem by first getting Brent out of the first responder role and sets up a team of engineers to handle escalations and only that team gets to interrupt what Brent is working on if they need help with an issue.  Then Brent can only explain what to do and how to do it.  Brent isn't allowed to fix the issue himself.  This starts to enhance the culture of cooperation and cross training so that Brent doesn't continue to be a silo of knowledge.

One thing I've noticed is that technical people with that level of skill are usually very open about sharing their knowledge, but they are never given time to do so.  The business side of the house is usually more interested in faster resolution times than they are about redundancy of operational knowledge.  This is disappointing, because with enough people intimately familiar with an issue, the better the opportunity for someone to find a creative solution for the long term.

Another great point in the book about increasing throughput is explained in chapter 19 during the "leadership off-site".  Starting on page 198 through 202, Bill, Erik and Steve (CEO) discuss implementing a project freeze in order to reduce the number of tasks that the critical resources need to focus on (effectively erasing all of the small debts and leaving only one huge debt for the time being).  Steve is extremely reluctant to allow this, seeing it as a huge waste of company resources akin to "subsidized potato farmers paid not to grow crops" (or in my example, wasting money on interest payments).  Erik, in typical Jonah cadence draws on Steve's experience as a plant manager to relate due-date performance, WIP, inventory levels all back to taking on new orders and releasing work onto the plant floor.  The result is obvious.

As common sense as this approach should be, many business executives, senior management and project managers just simply do not grasp this concept.  If all of your resources are allocated to projects, systems keep breaking down and require "all hands on deck" to get things running again, project due dates are not being met, and quality of project deliverables are going down... in what world view does it make sense to add another project to the queue?  Shouldn't it be time to pay down some technical debt first?

No comments:

Post a Comment