INTRODUCTION: WORK AND FLOW
Do not squander time for that is the stuff life is made of.
—Benjamin Franklin
As a build engineer, my first job out of college was to make builds visible. This meant tracking which version of what file went on which computer and in what environment. Three months into that job, I was working on a build—getting all the code from the source code repository, compiling it into executables, and then installing the resulting new functionality into a place where others (analysts, developers, testers, and other interested people) could see it. The build wasn’t compiling though, and I sat there trouble-shooting the broken build, in the office, alone, at 2:00 a.m. Tired, I was making mistakes, so I went home. I seriously questioned my career choice. Technology work apparently meant working many late nights. After a nap, I returned to the office to track down code dependencies between various developers and eventually got the build working.
I’m not sure exactly how many hours I’ve spent tracking down dependencies during all my years of merging, building, and releasing software, but I’m convinced that it’s been way too many. If I had a dollar for every minute spent troubleshooting builds and broken environments, I would have a sweet little nest egg. Delayed work, whether it is measured in hours, days, weeks, or even months, carries with it a cost. Losing time due to avoidable problems is expensive and dispiriting. Life is short. Wasted time can never be regained.
In the sci-fi film In Time, time is literally money—people earn minutes, hours, and days to buy food, housing, transportation, and everything else imaginable. Street thugs kill people by stealing all their minutes. Wasted time is the kiss of death. In one memorable scene, Will Salas, played by Justin Timberlake, saves the life of the wealthy Henry Hamilton, played by Matt Bomer. When Will and Henry get to a safe place, Henry tells Will that he is 105 years old and tired of living. He asks twenty-eight-year-old Will what he would do with 100 years. Will quips back, “I sure as hell wouldn’t waste it.” Later, as Will sleeps, Henry gives Will his 100 years and leaves him a note, “Don’t waste my time,” before he runs off to timeout by allowing his own clock to run down while sitting on the ledge of a tall bridge.1
A scrawled note from a dystopian sci-fi film embodies our reality. Time is life—use time wisely.
We workers are drowning in nonstop requests for our time. From developers to IT operations people, it’s overwhelming to keep up with the ever increasing demand. In this regard, things haven’t changed much since my first job out of college, where as a software configuration management lead at Boeing, I did builds and deployments on IBM mainframes at Hickam Air Force Base in Hawaii.
A line of people formed outside my cube wanting to know the status of a build. Did everything compile okay? When will the build be deployed to the quality assurance environment? Can I get one last change in? I wanted to say, “Pick a number. I’m working as fast as I can. Every one of your interruptions delays the build by another ten minutes.” The fact that developers and testers had come to me to ask for status updates was a symptom of a much larger problem that I didn’t recognize at the time.
My calendar was booked with meetings all day long. Rarely did I get a chance to work uninterrupted until the evening or the weekend. Four months into the new job, I pulled an all-nighter in the office, working as fast as possible to catch up on the mountain of work. When the program manager arrived at 6:30 the next morning, he thought I had just arrived early. He was not pleased to hear that I was headed home to take a nap. Sleep deprivation was another red flag that I didn’t give enough thought to at the time. Later on, after years spent working in technology, I recognized that relentless heroism—staying late night after night, wearing two hats, and consistently playing catch-up—is unsustainable. Quality doesn’t happen on four hours of sleep.
We overload ourselves and we overload our teams—this is the everyday reality within the information technology sector. And, because we get interrupted all the time, we stop work on one task and start work on a different task, from one project to the next, never focusing on one thing long enough to do it justice. This context switching kills our ability to settle into work and concentrate sufficiently. As a result, we are unhappy with the quality of our work despite our desire for it to be good.
The problem is that we are working with dysfunctional processes—companies haven’t adapted to keep up with demand in a healthy, sustainable way. Instead, we see the continued use of antiquated approaches meant to keep workers busy all the time. These processes are not working. This is the elephant in the office. If workers were able to get everything done right and on time, there wouldn’t be an issue. But that’s about as common as a black swan. The amount of requests (the demand) and the amount of time people have to handle the requests (their capacity) is almost always unbalanced. This is why we need a pull system—in which people can focus on one thing long enough to finish it before starting something new—like kanban. Kanban is a visual pull system based on constraints that allow workers to pull work when they have availability instead of work being pushed onto them regardless of their current workload. Since demand and capacity are frequently unbalanced, and it’s almost impossible to get everything done on time, systems like kanban are for helping people balance all their work demand.
We’ll get into kanban and where it fits into the process of making work visible a bit later, but for now, know that kanban is an approach to make work and problems visible and improve workflow efficiency. Kanban helps you get work done efficiently without burning the midnight oil night after night.
In the 2000s, I worked at an image licensing company in Seattle owned by Bill Gates called Corbis. I managed the Build and Configuration Management team.
We had a decent reputation among the Engineering department until 2005, when our two preproduction, seven-server environments quad-rupled into eight preprod, twenty-five-server environments. We had seventeen databases. Each configured manually within the tightly coupled, highly dependent architecture. On top of that, the business asked us to develop new major systems at the same time, and they wanted the ability to deploy either one before the other. The dependencies between the existing system and the two new systems ballooned. My job grew from building out and managing twenty-five servers to building out and managing two hundred servers.
To deal with the changes, we created and maintained additional long living branches in source control, which is the place where developers check in their code for safe keeping. It was a terrible solution, but it helped the teams avoid clobbering each other’s changes. Think of long living branches as a place where code is stored in isolation, where it’s impossible to see the impact it might have on the code already released to production. It’s kind of like adopting an older cat and praying that he and your current, much older cat will embrace each other with open paws. With more than two hundred servers to configure and maintain, configuration management was elevated. It took, at best, two weeks to restore production data to preproduction environments. We scheduled “M Is for Merge” days every six weeks, which consumed many developers’ time.
Our reputation plunged. Developers complained that builds were taking too long. This, of course, offended me, and I set off to prove them wrong by collecting build-and-deploy time metrics.
Figure 1. Builds Don’t Take That Long
I pointed out that the big ball-of-mud architectural design disaster on our hands made deployment and maintenance of environments problematic. I pointed out that the manual smoke tests (tests to see if website functionality still works) delayed the time that developers and testers could see the latest changes and that lack of automated testing hurt our ability to quickly spot problems. Manual smoke tests were the norm. Both of these problems were dismissed fairly quickly as not real issues. The fact remained that developers and testers were unhappy. Business people were unhappy. And the boss was unhappy. It’s no fun being on the team that “doesn’t deliver.” The barriers between teams were stronger than the connections. This is the problem with a bad system.
My own experience with a bad system coincided with the CFO deciding to replace the enterprise resource planning (ERP) system with another ERP product called SAP. An ERP system is a management information system which integrates things like planning, purchasing, inventory, sales, marketing, finance, and HR. SAP is its own ERP system, created by SAP AG, the fourth largest software company in the world.
My boss asked me, “Hey, do you want to manage the SAP Basis team as part of managing the build and release team?” Like an idiot, I said yes. I don’t know how I could have possibly set myself up for more failure. I had zero experience with SAP, and adding SAP to my list of responsibilities spread me thin—to the point where I managed to be terrible at many different jobs. Multitasking is a good way to screw up progress, as I’m sure many of you reading this book know from experience.
At the time, I didn’t know that all these things were red flags of a bad system. All I saw was that my performance was less than exemplary and that I was an unhappy employee who had started to consider other options.
I updated my resume.
In 2006, we spent a good deal of time analyzing and comparing different tools to manage our source code. Our team chose Team Foundation Server (TFS). We were a Microsoft shop, after all, and I ended up installing, configuring, and maintaining TFS—while also learning SAP, interviewing new candidates weekly, and helping to implement a new sustainment process. This process made it possible for us to deliver improvements every two weeks instead of every six months.
A user interface (UI) developer named Dwayne Johnson recognized the value in delivering small changes frequently and began socializing the idea of making small improvements on a consistent schedule. Dwayne started the process by fixing UI bugs on a regular bi-monthly cadence. At the time, it was just one more thing to support, but it was a very important one. These incremental and iterative improvements done on a regular cadence was our Agile alternative to traditional Waterfall development. These Agile methods wandered into our process, getting us thinking about a better approach to our work.
In April of ’06, a Scottish Fellow from Microsoft appeared at Corbis. David Anderson visited us monthly to teach us how to apply the Theory of Constraints (TOC) to our work in exchange for permission to write a story about the Corbis Agile transformation. TOC is a way to identify the most important limiting factor (the constraint) that stands in the way of achieving a goal and then systematically improving that constraint until it is no longer the limiting factor. There was much excitement while reading his book Agile Management for Software Engineering: Applying the Theory of Constraints for Business Results as we thought we would do Feature Driven Development, a type of Agile development focused on cross-functional, collaborative, and time-boxed activities to build features. As Darren Davis writes in his blog “The Secret History of Kanban,” David’s methods “...eliminated explicit estimation from the process, and relied on data to provide a probabilistic means of determining when software was likely to be done.”2 David got us going on operations reviews and explained how important it is to measure progress (or lack thereof). Learning what to measure changed my world. Ranting didn’t work, but measuring cycle time (the time it takes to do work) and presenting that data to leadership, did. I was able to influence leadership then and got buy-in to hire additional team members.
Sometimes the obvious gets lost in the crunch of the corporate world. We intuitively knew we had too many projects in flight, but it was hard to see until we measured the actual time that it took to get work done, at which point it became obvious the work spent more time in wait states than in work states. We spent time waiting for approval. Waiting for others to finish their part so we could start (or finish) our part. Waiting for uninterrupted time to focus on finishing the work. Waiting for the right time of day/week/month. And while we waited, we started something new, because, you know, with resource utilization as a goal, you have to stay busy all the time.
As Kate Murphy writes in her article “No Time to Think,” “One of the biggest complaints in modern society is being overscheduled, overcommitted and overextended. Ask people at a social gathering how they are and the stock answer is ‘super busy,’ ‘crazy busy’ or ‘insanely busy.’ Nobody is just ‘fine’ anymore.”3 I see evidence of this every day. When there is a still moment for reflective thought—say, while waiting for a meeting to begin—out come people’s phones. Busyness can be an addiction for terminally wired ambitious people. But busyness does not equate to growth or improvement or value. Busyness often means just doing so many things at once that they all turn out crappy. Sometimes walking in the park and allowing ourselves time to think is the best way to seize the day. But horrors if an engineer sits idle for fifteen minutes simply thinking.
At Corbis, looking at the reasons why we worked on too many things at once was a revealing exercise. The CFO wanted to implement a new financial system. The SVP of Global Marketing wanted to blah, blah, blah. The VP of Media Services also wanted blah, blah, blah. The head of Sales wanted blah, blah, blah, blah. And they all wanted everything now. The resulting business priorities clashed all the way down the hierarchy and that was just the business side of the house. On the engineering side, not only did we need to implement all the business requests, we also had our own internal improvements to make and maintenance work to do. Furthermore, we still had to be available to drop everything when production issues occurred—like it or not, production comes first. The clashing priorities became apparent while looking at the many long-standing branched code lines, but other than that, there was no clear visual of the impact of working on too many things at once. It’s hard to manage invisible work. With invisible work, we don’t notice the explicit reminders that our mental budget is already full. There is no time to simply think.
After eight years at Corbis, I was one of forty-two people let go during the September 2008 round of layoffs. At this point, I decided to try something different. I got a job with AT&T Mobile on their program management team. But the regression from using the Lean kanban approach I helped create at Corbis to using a Waterfall approach (a traditional software development method where work waits until all the parts of the previous stage are complete), with estimations based off of time reports, was too much of a throwback for me. In July 2010, I fired myself.
In January 2011, David Anderson offered me the opportunity to research, develop, and teach a new course for David J. Anderson & Associates called Kanban for IT Operations. At the time, Europe led the United States in kanban implementations, so my research in February began in England, Sweden, and Germany. In March, we ran the first beta workshop in Boston, where I attended and spoke at DevOpsDays Boston 2011 at the Microsoft New England Research and Development Center in Boston.
Originally, I set off to write a reference for students to use during workshops while designing their kanban boards. Later, this piece grew into a time-saving reference for me as well. It became a place to capture not only everything I learned about applying Lean, kanban, and flow practices to my own work, but also selected equations, theories, and stats from thought leaders. For example, how to define Lean? For that, I prefer Niklas Modig and Pär Åhlström’s definition. In their fantastic book This Is Lean: Resolving the Efficiency Paradox, they define Lean as, “a strategy of flow efficiency with key principles of just-in-time and visual management.”4
So, what do we know? We know the demand for delivering business value to production, so that we can be competitive, is high. We know that many organizations are running deployment strategies that are slow and cumbersome. We also know that we are wired to do our best when we can clearly see what we are doing right as well as what we are doing wrong. This might seem obvious, but it’s consistently ignored.
The technology world shows no signs of slowing down. The pace at which we need to deliver new capabilities to win new customers and prevent existing customers from walking away (churn) seems like warp speed. Many companies today are in survival mode, they just can’t see it. This means that there is no better time than right now to elevate how we work. So, how do we level up our game?
The answer is straightforward and accessible. It doesn’t cost you tons of money, and it doesn’t take geniuses or specialists. All it takes is a shift from haphazardly saying yes to everything to deliberately saying yes to only the most important thing at that time. And to do it visually.
The solution is to design and use a workflow system that does the following five things:
What we will cover in this book:
The examples described throughout the pages of this book are all based on my own real-life experiences and on those of others who have stood as witnesses to time-theft scenarios. Some prefer to avoid publicizing the crimes committed within their companies, so for them, the names have been changed to protect both the innocent and the guilty. We will also be looking at systemic organizational issues that must be addressed in order for you to be successful. As Edwards Deming said, “A bad system will beat a good person every time.”5
This book is simultaneously an explanation, a how-to guide, and a business justification for using Lean, kanban, and flow methods to increase the speed and effectiveness of work.
Everything in this book may not apply to your specific situation. It has an IT bent to it, with several non-IT examples thrown in for good measure. Take what does apply to you and use the rest to gain insight into what people in other parts of your organization, or your competitors, might be dealing with. Each section in Part 2 includes exercises from my workshops, where we step through a series of activities designed to make work visible, improve workflow efficiency, and surface problems. They build upon each other, so it is best to read the sections in sequential order.
Explaining the concepts in this book to others should be a straightforward process. Getting buy-in to implement the suggested approaches may not be. Change is hard for humans. So, before we dive into workflow design, let’s investigate exactly what prevents you from getting your work done quickly in the first place. Once we scrutinize the crimes committed against your existing workload, we can proceed with the insight and awareness necessary to do something about it. Let’s get started.