Wiring the Winning Organisation with Gene Kim

Join Murray Robinson and Shane Gibson as they chat with Gene Kim, leader of the Enterprise Technology Leadership Conference and author of the Phoenix and Unicorn Projects.

Gene describes how DevOps research shows that excellent socio-technical leadership is critical to team success and failure. He explains how technical leaders can dramatically increase team effectiveness by improving the organisational architecture and wiring to support independence of action, rapid feedback and simplification while maintaining stability and security.

 

Shanes Summary

One of the recurring things you’ve talked about all the way through is wiring. The structure of the organization. The socio technical architecture. System thinking. How do organizations go from worst to first? They don’t change their people. They don’t change the technology. They change the organizational wiring. 

So the more specialization you have, the more complex organization wiring you need. As soon as we have a shared resource, we introduce constraints. As [00:39:00] soon as we introduce constraints, we introduce wait times. They go through the roof and now we have a system failure.

And a lot of organizations don’t look at their flow of work because they don’t treat it as a system. They don’t look at the handoffs and they don’t look at the delays. They don’t bring any of that lean behavior into their teaming. 

And how do people become unstuck? How do we free up that architecture, that flow across the organization? Three things, slowification, simplification, and amplification.

So how do we slow things down. How do we remove the uncertainty and move it to the beginning and not test and production. What can we do? So, you know, a sprint planning exercise is about reducing uncertainty. It’s about communication. Research spikes are about take the heavy gnarly stuff that we don’t know how to solve and do a little bit of work up front to reduce the uncertainty.

This idea of simplification. Let’s map out the nodes and links. Let’s figure out where the constraints are and let’s unblock them. Moving the [00:40:00] data back into the product team. So how do you move the work to the person that actually starts it? 

Gene: And that is about software architecture. How we partition and group things to enable independence of action. 

Shane: So we want to decouple. We want to decouple the work as much as we decouple the systems. And we want to introduce independence of action. We want to be very clear about how we’re dividing that work and where those domain boundaries are.

We want to make sure that there is feedback. Up and down or middle out. And we want to have people focus on improving their work. How many leaders actually stop and observe the system. In a manufacturing plant, They would walk the floor. How many leaders walk the wiring of their organization rather than sit in their little offices and wait for the problems to bubble up. 

So for me, wiring systems, thinking nodes and links. It’s this recurring pattern and thinking of look at it holistically end to end. And yes, people and technology are part of it but actually the system is where we get the majority of the failures.

Podcast Transcript

Read along you will

Shane: Welcome to the No Nonsense Agile Podcast. I’m Shane Gibson. 

Murray: And I’m Murray Robinson. 

Gene: And I’m Gene Kim.

Murray: Hi Gene, thanks for coming on. 

Gene: I am delighted to be here. 

Murray: Why don’t we start off by getting you to introduce yourself to the audience?

Gene: Oh, for sure. 

So I’ve been studying high performing technology organizations for 25 years. That was a journey I started back when I was the CTO and technical founder of a company called Tripwire in the information security and compliance space. Our goal was to study these high performing organizations that simultaneously had the best project due date performance, the best operational reliability and stability, as well as the best security and compliance. 

By far the biggest surprise was how it took me into the middle of the DevOps movement. I got introduced to this community in 2010 and my area of passion has been studying how large complex organizations that have been around for decades or centuries are using those principles and practices [00:02:00] to win in the marketplace. So I got to work with Dr. Nicole Forsgren and Jez Humble on the state of DevOps research. And I spent the last four years working with Dr. Steven Spear trying to understand what is common between agile, and DevOps and lean and safety culture. And that all went into a book called wiring the winning organization that came out last November.

Murray: So did you start your career as an engineer? 

Gene: I got my graduate degree in computer science, focusing on networking and compiler design and spent the better part of the two thousands up until 2016 thinking of myself as an infrastructure and operations person. So the Phoenix project was written from the perspective of an ops leader. But around 2016, I’ve really changed my mind. I consider myself a developer. And the unicorn project that came out in 2019 was essentially rewriting the Phoenix project through the eyes of a developer. What does it look like when you’re stuck in an organization where you can’t get anything done.

You can take your 10 X developer, put them into the horrible Phoenix [00:03:00] project and suddenly the best developer can’t build, can’t test, can’t deploy, can’t get logs, can’t get license keys, can’t do anything without approvals, architecture review boards, information security, saying no. And just showing, how do we transform that organization where small teams can work independently and get amazing things done. And it has nothing to do with the skill of the developer. It’s really about the wiring that they exist within and that’s really the job of leaders. Create the wiring that liberates everyone’s full problem solving capabilities. That enables independence of action so that small teams can do what needs to be done. And spend their best efforts on the technical problem versus the communication, coordination, prioritization, deconfliction, cajoling, politicking, that’s often required to get things done. 

So much of what I discovered, through the state of DevOps research show how this is universal, independent of industry, vertical, independent of the phase of value creation.

I think there’s really two constructs that really were this big aha moment for me. One [00:04:00] was the observation that there’s really three layers of which we do work. Layer one is the work in front of us. So it could be the patient, the metal that needs to be transformed or the code or the binary that’s going to run in production. Layer two is about the technology that’s needed. So it could be the press. It could be the MRI machine. It could be the code editor, IDE. It could be the Kubernetes platform. 

Gene: But layer three is the organizational wiring. It’s the social circuitry that dictates who gets to talk to who, when, about what, in what format. 

When you take a look at any case study of worst to first in general, the only thing that changes is layer three, the Mandarin system. Whether it’s Etsy, the transformation of Facebook or the NUMMI joint venture in Fremont, California. So that was the worst performing General Motors automotive plant around the globe. And then became the site of the NUMMI joint venture with Toyota. And so within one year they went from the worst performing plant in North America to one of the best performing [00:05:00] plants on par as any in Japan. And so what’s remarkable is that it’s the same people, same floor space, same capital equipment, same technology, same people. The only thing that changed was layer three. Layer three dictates whether you are high performing or low performing. 

Murray: What does the research say about the success factors for more effective organizations? 

Gene: So one of the key finding in the state of DevOps research was simplification, partitioning work to enable independence of action, was one of the top predictors of performance. To what extent can teams do what they need to do without a lot of coordination with people outside of their team. To what extent can they make large scale changes to their parts of the system without permission from anyone else. To what extent can they use their own testing on demand without the use of a scarce integration test environment which couples you to everyone else in the enterprise. To what extent can they do a deployment [00:06:00] without needing to coordinate with platforms they depend upon. That’s all about independence of action and contained blast radiuses. And that everyone has what they need, when they need it, from where they expect it as opposed to everyone being stuck. So that’s a function of architecture and how teams are organized. 

The second key finding is amplification. And In the ideal, we have really energetic feedback loops where even weak signals of failure are amplified. They can be acted upon quickly to ideally detect, correct and prevent. It’s actually feedback that allows people to learn. One of my favorite examples is when developers develop things, put into production and all the feedback goes just to the infrastructure and operations people it’s like, Whoa, feedback is being funneled to the wrong people . 

And then the third one is really about slowification. There has to be enough time to allow for planning and practice and experimentation and improvement. And there has to be a place where you can do your most consequential dangerous work not in the [00:07:00] production environment. Can you do it in planning or practice? Can you do tabletop exercises? Can you do simulations, rehearsals. And if you can’t do that then that means all the your most dangerous work is being done in production, when you can’t undo and redo. Which means that you can’t iterate and learn.

It’s really about slowifying, moving work in time, so it’s not done in performance, but done in planning practice. There’s simplification about partitioning work to enable independence of action. And, it’s amplification. It’s all about signals. How do we get signals stronger and going to the place where it needs to go.

Murray: Let’s go back to the basics. What is DevOps? 

Gene: Yeah. What is DevOps? It’s a set of architectural practices, technical practice and cultural norms that allow organizations to rapidly experiment and iterate. It allows organization to deliver value quickly to the end user customers. And it’s a way to do that while preserving reliability, security, stability, and so forth. And so why do we do that? It’s [00:08:00] so that we can win in the marketplace. It’s so we can achieve organizational and mission goals.

Murray: Does DevOps mean that we’re going to integrate development and ops into one team. 

Gene: Sometimes. 

I love the example of Facebook where they said in 2009, they’re having all these incidents and outages. It was sort of like a downward spiral where code quality, service quality kept on getting worse. Number of sev one incidents kept on going up.

They had this rule that in infrastructure and operations, you can’t have your laptop open unless you have a live site incident going on in the meeting. And there was one meeting where all 15 Infrastructure engineers had their laptops open working on separate incidents. 

And that was a signal that whatever they were doing was not working. And the decision they had to make was that they were going to put dev managers and architects on pager rotation. And it was the first time that many of these people saw the downstream effects of what they were doing in the daily work. And suddenly within six, nine months the number [00:09:00] of sev one incidents went way down because feedback was now getting to the developers. 

Shane: So I’ve seen people treat DevOps as the combination of Dev and Ops. So you build it, you release it, you maintain it, it breaks, you fix it. if it doesn’t work then that feedback loop comes to you much quicker than going somewhere else. I’ve seen the whole SRE movement being attributed to DevOps. That idea of thinking about things that break early and then fixing the root cause. 

I’ve seen it treated as agility, in terms of team design. This idea of how quickly can we move safely. How do we get feedback loops to the team. How do we allow them to experiment early? How do we bring the risk and the uncertainty forward? 

And, so, it seems to be a bunch of patterns that are all valuable that could be pushed into the DevOps bucket, but also could be product patterns or agile team design patterns. Is that how you see it? 

Gene: Yeah absolutely. I would say it’s all of it and a bunch of [00:10:00] principles. 

So let me give you an example about the data ops angle. One of my heroes is Mike Nygaard. He wrote the book, Release it. That’s a phenomenal book about design more resilient services through things like the bulkhead pattern and the things that Netflix made famous. So he’s now the VP of data engineering at NewBank. And, he was talking about data ops being the land that DevOps left behind. 

New Bank, one of the largest banks in Latin America is actually one of the largest payment platforms in Asia. And the problem that he brought in to solve was our batch jobs are taking longer and longer to run to the point where they’re taking more than a day. The storage and compute costs that execute the batch jobs are spiraling out of control. And one of his principles that he’s using to take the organization forward is, who should be responsible for it. So the whole notion of you build it, you run it. They’re moving data ownership back to the product teams. So they have to be responsible for the storage costs and the compute costs, and the SLAs in terms of how [00:11:00] quickly data has to get to the people who depend upon them. So the signals being sent to where they need to go. It makes no sense to have a data team that’s the dumping ground of everyone’s terrible data that takes too long to run. That’s not fit for purpose. That’s not what people are looking for. Instead, you have to bring that responsibility to the service team, ideally a combination of dev and ops. And if you have a good platform team you don’t have to have a lot of infrastructure specialists there. They’re providing shared services that are exposed so that other people can take advantage of great infrastructure that’s tailor made for that organization.

I think that’s another marvelous example of how you wire an organization one way and you end up with these runaway problems of batch jobs taking too long. But if you wire the organization a different way and now you end up with a self balancing pattern where the service teams know exactly what they need to do. They’re being held accountable and they get the feedback they need to achieve the goal ?

Shane: Yeah, so , DataOps is really [00:12:00] interesting. We had the whole data mesh thing come out, which was a great set of principles, but no execution plan so now it’s being taken over by the vendors. But if I look at the idea of pushing the data work back to the product teams where it should be. So we shouldn’t treat data as exhaust where we take their crap and then we have to clean it up, and we don’t control it. The analogy I use is you’re a chef in a restaurant and somebody keeps delivering rotten tomatoes. You ain’t going to put up with that shit for long, but as data people, we do.

But one of the pushbacks is that to get the data skills into a product team is hard. But we did that with DevOps. We took the operation skills that were hard and we actually pushed those skills back into the dev team. And so it is possible. It was just that we haven’t focused on doing it yet.

 

Gene: Let’s zoom out just a little bit and talk about why these patterns emerge. And what I learned working with Dr. Steven Speer is that the more functional specialties you have, the more sophisticated your layer three wiring must [00:13:00] be. One of the best examples of organizational systems that make sure that no one has what they need when they need it are emergency departments in hospitals. Why is that? It’s because the number of functional specialties have grown. You have so many different functional specialties which need to be integrated towards a common purpose. And so, what was really surprising to me was that in the 1950s it wasn’t so difficult. Hospital systems were safer. It was easier to get what you needed. And the reason is that in typical hospitals in the 1950s, at layer one, you had basically doctors and nurses. At layer two, you had very little technology. You had basically x rays, which weren’t typically complex or fast moving. So you could get away with a pretty simple layer three wiring to run the hospital system. Fast forward 70 years to the current day, you have 20 different functional specialties, just among the clinicians. You have nurses, supply chain, pharmacy. 

That’s at layer one at layer two look at how much technology we have. It’s not just x ray machines. It’s MRIs, CAT scanners, blood tracers computer systems. [00:14:00] The electronic patient record systems. So imagine just how much more sophisticated the layer three organizational wiring must be to get people what they need when they need it.

And so the same thing has happened in our space. It used to be pretty simple. In the mainframe days, you had your single cross functional team that was simultaneously responsible for mostly building and running, you had the operators of course, but now you have platform teams, you have, container teams, you have security teams, you have data ops teams just the number of miracles that technology affords continues to grow. But that’s why we have to have more sophisticated organizational systems to enable developer productivity. 

All I want to do as a developer is work on my feature. I don’t care about logging or the environment or Kubernetes deployment files or authentication or data masking or where my data goes who I need to connect to.

And what’s marvelous is that it’s the emergence of these development platforms that allow other teams to do that for you. And you don’t have to open up a ticket. You don’t have to chase them for six weeks, you can get them on demand. [00:15:00] And I think that’s what creates these miracles that make us so productive.

Murray: We’ve just interviewed Arnold Straubach from Burt’s Org, and they have these cross functional clinical teams. And team of 10 manages themselves almost completely. Their management overheads are one half of 1%. . So there is a industrialization scaling argument that comes out of Taylorism, which is causing the specialization and the silos and then the super complex systems, I reckon. 

Gene: Yeah, well, what causes a specialization of skills? In the 1950s you didn’t have a lot of functional specialties. You didn’t have neurologists and endocrinologists and so forth. But on the other hand, by the time you learned that you had cancer, it was basically a death sentence. You had mortality rates of like up about 95%. And so what science has done is enable this ever deepening functional [00:16:00] knowledge of increasingly narrow domains. But the reward for that is that we went from 95 percent mortality rates for cancer down to, if you detect it early enough, you have 90 percent chance plus of surviving. And, so too, in the worlds we live in. 

A couple of holidays ago I spent an entire week learning about Java logins. I do all my programming in Clojure that runs in the JVM, but I was never a Java programmer. So I never actually understood how it all worked. And one time it just became intolerable. I just couldn’t get logs to show up where I wanted them. And so I spent a week learning about Java logging. It wasn’t terribly rewarding because I didn’t actually care that much. I got it working. That was important to me, but did I really care about the history of Java logging? No. 

If I could wave magic wand, what would have happened? Someone on a different team just specialize in Java logging and I could just copy configuration file and I would just put it into my code and they would tell me where I could find my logs. And so this group has liberated me [00:17:00] from worrying about things that I actually don’t care about. Same for CICD pipeline. Same for how to get things running in production. And so I love specialization of labor, so that they can do it so I don’t need to worry about it. 

Murray: It’s great if it works well to support you, but what if there is specialization of labor and you have to sit there for the next two weeks waiting for them. 

Gene: Absolutely. And so I’m being held responsible for the quality of my data and making sure that the batch jobs run. But I don’t know anything about it. And I have to open up a ticket with the data specialist and I have to wait 6 weeks, 8 weeks, it’s terrible. In fact, let me give you an even worse example. A friend of mine was telling me that, they’re a mobile telco provider, 20 million customers. And the number one initiative this year, one of the top five certainly, is to present a checkbox to all of our customers, so that they can opt into a five dollar a month service to get email or watch movies. The problem is that it requires work from 40 different teams across four [00:18:00] different customer channels. This requires CEO minus one level support, daily war room meetings, 28 million dollars, 12 months. And the bad part is that most people, when asked, give it a 20 percent chance of success. Why? Because it didn’t work the first two times they tried it. 

And so this is not a complex problem presenting a checkbox to those customers. This is purely a problem of layer three communication, coordination, and most importantly, architecture. Somehow we create an architecture where it is impossible for small teams to get anything done.

We want systems that allow people to work independently of each other. And sometimes when you have these complex dependencies, the best thing we can do is partition them so that teams regain independence of action and functional specialties are given in a way that can be self service and on demand without creating the queuing problems. Because queuing and prioritization and scheduling is so absolutely dangerous.

Shane: So you close that out with what I think is the key point and it’s around self service. So the [00:19:00] examples you used are where we’ve had specialization and then they’ve been codified as a machine that everybody else can use. And so the problem we know is humans take longer than a machine. Humans do it with variability. So we start creating these human bottlenecks because we’re trying to make the wiring of the organization act like a machine, but the humans are the cogs. Whereas if it’s a piece of technology, if your Java logging framework was set up and you knew that you pushed the config file to the machine and the machine will give you a standardized response and it can scale to one response or a million, you don’t need to care, then that’s a great specialization, but a specialization of the machine not a human in the loop. . And I think that’s where we’ve got it wrong is that we’re now going to micro specialization of humans, but they can’t scale. We treat them like machines and they’re not.

Gene: What are the patterns and techniques to create wiring that works? One is modulization and platforms. Those are very similar, right, cause were enabling independence of action [00:20:00] by making sure that work can be done in parallel. So what do you do when you have interdependent sequential steps like an assembly line in the Toyota production system, where you deliberately sequentialized work so that they’re tightly coupled, and you want to make sure that they’re not shared. Once you start sharing a critical resource whether it’s a machine a person, this is where you get into contention. And this is where wait time goes up exponentially. 

Team of teams is a great one where when they were trying to dismantle enemy terrorist networks, the value stream looked like, U. S. Army Rangers, intelligence agencies, and then U. S. Navy SEALs . Those are all people, analyzing things, trying to get data to where they need to go to synchronize a bunch of efforts. And you do that by not reorganizing the entire U. S. Department of Defense, but making sure that, the interfaces between U. S. Army Rangers, Intelligence Agents, and Navy SEALs are well known so that you can, go through the OODA loop quickly, seamlessly, predictably. When sighting to [00:21:00] terrorist capture takes months or quarters, that will never result in a capture, because people are trying to evade capture. My claim is you can do this both for mechanical systems and electronic systems as you can for human systems. 

So how do you achieve that? It’s really through partitioning. I’m just rereading Eric Evans book Domain Driven Design. This is all about creating hard modular boundaries in the design process as well as the operations process so that when bad things happen, they’re contained to that domain.

Shane: Whenever you talk about domain boundaries, you find a bunch of people arguing where the boundary is right? 

Gene: Yeah, so much. One of the things I’m proudest of in the book is understand the notion of coupling and what it means to be decoupled. 

Imagine two guys moving a couch. Let’s call them Steve and Gene. Steve and Gene are coupled through the couch. What affects one affects the other and vice versa. You might think two guys moving a couch is all brawn work. There’s no brain work needed. And yet, Steve and Gene have some problems they need to solve [00:22:00] like where exactly is the center of gravity? To get through a narrow doorway, around which axis should they rotate the couch? To get through a narrow winding set of stairs, who should go first and should they face forwards or backwards. They don’t need consultants. They don’t need focus groups. Just by picking up the couch, trial and error, experimentation, communication, coordination, we can have some confidence that Steve and Gene will figure out how to get the couch to where it needs to go. 

But there are all these things that leaders can do to make their job more difficult. So one way is you turn off all the lights. And so suddenly it will take longer. They may damage the couch, the room themselves. That’s not so good. 

But there’s another way that leaders can make their job more difficult, which is by introducing a lot of background noise or putting an intermediary between Steve and Gene so they can’t talk to directly with each other. Maybe they have to go through JIRA work tickets and that’s their only mode of communication. Suddenly it’s not so absurd to think that Steve and Gene will not successfully move the couch. And I think that’s what DevOps is. They got to a point around [00:23:00] late two thousands where the only way to do a deployment was through tickets and no matter how many fields you added, there was just not enough bandwidth in that communication channel to enable them to safely, quickly deploy code. And the answer is, you let Steve and Gene talk directly with each other and co create the solution. So really, it’s a metaphor for joint cognition, joint co creation. To what degree are Steve and Gene able to act as a coherent whole? And so in data ops, if a service team is acting as their data is just exhaust you’re not going to get the right integrity of data and downstream batch jobs and so forth.

 

Murray: A lot of things you’re talking about are the same things we talk about in the Agile community. I think DevOps is much more technically focused. I saw DevOps emerge out of the Agile community in 2010. I was an Agile conference in 2010 when Jez Humble spoke about it for the first time. So what I want to know [00:24:00] is, what do you think is the relationship between the DevOps community and the Agile community?

Gene: I’ve heard people argue that DevOps is a subset of Agile. Some would say no Agile is a subset of DevOps. But I’m not sure if I care. 

There was an old bad way to do things and what I’ve learned is that there’s really three orthogonal axis of performance which completely describe all the ways that we can do things poorly or greatly. 

One was about independence of action. Are people configured and divided in a way where they can get something done? There’s another axis, which is that no one gets the feedback they need. Or the feedback being generated, but it’s going to the wrong people. And the third dimension is around slowification. Are you allowing enough time for improvement of daily work? 

So those three things, I think, completely explain whether a system will be well performing or poorly performing. And I don’t care what you call it, Agile, DevOps, Lean, Toyota production systems, safety culture, [00:25:00] resilience engineering, whether it’s a Westrum organization models or a Conway’s law. What Deming taught us and technical debt. All of these things are trying to describe these concepts.

Murray: I think continuous improvement and humble leadership is absolutely core to all these movements. 

Gene: Yeah for sure. 

The larger the organization, the more the leader has to be the socio technical maestro. Dr. Westrum said, there’s five characteristics- high energy, high standards, great in the large system thinking, great in the small. Because they have to know when things are being misrepresented. But most importantly, they love walking the floor. Hanging out with developers, seeing what daily work looks like. That’s the highest fidelity signal a leader can have about to what extent are they enabling people to do their work easily. And I don’t care where you draw the boundary. Is it dev? Is it ops? Is it DevOps? You need all of those five characteristics.

Murray: I see people complaining [00:26:00] about Zombie DevOps. Six months ago the DevOps team were the infrastructure team. They’re working in a highly hierarchical organization with very siloed communication and very slow feedback loops and very long delays. And if you want the DevOps team to do a firewall rule change for you or get you a virtual environment, it’s going to take three months. . 

We’ve been talking about the destruction of the agile brand name caused by corporations slapping the Agile word onto things. Is the same thing happening in DevOps? And if so, what can we do about it? 

Gene: Yeah. I think you could say the same thing is happening in DevOps. And the same thing happened in Lean. Dr. Steven Spear was talking about how the lean community got overly focused on the tools. It was not so much about a community of scientists learning together, but it became about the Andon chord and the six S’s and the standard work, [00:27:00] really missing the bigger picture. Without a lot of energetic leadership that’s just the way it goes. Somehow leaders think you can buy it in a box and that abdicates their responsibility for the overall performance of the system. 

I would say Leaders are responsible for the system level goals, and they should never be a sucker to someone selling you DevOps in a box, Agile in a box, Spotify model in a box. 

You just hear all these heartbreaking stories of consultants coming in and saying we’re going to Spotify you and they leave the system worse off than when they came in. You can say it’s their fault, but no, it’s your fault. You are part of it. You are complicit as leader. You are responsible for being the technical maestro. 

The opposite of leadership enabling people to do their work easy and well, is that they make everyone’s work harder, terrible, miserable. That actually shows up in the state of DevOps research. To what extent do people feel satisfaction in their work, feel connection to the mission? To what extent can [00:28:00] they recommend their organization as a great place to work to their colleagues and friends, that’s all in there.

Murray: Gallup polls have been doing this research on employee engagement for many decades. And they’ve said that only about 25 percent of people truly feel engaged, supported, and able to do their best in an organization because of the system, the leadership .

Gene: And yet we know that in these great organizations people have a deep connection to the work. They have satisfaction in the capabilities they give to the customers. Someone’s complaining about a button that’s in the wrong place and Hey, they deliver to their customer the next day, and they get the, deep gratitude and excitement from the customer because they’re co creating solutions. How great is that?

Murray: Yeah, it is great. 

There’s been quite a lot of complaints about SAFE in the agile community. And I know for example, the team topologies, authors said that the SAFE people took their stuff and changed it all around. So it wasn’t what they said it was. And the same [00:29:00] with the scrum people complaining a lot about, SAFE institutionalizing Zombie Scrum. What do you think about the SAFE community. And what’s the relationship with DevOps? 

Gene: Yeah, I’m a big admirer of Dean Leffingwell, the original author of the SAFE framework. I took his training. And , you can’t sit through a week of listening to Dean talk about problems that he solved and not have a, tremendous amount of appreciation for his experience. 

One of the things that really surprised me was how many high performing technology leaders had chief architect in their title. They were head of product engineering and chief architect. Which makes a lot of sense because, that’s all about organizational wiring and the socio technical system.

But I’ve also seen organizations where it takes a year to ship a checkbox to the customer. And I find that very frustrating because that’s an architectural problem. 

Another behavior that I’ve seen is that people treat the PI [00:30:00] planning process as the only mechanism to do synchronization and coordination, which is absurd. 

And I’ve also seen PI planning sessions where it’s too big. People sometimes describe these meetings as a total waste of time because it’s about things they don’t care about. They need to shrink it. 

I attended one at Rally software which is where many of these SAFE practices were born and the big room planning session is incredibly energizing. It’s about dependencies that we want to detect now rather than later to enable better execution. When it’s done wrong it’s about planning stuff that no one cares about. Or they view it as the only time you can talk to someone else, which is terrible. What you want is for teams to be able to identify who your dependencies are, maybe even set up ad hoc cross functional teams so they can work together on a feature together independent of the rest of the release train. and That’s invigorating and it guarantees a way to communicate and coordinate with each other. So I’ve seen safe done well, I’ve seen it done [00:31:00] poorly, and I put the blame again on leadership and to the people who have the certifications who are doing it wrong. 

Murray: I think the big consulting companies are just giving leaders, what they want. If leaders want some new words and some new processes, then that’s what they will give them if it’s the easiest way to make some money.

Gene: I was a card carrying member of the Institute of internal auditors for six years and that was amazing because they define the International Professional Practices Framework. The book on how internal audits should be performed. It defined what is an audit and what are the phases and how do you perform them and what are the standards. And most importantly, they had what they call the Quality Assurance Review process. Auditors for the auditors, to make sure that what you do is in accordance with what’s defined in the IPPF. And I think whether it’s Scaled Agile or a McKinsey there should be some QA framework where people can objectively judge are they [00:32:00] doing things in a way that is in accordance with what is prescribed.

 

Murray: What can we learn from military leadership? 

Gene: Oh my goodness so much. General Stanley McChrystal and Lieutenant Commander Dave Silverman were coauthors of the book Team of Teams. And Dave told me that they were shocked by the lack of formal training that leaders have in the commercial domain. And it was really was dazzling just how much formal leadership training people in the military have. Both non commissioned officers, enlisted folks, as well as officers.

As an example, a friend of mine, he’s a lieutenant colonel in the U S Marines, and he’s going through graduate school to be an effective Colonel. And you have to go through another big jump in training to become a flag officer, a general and Admiral.

Some of it is technology specific or like about the domain, but so much of it is about how systems work, and what is it like to [00:33:00] be a leader of leaders. And how systems are wired when you are having to achieve a mission where you can expect there will be no communication. 

Team of teams is so interesting because that was all about the special operations command about how the different functional groups Army, Navy, intelligence agencies need to modify the wiring to make sure that sighting could lead to capture. I loved how they described the notion of liaison officers. And you wanted to have people within the special operations community embedded in embassies because, their the closest to where the intelligence is.

And so what does that have to do with our work? I think if we look at the DevOps movement, it really came from having these people who could span boundaries. You had the best infrastructure and operations people hanging out with the best developers, hanging out with the best architects, hanging out with the best information security people. They created a team of [00:34:00] teams that achieved a larger mission more than just InfoSec, more than just Dev and Ops and so forth. And they described so well the characteristics of that person that you choose to work outside of your functional specialty. They have high levels of empathy, high levels of ability to walk in other people’s shoes. They’re going to be very good at communicating. They know what the larger mission is. They’re very good at making friends.

 

Shane: So for me, Phoenix Project was a seminal book because you told a story. And in that story were so many organizational patterns, so many technology patterns, so many personal behavioral patterns. What made you write that way?

Gene: So some of my favorite books are of that style. It’s called the business fable. Five dysfunctions of team by Patrick Lencioni. One of my favorites. I remember reading that on an airplane and I was just so stressed out. It was about a startup executive team that didn’t trust each other and got nothing done. And I was like, Oh my gosh, there’s a book about me. [00:35:00] But I think the most famous of that genre is a book called the goal by Dr. Eliyahu Goldratt, which is referenced in the Phoenix project.

For those of you who don’t know, it’s a novel about a manufacturing plant manager who has to fix his cost and due date issues in 90 days. Otherwise they shut the plant down. And I think that book sold five, 6 million copies. It’s integrated into almost every major operations research, MBA curriculum and so forth. My fellow coauthors and I, wanted to write the goal, but for the IT context. Same number of pages, similar cast of characters, similar challenges that they had to face.

It’s been very pleasantly surprising just how relevant it is now as it was 10 plus years ago. It’s just a fantastic way to communicate problems, where other people say, holy cow, this could be about me. 

 

Murray: What interesting things are emerging in the DevOps community? 

Gene: I’ve never had more fun than right now just experimenting on how can I be a better developer aided [00:36:00] by AI, like GitHub copilot ChatGPT, Claude.

There are all these things that I dreamed about doing for a decade that I have suddenly started working on because now it’s within reach. I’ll give you an example. For almost 15 years whenever I’ve been watching YouTube videos listening to podcasts including yours, I take a screenshot whenever I hear something interesting because it’s my hope that someday I’ll go back and look at the timestamp, go to the transcript and figure out what was so interesting and write something about it.

I have over 1500 screenshots over the last decade of which maybe a handful I’ve gone back to do something about. And yet I did it for a reason. So since January, I’ve been playing with using Chat GPT API and more recently Claude and Google Gemini to give it screenshots. And I say, extract out what podcast was it? What episode, what was the timestamp? Pull the transcript and find out what was being talked about. It’s so [00:37:00] cool. 

So in Clojure write a function that takes a bitmap and march down the left hand side of the screen until you find the specific color red and then march right, and then calculate what percentage complete the video is. I could have done that, but that would have taken me two days, at least. I haven’t written any code like that in 20 years but , I could do that in 30 minutes. So it’s just amazing how much we can do. There are projects that take days, weeks, that you can now do in hours.

A friend of mine, Steve Yagi, told me his company Sourcegraph, their CTO said we want to do an integration with Emacs. That would have been a summer intern project that would have taken eight weeks. He did it in two days solo. 

This is showing that the job of the technology leader is not getting any easier, because you have these new problems, like a friend of mine, he did a pilot within his bank and they ultimately called the coding co pilot a failure because junior developers were able to commit more code, but the [00:38:00] problem is that overwhelmed the senior developers because they had to review all of it, of which most of it was not actually deemed fit for production.

And so that’s a wiring that just doesn’t work. Somehow the pilot as designed did not achieve it’s goals. And so they’re trying to rethink, all right, given the mix of junior and senior developers, who should be using it when, and how do we still ensure that they’re writing great code that can be reliably put into production. 

Murray: All right. 

Shane, I think we better go to summaries. What do you got? 

Shane: Excellent. One of the recurring things you’ve talked about all the way through is wiring. The structure of the organization. The socio technical architecture. System thinking. How do organizations go from worst to first? They don’t change their people. They don’t change the technology. They change the organizational wiring. 

So the more specialization you have, the more complex organization wiring you need. As soon as we have a shared resource, we introduce constraints. As [00:39:00] soon as we introduce constraints, we introduce wait times. They go through the roof and now we have a system failure.

And a lot of organizations don’t look at their flow of work because they don’t treat it as a system. They don’t look at the handoffs and they don’t look at the delays. They don’t bring any of that lean behavior into their teaming. 

And how do people become unstuck? How do we free up that architecture, that flow across the organization? Three things, slowification, simplification, and amplification.

So how do we slow things down. How do we remove the uncertainty and move it to the beginning and not test and production. What can we do? So, you know, a sprint planning exercise is about reducing uncertainty. It’s about communication. Research spikes are about take the heavy gnarly stuff that we don’t know how to solve and do a little bit of work up front to reduce the uncertainty.

This idea of simplification. Let’s map out the nodes and links. Let’s figure out where the constraints are and let’s unblock them. Moving the [00:40:00] data back into the product team. So how do you move the work to the person that actually starts it? 

Gene: And that is about software architecture. How we partition and group things to enable independence of action. 

Shane: So we want to decouple. We want to decouple the work as much as we decouple the systems. And we want to introduce independence of action. We want to be very clear about how we’re dividing that work and where those domain boundaries are.

We want to make sure that there is feedback. Up and down or middle out. And we want to have people focus on improving their work. How many leaders actually stop and observe the system. In a manufacturing plant, They would walk the floor. How many leaders walk the wiring of their organization rather than sit in their little offices and wait for the problems to bubble up. 

So for me, wiring systems, thinking nodes and links. It’s this recurring pattern and thinking of look at it holistically [00:41:00] end to end. And yes, people and technology are part of it but actually the system is where we get the majority of the failures.

Murray, what do you got?

Murray: Yeah. The key thing for me is leadership. So leaders should be working on the organization wiring, which is to me, the system. It’s system thinking. In addition to that the leaders determine the culture. All the problems we’ve seen with agile being turned into the opposite of itself all comes back to the leaders. And good leaders can make an amazing difference to an organization. And we’ve got some really good examples of good leadership from the military. In Team of Teams and Turn the Ship Around and mission Command and all that stuff. There’s a lot of really good ideas about empowerment, decentralization, rapid feedback, which are very helpful. So maybe organizations get what they deserve because that’s what the leaders set them up [00:42:00] for. 

Gene: So whenever you have an interface, you want to have the clearest signaling so that the person on the other side of the interface will react in a predictable way. And so when you have no control over the other side of the interface, or when that other side of the interface is a very long distance away on the org chart, what’s the best thing that you can do? You can actually put a trusted person on the other side who can help translate, who can help communicate more clearly.

And it reminds me of how the Apollo astronauts did it. So during an Apollo mission the only people who were allowed to talk to the astronauts in space were other astronauts. And in general, it was the people who trained them. It was a backup crew. And one other person. So in other words, these were not just any astronauts. These were really good astronauts. And the reason for that is that when the bandwidth between the crew in space and mission control in Houston is that tenuous and potentially absent, the best thing you can have is an avatar of the [00:43:00] astronaut on the other side of the radio because that will enable better communication than any non astronaut could. And so it’s just another example of what great socio technical maestros do, to ensure that the system behaves well as a whole despite very adverse conditions between two nodes.

 

Murray: If people want to learn more about what you’re talking about, what are the three books that they could read?

Gene: Yeah if you like the Phoenix project and you want to get more of a dev centric view of that I would read the Unicorn Project. The Accelerate book documents the work that we did, the science behind DevOps and why it works. And if you want to open up the aperture to the real mechanisms at work it’s Wiring the Winning Organization.

Anyone interested in AI, I would recommend Co Intelligence by Dr. Ethan Mollick. He’s an economist. He’s not a computer science person. And it’s just this marvelous practical way of what is AI? Why should you care? What could it mean to, society and workforces. And fun book I’ve read lately is Life in First [00:44:00] Person. It was about the making of Doom and Quake and all the folks around id Software. It was fabulous.

Murray: So how can people engage with you? I know you’ve run the DevOps conferences. Do you provide training, consulting services? 

Gene: Yeah, I run a conference called the Enterprise Technology Leadership Summit. We’re going into conference number three. We’re going into year 10. The goal is to highlight the most heroic technical leadership stories ever. And it’s just been so fun and rewarding.

You can reach me on Twitter. I’m real Gene Kim and on, LinkedIn. So that’s probably the best way to reach me.

Murray: All right, that’s great. Hey, we really appreciate you coming on Gene.

Gene: Oh my gosh, Shane Murray, congratulations on the podcast and the guests that you’ve had. This is such a treat. And thank you for all your thoughtful questions.

Murray: That was the No Nonsense Agile Podcast from Murray Robinson and Shane Gibson. If you’d like help to create high value digital [00:45:00] products and services, contact murray at evolve. co. That’s evolve with a zero. Thanks for listening.

Subscribe to the Non Nonsense Agile Podcast

We will email when we publish a new episode, no spam, pinky promise