Blame. It’s the act of finding fault and it’s an ugly word that conjures up all kinds of feelings and reactions. As humans, we do it subconsciously. Research has shown that we’re hard-wired to place blame as a way of dealing with discomfort when faced with a difficult situation; it’s an emotional reaction.
For those of us in the industry in Site Reliability roles, it’s our job to ensure service is running 24/7 safely, securely, and reliably. So if there is an incident, we naturally want to know what happened and immediately start searching for a “root cause”. We quickly set up a “postmortem” to review the chain of events and discuss processes and technologies that may or may not be to blame for a disruption in service. At Salesforce, a potential service disruption means it would affect the business of a lot of customers… and it’s a big deal. Our customer’s trust is our #1 value, so there’s zero room for failure.
Now, let’s pause for a moment to think about your reaction as you read that last paragraph. You may have had an emotional reaction to “service disruption… It’s a big deal.” Maybe you nodded in agreement or let out a big breath, felt a tingling in the pit of your stomach. If you’re a product developer or an operations engineer, you’ve probably participated in a postmortem and right now you can feel that tension as you think about the last one you attended. You’re re-living the worry about being blamed and having to answer the big “WHY”.
It doesn’t have to be that way! We can change our behavior; we can change the way we work together; and, we can stop assigning blame. It’s not easy and it’s not quick, but we can create a blameless culture if we change the language we use, change the questions we ask and change the behaviors we exhibit.
Postmortem -> Retrospective
Let’s start with what we call the meeting. “Postmortem”. It’s another heavy word; by definition, it’s “an examination of a dead body”. Wait — who said anything about a “dead body”? No one died, so let’s stop using this word! Let’s call them “retrospectives”. In reality all we’re doing is reflecting back and having a discussion, so retrospective feels much less morbid (and threatening), don’t you agree? At Salesforce, we use the word retrospective and have seen the difference it makes.
Behavior & Structure Matter
Next, consider how the incident retrospective is structured and how you, your teams and your partner teams behave during incident discussions. We’re all in this together. It’s to everyone’s benefit to really understand how the incident happened, which parts of the system and/or which processes were disrupted and what needs to be done to prevent future incidents.
This is where it gets real; where we need to fight the urge to place blame on an individual or a team. Instead focus on looking back at what happened, and review what led up to the incident. Most importantly strive to learn instead of blame. An incident retrospective should facilitate both the service and development improvement processes.
Why -> How
Words matter — stop asking “WHY” and start asking “HOW”. If you’ve got children you’ve been asked “why” hundreds of times… and it drives you nuts, right? Being asked “why” over and over again sitting in a room with your peers makes you just as nuts, except let’s add the pressure of feeling like your job might be on the line and you’re worried about being blamed for an incident (sorry, I just made your heart race a little bit, didn’t I?).
Asking “why” puts people on the spot and makes them feel as if they have to deflect blame and justify an action or a decision. But, asking “how” provides the opportunity to discuss the situation objectively and describe rather than tell.
The Takeaway
As humans we are hard-wired to place blame (see Brené Brown’s TED talk for more on this), and reading this isn’t going to make millions of years of behavior and evolution just disappear. What I do hope is that reading this has given you an awareness you didn’t have before so that the next time you’re in an incident retrospective you can focus on problem solving rather than playing the blame-game.
If you’re interested in learning more, check out my talk at DevOps Enterprise Summit or view the slides here.