Prevent, Detect, Recover
Prevent, Detect, Recover (PDR) is a thinking tool I use to help teams consider quality beyond testing. PDR stands for:
- Prevent: activities that can be performed to prevent incidents happening in the first place.
- Detect: activities or approaches that allow the discovery of incidents before deployment and release. Testing is one such activity.
- Recover: activities or approaches that enable faster recovery.
PDR allows teams to consider quality across ALL delivery, not just in the testing phase. Let me explain more.
Quality != Testing
Most teams underestimate how much time, expertise, and complexity goes into quality-related activities. When it comes to quality, common mistakes I see are:
- Teams only focus on testing as the quality activity.
- Teams rarely think about testing until after code is written.
- Teams believe testing is quick and easy.
As a result, teams often underestimate the cost of quality. They fail to consider how to maintain and recover systems post-deployment. Instead, they typically focus on testing at the expense of other approaches.
For example:
- Teams don't think about other aspects of quality such as performance, security, and scalability until there are related issues, and the resulting cost to fix this is higher post-production.
- Teams overlook a poor CI/CD process's impact on product quality.
- Teams only think about automated alerts after deployment, where pressure is often on to move on to the next feature.
- Teams either over-index on critical user journeys (CUJ) and create too many Service Level Objectives (SLO) or they don't write any SLOs mainly because their benefits are not fully understood.
One of the biggest threats to consistency and predictability of releases is the lack of understanding of the scope of work due to complexity and 'unknown unknowns'. Conversations with a quality coach help uncover this work, allowing teams to make informed decisions on scope.
Discussing quality before any code is written gives teams an understanding of what is required to build a quality product. This benefits the team as they clearly understand 'what good looks like'.
Of course, knowing and doing are two different things.
In engineering, we constantly discuss tradeoffs. Quality, speed and cost are typical tradeoffs to be made. A discussion on PDR gives the information required to make informed decisions on these tradeoffs. Teams are in a better position to estimate their work accurately. In doing so, they can deliver more consistent and predictable releases.
Gilding the Lily
A senior manager or engineer will inevitably caution against 'gilding the lily' by over-engineering the solution. This is because modern engineering practices favour minimal viable product (MVP). It's wise to do so. The benefit of agile is in delivering only the business value required.
But business value must be appropriately supported. After all, it's MVP not MVC (minimal viable code). Let's not under engineering systems that require intensive post-production support and can't be adequately supported due to lack of Recoverability.
The cost of Recoverability
Relying on Recoverability as the primary source of quality is an attractive proposition as it allows for increased speed of delivery. The downside is an increase in unplanned work as incidents increase. Unplanned work increases flow interruptions and reduces productivity and overall team happiness. Consistent and predictable releases can only come from understanding the full scope of work that includes quality tasks.
When to discuss PDR
PDR can be discussed at any time, but there are some occasions when discussing PDR is optimal.
PIR's & Retros
One of the most effective ways to get teams to think about PDR is in a team retrospective or during a PIR(Post Incident Review).
Typically, the motivation to solve problems is higher as the experience is still fresh in their minds. Having real problems to solve often gives teams the reason and the permission to try to experiment with new approaches.
PIRs and retros are the time to reflect on success and failure. PDR helps teams to systematically think through how to prevent, detect and recover from such an incident in the future.
Planning
Another valuable time to bring up PDR is in delivery planning. The impact of injecting questions and discussions on PDR at this time has a multiplier effect on quality. That's because people have time to incorporate these ideas into their work. The cost of leaving planning on quality to just before the 'testing phase' has significant downsides.
PDR for senior management
PDR is most powerful at the team level. It can be a coaching tool when explaining concepts to principal engineers, Directors of Engineering (DoE) and senior management. Influencing senior engineering provides time and resources for these quality tasks to exist.
Prevention Tasks
Implicit assumptions about scope, complexity and requirements are common causes of poor quality. Developing a shared understanding of the why and the what increases quality significantly. The following sessions/conversations can be facilitated, or the team can have them themselves.
- Quality Sliders (tradeoffs)
- Risk Storming Sessions
- Quality Attributes Discussions
- Example Mapping Sessions
- Story Splitting Conversations
- Vertical slicing
- Small batches
Detection Tasks
Testing is an important activity that should occur as early as possible, but others exist. Teams agreeing on tasks and activities will have greater clarity on what tasks will be performed.
- Test Driven Development & Pair Programming
- Static code analysis & code security tools
- Peer reviews
- Test Planning
- Levels of Testing (Unit, Integration, E2E, Test in Prod)
- Types of testing (performance, security, exploratory)
- Test data setup
- Test environments (staging, production)
- Feature Flags
- Test Environment Setup
- Test Automation Strategy
- Test Reporting
- Distributed tracing
- Testing in Prod
Recovery Tasks
Again, discussing and agreeing on recovery approaches means teams can be ready before release.
- Incident management
- patching during an incident
- testing during an incident (feature flags)
- training new engineers on incident management
- training new engineers on SLO's
- Automatic Detection (Alerts)
- Deploy and Release process
- Monitoring Discussions
- Automated Alerts
- CUJ's & SLO's
- Logging Standards
Working with people outside of quality
Many tech people scratch their heads in puzzlement when a quality professional asks to be included early in a project or feature design. After all, isn't testing something that's done in the end? As a result, you may have your work cut out to convince people to 'let you in' to meetings. The best advice is to build relationships with key influencers. In my experience, these are product managers, delivery leads and principal engineers.
Also, don't assume people know anything about bug prevention and its benefits. Share articles & talks on bug prevention and shifting testing left. Talk to prevention over detection and its benefits in terms of cost of quality. Talk about better scoping leading to more accurate estimates that, in turn, offer greater predictability for releases.
A final word on Team Maturity
A common mistake quality coaches make is to assume teams have the same knowledge and depth of experience as they have, only to find that even the most essential quality activities are not being performed. As a quality coach, stay curious and ask about existing practices before jumping into advanced concepts.
Make a point of rolling out new practices slowly. Allow time for new concepts to be absorbed into the team psyche. Avoid change fatigue by keeping the level of change small. Be guided by the team, their motivation, and the amount on their backlog.
Do you have a similar approach? Let us know!
Comments ()