This is the abbreviated version of an equally named article that was published in Medium’s biggest Scrum & Agile publication, “Serious Scrum”: “How to (not) get burnt” on Medium. The German translation on this blog refers to the full version and thus is much longer than this summary.
If you’re all new to Agile Estimation, here’s a primer
Agile teams want to remove wasteful conversations about subjective durations of tasks, therefore we estimate in Story Points, which is literally “the number of (abstract) points we estimate a (user-) story”.
Teams agree on an abstract estimation of effort one small user story that has already been done in the past. From then on all other user stories shall be estimated in multiples of that story or sums of effort of previously estimated stories. The well established best practice starts with Fibonacci numbers:
0 | 1 | 2 | 3 | 5 | 8 | 13 | 20 | 40 | 100 |
Agile Estimation uses the Fibonacci numbers to estimate effort relative to other tasks: “I think this feature is as much effort as those two features combined.” (thus it’s also called relative estimation).
Why it’s a bad idea to take on big features
Inherent uncertainty of big features makes it unlikely to finish them within one Sprint. Let’s say we have a team with a rolling average velocity of 25. This team finishes items worth 30 Story Points per Sprint. What do you think is more likely?
- for the team to finish 6 user stories worth 5 Story Points each or
- for the team to finish 1 user story worth 20 Story Points and another worth 8?
Of course it’s option 1. The organizational effort clarifying the uncertainties of the bigger stories and the potential rework after spending time building the wrong thing are the biggest hurdles preventing big features from getting finished within one Sprint. Because of this, teams commit themselves to not take on features exceeding a certain size but to instead break them down first.
Teams estimating on a limited scale and why this is a problem
Some teams had realized that they committed to doing things too big to handle in a Sprint. Now in order to improve, they had decided to “cut” their scale at 13. The pitfall they fell for was: Limiting the scale doesn’t improve your breakdown. Limiting your scale only leads to relatively different estimates, leading to an actually worse result.
Here’s a metaphor for why limiting your scale is a terrible idea:
Say we measure the temperature of water. 100°C is boiling hot, but already 60°C hurts pretty much. Let’s say we change our scale for 40°C to be the maximum. Nice, right? Now we can’t get burnt anymore! Wrong! What used to be labelled 60°C is now labelled 24°C. We just changed labels, but we didn’t (and can’t) change facts of nature: Obviously we still get burnt.
When asked to size any item in the Product Backlog at a scale of 1–13 Story Points, teams will most likely adjust to the reality of the new scale: Suddenly 13 becomes the absolute maximum, the boiling point. This in turn means: The later, lower priority, less well refined backlog items further down in the backlog will be estimated as this new maximum of: 13.
The main disadvantage is that relations are less visible
The smaller scale has a much lower factor (a different relation) between the items on the scale:
0 — 100: This sequence has an average of 19.2 and a median of 6.5. The relation between the maximum and the median value is a factor of 15.4
0 — 13: This sequence has an average of 4.6 and a median of 3. The relation between the maximum and the median value is a factor of 4.3
Backlog items estimated at the maximum of the scale very often represent later, larger, multi-sprint efforts and should definitely be broken down as time goes by. In the early stages of a project however, I have very often seen a huge difference between the well-refined User Stories of the first Sprints and the items further down in the Backlog. The factor between those was far greater than 4.3. This is relevant in order to make clear how much uncertainty those items entail.
Cropping the scale is outright harmful, since it decreases both accuracy and clarity around items:
The fix is simple: Give them the full scale again (and ideally a set of Planning Poker cards)
Part of the problem was that the aforementioned team did not have planning poker cards (or an Agile Coach acting as Chief Handicraft Officer who simply crafts some playing cards). They estimated by show of hands.
By now there are apps like Scrum Time Planning Poker, but back then I just wanted to get those teams some real, physical Planning Poker cards. Online I found several products which made the same mistake: Ranges until 21, sometimes lower. So teams run exactly into the above described trap.
Fortunately there’s also positive examples: Agile Planning Poker Cards by Ulassa – the full scale and a great design. (Not a sponsored link, I just like them.)
My team later (re-)estimated the backlog: The increased granularity with the full Fibonacci-scale led to more realistic effort-ratios between backlog items. This in turn made it easier to predict delivery times. Using the full scale finally sparked conversations about splitting user stories again. Proper agile estimation was (amongst others) what the team needed to get things done again.
Further reading:
- https://www.tempo.io/blog/why-we-are-so-bad-at-time-estimation
- Best of Mike Cohn on Agile Estimation:
- Should Story Points Be Assigned to a Bug Fixing Story? (I disagree with this one, but that’s a topic for another blog post.)
- Why I Don’t Use Story Points for Sprint Planning
- Story Points Estimate Effort Not Just Complexity