Establishing a Common Baseline for Story Points

A common criticism of story points is that the meaning of a story point will defer between teams. In this post I want to describe how can we establish a common definition of a story point across multiple teams within an organization.

The best way I’ve found to do this is to bring a broad group of individuals representing various teams together and have them estimate a dozen or so product backlog items (ideally in the form of user stories in my opinion). Not each estimator needs to understand every item but most people should understand most items. The items being estimated do not need to be new items; some could be from a project finished recently that many estimators remember or worked on. Some items could be artificial; perhaps the team is asked to estimate, “a typical transaction activity report.” If that meant something to most estimators, it would be a good candidate item.

I’ve done with this 46 people in a large conference room–44 estimators plus me and a coach from my client who wanted to watch so he could moderate such a meeting the next time one would be needed. The 44 estimators represented 22 teams; two estimators per team were in the meeting. If you’ve seen or used the Mountain Goat planning poker cards, you’ll have noticed that they feature a very large number in the middle (plus the number in a smaller font in the corners). We could have done something cute like put eight little goats on the eight card. We put the very large number there deliberately, though: We wanted it to be visible across a potentially large conference room.

You can probably imagine how difficult it might be to gain consensus among 46 people playing planning poker. While it will not take proportionately longer to derive estimates, it does take quite awhile with that many people. I think it took us about two hours to estimate twelve items.

But when that meeting was over, each pair of estimators went back to their teams with twelve estimates. Those estimates could then be used as the basis for estimating future work. As each team estimated new product backlog items they would do so by comparing them to the initial 12 plus any estimates that had been produced since (by them or any other team).

I’ll blog next about when it may or may not be a good idea to establish such a common baseline.

Tags: , , ,

15 Responses to “Establishing a Common Baseline for Story Points”

  1. Dan Ackerson says:

    Great idea, Mike! We have strugged quite a bit with how to reach a common consensus about story points across the organization. Up until now we’ve just left it at “different teams will have different velocities” but this is clearly confusing to upper management (who immediately tend to suspect some teams of being “underachievers”). I’ll let you know how our company-wide planning game works out!

  2. That’s awesome, thank you. Could you also imagine a similar approach in consulting firms where the team members can change from one customer engagement to the next? That could really be helpful for having a shared understanding in the organization of roughly what, say, 100 story points looks like when thinking about a new project or release.

    It may also help to surface particularly bad environment/political issues on certain projects – “hey, why does it take 5 times as long to do a story point on this project?” (although, perhaps the reverse could also be true, where teams are unfairly compared to one another, hmmm…)

  3. Mike Cohn says:

    Hi Abby–

    In a consulting firm I would absolutely want to strive to have a common baseline for story points. In that type of environment this is commonly achieved by using some artificial user stories as they’ll be more likely to be understandable by a broad set of consultants. So you may have a story like, “Have Sharepoint do such-and-such” (if that’s something you do commonly). Get a bunch of consultant developers to agree on its size, repeat and then publish a set of agreed-upon stories.

    Additionally, in a consulting environment, I would strongly suggest you track data on changes in team size since that will happen often. Your question prompted me to move that up on my to-blog-about list and I just wrote about it tonight here: http://blog.mountaingoatsoftware.com/?p=50

  4. Mike Zwicker says:

    Bingo! You’ve touched on another one of our hot topics. We have lived the pain long enough and have now decided to bite the bullet… We Need a Common Basis of Estimate! Sounds good doesn’t it, but that’s as much (or as far) as we can agree on.

    On a small scale, we have tried following the method described above but couldn’t come to a consensus. Seems our system, which includes many configurable COTS products and glue code, is so different from product to product nobody can come to terms with similarity. Coming up with a common basis of estimate for each product is a start but doesn’t answer the bigger issue.

    I have thought about suggesting the use of ideal days all the while still following the 0, 1, 2, 3, 5, 8… numbering scheme.

    Think we have already stepped in a cow patty! Your thoughts?

  5. Mike Cohn says:

    Hi Mike–

    While I prefer story points to ideal days, ideal days can work as an estimating unit. For me, though, the real key is to emphasize relative estimating as much as possible. With ideal days some product backlog items will be estimated as an answer to the “How long will it take?” question. But I’d like even more product backlog items estimated as answers to “How much time does it take compared to this backlog item?” Relative estimating is a wonderful technique. It’s the only way to estimate with points but can be applied with ideal days as well.

    Another possibility for you that may help you try story points still is that the items you estimate en masse do not need to be real and not all need to be understood by each person. Some teams I’ve worked with just estimate some artificial but close to real user stories. They’ll estimate things like “create a typical new report” and agree that’s 2 points. Then they’ll estimate “add a medium complexity screen like this one [show some existing application]” and call that a 13. And so on. So none of the items is real but all are realistic.

  6. Syed Rayhan says:

    Mike,
    This clarifies a confusion I had about relative estimation. That is a team needs to have a relatively well defined/complete product backlog in place and perform relative estimation before they start working on it. I was not sure to how to perform relative estimation on a continuously growing backlog. Just to make sure I understand what you are saying here, a team or a company can agree on relative sizing of a set of representative stories and then use them to estimate real stories as they come in. That solves our problem. For some of our clients, we do releases every sprint and we do not have stories defined for future sprints well ahead of time. So we endup having only actual estimates (sprint planning) and hence we do not have team velocity. Now we will have one…:-)

    Thanks a lot as always.

    Syed Rayhan
    product: http://www.scrumpad.com
    company://www.code71.com

  7. Mike Cohn says:

    Hi Syed–

    Yes, you can create a baseline from these representative stories so your understanding is correct. It’s still better to use real ones when you can and you will build up real ones iteration by iteration. Any new story can be compared to any of the old ones so the original baseline become less important over time (as there are many more to compare to).

    -Mike

  8. Ian Suttle says:

    Inspiring post for getting everyone thinking the same way. This has been a common question amongst people in our org as we introduce story point estimating.

    I’ve recently been giving presentations about story points and their benefits to others in our company. I’ve written up a brief intro to get the basics down here: http://www.iansuttle.com/blog/post/Betting-on-Story-Points.aspx

    Thanks,
    Ian

  9. Kerry Kimbrough says:

    Sorry, I arrived late to the discussion while doing some research on agile estimation. But I’m troubled. Here, the premise is that relative estimations are superior, but I find that assumption very problematic. First, the exemplar may be missing. Second, your info on actual effort for the exemplar may be flawed (especially true in the common case of “Gee, I think I remember when I did something like this”). And finally, the comparability of the task at hand with the exemplar may be sketchy — it may seem kinda semi-sorta similar, but actually it’s a new situation in many ways. In my experience, this “things are always different” aspect is the rule, not the exception. What I see is that the actual pragmatics of sw dev usually make relative estimation unsound. Yet this business of story points introduces all kinds of additional problems (e.g. the topic of this thread) and unnecessary variables (e.g. velocity). Since estimating is really about imagining what will happen in the future, isn’t it better work directly with the real world of real tasks and real hours before you? Would truly like to hear your responses, because estimation is a difficult problem, and I’m still looking for more knowledge.

  10. Mike Cohn says:

    Hi Kerry–

    The idea of relative estimating being better than absolute is not merely a premise. It has been shown by Lederer and Prasad in “A Causal Model for Software Cost Estimating Error” (1998) and Vicinanza et al. in “Software Effort Estimation: An Exploratory Study of Expert Performance” (1991).

    I don’t consider velocity an unnecessary variable. I call velocity “the great equalizer” because it is what lets a team be able to successfully plan a project as long as their estimates are consistent (even if there is systematic error in those estimates). Being consistent is much easier to achieve and this allows teams to create plans.

    I don’t think it’s the foundation of your argument above, but I want to be clear that I advocate story point estimating for user stories (features) and not for tasks. You mention tasks a couple of times.

    You conclude with a mention again of tasks and saying wouldn’t it be better to use tasks. But keep in mind, an agile team plans at two levels: with story points I am talking about plans put together to encompass the next 3-9 months. A typical team would spent a month of effort (up front) if they had to break all the work of the next 9 months into tasks and hour estimates. Release-level planning is to enable prioritization and product decisions. At the start of each sprint the team does do planning at the task/hour level.

    You may want to take a look at the Agile Estimating and Planning book.

  11. Kerry Kimbrough says:

    I mentioned some of the many sources of variation that confront relative estimates. I agree relative estimating can be effective when the team can get these variations under statistical control. But how to do that? What I seem to hear most often is “well, it eventually works itself out”. But what I often see in practice is “not”. Because the team really isn’t thinking about it or doesn’t know how, the distribution of actual duration around “N story points” has a high variance and velocity is unstable.

    I understand the 2 levels of estimation and how story pts (relative) are used for backlog and hrs are used for sprints. (Sorry, my previous post confused these.) I understand that story points are intended to quickly capture the intrinsic difficulty of a story (regardless of who does it when). Has anyone tried a direct quantitative estimate of difficulty? Sort of an agile version of function points? (Not that I’d suggest FP, which is problematic in many ways and anyhow not well-suited to a brief planning conversation).

  12. Sonali says:

    Hi Mike,
    Thanks for this interesting post. In one of the reply you have mentioned that once we start into iteration we can refer to the recent US implemented in last iteration to compare & to come up with size estimation for the US in current sprint. That means the baseline that would have been established at the beginning of the project can be replaced with the new ones.
    But does that mean we establish & publish new baseline every time it changes from original or it’s left to the team to decide to choose which one to compare with ?

  13. Mike Cohn says:

    Hi Sonali–
    I want to be careful is saying the baseline is replaced as you do above. It hasn’t been replaced. Rather we’ve added to the set of items we can compare against. Think about this this way: Suppose we are standing in a city park. I point to a tree and say “that tree is 1 unit away.” I point to a farther tree and say that tree is 3 away. Looking in the other direction, I point to a table and say that the table is also 3 away. Next you point to something and think it’s about as far as the table so you call it 3 away as well. All 3-unit items should be equally distant from us, even though compared to the table rather than to the initially baselined 3.

  14. Devi says:

    Hi,

    We have been working on a scrum project for a year now and ideal days seemed to be working. But ideal days have become more of actuals and we have no data for user story sizing to do meaningful trend. Some of my team members do not think story points would solve the problem because it would again be specific to team and we can’t be assured that our reference would remain unchanged. for eg. if 2 sprints ago a userstory was say 6 points, today it might be 3 because the team is more comfortable in that type of functionality and would find it relatively easy. Does the story point sizing work only if we have built a reference at the start – with the product backlog?

  15. Mike Cohn says:

    Hi Devi-
    A story that would have been estimated as 6 two sprints ago should still be estimated as 6 today. With relative estimating, new items are estimated in comparison to all previously estimated stories. You build your reference as you go. Find an item you want to call a 2 (leaving room below for ones). Then find something you want to call a 5. Do this through full-team discussion. Then use a technique like Planning Poker to estimate the rest of your product backlog.

Leave a Reply