Posts Tagged ‘story points’

How Do Story Points Relate to Hours?

Sunday, February 8th, 2009

I’m often asked about the relationship between story points and hours. People who ask are usually looking for me to say something like “one story point = 8.3 hours.” Well, that just isn’t the case (especially since I made up 8.3 hours). Let’s see what the real relationship is between a story point and hours…

Suppose for some reason you have tracked how long every one-story-point story took to develop for a given team. If you graphed that data you would have something that would look like this:

Number of hours to develop various one-point stories

This shows that some stories took more time than others and some stories took less time, but overall the amount of time spent on your one-point stories takes on the shape of the familiar normal distribution.

Now suppose you had also tracked the amount of time spent on two-point user stories. Graphing that data as well, we would see something like this:

Number of hours to develop various one- and two-point stories

Number of hours to develop various one- and two-point stories

If the one-point stories are centered around a mean of x, ideally the two-point stories will be centered around a mean of 2x. This will never be exactly the case, of course, but a team that does a good job of estimating will be sufficiently close for reliable plans to be made from their estimates.

What these two figures show us is that is the relationship between points and hours is a distribution. One point equals a distribution with a mean of x and some standard deviation. The same is true, of course, for two-point stories, and so on…

By the way, notice that I’ve drawn the distributions of one- and two-point stories as having overlapping tails. It should be totally realistic that the biggest story that a team put “one story point” on might turn out to take more time than the smallest story they put a two on. After all, no team can estimate with perfect insight, especially at the story point level. So, while the tails of the one- and two-point distributions will overlap, it would be extraordinarily unlikely that the tails of, say, the one- and thirteen-point distributions will overlap.

Is It a Good Idea to Establish a Common Baseline for Story Points?

Saturday, August 9th, 2008

In my previous post, I wrote about how to establish a common baseline for story points across relatively large teams (a few hundred developers). In this post I want to consider whether doing so is a good idea.

The need for a common baseline to story points usually arises from the reasonable desire to know how big the entire project is. To know that, we must know the size of the work to be done by each team. Unfortunately, along with this goal comes the ability to compare teams based on their velocities. Since many managers are constantly looking for ways to compare team and individual performance it is not surprising that they begin to make such velocity comparisons. Almost all such comparisons are disruptive to performance of the combined, overall group or department.

A chart such as the one that follows can show a lot of interesting information.

Velocities before teams told they would be compared

However, this chart can be very dangerous because of how teams will assume the data is being interpreted. Shown a chart like this a common team response will be to feel that they need to faster than the other teams. Achieving this additional speed may come from working in a more focused manner (a good thing), but it may come instead from sacrificing quality, leaving important refactorings undone, or a variety of other not-so-good manners.

Some teams may respond to the pressure for their abstract measure of velocity to increase by gradually inflating the number of story points assigned to a story. This can happen in subtle and not particularly nefarious ways that can accumulate into large problems. Consider, for example, a team that is arguing over whether a particular story should be estimated at 5 or 8 points. If the team is under pressure (real or just perceived) to increase velocity they will be more likely to assign the 8. The next story the team considers is slightly larger. They compare it to the newly assigned 8 and decide to give it a 13. Without pressure to improve velocity, this same team may have given the first item a 5 and the second (slightly larger still) item an 8. In this one scenario the team has inflated their points from 5+8=13 to 8+13=21, or more than 50%. Story point inflation such as this tends to happen very quickly if it happens at all.

Consider what happened in the next few iterations for the four teams shown in the previous figure.

Four teams and their velocities

Not surprisingly, someone in the Project Management Office distributed the chart showing the similarities over the first three iterations. Two of the teams reacted by instantly inflating their story points. After seeing that, the yellow team followed suit. The green team is either extremely virtuous or they haven’t noticed the charts yet.

So, should you establish a common baseline? Yes, if there are advantages to doing so on your project. If you do, however, you need to make sure you go out of your way to create safety around that baseline for the teams. Stress that this isn’t being done as a way to compare teams and that you (and your bosses know) that there are many factors that influence velocity, not just “how good” a team.

Establishing a Common Baseline for Story Points

Wednesday, August 6th, 2008

A common criticism of story points is that the meaning of a story point will defer between teams. In this post I want to describe how can we establish a common definition of a story point across multiple teams within an organization.

The best way I’ve found to do this is to bring a broad group of individuals representing various teams together and have them estimate a dozen or so product backlog items (ideally in the form of user stories in my opinion). Not each estimator needs to understand every item but most people should understand most items. The items being estimated do not need to be new items; some could be from a project finished recently that many estimators remember or worked on. Some items could be artificial; perhaps the team is asked to estimate, “a typical transaction activity report.” If that meant something to most estimators, it would be a good candidate item.

I’ve done with this 46 people in a large conference room–44 estimators plus me and a coach from my client who wanted to watch so he could moderate such a meeting the next time one would be needed. The 44 estimators represented 22 teams; two estimators per team were in the meeting. If you’ve seen or used the Mountain Goat planning poker cards, you’ll have noticed that they feature a very large number in the middle (plus the number in a smaller font in the corners). We could have done something cute like put eight little goats on the eight card. We put the very large number there deliberately, though: We wanted it to be visible across a potentially large conference room.

You can probably imagine how difficult it might be to gain consensus among 46 people playing planning poker. While it will not take proportionately longer to derive estimates, it does take quite awhile with that many people. I think it took us about two hours to estimate twelve items.

But when that meeting was over, each pair of estimators went back to their teams with twelve estimates. Those estimates could then be used as the basis for estimating future work. As each team estimated new product backlog items they would do so by comparing them to the initial 12 plus any estimates that had been produced since (by them or any other team).

I’ll blog next about when it may or may not be a good idea to establish such a common baseline.

Should Companies Measure Productivity in Story Points / Ideal Days?

Wednesday, December 12th, 2007

Using story points or ideal days to measure productivity is a bad idea because it will lead the team to gradually inflate the meaning of a point–when trying to decide between calling something “two points” or “three points” it is clear they will round up if they are being evaluated on productivity as measured by the number of story points (or ideal days) finished per iteration.

My view is that points can be used as the best way to estimate and assess progress that we’ve ever had or they can be used as another weapon with which to hit the team. There are plenty of weapons with which you can hit your team. We don’t need to ruin points by using them that way as well.

Some teams have measured productivity with things like the number of backlog items delivered or the % of backlog items completed vs. planned into a sprint. Teams will alter their behavior on those as well though so they can be gamed and misleading. These metrics can be useful but only as part of a suite of metrics collected at the end of each iteration.

If we rethink the question of “how do we measure productivity” we might get a better answer. Suppose you own a sandwich shop and want to measure the productivity of the sandwich maker in the back. He responds to our metric by making as many sandwiches as he can–regardless of whether anyone ordered them! At the end of the day there will be 200 extra sandwiches to throw away. A better measure of him might be how quickly he makes any sandwich. So we’d measure the time from when the customer placed the order until the sandwich is put on a tray. Or for a more complete metric we may want to measure the time from when he receives an order until he is ready to receive the next order as this captures any cleanup or restart time.

So, one measure we may want to include in our suite of metrics could be the responsiveness of the development organization. This would be measured in the same way as in the sandwich shop. Datestamp each product backlog item and track the time from when something enters the product backlog until it either (a) comes out of an iteration or (b) is delivered into the hands of customers. Choosing between (a) and (b) will largely be a matter of how often you ship software. Option (b) is a better measure of rapid delivery of customer value but is impractical in some cases. It would be a bit of a useless measure for the Microsoft Vista team, for example.

Why I Don’t Use Story Points for Sprint Planning

Thursday, November 8th, 2007

As described in Agile Estimating and Planning, I’m a huge fan of using story points for estimating the product backlog. However, I also recommend estimating the sprint backlog in hours rather than in points. Why this seeming contradiction?

I’ve previously blogged on the reasons why I recommend using different estimation units (points and hours) for the different backlogs. But I’m often asked this related question I want to address here:

I’m curious why you aren’t using story points to do your sprint planning.  I thought that the point of measuring story point velocity was partly to determine how much we can take on (or commit to) in a sprint.  Do you only use story points for longer-term planning (e.g. release planning)?

I don’t use story points for sprint planning because story points are a useful long-term measure. They are not useful in the short-term. It would be appropriate for a team to say “We have an average velocity of 20 story points and we have 6 sprints left; therefore we will finish about 120 points in those six sprints.” It would be inappropriate for a team to say, “We have an average velocity of 20 story points so we will finish in the next sprint.” It doesn’t work that way.

Suppose a basketball team is in the middle of their season. They’ve scored an average of 98 points per game through the 41 games thus far. It would be appropriate for them to say “We will probably average 98 points per game the rest of the season.” But they should not say before any one game, “Our average is 98 therefore we will score 98 tonight.”

This is why I say velocity is a useful long-term predictor but is not a useful short-term predictor.

Velocity will bounce around from sprint to sprint. That’s why I want teams to plan their sprints by looking at the product backlog, selecting the one most important thing they could do, breaking that product backlog item / user story into tasks and estimating the tasks, asking themselves if they can commit to delivering the product backlog item, and then repeating until they are full. No discussion of story points. No discussion of velocity. It’s just about commitment and we decide how much we can commit to by breaking product backlog items into tasks and estimating each. This is called commitment-driven sprint planning.

When a team finishes planning a sprint in this way it is indeed likely that the number of story points they have unknowingly committed to should be close to their long-term average but it will vary some. It will also be true that a team will commit to approximately the same number of hours from one sprint to the next. I use the term capacity to refer to this number of hours because velocity is reserved for referring to measuring the amount of work planned or completed as given in the units used to estimate the product backlog (which I recommend be done using story points).

To Re-estimate or not; that is the question

Sunday, September 2nd, 2007

Should a team that is estimating in story points ever re-estimate? This is a question I’m commonly asked and would like to address here.

Most people have a natural feeling that re-estimating is somehow wrong but they can’t quite say way. I’ll encourage those individuals to stick to their hunches, and hopefully I can provide of the reasoning that supports your natural inclination that most re-estimating is wrong. Philosophers talk about two types of knowledge. The first is a priori knowledge, which is knowledge before you experience something. Let’s call this knowledge-before-the-fact. This is the type of knowledge we have when we estimate something. Before I estimating development of the new search screen I think it’s about 8 story points, because it seems to be about the same total effort as some other 8 point story. The other type of knowledge is called a posteriori knowledge by the philosophers. This is knowledge after the fact.

When we estimate it is important that we not mix knowledge-before-the-fact with knowledge-after-the-fact. Suppose you are looking at a Scrum product backlog that has just been estimated with none of the work started. Each of those estimates was given before-the-fact (a priori). Now suppose you are looking at the same project a few months later. You’ve got a list of completed work, some of the items on that list still show their original, before-the-fact estimates but some have been re-estimated with after-the-fact estimates. The product backlog is similarly mixed: mostly the initial, before-the-fact estimates but some estimates that have been revised after-the-fact because of what was learned by developing previous user stories off the backlog.

Having both before-the-fact and after-the-fact estimates on your product backlog and list of finished work can cause a lot of confusion for the project. When all estimates are given in before-the-fact numbers we can reason about them and compare them. Suppose the team is estimating a new item and want to say its equivalent to 20 story points because it’s similar to another item that has been estimated at 20 story points. That logic makes sense if the original item has not been re-estimated. If the old item was given an estimate of 10 before the fact and re-estimated to 20 after the fact then it is harder to know if the new item should get a 10 or a 20. With the re-estimation having occurred we’re in the position of saying “Before I start this one I think it’s a 20 because the other one felt like a 20 after I did it.” That’s weaker than “Before I do either of these they seem the same size.”

So, does this mean you should never re-estimated? Absolutely not. There are times when you want to re-estimated. Generally re-estimating is useful when you completely blew it on the original estimate and can see that the mistake was a rare occurrence. (That is, if every estimate is systematically off by half I wouldn’t re-estimate.) Second, you should re-estimate when there has been a change in relative size. For example, the team has discovered that learning AJAX will be about half as hard as they thought. We’d want to fix that because the new knowledge tells us that our relative estimates are off-kilter for the AJAX-heavy stories.

Sprint and release planning should be in different units

Sunday, January 14th, 2007

A common source of confusion on agile teams occurs when the sprint (“iteration”) backlog and the product backlog are both estimated in hours. To avoid this confusion I strongly recommend estimating these backlogs in different units.

In sprint planning the team should always talk of tasks and hours. Sprint planning covers the horizon of typically two to four weeks out.

In release planning the team can choose between “ideal days” and “story points.” Regardless of which they choose, they still do sprint planning in hours.

I prefer story points for the product backlog items (typically “user stories” are on the product backlog for me). What this means is that I may have a user story (“As a vacation planner, I can see photos of hotels so that I can choose the right hotel for my vacation.”) that is estimate in points–let’s say 5 points. That hangs out on the product backlog (PB) until the product owner prioritizes it such that the team chooses to work on it in a sprint. Once selected it is broken down into tasks and hours:

  • code the user interface, 6 hours
  • code new stored procedure, 4 hours
  • add photo maintenance page, 8 hours
  • write automated tests, 5 hours

and so on.

So both are used but at different times and when viewing items at different horizons.

Here’s a key reason I prefer points:

If I estimate the PB items in ideal days then it is too easy to mistakenly think that the PBIs and the items on the sprint backlog are estimated in the same unit. After all, the sprint backlog is estimated in ideal hours and the PB is estimated in ideal days. So, they’re the same unit (times 8 for the PB), right? This is a huge fallacy. On average the teams I’ve coached spend 30 minutes breaking a product backlog item into tasks and estimating those tasks. So, let’s not call that estimate “hours”. Let’s call it “hours I thought a lot about.” On the other hand, teams I coach spend 2-3 minutes on average estimating the PBIs. (These items don’t need the detailed thought upfront; we just want a rough estimate so we can decide priorities and basic schedule.) So, let’s call these “hours I pulled out of the air.” When the PB and the SB are estimated in days and hours, it is too tempting to divide the number of days on the PB (times eight) by the number of hours finished per sprint and think that’s an estimate of how long the rest of the project will take. However, that’s bad math. It’s literally dividing apples by oranges. It’s “hours I pulled out of the air” divided by “hours I thought a lot about.” The result will be meaningless. The problem goes away when teams go to two-level planning (release and sprint) and when they track velocity in story points.