Archive for August, 2008

Predicting Velocity When Team Membership Or Size Changes Frequently

Monday, August 11th, 2008

As a measure of the amount of work completed in an iteration, velocity works extremely well when teams are relatively stable. If the same people stay on a team, it is reasonable to assume that the amount of work they complete will be relatively constant from iteration to iteration. This allows us to plan using inferences such as “This team has an average velocity of 25 points per iteration over the last year and they have time for 8 iterations in this new project; therefore they will complete around 200 points in those 8 iterations.”

But what do we do when team membership or size changes frequently?

To answer this question most effectively, you should collect data on how teams of different sizes have performed over time in your organization. When I was a VP of Development at a couple of agile organizations, I used to collect data on velocity and team size changes in a simple spreadsheet similar to this:

Initial Team Size New Team Size Median of Last 5 Iteration +1 Iteration +2 Iteration +3
6 7 25 20 24 28
6 7 16 16 15 19
8 6 50 40 40 42
7 8 12 10    

The first column represented the size of the team before a change occurred. The next column represented the new size of the team (up or down). The third column was what I considered to be a reasonable “long term average” velocity for the team at its initial size. Because team’s could change frequently (by a person or two) I settled on using the median value of the team’s last five iterations. The tradeoff in using a longer measure (median of 15 iterations perhaps) is that you’d have fewer observations. The next columns represent the actual velocities of teams over the next three iterations. Notice that for the last team values are not shown for the last two iterations. This is usually because the team size changed again. If you have a significant number of teams, the rows in this type of spreadsheet will accumulate quite quickly.

In some of the organizations where I used this approach we did not have a standardized definition of story point (or we’re using ideal days, which were not as normalized as you might think). So all analysis was done on a percentage basis. What I wanted to know was “What is the average impact of adding a person to a seven-person team?” I would have loved the answer to be something like “Velocity goes up 15%.” Unfortunately, it wasn’t that straightforward because velocity often dipped for a couple of iterations before going up. By tracking data I found that usually by the third iteration a team had settled in on a new velocity, which is why my spreadsheet above only tracks through Iteration+3. By all means, track more and see what you find. (But keep in mind that the data will get sparse as team sizes will change again.)

Another tab in my spreadsheet, expressed all the data in percentage terms. The first two rows

Initial Team Size New Team Size Iteration +1 Iteration +2 Iteration +3
6 7 -20% -4% +12%
6 7 0% -6% +15%

(Example: Iteration +1 in the first row is -20% based on the team dropping its velocity from 25 to 20.)

I then simply averaged these percentages for each team size change to get results like:

Initial Team Size New Team Size % Change in Iteration +1 % Change in Iteration +2 % Change in Iteration +3
6 7 -10% -5% +15%

This allowed me to answer all sorts of questions, including:

  • What will this team’s velocity be if we add two people?
  • How soon could we get this project if we added a person to each team?
  • If I want all those projects done by the end of the year, how many people would we need to add?
  • What would be the impact of not approving the new employees in the budget?
  • What would be the impact of a 15% layoff?

There are of course many flaws with this approach. Adding Susan to the project is very different than “adding an unknown person” to the project. Still, if I have the data on averages across the board in our organization I can make assumptions about specifically adding Susan (if I want, there can be many more risks in doing this). Notice that the approach does not attempt to take into consideration who it was that was added or even what skillset the person had. You could collect such data if you wanted. As anal about collecting all sorts of data like this as I am when I have access to it, I knew though that collecting that type of data would have made this just hard enough that I wouldn’t have done it regularly.

I did collect a few other bits of data that I left out of the initial table (so as to have more horizontal room for the data of real interest). For example, I collected data such as iteration length and the name of the team’s ScrumMaster (the latter was in case I had questions a few weeks later).

The approach described here was just simple enough that I could get empirical evidence of the impact of team size changes. This was invaluable when discussing headcount changes with product owners and the CEO.

Is It a Good Idea to Establish a Common Baseline for Story Points?

Saturday, August 9th, 2008

In my previous post, I wrote about how to establish a common baseline for story points across relatively large teams (a few hundred developers). In this post I want to consider whether doing so is a good idea.

The need for a common baseline to story points usually arises from the reasonable desire to know how big the entire project is. To know that, we must know the size of the work to be done by each team. Unfortunately, along with this goal comes the ability to compare teams based on their velocities. Since many managers are constantly looking for ways to compare team and individual performance it is not surprising that they begin to make such velocity comparisons. Almost all such comparisons are disruptive to performance of the combined, overall group or department.

A chart such as the one that follows can show a lot of interesting information.

Velocities before teams told they would be compared

However, this chart can be very dangerous because of how teams will assume the data is being interpreted. Shown a chart like this a common team response will be to feel that they need to faster than the other teams. Achieving this additional speed may come from working in a more focused manner (a good thing), but it may come instead from sacrificing quality, leaving important refactorings undone, or a variety of other not-so-good manners.

Some teams may respond to the pressure for their abstract measure of velocity to increase by gradually inflating the number of story points assigned to a story. This can happen in subtle and not particularly nefarious ways that can accumulate into large problems. Consider, for example, a team that is arguing over whether a particular story should be estimated at 5 or 8 points. If the team is under pressure (real or just perceived) to increase velocity they will be more likely to assign the 8. The next story the team considers is slightly larger. They compare it to the newly assigned 8 and decide to give it a 13. Without pressure to improve velocity, this same team may have given the first item a 5 and the second (slightly larger still) item an 8. In this one scenario the team has inflated their points from 5+8=13 to 8+13=21, or more than 50%. Story point inflation such as this tends to happen very quickly if it happens at all.

Consider what happened in the next few iterations for the four teams shown in the previous figure.

Four teams and their velocities

Not surprisingly, someone in the Project Management Office distributed the chart showing the similarities over the first three iterations. Two of the teams reacted by instantly inflating their story points. After seeing that, the yellow team followed suit. The green team is either extremely virtuous or they haven’t noticed the charts yet.

So, should you establish a common baseline? Yes, if there are advantages to doing so on your project. If you do, however, you need to make sure you go out of your way to create safety around that baseline for the teams. Stress that this isn’t being done as a way to compare teams and that you (and your bosses know) that there are many factors that influence velocity, not just “how good” a team.

Establishing a Common Baseline for Story Points

Wednesday, August 6th, 2008

A common criticism of story points is that the meaning of a story point will defer between teams. In this post I want to describe how can we establish a common definition of a story point across multiple teams within an organization.

The best way I’ve found to do this is to bring a broad group of individuals representing various teams together and have them estimate a dozen or so product backlog items (ideally in the form of user stories in my opinion). Not each estimator needs to understand every item but most people should understand most items. The items being estimated do not need to be new items; some could be from a project finished recently that many estimators remember or worked on. Some items could be artificial; perhaps the team is asked to estimate, “a typical transaction activity report.” If that meant something to most estimators, it would be a good candidate item.

I’ve done with this 46 people in a large conference room–44 estimators plus me and a coach from my client who wanted to watch so he could moderate such a meeting the next time one would be needed. The 44 estimators represented 22 teams; two estimators per team were in the meeting. If you’ve seen or used the Mountain Goat planning poker cards, you’ll have noticed that they feature a very large number in the middle (plus the number in a smaller font in the corners). We could have done something cute like put eight little goats on the eight card. We put the very large number there deliberately, though: We wanted it to be visible across a potentially large conference room.

You can probably imagine how difficult it might be to gain consensus among 46 people playing planning poker. While it will not take proportionately longer to derive estimates, it does take quite awhile with that many people. I think it took us about two hours to estimate twelve items.

But when that meeting was over, each pair of estimators went back to their teams with twelve estimates. Those estimates could then be used as the basis for estimating future work. As each team estimated new product backlog items they would do so by comparing them to the initial 12 plus any estimates that had been produced since (by them or any other team).

I’ll blog next about when it may or may not be a good idea to establish such a common baseline.