Posts Tagged ‘metrics’

Using a One-Handed Clock to Convey Project Goals

Tuesday, October 6th, 2009

The “iron triangle” is a long-accepted way of talking about the four parameters of project success. In the iron triangle, scope, schedule and budget each takes its place along a side of the triangle. Quality is placed in the middle under the premise that we don’t mess with quality. We can, however, adjust the sides. Sometimes a product owner or key stakeholder is told, “Pick any two but I can adjust the third” by the project manager, ScrumMaster or coach. Sometimes the customer is told, “you can only lock down one of the sides.”

I’ve recently decided there’s a better way to convey the points we’ve been trying to make with the iron triangle–we use The One-Handed Clock of Project Goals.

To use the one-handed clock, position Scope, Schedule and Budget where twelve o’clock, four o’clock, and eight o’clock would be on a clock. It doesn’t matter which is positioned where but I put them as shown in the figure. Quality is again assumable fixed and not needed on the clock.

The one-handed clock before its hand is added.

Next, ask the product owner or key stakeholder to point the one hand where it best indicates the project’s goals. The one hand can be aimed, for example, directly at Schedule. This would indicate that Schedule is the most important goal and Scope and Budget both take a back seat to it.

A one-handed clock.

Or perhaps the one hand is pointed between Scope and Schedule showing a mix of importance between them.

1HandedClock-B

To see how the One-Handed Clock of Project Goals works, take a moment to think about it. You can position the hand anywhere between any two of three goals but one goal is always left out. In the terms of the iron triangle, that would be the side left flexible.

There are a lot of things I like about this new way of visualizing the relative importance of Scope, Schedule and Budget. I’ll mention two here:

  • The One-Handed Clock allows stakeholders or product owners to convey a position more precisely than saying something like “I pick Schedule and Budget.” The ability to point the arrow precisely rather than only directly at an item is essential.
  • The One-Handed Clock is a useful visual metaphor that can be hung in a team room. The iron triangle doesn’t really work for that as it’s hard to convey which sides were selected other than by darkening them, which doens’t show much.

Try this out and let me know what you think. The teams and product owners I’ve introduced it to so far have found it very helpful. I suspect you will as well. Also, let me know what else you use a one-handed clock for as it’s a useful visualization for any three competing factors.

How Do Story Points Relate to Hours?

Sunday, February 8th, 2009

I’m often asked about the relationship between story points and hours. People who ask are usually looking for me to say something like “one story point = 8.3 hours.” Well, that just isn’t the case (especially since I made up 8.3 hours). Let’s see what the real relationship is between a story point and hours…

Suppose for some reason you have tracked how long every one-story-point story took to develop for a given team. If you graphed that data you would have something that would look like this:

Number of hours to develop various one-point stories

This shows that some stories took more time than others and some stories took less time, but overall the amount of time spent on your one-point stories takes on the shape of the familiar normal distribution.

Now suppose you had also tracked the amount of time spent on two-point user stories. Graphing that data as well, we would see something like this:

Number of hours to develop various one- and two-point stories

Number of hours to develop various one- and two-point stories

If the one-point stories are centered around a mean of x, ideally the two-point stories will be centered around a mean of 2x. This will never be exactly the case, of course, but a team that does a good job of estimating will be sufficiently close for reliable plans to be made from their estimates.

What these two figures show us is that is the relationship between points and hours is a distribution. One point equals a distribution with a mean of x and some standard deviation. The same is true, of course, for two-point stories, and so on…

By the way, notice that I’ve drawn the distributions of one- and two-point stories as having overlapping tails. It should be totally realistic that the biggest story that a team put “one story point” on might turn out to take more time than the smallest story they put a two on. After all, no team can estimate with perfect insight, especially at the story point level. So, while the tails of the one- and two-point distributions will overlap, it would be extraordinarily unlikely that the tails of, say, the one- and thirteen-point distributions will overlap.

Predicting Velocity When Team Membership Or Size Changes Frequently

Monday, August 11th, 2008

As a measure of the amount of work completed in an iteration, velocity works extremely well when teams are relatively stable. If the same people stay on a team, it is reasonable to assume that the amount of work they complete will be relatively constant from iteration to iteration. This allows us to plan using inferences such as “This team has an average velocity of 25 points per iteration over the last year and they have time for 8 iterations in this new project; therefore they will complete around 200 points in those 8 iterations.”

But what do we do when team membership or size changes frequently?

To answer this question most effectively, you should collect data on how teams of different sizes have performed over time in your organization. When I was a VP of Development at a couple of agile organizations, I used to collect data on velocity and team size changes in a simple spreadsheet similar to this:

Initial Team Size New Team Size Median of Last 5 Iteration +1 Iteration +2 Iteration +3
6 7 25 20 24 28
6 7 16 16 15 19
8 6 50 40 40 42
7 8 12 10    

The first column represented the size of the team before a change occurred. The next column represented the new size of the team (up or down). The third column was what I considered to be a reasonable “long term average” velocity for the team at its initial size. Because team’s could change frequently (by a person or two) I settled on using the median value of the team’s last five iterations. The tradeoff in using a longer measure (median of 15 iterations perhaps) is that you’d have fewer observations. The next columns represent the actual velocities of teams over the next three iterations. Notice that for the last team values are not shown for the last two iterations. This is usually because the team size changed again. If you have a significant number of teams, the rows in this type of spreadsheet will accumulate quite quickly.

In some of the organizations where I used this approach we did not have a standardized definition of story point (or we’re using ideal days, which were not as normalized as you might think). So all analysis was done on a percentage basis. What I wanted to know was “What is the average impact of adding a person to a seven-person team?” I would have loved the answer to be something like “Velocity goes up 15%.” Unfortunately, it wasn’t that straightforward because velocity often dipped for a couple of iterations before going up. By tracking data I found that usually by the third iteration a team had settled in on a new velocity, which is why my spreadsheet above only tracks through Iteration+3. By all means, track more and see what you find. (But keep in mind that the data will get sparse as team sizes will change again.)

Another tab in my spreadsheet, expressed all the data in percentage terms. The first two rows

Initial Team Size New Team Size Iteration +1 Iteration +2 Iteration +3
6 7 -20% -4% +12%
6 7 0% -6% +15%

(Example: Iteration +1 in the first row is -20% based on the team dropping its velocity from 25 to 20.)

I then simply averaged these percentages for each team size change to get results like:

Initial Team Size New Team Size % Change in Iteration +1 % Change in Iteration +2 % Change in Iteration +3
6 7 -10% -5% +15%

This allowed me to answer all sorts of questions, including:

  • What will this team’s velocity be if we add two people?
  • How soon could we get this project if we added a person to each team?
  • If I want all those projects done by the end of the year, how many people would we need to add?
  • What would be the impact of not approving the new employees in the budget?
  • What would be the impact of a 15% layoff?

There are of course many flaws with this approach. Adding Susan to the project is very different than “adding an unknown person” to the project. Still, if I have the data on averages across the board in our organization I can make assumptions about specifically adding Susan (if I want, there can be many more risks in doing this). Notice that the approach does not attempt to take into consideration who it was that was added or even what skillset the person had. You could collect such data if you wanted. As anal about collecting all sorts of data like this as I am when I have access to it, I knew though that collecting that type of data would have made this just hard enough that I wouldn’t have done it regularly.

I did collect a few other bits of data that I left out of the initial table (so as to have more horizontal room for the data of real interest). For example, I collected data such as iteration length and the name of the team’s ScrumMaster (the latter was in case I had questions a few weeks later).

The approach described here was just simple enough that I could get empirical evidence of the impact of team size changes. This was invaluable when discussing headcount changes with product owners and the CEO.

Is It a Good Idea to Establish a Common Baseline for Story Points?

Saturday, August 9th, 2008

In my previous post, I wrote about how to establish a common baseline for story points across relatively large teams (a few hundred developers). In this post I want to consider whether doing so is a good idea.

The need for a common baseline to story points usually arises from the reasonable desire to know how big the entire project is. To know that, we must know the size of the work to be done by each team. Unfortunately, along with this goal comes the ability to compare teams based on their velocities. Since many managers are constantly looking for ways to compare team and individual performance it is not surprising that they begin to make such velocity comparisons. Almost all such comparisons are disruptive to performance of the combined, overall group or department.

A chart such as the one that follows can show a lot of interesting information.

Velocities before teams told they would be compared

However, this chart can be very dangerous because of how teams will assume the data is being interpreted. Shown a chart like this a common team response will be to feel that they need to faster than the other teams. Achieving this additional speed may come from working in a more focused manner (a good thing), but it may come instead from sacrificing quality, leaving important refactorings undone, or a variety of other not-so-good manners.

Some teams may respond to the pressure for their abstract measure of velocity to increase by gradually inflating the number of story points assigned to a story. This can happen in subtle and not particularly nefarious ways that can accumulate into large problems. Consider, for example, a team that is arguing over whether a particular story should be estimated at 5 or 8 points. If the team is under pressure (real or just perceived) to increase velocity they will be more likely to assign the 8. The next story the team considers is slightly larger. They compare it to the newly assigned 8 and decide to give it a 13. Without pressure to improve velocity, this same team may have given the first item a 5 and the second (slightly larger still) item an 8. In this one scenario the team has inflated their points from 5+8=13 to 8+13=21, or more than 50%. Story point inflation such as this tends to happen very quickly if it happens at all.

Consider what happened in the next few iterations for the four teams shown in the previous figure.

Four teams and their velocities

Not surprisingly, someone in the Project Management Office distributed the chart showing the similarities over the first three iterations. Two of the teams reacted by instantly inflating their story points. After seeing that, the yellow team followed suit. The green team is either extremely virtuous or they haven’t noticed the charts yet.

So, should you establish a common baseline? Yes, if there are advantages to doing so on your project. If you do, however, you need to make sure you go out of your way to create safety around that baseline for the teams. Stress that this isn’t being done as a way to compare teams and that you (and your bosses know) that there are many factors that influence velocity, not just “how good” a team.

Improving On Traditional Release Burndown Charts

Wednesday, June 18th, 2008

I want to use this month’s blog posting to introduce a type of burndown (and burnup) chart that I find useful. I’ve been drawing this style of burndown chart for years and have coached many of my clients to do the same. Unfortunately, we’ve had to draw it either by hand or in tools like Visio and OmniGraffle because the agile tool vendors haven’t (to my knowledge) hit on this idea yet. I’m hopeful that some of them will see this posting, decide this is a good visualization, and incorporate it into their products.

The classic Scrum release burndown chart is good at showing whether a team will finish “on time” as can be seen in the following example burndown chart:


A traditional release burndown chart

A release burndown chart such as this one shows sprints on the horizontal axis and can show story points or ideal days on the vertical. It is updated once per sprint to show the team’s net progress that sprint. A team’s net progress is the amount of work they finished net of any changes in scope. So a team that completes 30 points of work but that has 10 points added to their product backlog will show net progress of 20.

But while a traditional release burndown chart excels at showing whether a team is on pace to finishing on time, it is not very good at showing what will be included in that “on time” delivery. To see this, imagine two teams that each start with 200 story points of work. The first team finishes twenty points of work for each of ten sprints. The second team is incompetent and rather than completing twenty points each sprint they drop twenty points of scope each sprint. The two burndown charts will be identical–perfect lines descending over ten sprints from 200 to 0.

At one level this is OK: the burndown chart shows whether a team will be finished (or by when the will be finished). The simplicity of the standard release burndown chart has much in its favor.

It isn’t hard, though, to extend a release burndown chart to also show what will be in the product by the final sprint. Look at the next figure, which is a hypothetical example of an eCommerce product.


A predictive release burndown chart

In this figure, you can see the burndown is tracked in the normal way through the end of the current sprint, the seventh. The company desires to release this product after the fourteenth two-week sprint. The right side of the burndown chart shows the team’s product backlog with the highest priority theme (“Returns”) at the top. This top block represents some “must-have” user stories related to returning purchased items. Below that is a theme for gift wrapping purchased items, followed by some “nice to have” aspects of returning items. At the bottom right is the Coupons theme.

Extending out from the team’s current position at the end of the seventh sprint are four lines. These lines represent the following:

  1. The team’s current position, drawn as a horizontal line from the current burndown position over to the product backlog. This tells us what is in the product so far. We can see that the mandatory return user stories and the gift wrap user stories are finished and that the team is partially into the nice-to-have return user stories.
  2. A black, dashed line showing the team’s most likely finish. This is the first of three trend lines meant to show the likely range of work the team might deliver. To draw a team’s most likely finish use the team’s long-term average velocity. You can define “long-term average velocity” in whatever way you want but my preference is to use the average velocity of the last 8-12 sprints. Pick the number of historical sprints that is most suitable for your team based on how long the sprints are and how long the team stays together.
  3. A pessimistic forecast of the amount of functionality that may be delivered. I recommend forecasting this based on a team’s worst-case but likely velocity. Calculate this by averaging the worst three or so velocities chosen among the same 8-12 iterations you looked back at to determine the team’s long-term average velocity.
  4. An optimistic forecast of the amount of functionality that may be delivered. Calculate this in the same was as in the pessimistic case but use the three (or so) best velocities of the team.

The figures in this blog are static images I’ve cut from a presentation. If you apply this technique for your team, the backlog items on the right should be clickable, allowing users to drill down into a product backlog theme to see specifically which items (typically user stories) make up the backlog.

By producing a single chart that shows both a team’s rate of progress (its burndown) and the product backlog, we have a single visualization that shows both when a team is likely to finish and what features will be in the product by that time. This makes is easier for product owners to make scope vs. schedule tradeoff decisions.

Check back in a few weeks when I’ll show an even more powerful technique for visualizing large product backlogs.

Should Companies Measure Productivity in Story Points / Ideal Days?

Wednesday, December 12th, 2007

Using story points or ideal days to measure productivity is a bad idea because it will lead the team to gradually inflate the meaning of a point–when trying to decide between calling something “two points” or “three points” it is clear they will round up if they are being evaluated on productivity as measured by the number of story points (or ideal days) finished per iteration.

My view is that points can be used as the best way to estimate and assess progress that we’ve ever had or they can be used as another weapon with which to hit the team. There are plenty of weapons with which you can hit your team. We don’t need to ruin points by using them that way as well.

Some teams have measured productivity with things like the number of backlog items delivered or the % of backlog items completed vs. planned into a sprint. Teams will alter their behavior on those as well though so they can be gamed and misleading. These metrics can be useful but only as part of a suite of metrics collected at the end of each iteration.

If we rethink the question of “how do we measure productivity” we might get a better answer. Suppose you own a sandwich shop and want to measure the productivity of the sandwich maker in the back. He responds to our metric by making as many sandwiches as he can–regardless of whether anyone ordered them! At the end of the day there will be 200 extra sandwiches to throw away. A better measure of him might be how quickly he makes any sandwich. So we’d measure the time from when the customer placed the order until the sandwich is put on a tray. Or for a more complete metric we may want to measure the time from when he receives an order until he is ready to receive the next order as this captures any cleanup or restart time.

So, one measure we may want to include in our suite of metrics could be the responsiveness of the development organization. This would be measured in the same way as in the sandwich shop. Datestamp each product backlog item and track the time from when something enters the product backlog until it either (a) comes out of an iteration or (b) is delivered into the hands of customers. Choosing between (a) and (b) will largely be a matter of how often you ship software. Option (b) is a better measure of rapid delivery of customer value but is impractical in some cases. It would be a bit of a useless measure for the Microsoft Vista team, for example.