Measuring product team productivity

Aug 01, 2016

One of the hardest things to measure in software development is the productivity of development teams. How, from the outside, does one know that they are achieving everything that they can? What would we need to know in order to be able to answer that kind of question?

Once upon a time, people used to think that programmer productivity could be measured by counting the lines of code created by the programmer in a day, or a week. This seems laughable now, because we know better, but this was an approach used at many large and successful companies, and if we’re honest then we have to admit that there is something tempting about it: you’re paying people to write code, so why not measure how much of it they’re writing? The alternative was not measuring performance at all. It’s important to remember that measuring LOC was not obviously absurd to a large number of intelligent people at the time, partly because everyone else was doing it and partly because there were no widely-known alternatives.

Nowadays the objections to LOC as a performance metric are well-known: it does not capture the contributions of non-programmers, it’s a bad measure of programmer effectiveness, is very sensitive to the programming language being used, and is easily gamed (splitting the same code across multiple lines would produce a higher line-count but no extra functionality). But, for me, the main objection is that it tells us nothing about how valuable the code being written is, or what effect the code has in the real world. If all we know about a team is that it collectively produced 2300 lines of code this month, then we really don’t know very much about the value of the team’s work.

The real problem is that most of the metrics that have replaced line-counting have exactly the same flaw.

Consider story points. In most cases, the team is responsible for estimating the number of story points that they believe is appropriate for delivering a given user story. There’s some deliberate ambiguity about what this will entail - the team can consider technical complexity, novelty, clarity of requirements and other more nebulous factors (“anything that involves touching the single sign-on system takes twice as long as we normally think it will”) to come up with a number that represents the cost of building something. Progress can then be judged against these estimates: did we complete as many story points as we expected?

This just moves the problem: at the end of the month, we know that 73 story points have been delivered, instead of 2300 LOC. How is this any more useful? Story points are certainly easier to estimate than LOC, and story points might align more closely to real “stuff” delivered, but story points are just as vulnerable to gaming, and give no real measure of the effectiveness of the end product.

Estimation, planning poker, Fibonacci sequences, all of the ceremony that goes around the process of coming up with story points - these things invest the process with a sense of order and rationality that obscures the fact that what we’re measuring is just how much stuff was done. And we’re doing it using largely arbitrary numbers that approximate to how difficult we thought the thing was to do, not how valuable it is.

Like LOC, story points are used by many large and successful companies. Story point estimation is not obviously absurd to many intelligent people who use that practice. But it’s my view that it is absurd when seen from the perspective of the value being created.

Put simply, a team delivering a product should understand the value being created. They should know what’s most important and useful about the product, and should optimise to create the most value. Sometimes that means doing less work (fewer story points!). For instance, maybe you have a business objective to reach 100,000 subscribers, and several candidate features that may help to achieve this. But maybe the best option isn’t to deliver the 13-story-point feature that implements a complex referral system into your app; perhaps it’s more cost-effective to just spend more money on advertising. A story point system would suggest that spending a day researching advertising costs and then deciding not to deliver any new features would be entirely unproductive, but the reality is very different.

Focusing on value also helps to capture the input of non-programmers. Supposing a smart product owner or analyst can simplify a feature definition, so that it now takes 3 story points instead of 5. In the story point accounting, there is no visible improvement, because we’re only counting the fact that the complexity of the work has reduced. In the value accounting, we have created the same amount of value for less time spent! This rewards team members for working smarter rather than harder. If we work from an impact map rather than a typical product roadmap, we are already a long way towards seeing our product in terms of the value it creates rather than the features it has or the user stories that went in to building it.

It’s true that there are some activities which really should be measured in terms of throughput of work rather than the results produced. Homogenous, repetitive tasks can be measured in this way. But when developing a digital product, most tasks are not alike and are not repetitive, and we should want to encourage smart and creative thinking about how to get results rather than a mentality of producing as much “stuff” as possible. Counting story points, in this sense, is no better than counting lines of code.

At Fluxus, we’re building a tool to help product teams work in a results-oriented way - we call it Focus. If you’re interested in trying this out, you can register for the Focus beta program.