Fork me on GitHub

robknight.org.uk

Blog

Drupal occupies a strange place in the web framework landscape. It's not a pure framework, like Ruby on Rails, Symfony or Zend Framework. Nor is it just a CMS or blogging product with the ability to host plugins, like WordPress. It's somewhere inbetween, giving the developer a fully functional CMS as a platform but providing many of the flexible basic services and abstractions that would be expected in a more generic "framework".

For this reason, talking about "architecture" in Drupal can be confusing. From one perspective, Drupal is the architecture. To consider this properly, let's unpack the metaphor:

When we talk about system architecture, we're talking about the bits of the project that aren't going to change, or will change only very slightly, in response to feedback and information gathered during the project itself. The architecture is the stuff that we can rely upon to remain true for a long time. In this sense, the metaphor with building architecture is very accurate: when planning a new house or office block, we begin by figuring out where the walls, foundations and windows will go, and how the heating, power and water will be provided - we don't worry about what colour to paint the walls, or whether to have carpet or wooden floors (in a Drupal project, this kind of thing is handled by the theme). And on a Drupal project, the main fundamental part is, well, Drupal. It defines how the user permissions system works, how content is stored in the database, the entire operation of the presentation layer, and with a few additional modules it can also define a lot about how integration with other systems works too.

So if Drupal gives us an architecture to work with already, do we need architects on Drupal projects? My answer is a tentative yes - there's clearly a need for architects to design the overall system, of which Drupal may be only a part. And the other parts of the system, which are built from much lower levels of abstraction (say, on top of a Java framework) will need architects to design them. But when working with Drupal itself, the job of the architect is different from these other cases. If an architect tries to plan how a Drupal site should operate from first principles, he will be wasting a lot of his time since many of these decisions have already been taken - and tested in the real world - by others in the community. The job of an architect on a Drupal project is not to design but to understand how to get Drupal to do exactly what is needed in the most efficient way possible.

An architect with little knowledge of Drupal may look at the requirements for his system and say "aha, we need a service for querying large amounts of information from an external data store". A more experienced Drupal architect would say "aha, we need to use Views, Schema and Table Wizard". The experienced architect knows that contact forms can be done easily via Webforms and don't require a custom module. This seems obvious to experienced Drupal developers, but it really is a quite strange concept to developers and architects from other backgrounds who are used to building applications from extremely flexible basic components.

Consider an organisation that needs to have multiple contact forms, allowing users to submit different kinds of information on each. Using something like Zend Framework, this would involve writing code to generate each form using Zend_Form, enforce validation rules using Zend_Validate_* classes and ultimately send the submissions via Zend_Mail. Now, each of those components is well-architected and de-coupled, and each can be subclassed and replaced easily. Zend_Mail has various implementors for different mail transports and encodings - it's a very flexible library. But the amount of effort required to implement even a very simple user story with ZF is considerably greater than the effort required to do the same with Drupal, because Drupal provides pre-built architecture for this very common web pattern. In ZF, each form will require a separate form class which contains all of the form element definitions, labels, error messages and validation rules - and that's before we consider the possibility of administrative users adding extra form fields without any code changes. In Drupal, the entire set of user stories is handled by the Webforms module, with little or no code to be written at all. Thus the Drupal architect's greatest asset is the knowledge of the patterns already implemented by others, rather than the ability to produce designs of how to re-implement this system from first principles (which would probably end up looking something like the Zend Framework example, and would take as long - or longer - to code).

This example explains why Drupal is different. Other frameworks value - correctly, in some cases - flexibility over functionality. Adherence to the formal structures of object-oriented (or perhaps that should be called class-oriented programming) is prevalent here. This can make these web frameworks extremely flexible, with every component pluggable and replaceable, with intricate inheritance trees providing - in theory - code reusability. But in practice, Drupal's approach of providing the developer with a set of prefabricated components that satisfy 90% of the likely user stories is simply more productive. Reusability is achieved by having modules that implement large chunks of functionality, with the potential to inject new functionality or override behaviour via Drupal's aspect-oriented hook system. Whilst this might feel limiting to a design purist, it is an extremely pragmatic way of building high-functioning websites; quite simply, it's easier to tweak the pre-fabricated components than it is to build new ones. When properly understood, Drupal allows architects and developers to skip ahead in huge leaps, building on top of the components already provided, focusing all efforts on the new and unique parts of a project, where those efforts are truly needed. This does mean accepting a limited scope for the architect, but this limited scope can be traded off against faster delivery and more time for in-depth consideration of the newest - and, by definition, riskiest - parts of a project. Architects can still add value and get satisfaction from solving the truly difficult problems.

So, my answer to my original question is that Drupal projects do need architects. But they need to understand that they're not designing the whole system from the ground up - they're designing only those parts that aren't already there, and their greatest asset when doing so is their understanding of how the existing parts work.

I haven't blogged or tweeted much about work lately, and the main reason for this is that I have been working on a project that has been under wraps until yesterday.

The project has now been announced: I've been working with Capgemini on the migration of Royal Mail to Drupal. It's a hugely exciting project - the site will ultimately be one of Europe's largest Drupal sites - and it's also great to be bringing Drupal into an enterprise setting with one of the world's biggest enterprise consulting firms. When I started developing with Drupal almost five years ago, it was to enable me to build richly interactive, community-oriented sites for small businesses, charitable and non-profit organisations. I never imagined that 5 years later I'd be doing the same thing for one of the UK's biggest and most recognisable brands.

For those outside the UK, it must be understood that Royal Mail is huge. It's one of the UK's largest employers, has a network of thousands of post offices, many thousands more post boxes and has a legal requirement to deliver to every single postal address in the country. I pass two post boxes just walking to the bus stop on my way into work in the morning. There's something exciting about working on a site for a business that plays such a large role in so many people's lives.

However, Royal Mail isn't the sole focus - or even the main focus - of the project. What Capgemini is building is a fully-fledged framework of components for delivery of enterprise-scale web projects, with Drupal as a central component. This framework is called Immediate, and working on making this a great product, suitable for many customers, is where the main focus of my efforts are going. This means making use of great Drupal technologies such as Features, Drush (& make files) and CTools exportables to build a packageable system. It also means integrating with a wide variety of third-party providers for e-commerce, identity and authentication, CRM and more.

For such large-scale sites, performance is critical and we have great support from David Strauss and Four Kitchens. Pressflow, memcached, APC and a highly-tuned MySQL server all go into the mix, and Zeus provides an excellent reverse proxy and load balancing solution. No doubt many interesting performance challenges await as the site goes fully live, by which point it will be one of the most heavily-trafficked Drupal sites in the world.

It's really great to see the good reception this project has had from the Drupal community, and hopefully this latest success will spur on even greater adoption of Drupal within the enterprise world.

Digital Economy Bill: It's the numbers, stupid

Posted on 28 March 2010 - 1:44pm

Since my previous post on the Digital Economy Bill, Cory Doctorow has written another post, this time accusing Lib Dem MPs of "stand[ing] back" and allowing the Digital Economy Bill to proceed to the "wash-up", the Parliamentary process by which bills that ran out of time before the dissolution of Parliament are nodded through. Now, I worship at Cory's altar as much as any other geek, but I think he's wrong on this.

As I said in my previous post, the government has the numbers to do what it likes so long as it retains the support of its own back-bench MPs. If they have the support of the Tories, government bills are virtually unsassailable. This is how ID cards, the DNA database, 28-day-detention-without-charge and, for that matter, the Iraq war have been approved by Parliament. The Lib Dems voted against them, which is about as much as you can do when Labour outnumbers you 7:1 and, with Tory support, 10:1. (If you're wondering how the Lib Dems are that badly outnumbered despite getting 22% of the votes at the last election, you'll want to consider how the electoral system works). The Lib Dems aren't "standing back" as there's actually no real way of stopping the government once it has decided to do something.

Worse, by saying that the Lib Dems are supporting the Labour/Tory consensus, Cory is letting the real culprits off the hook: the massed ranks of Labour back-benchers, with whom true power and responsibility lies. The Lib Dems have repeatedly called for further scrutiny and debate on this issue: David Heath first called for the second reading of the Bill to be held urgently, and Don Foster has made it clear that the Lib Dems are against the web blocking provisions and against the use of disconnection as a punishment for file-sharing - through negotiation, Lib Dems have already ensured that further legislation will be required before anyone gets disconnected, and that this must follow at least a year of studies considering alternatives, and a full consultation process. Since that legislation will have to occur on the other side of the general election, after which the current government may have either lost office or be forced into power-sharing with the Lib Dems, there's a fair chance that disconnection will never happen.

Now, it's certainly possible to push further. The clauses relating to disconnection and web blocking could be dropped from the Bill before it is passed. But it is not in the power of the Lib Dems to make this happen, due to the aforementioned Parliamentary arithmetic. We can be pretty sure that the Lib Dems will be voting against the Bill when it comes up, but we can also be pretty sure that they'll lose due to Labour's back-benchers supporting the government. Tom Watson is an honourable exception, but he doesn't seem to have the support of many of his Labour colleagues.

There's one final roll of the dice, though. Because of the rapid speed at which the government is pushing the Bill through, it has to go forward as part of the "wash-up" process. This is normally reserved for uncontroversial legislation that simply ran out of time before the election, but Labour have rarely stopped to worry about procedural niceties. In the wash-up, the parties come to an agreement about what to allow through, then - as I understand it - hold a series of votes which are effectively formalities, nodding legislation through. There is a chance that Labour can be spooked into dropping the controversial clauses in order to get the Bill through wash-up, or risk losing the whole package. This can only happen if Labour are worried about their own support on the back benches, which means that this is where pressure should be directed. I honestly can't understand why Cory is putting the emphasis on the actions of the 60-odd Lib Dem MPs (many of whom have publicly said they'll vote against the Bill anyway!) when there's 400+ Labour MPs, many of whom are in very marginal constituences and will have to take complaints from their constituents, particularly those raised in an organised manner, very seriously right now. And if we want to put real pressure on them in Labour/Lib Dem marginal seats, it might be worth mentioning that, actually, the Lib Dems are the [relatively] good guys in this, and Labour are the party which created this Bill and are forcing it through Parliament.

So far as I can tell, the Lib Dems are doing about as much as they can. It's not enough to stop the Bill, because there's not enough Lib Dem MPs to do that. Follow the numbers and you can see where the battle over this bill is really being fought.

Digital Economy Bill: have I got this right?

Posted on 27 March 2010 - 11:05pm

As anyone with an awareness of politics or technology issues will be aware, the British government has recently been attempting to pass the Digital Economy Bill. This is a wide-ranging piece of legislation, covering issues from digital radio to copyright infringement on the internet and much more besides. As the legislation has evolved, it has acquired - apparently at the behest of the government in the form of Lord Mandelson - greater powers to punish those accused of copyright infringement, and this is where the current controversy lies.

Now, first of all it should be pointed out that there are differing views on copyright itself. At the extremes lie the views that copyright is essentially wrong, as it prevents the totally free flow of ideas, and correspondingly the view that copyright should be absolute, giving the owner of a copyrighted work considerable powers to enforce their control over those works. In the middle, most of us accept that copyright provides a useful incentive to people to create things, by granting them a temporary monopoly on their creations, enabling them to profit from the sale of licensed copies of their work, but also believe that this needs to be balanced by rights of 'fair use', allowing others to share, remix and discuss these works in freedom.

The government, it is fair to say, have shown themselves to be on the side of the rights-owners, those who wish to maintain or extend their powers to enforce control over copyrighted works. The Digital Economy Bill provides new powers for rights-owners to seek the blocking of websites which they accuse of facilitating the sharing of copyrighted works, and to seek "technical measures" against individuals they accuse of sharing copyrighted works.

At this point, your views on the matter may diverge based on how much you know about the technology. As a self-confessed geek, I have to admit to knowing quite a lot, which gives me grave doubts about the feasibility of blocking websites or employing "technical measures" against individuals. Importantly, it is often hard to block websites in isolation. Many sites exist on "shared hosting" accounts, which mean that if one website on a particular server is being blocked, other - perfectly innocent - sites on the same server might get blocked too. This means that hosting providers - the people who operate the network infrastructure - have to be super-cautious about any threat of web blocking, because there is a risk that their customers may end up as collateral damage. This means that hosting providers often act on the mere threat of web blocking, simply taking down websites that are alleged to contain infringing content on the basis of nothing more than a solicitor's letter. By creating a further threat, of state-enforced web blocking, the power shifts further away from individuals running websites and towards those who wish to threaten them. The Liberal Democrats (full disclosure: I'm a paid-up member) and Conservatives jointly acted to specify a proper, legal process for this in an amendment to the Bill; whilst an improvement over the government's original proposals, it still leaves site operators under threat of site blocking. After a resolution at the party conference, Liberal Democrat Lords attempted to introduce a new, better amendment, but for reasons that I do not fully understand, this amendment was not adopted in the Lords. What we now have was better than the government's original proposals, but still not good enough.

On the second point, "technical measures" against individuals, the situation is even less clear. "Technical measures" relates to the use of "throttling" to limit the amount of data that can be transferred over an internet connection, effectively degrading the service to the point of unusability and, if that is not judged to have had an effect, enforced disconnection from the internet can follow. This is an even more serious problem than web blocking, because it is almost guaranteed to create collateral damage. If one member of a household is accused of sharing copyrighted works, the rest of the household can be made to suffer for it. There is a good case to believe that this is a violation of natural justice and will, I imagine, end up being challenged on Human Rights grounds. The Open Rights Group (full disclsoure: I'm a paid-up member) has done great work in campaigning against this part of the Bill (and others!), and that campaign is rapidly approaching its moment of truth: the Bill is due to be voted on on April 6th and, at present, "technical measures" are still very much part of the Bill.

Matters are further complicated by the Labour government's abuse of Parliamentary procedure. They are attempting to pass this legislation with the barest minimum of debate in the House of Commons; the Bill will pass without a committee or report stage and will be made law as part of the 'wash-up' process that exists to fast-track pending legislation once a general election has been called. Given that this is not an emergency bill, and that there still exists substantial disagreement over its contents, this can fairly be called abuse of the procedure.

However, it's important not to let the unusual circumstances obscure the Parliamentary reality: Labour have a considerable majority and the Conservatives are sympathetic to the Bill - it would have passed even if it had been debated for months. The Iraq War, ID cards, 28-days detention, Control Orders, the exemption of MPs expenses from the Freedom of Information Act, the one-sided Extradition Act, the DNA database, the Legislative and Regulatory Reform Act, restriction of trial by jury and countless other smaller but no less pernicious pieces of government business have been passed by sheer weight of Labour's numbers, often with Conservative support or sympathy. The Liberal Democrats have voted against all of these, but this has never succeeded in preventing the legislation, on the simple basis that Liberal Democrat MPs comprise fewer than 10% of the total in the Commons (despite receiving 22% of the vote at the last General Election). In some cases, Lib Dem amendments have succeeded in taking some of the sharp edges off Labour's legislative flails, and when Labour's back benches have remembered their consciences it has been possible to defeat the government - on a grand total of six occasions in the last five years.

But does that mean that the Lib Dems are doing enough? Well, I'm not sure. Certainly the Lib Dems have taken the most sensible position of the three main parties, opposing web blocking and disconnection. At this point, I'm getting mixed messages about how effective that opposition has been; Don Foster MP says that for people to be forcibly disconnected from the internet, further legislation will be required in the next Parliament and that the Lib Dems will oppose it when it comes up; Jim Killock of ORG says that this isn't good enough, as it will likely be passed when it does. He is probably right, given that the legislation will take the form of "Statutory Instruments", which are normally approved with little scrutiny. This is a favourite trick of Labour's - pass the uncontroversial bits when everyone is paying attention, but give the Secretary of State (Mandelson, in this case) the power to add the worst bits back in via SIs when nobody's paying much attention. Jim wants to avoid things getting to that stage and seems to be hoping that moral pressure from the Lib Dems might persuade the government to drop the whole idea. However, given the previous record (enumerated above), I somehow doubt that this will work. A better hope is that Mandelson might not be around in a year's time and if the election is as close as the polls currently predict, the Lib Dems might end up holding the balance of power, a much stronger position from which to block the disconnection powers from coming into use.

In any case, pressure from the Lib Dems looks like our best hope right now and we should certainly be pursuing it, along with a vigorous campaign by ORG members and supporters. With the General Election bearing down on us, it might just be possible to spook enough Labour backbenchers into pressuring their own side into dropping the worst parts of the Digital Economy Bill before it passes into law. A cynic might remark that if two million people marching didn't stop the Iraq war, the few hundred of us outside Parliament a few days ago is unlike to stop this Bill, but that's no reason to give up on campaigning.

My question is: have I got this right? Is there more to this than I've realised, and is there some nuance of Parliamentary procedure which actually makes the actions of the Lib Dems more important than I've realised? Is there more that we can do?

Spotify fails the OiNK test

Posted on 21 March 2010 - 8:15pm

"OiNK", for those who don't know, was a Bittorrent tracker, shut down following police raids in October 2007. OiNK's administrator was found not guilty of the slightly odd charges brought against him, but the site is gone and shall likely never return.

OiNK, however, was awesome. And it's not just me saying that, Trent Reznor said so too:

I'll admit I had an account there and frequented it quite often. At the end of the day, what made OiNK a great place was that it was like the world's greatest record store. Pretty much anything you could ever imagine, it was there, and it was there in the format you wanted. If OiNK cost anything, I would certainly have paid, but there isn't the equivalent of that in the retail space right now. iTunes kind of feels like Sam Goody to me. I don't feel cool when I go there. I'm tired of seeing John Mayer's face pop up. I feel like I'm being hustled when I visit there, and I don't think their product is that great. DRM, low bit rate, etc

The point is that OiNK was a great experience; the fact that the music was free was largely irrelevant. People were certainly happy to hand over money for the experience, as evidenced by the £180,000 or so that users of the site donated to fund the running costs. OiNK was a site for music fans, by music fans; what was missing was a direct means of paying the musicians for having created the stuff in the first place.

Apple's iTunes store wants us to pay per track, with DRM built in. For a whole bunch of reasons, this is bad. DRM is simply awful under almost any circumstances, but the pay-per-track approach means that it's hard to experiment with new music. Figuring out whether you want to pay for music might require listening to it a few times first, and iTunes doesn't support that.

But there's a rival model, one which - conceptually at least - resembles OiNK's all-you-can-eat access to music: Spotify. Spotify provides instant access to a large catalogue of music for free, with the only downside being that you are exposed to annoying advertising. Spotify pay the artists (or, more accurately, the record labels) for each time you listen to a track so, in theory, everyone should be happy. For those who can't stand the ads, or want access to higher-bitrate MP3s and the ability to download tracks to their mobile devices, Spotify provides a premium membership for the slightly-too-high figure of £9.99/month.

I like Spotify, I use it almost every day and I believe that it - or something like it - is the future of music distribution. But it's still nowhere near as enjoyable or useful as OiNK, a site cobbled together out of volunteered time, open source software and a healthy disregard for over-zealous copyright enforcement. Despite having plenty of venture capital behind it, it fails to create the sense of being the "world's greatest record store", at least in my opinion. So, why is this?

The first, obvious, answer is the ads. They're intrusive and annoying, and Spotify's incentive (to get people to upgrade to premium accounts) is to allow ads to be as annoying as possible. The software does a reasonable job of preventing users from muting the sound whilst the ads are playing, and ads are frequent enough that they can disrupt the listening experience. The only solution here is to pay up for a premium account - I'll return to this later.

Secondly, there are some lacunae in Spotify's music library. Obviously, there's no Beatles, no Rolling Stones, no Led Zeppelin, no Pink Floyd or any world-beating band from the era when record labels truly ruled the world. The back catalogues of these bands are too expensive and, in the case of some, it may not be possible to provide their music as individual tracks. However, it's hard to argue that depriving Spotify listeners of classic rock is really a problem - if you like Pink Floyd you've probably already bought their albums and, whilst it may prevent new listeners from discovering the band, it feels easier to simply accept that this music is in a different category, not part of the online music scene. What annoys me is not being able to find a decent version of Final Solution by Rocket From The Tombs, or pretty much any of Steve Albini's output, or the impossibility of tracking down that old b-side that you know exists somewhere, just not here. When something didn't exist on OiNK, you could request that other users scour their own music collections to see if they had a copy to share; Spotify presents no such option.

Spotify also does a poor job of recognising the existence of a community of Spotify users, at least within the confines of its software. It's five years since jwz pointed out that all software should be social, but Spotify doesn't seem to have noticed. There are no reviews, no comment threads, no user profiles, no simple means of sharing playlists, although the functionality to edit and share playlists with others does exist. Spotify pulls in some review and biographical information about albums and artists (from Allmusic, I think), but there's absolutely no user-generated content. Sometimes the biographies contain links to other artists, and if those artists are on Spotify then you can see their tracks and play them, but sometimes they just aren't on Spotify at all. There's a "related artists" feature which serves as a basic means of discovering new music - if you like the Pixies, you'll probably like the Smashing Pumpkins - but it's crude and obvious. It lacks any sense that anyone cares about this information, really wanting to impart that you have to listen to this because it's awesome. You're left alone to choose, to play what you already know that you like, unless it happens not to exist in the Spotiverse. Rather than spending an afternoon in the world's greatest record store, it's like spending an afternoon in a giant warehouse stacked high with alphabetically-sorted CDs of bands most people like.

I can't review Spotify without mentioning the diabolically awful "Home" page. Have they ever tested this on anyone? Seriously? It contains a list of "What's new?", which seems to be a random selection of stuff they've added recently. This can be admirably eclectic - right now I've got Lady Gaga, Cheryl Cole, "Jazz Acetate Collection" and something called "In Christ Alone", but I've no idea why they expect me to find this interesting. There is a "more" button, but this simply returns a different random selection of recent stuff, often containing one or more of the items that you started with. After five or so clicks, most of the "What's new" items will have appeared once already, so you're left with the choice of giving up or clicking endlessly in the hope of finding something interesting, never knowing if one more click might yield something as yet unseen. Beneath that is an "Artists you may like" suggestion box, which, to be fair, is reasonably accurate.

So, having said all of that, is it worth revisiting the option of paying for Spotify? After all, their business model depends on people buying the premium option - they make a loss on ad-funded users - and I want to see Spotify succeed. Ultimately, I'm not sure that £120/year (more than my broadband fee!) is sufficiently enticing for the service that they provide. I could buy an album a month for that, and at the end of the year I'd have 12 albums; with Spotify I'd have nothing. If I felt that Spotify might introduce me to something genuinely amazing, if I felt that there was a community of people on Spotify that I cared about, if I felt that my Spotify profile - my playlists, listening history, recommendations or reviews - had become part of my virtual personality, I might want to pay to ensure access to it. But there's almost no pain in walking away from Spotify, and resultingly no reason to want to pay for it when there's a usable free option available. As annoying as the ads are, benefit of not hearing them again does not feel like a good enough justification for paying.

Why software developers aren't just engineers

Posted on 16 January 2010 - 11:25am

Recently I wrote about Tom DeMarco's claim that software engineering is, in some sense, a 'dead' discipline. In many respects, I think he is right, especially in pointing out that a narrow focus on 'engineering' concerns is no longer appropriate. Engineering is about making something that carries out its stated purpose in an efficient and reliable manner, but it cannot tell us whether or not that purpose is a valuable one.

I think it's worth expanding on that insight a little further, particularly as it affects me in my job. I am, fundamentally, a coder by trade and my job still requires me to be able to write good code. Given that I've been writing code since primary school, I like to think that I'm now reasonably good at it. But when I was 18 and thinking ahead 10 years, I thought that being an ace coder was the only thing, professionally, that mattered, and it isn't.

Don't get me wrong, I still think that it matters a lot. Being able to write good code in one's chosen language(s) is a pre-requisite for anyone who takes him/herself seriously as a software developer. But the narrow conception of the software developer as a code monkey or savant hacker finding self-expression in the depths of a C++ program just doesn't seem to exist in reality.

Software development in 2010 is a much more multidisciplinary subject than the public perception of software development recognises. To be a good developer, you need to understand -

Psychology

If you want people to use your software, you need to understand how they relate to it. What makes software usable, intuitive, comfortable and reassuring for people, and what makes software confusing, irritating and unfriendly? What problem is a person trying to solve with your software and how can you make it easier for them?

Economics

Why should people buy or choose your software? What's the payoff? How to structure pricing and incentives is very important: obvious examples include mobile apps and subscription-based web2.0 services, but there are plenty more. Of course, economics goes a lot further than "how to sell your product" and a good grasp of economics will teach you much more: incentives, coordination problems, economies (and diseconomies) of scale and the benefits of specialisation and trade, for starters.

Business and organisations

There has never been such a thing as a self-contained "computer system", but it's now harder than ever to pretend that there is. Networks are everywhere, and software - your software - forms a small part of a chain that will include people, networks and other software. Where does it fit in and what problem does it solve? How does it co-exist with the rest of the system, human and machine?

Aesthetics

This is a somewhat controversial point, but I think that software needs to be beautiful to be truly successful. And I don't just mean that the code should be elegant (because often it isn't). A great work of art can be appreciated even by people who cannot comprehend how it was made, or by those who do not typically see beauty in such works. This doesn't mean that software needs to have eye candy, it just means that software needs to feel 'right' to the people using it in the way that a good painting or piece of music feels 'right' to the viewer or listener.

All of this can be summarised by saying that software must be useful. But there's a lot more that goes into making software useful than most people would commonly imagine, including most people whose job it is to make software. I once thought that it would be all about being able to optimise subroutines or create perfect data structures or choose the right design patterns, but software development now is all of those things and more.

This flies in the face of the stereotype of the software developer as someone who doesn't understand people and doesn't particularly want to. If such people really exist, then there's always going to be a place for the ultra-focused hacker who can code anything 10x faster and more reliably than his or her colleagues, but to progress from designing code to designing systems requires a different skillset. The challenge for my generation of software developers is to take what we know about code and computers and combine it with new knowledge and understanding of the world around us to make software that delivers benefits great enough to justify itself.

Footnote: In the title of this post I said that software developers are not "just" engineers. I do not mean this in the sense of "mere" engineers; I mean that engineering is one specialism within software development and not the only one. Software development is a team game and teams need good engineers as much as they need good architects and project managers.

Clouds and commodities

Posted on 3 January 2010 - 10:58pm

I've been following Sean Park's blog for about a year now, and have been finding his insights to be very interesting. His latest post, on Amazon Web Services, is typical in that he connects several different innovations to speculate about the future. What follows is my response to the ideas and issues he raises.

The latest development is Amazon's launch of 'spot' pricing for EC2 instances. In short, this means that the price for computing time can vary over time and customers can bid for that time. If their bid is equal or higher to the current spot price, they get their time. If it's lower, they don't - their currently running instances are shut down until they increase their bid or the spot price falls below their current bid. The customer's bid is only a maximum that they would be willing to pay, so if the spot price is lower than the bid then the customer only pays the spot price. This should enable cost savings for AWS customers who don't care when their processing happens, possibly at the cost of those who need it now.

As Sean points out, this is only the first baby-step towards something that we might recognise as a market. The spot price is apparently arbitrarily determined by Amazon, presumably based on their own algorithms used to monitor the capacity available on EC2. I would argue that there is a competitive market in cloud computing capacity at present, but it is a considerably less fluid market than it will be in years to come. What I think Sean and others have in mind for the future is a fluid market in which prices shift constantly in response to supply and demand across many different cloud providers, with real-time decisions being made - probably automatically - to re-allocate computing tasks in response.

The current market in computing power is impeded by technological costs - it's not always easy to move a computing task from one cloud provider to another. The different cloud providers have sufficiently different products - in terms of SLAs, connectivity, bandwidth, processing power and other available resources - that it's not trivial to determine which is offering the best value. For example, some computing tasks depend on having low-latency network access, which makes geographical and network location important. This makes commodification non-trivial (and can mean that sometimes, "a compute cycle is a compute cycle is a compute cycle" is not true).

I suspect that the market in computing power in the future will not move towards the idea of a single commodity or even a small number of commodity products. There will be many products, differentiated by a range of technical factors. As cloud computing matures, the needs of customers will become more varied, and this will mean that some products may be unsuitable for certain needs. It's not hard to imagine how some customers may place a premium on security, others on network latency, others on available system RAM and others on energy efficiency - and, of course, on price. Although any of the available products could 'get the job done', there will be big differences in suitability.

So, that's my guess. There are, I suppose, some reasons why I might be wrong. Perhaps the infrastructure required to discover and access the most suitable computing resource will be too expensive. We might be better off with a simple market with less choice, where computing time is packaged in standard units: at present, Amazon offers 21 products - seven sizes of EC2 instance multiplied by three available locations. We might imagine that competitors will have to mirror this structure and will offer a similar geographical breakdown in order to compete on a like-for-like basis with EC2. Perhaps customer needs will be less varied than I imagine, and the majority of demand can be met by the supply of a small number of commodity products delivered with high efficiency.

But if I'm right, then we're going to have a very complex market. Each customer will have unique requirements and will need to be able to explore the available products based on their suitability according to mutiple criteria. Provisioning across a wide variety of platforms will be a considerable technological challenge (though CohesiveFT seem to be doing good work here) and there is definitely scope for the development of a complex exchange (or multiple exchanges?) where bidding for computing time can take place, with appropriate agencies in place to handle provisioning once bids are accepted.

The ultimate aim should be to make the workings of this system entirely transparent to the people using it. Say I'm looking at pictures I've taken on an iPhone and I want to stitch these together with pictures taken at similar locations by other people to generate a 3D walkthrough of a particular locale - a fairly intensive, though technically feasible, task. For this I will need a certain amount of processing time and RAM, and a certain amount of bandwidth and data transfer. Assuming (with some magic involved, I admit) that my software can create an accurate estimate of my needs, it should be able to automatically go to the exchange and locate the cheapest provider that satisfies the requirements (a VRM system!) and automatically provision the required computing time. A short while later, I'm browsing a 3D landscape on my phone.

The uses for corporate customers are probably where the real value lies, but when individual consumers have seamless access to computing power in this way then we will know that we have a fully functioning and fluid market.

What's the point of software engineering?

Posted on 11 December 2009 - 11:51pm

Tom DeMarco is one of the founding fathers of modern software engineering. His 1987 book PeopleWare is a highly influential text in the field, and his ideas anticipated the Agile methodology which is now commonplace (common enough that it is now the dominant paradigm that new ideas must refine or rebel against). So when someone of his stature asks if software engineering is "An idea whose time has come and gone?", something interesting must be happening.

That's exactly what he asked in the August edition of IEEE Software magazine (PDF). DeMarco's article takes the form of a critical look back at some of his most influential ideas, in particular the importance of measurement in assessing the performance of a software project. He takes issue with his earlier statement that "You can’t control what you can’t measure", pointing out that some very successful projects have not been held back by a lack of precise measurement of their progress. This suggests that the focus on measurement and predictability that has been a basic assumption of software engineering may have been misguided.

Of course, Agile development has already moved us away from the idea of creating detailed delivery plans very early in the project lifecycle, but DeMarco's conclusions go much further. He gives the example of two projects, both of which cost $1m, but project A will eventually yield value of $1.1m and project B will yield value of $50m. Project A must be tightly controlled, its scope narrowly defined and delivery must proceed according to a strict schedule. Even minor budget overruns will cut deeply into the return on investment to be gained. Project B, however, has no real need for such tight control - if the budget doubles, it matters relatively little so long as the project is delivered.

DeMarco's epiphany is that if the project you're working on is subject to tight control, or if as a project manager you are having to enforce tight controls, this strongly suggests that the project is of only marginal value to begin with. If a small budget overrun will obliterate the profit generated from your project, maybe the project isn't such a good idea at all. And given the amount of attention that measurement and control generates within software engineering circles, it may be the case that most of the projects being worked on are of such little value that it is necessary to tightly control every aspect of their development, a difficult and costly task in itself.

DeMarco is not proposing anarchy in place of order, but is proposing that we - developers, project managers, analysts, CTOs and ultimately business decision makers - need to think differently about software projects. There remains vast unexploited potential for software to improve our lives and our businesses, and the challenge is to be bolder and more creative in identifying this, creating products that deliver sufficient return on investment to justify their existence even when allowing for the inevitable budget overrun. The current conservative culture of tight control is effectively a lie anyway, since budget estimates given at the start of a large project are unlikely to survive the collision with reality. It would, he argues, be better to admit this and to focus efforts on ensuring that the results make this worthwhile.

This struck a number of chords with me. It reminded me a little of my recent post Why winning is easer than finishing second in that DeMarco argues that if we find ourselves having to work very hard at something, it may be a sign that it's just not meant to happen. In my post I talked about the huge effort that goes into failed attempts to pitch for projects compared to the relatively small effort that goes into winning projects where the client already has a preference for you; this mirrors the point about how the least valuable projects are the ones that require the most work.

It also reminded me a lot of user-centered design, at least in the way that we apply it at PRWD. When designing a user-centric system, one of our main considerations is to ensure that what we're designing or developing delivers genuine value to users. If a feature of an application or website is unlikely to be used often, or is unlikely to add anything to the user experience, the cost of development is unlikely to be recouped. We prefer, wherever possible, to focus on projects where the return on investment is significant enough to justify doing things properly, rather than trying to squeeze schedules and budgets to make a marginal feature viable.

Ah, so that's why I hardly ever get comments...

Posted on 11 December 2009 - 9:55pm

If one person complains that my spam filter rejected their comment, I will mumble some vague apology and hope that they go away and stop bugging me about it. But when several people point out that my spam filter is repeatedly rejecting their perfectly normal comments, it's time to do something about it.

I had high hopes for Mollom, mostly because I really like its founder Dries Buytaert (who founded the Drupal project that powers this site and many others), but whilst it does an admirable job of keeping spam out, it does a less than admirable job of allowing genuine commenters in. I've now replaced it with TypePad AntiSpam, at least for a trial period of a week or two. So if you have been itching to comment but haven't been able to, now is the time!

Peer-to-peer, private and utility

Posted on 10 December 2009 - 10:01pm

This is going to be a slightly more speculative post than is usual. I'd like to explore three distinct models of delivery for services and goods. This is all pretty uninformed, as I don't really know anything about economics bar what a nasty Google Reader habit will leave you with. With that caveat in place, here goes...

It strikes me that many of the important goods we utilise as the building blocks of modern civilisation can, broadly, be managed in three different ways: as private concerns, as utilities, and as peer-to-peer goods. To sketch a rough example, electricity began with private power generation, progressed to utility provision by massive-scale entities (often state-owned or regulated) and may now be transitioning to small-scale and, in part, peer-to-peer generation using solar panels and wind farms (Nick Carr writes interestingly about this; I really must buy his book(s)). I am, of course, grossly over-simplifying, but I hope that I am catching some essential characteristic of the different approaches.

Money is another example. At times, money has been a private thing; access to it was limited largely to those who already had it. Money - as coins and notes - evolved as a representation of pre-existing private wealth. Over time, the system of goldsmiths issuing promissory notes in return for deposits of gold gave way to currency and fractional reserve banking, which eventually mutated into the utility model of mass banking we have today. Now, barely anyone deals with the local goldsmith or issues their own promissory notes and the stock of gold in the Bank of England or Fort Knox is a relatively small quantity of the wealth denominated in pounds or dollars. We interact with financial utilities, massive organisations that are intended, through their sheer size and scale, to deliver efficient supplies of capital to those who can make the best use of it. Of course, the rise of utility banking has not totally eclipsed what went before it, and this somewhat schizophrenic quality of banks remains part of the problem with them and explains why some want to split investment banking off from retail utility banking. Again, I have simplified here but I hope that I have captured some essential truth.

Computing power has actually undergone several transitions, but the major trend of recent years has been away from the idea of the private reserve of computing power - having your own servers and desktop PCs - to utility computing, where servers are provisioned dynamically from the cloud, and the computing power on the desktop is only really used to power the web browser. Seti@Home provides a glimpse at what a future peer-to-peer computing network might look like, and the forward-looking vision of those who created the internet on peer-to-peer principles should smooth the way.

It's important to note here that I do regard this as being a general trend of private to utility to peer-to-peer, but it doesn't happen all at once and each subsequent stage will not completely displace the former. There are still plenty of, say, private electricity generators in operation, they're just not a very big slice of the pie any more.

Might marketing data also follow a similar trajectory? Since time immemorial, vendors have held 'marketing data' about their customers simply by virtue of personal relationships. Shopowners would know the needs of their customers. In the 20th century, with the increasing ubiquity of large corporations, private data-gathering operations took hold. "Market research" was invented and the efficient management of information about customers and their observed needs became an important part of business. This process is still continuing, as the decrease in technology costs and the increase in opportunities to capture data about customer behaviour have led to the creation of vast private data repositories in our largest corporations. Many of the the most successful companies in business today claim that it is this deep understanding of customer needs that has driven their success.

Of course, there is a cost attached. To amass this data, store it, process it and analyse it is expensive. What's more, the benefit of gathering data increases as you gather more of it. Data about two million people's shopping habits is more than twice as valuable as data about one million people's. Once you reach the scale of Tesco (whose ClubCard loyalty scheme gives them personalised shopping history for millions of people; £1 in every £8 spent in British shops is spent there) the data is of huge importance. The investment required in gathering that amount of data is so immense that it represents a barrier to entry far bigger than the cost of acquiring premises or setting up supply lines, and control of the data is vital to the defence of a strong commercial position.

Data, as it is held now, is a classic example of a private good. It is hoarded at great cost by the few who guard it jealously, with the cost of acquiring data of their own sufficing to hold down the competition. There are, however, alternatives. The most obvious is Google, which holds a similarly huge amount of information about individuals, but has no desire to sell products to them directly. Google is happy enough to allow other retailers to piggyback on their database via advertising. Google is, essentially, a utility provider of data about customer buying intentions. For retailers, it's not as good as having your own data-mining operation, but it is dramatically cheaper. As a retailer, you only pay for what you use - on a per-click basis - and Google takes care of figuring out whether or not a person might want to buy your product. Google is not without competitors though: Facebook has a different kind of personal data about its users, and will also provide access to this data to others, for a price. For smaller businesses, this could be a boon as it erodes the advantage that mega-retailers have from their superior data. But, of course, the real winners are the utility data providers - Google and Facebook for now, though I suspect that credit card companies could get in on the act, and maybe even banks looking for a new business model could appear on the scene.

But what about peer-to-peer data? This is where VRM comes in. VRM offers the possibility that individuals will control their own personal data - their identity, their observable characteristics and attributes, and the data trail they leave from their interactions with others - and will be able to choose to share it with whomever they like. They can share it with other people, with small companies, large companies or nobody at all. In this sense, it represents something close to peer-to-peer exchange, in the sense that everyone 'creates' their own data and nobody has privileged access to it.

If you're interested in reading more of my thoughts on VRM, here are a few old posts of mine to get started with: This is my first post on VRM, when I was just figuring things out. VRM: Rent don't buy explores some of the concepts I mentioned in this post and What's a consumer? considers how a VRM world might change the relationship between 'consumers' and vendors.

It would be interesting to consider other goods which have undergone similar shifts. The written word, perhaps - from narrow, private creation by the few, to mass consumption (the printing press) to mass creation (blogs, wikis etc.). Is this a pattern that is far more fundamental than I realise?