As we move from “tens” to “hundreds” to “thousands” of nodes in a typical data centre we need new tools and practices. This hyperscale story – of hyper-dense racks with wimpy nodes – is the big shift in the physical world which matches the equally big shift to cloud computing in the virtualised world. Ubuntu’s popularity in the cloud comes in part from being leaner, faster, more agile. And MAAS – Metal as a Service – is bringing that agility back to the physical world for hyperscale deployments.

Servers used to aspire to being expensive. Powerful. Big. We gave them names like “Hercules” or “Atlas”. The bigger your business, or the bigger your data problem, the bigger the servers you bought. It was all about being beefy – with brands designed to impress, like POWER and Itanium.

Things are changing.

Today, server capacity can be bought as a commodity, based on the total cost of compute: the cost per teraflop, factoring in space, time, electricity. We can get more power by adding more nodes to our clusters, rather than buying beefier nodes. We can increase reliability by doubling up, so services keep running when individual nodes fail. Much as RAID changed the storage game, this scale-out philosophy, pioneered by Google, is changing the server landscape.

In this hyperscale era, each individual node is cheap, wimpy and, by historical standards for critical computing, unreliable. But together, they’re unstoppable. The horsepower now resides in the cluster, not the node. Likewise, the reliability of the infrastructure now depends on redundancy, rather than heroic performances from specific machines. There is, as they say, safety in numbers.

We don’t even give hyperscale nodes proper names any more – ask “node-0025904ce794”. Of course, you can still go big with the cluster name. I’m considering “Mark’s Magnificent Mountain of Metal” – significantly more impressive than “Mark’s Noisy Collection of Fans in the Garage”, which is what Claire will probably call it. And that’s not the knicker-throwing kind of fan, either.

The catch to this massive multiplication in node density, however, is in the cost of provisioning. Hyperscale won’t work economically if every server has to be provisioned, configured  and managed as if it were a Hercules or an Atlas. To reap the benefits, we need leaner provisioning processes. We need deployment tools to match the scale of the new physical reality.

That’s where Metal as a Service (MAAS) comes in. MAAS makes it easy to set up the hardware on which to deploy any service that needs to scale up and down dynamically – a cloud being just one example. It lets you provision your servers dynamically, just like cloud instances – only in this case, they’re whole physical nodes. “Add another node to the Hadoop cluster, and make sure it has at least 16GB RAM” is as easy as asking for it.

With a simple web interface, you can  add, commission, update and recycle your servers at will.  As your needs change, you can respond rapidly, by adding new nodes and dynamically re-deploying them between services. When the time comes, nodes can be retired for use outside the MAAS.

As we enter an era in which ATOM is as important in the data centre as XEON, an operating system like Ubuntu makes even more sense. Its freedom from licensing restrictions, together with the labour saving power of tools like MAAS, make it cost-effective, finally, to deploy and manage hundreds of nodes at a time

Here’s another way to look at it: Ubuntu is bringing cloud semantics to the bare metal world. What a great foundation for your IAAS.

Innovation and OpenStack: Lessons from HTTP

Thursday, September 8th, 2011

OpenStack is facing an important choice: does define a new set of API’s, one of many such efforts in cloud infrastructure, or does it build around the existing AWS API’s?  So far, OpenStack has had it both ways, with some new API work and also some AWS-based effort. I’m writing to make the case for a tighter definition of mission around the de facto standard infrastructure API’s of EC2, S3 and a few other elements of AWS.

What prompted this blog was my overhearing (or, seeing an email on a list) the statement that cloud infrastructure projects like OpenStack, Eucalyptus and others should “innovate at the level of the API and infrastructure concepts”. I’m of the view that any projects which try to do so will fail and are not worth spending your or my time on. They are going to be about as successful as projects that try to reinvent HTTP to make it better/faster/cleaner/whatever. Which is to say – not successful at all, because no new protocol with the same conceptual goals will match the ecosystem that exists today around HTTP. There will of course be protocol innovation, the last word is never written, but for the web, it’s a done deal. All the proprietary and ad-hoc things that preceded HTTP have died, and good riddance. Similarly, cloud infrastructure will converge around a standard API which will be imperfect but real. Innovation is all about how that API is implemented, not which API it is.

Nobody would say the web server market lacks innovation. There are many, many different companies and communities that make and market web server solutions. And each of those is innovating in some way – focusing on a different audience, or trying a different approach. Yet that entire market is constrained by a public standard: HTTP, which evolves far more slowly than the products that implement it.

There are also a huge number of things that wrap themselves around HTTP, from cache accelerators to 3G content compressors; the standardisation of that thin layer has created a massive ecosystem and driven fantastic innovation, even as many of the core concepts that drove HTTP’s initial design have eroded or softened. For example, HTTP was relentlessly stateless, but we’ve added cookies and cacheing to address issues caused by that (at the time radical) design constraint.

Today, cloud infrastructure is looking for its HTTP. I think that standard already exists in de facto form today at AWS, with EC2, S3 and some of the credential mechanisms being essentially the core primitives of cloud infrastructure management. There is enormous room for innovation in cloud infrastructure *implementations*, even within the constraints of that minimalist API. The hackers and funders and leaders and advocates of OpenStack, and any number of other cloud infrastructure projects both open source and proprietary, would be better off figuring out how to leverage that standardisation than trying to compete with it, simply because no other API is likely to gain the sort of ecosystem we see around AWS today.

It’s true that those API’s would better be defined in a clean, independent forum analogous to the W3C than inside the boiler-room of development at any single cloud provider, but that’s a secondary issue. And over time, it can be engineered to work that way.

More importantly for the moment, those who make an authentic effort to fit into the AWS protocol standard immediately gain access to chunks of the AWS gene pool, effectively gratis. From services like RightScale to tools like ElasticFox, your cloud is going to be more familiar, more effective and more potent if it can ease the barriers to porting from AWS. No two implementations will magically Just Work, but the rough edges and gotchas can get ironed out much more easily if there is a clear standard and reference implementations. So the cost of “porting” will always be lower between clouds that have commonality by design or heritage.

For OpenStack itself, until that standard is codified, I would describe the most successful mission statement as “to be the reference public cloud provider scale implementation of cloud infrastructure compatible with AWS core API’s”. That’s going to give all the public cloud providers who want to compete with Amazon the best result: they’ll be able to compete on service terms, while convincing early adopters that the move to their offering will be relatively painless. All it takes, really, is some humility and the wisdom to recognise the right place to innovate.

There will be many implementations of those core API’s. One or other will be the Apache, the “just start here” option. But it doesn’t matter so much which one that is, frankly. I think OpenStack has the best possible chance to be that, but only if they stick to this crisp mission and don’t allow themselves to be drawn into front-end differentiation for the sake of it. Should that happen, OpenStack will be vulnerable to another open source project which credibly aims to achieve the goals outlined here. Very vulnerable. Witness the ways in which Eucalyptus is rightly pointing out its superior AWS compatibility in comparison with OpenStack.

For the public cloud providers that hope to build on OpenStack, API differentiation is poison in a juicy steak. It looks tasty, but it’s going to cost you the race prematurely. There were lots of technical reasons why alternatives to Windows were *better*, they just failed to become de facto standards. As long as Amazon doesn’t package up AWS as an on-premise solution, it’s possible to establish a de facto standard around something else, but that something else (perhaps OpenStack) needs to be AWS-compatible in some meaningful way to get enough momentum to matter. That means there’s a window of opportunity to get this right, which is not going to stay open indefinitely. Either Amazon, or another open source project, could close that window on OpenStack’s fingers. And that would be a pity, since the community around OpenStack has tons of energy and goodwill. In order to succeed, it will need to channel that energy into innovation on the implementation, not on trying to redefine an existing standard.

Of course, all this would be much easier if there were a real HTTP-like standard defining those API’s. The web had the enormous advantage of being founded by Tim Berners-Lee, in an institution like CERN, with the vision to setup the W3C. In the case of today’s cloud infrastructure, there isn’t the same dynamic or set of motivations. Amazon’s position of vagueness on the AWS API’s is tactically perfect for them right now, and I would expect them to maintain that line while knowing full well there is no real proprietary claim in a public network API, and no real advantage to be had from claiming otherwise. What’s needed is simply to start codifying existing practice as a draft standard in a credible forum of experts, with a roadmap and the prospect of support from multiple vendors. I think that would be relatively easy to arrange, if we could get Rackspace, IBM and HP to sit down and commit to doing it. We already have HP and Rackspace at the OpenStack table, so the signs are encouraging.

A good standard would:

* be pragmatic about the fact that Amazon has already made a bunch of decisions we’ll live with for ever.
* have a commitment from folk like OpenStack and Eucalyptus to aim for compliance
* include a real automated functional test suite that becomes the interop benchmark of choice over time
* be open to participation by Amazon, though that will not likely come for some time
* be well documented and well managed, like HTTP and CSS and HTML
* not be run but the ITU or ISO

I’m quite willing to contribute resources to getting such a standard off the ground. Forget big consortiums or working groups or processes or lobbying forums, what’s needed are a few savvy folk who know AWS, Eucalyptus and OpenStack, together with a very few technical writers. Let me know if you’re interested.

Now, I started out by saying that I was writing to make the case for OpenStack to be focused on a particular area. It’s a bit cheeky for me to write anything of the sort, of course, because OpenStack is a well run project that has an excellent steering group, which recently held a poll of contributors to appoint some new members, none of which was me. I’ve every confidence in the leadership of the project, despite the tremendous pressure they are under to realise the hopes of so many diverse users and companies. I’m optimistic for the potential OpenStack has to accelerate cloud technology, and in Canonical we put a considerable amount of effort into making OpenStack deployment a smooth experience for Ubuntu users and Canonical customers. Ubuntu Cloud Infrastructure depends now on OpenStack. And I have a few old friends who are also leaders in the OpenStack community, so for all those reasons I thought it worth making this perspective public.