This is a series of posts on reasons to choose Ubuntu for your public or private cloud work & play.

We run an extensive program to identify issues and features that make a difference to cloud users. One result of that program is that we pioneered dynamic image customisation and wrote cloud-init. I’ll tell the story of cloud-init as an illustration of the focus the Ubuntu team has on making your devops experience fantastic on any given cloud.

 

Ever struggled to find the “right” image to use on your favourite cloud? Ever wondered how you can tell if an image is safe to use, what keyloggers or other nasties might be installed? We set out to solve that problem a few years ago and the resulting code, cloud-init, is one of the more visible pieces Canonical designed and built, and very widely adopted.

Traditionally, people used image snapshots to build a portfolio of useful base images. You’d start with a bare OS, add some software and configuration, then snapshot the filesystem. You could use those snapshots to power up fresh images any time you need more machines “like this one”. And that process works pretty amazingly well. There are hundreds of thousands, perhaps millions, of such image snapshots scattered around the clouds today. It’s fantastic. Images for every possible occasion! It’s a disaster. Images with every possible type of problem.

The core issue is that an image is a giant binary blob that is virtually impossible to audit. Since it’s a snapshot of an image that was running, and to which anything might have been done, you will need to look in every nook and cranny to see if there is a potential problem. Can you afford to verify that every binary is unmodified? That every configuration file and every startup script is safe? No, you can’t. And for that reason, that whole catalogue of potential is a catalogue of potential risk. If you wanted to gather useful data sneakily, all you’d have to do is put up an image that advertises itself as being good for a particular purpose and convince people to run it.

There are other issues, even if you create the images yourself. Each image slowly gets out of date with regard to security updates. When you fire it up, you need to apply all the updates since the image was created, if you want a secure machine. Eventually, you’ll want to re-snapshot for a more up-to-date image. That requires administration overhead and coordination, most people don’t do it.

That’s why we created cloud-init. When your virtual machine boots, cloud-init is run very early. It looks out for some information you send to the cloud along with the instruction to start a new machine, and it customises your machine at boot time. When you combine cloud-init with the regular fresh Ubuntu images we publish (roughly every two weeks for regular updates, and whenever a security update is published), you have a very clean and elegant way to get fresh images that do whatever you want. You design your image as a script which customises the vanilla, base image. And then you use cloud-init to run that script against a pristine, known-good standard image of Ubuntu. Et voila! You now have purpose-designed images of your own on demand, always built on a fresh, secure, trusted base image.

Auditing your cloud infrastructure is now straightforward, because you have the DNA of that image in your script. This is devops thinking, turning repetitive manual processes (hacking and snapshotting) into code that can be shared and audited and improved. Your infrastructure DNA should live in a version control system that requires signed commits, so you know everything that has been done to get you where you are today. And all of that is enabled by cloud-init. And if you want to go one level deeper, check out Juju, which provides you with off-the-shelf scripts to customise and optimise that base image for hundreds of common workloads.

Microsoft has built an impressive new entrant to the Infrastructure-as-a-Service market, and Ubuntu is there for customers who want to run workloads on Azure that are best suited to Linux. Windows Azure was built for the enterprise market, an audience which is increasingly comfortable with Ubuntu as a workhorse for scale-out workloads; in short, it’s a good fit for both of us, and it’s been interesting to do the work to bring Ubuntu to the platform.

Given that it’s normal for us to spin up 2,000-node Hadoop clusters with Juju, it will be very valuable to have a new enterprise-oriented cloud with which to evaluate performance, latency, reliability, scalability and many other key metrics for production deployment scenarios.

As IAAS grows in recognition as a standard part of the enterprise toolkit, it will be important to have a wide range of infrastructures that are addressable, with diverse strengths. In the case of Windows Azure, there is clearly a deep connection between Windows-based IT and the new IAAS. But I think Microsoft has set their sights on a bigger story, which is high-quality enterprise-oriented infrastructure that is generally useful. That’s why Ubuntu is important to them, and why it was worthwhile for us to work together despite our differences. Just as we need to ensure that customers can run Ubuntu and Windows together inside their data centre and on the LAN, we want to ensure that cloud workloads play nicely.

The team leading Azure has a sophisticated understanding of Ubuntu and Linux in general. They are taking a pragmatic approach that will raise eyebrows around the Redmond campus, but is exactly what customers want to see. We have taken a similar view. I know there will be members of the free software community that will leap at the chance to berate Microsoft for its very existence, but it’s not very Ubuntu to do so: let’s argue our perspective, work towards our goals, be open to those who are open to us, and build great stuff. There is nothing proprietary in Ubuntu-for-Azure, and no about-turn from us on long-held values. This is us making sure our audience, and especially the enterprise audience, can benefit from the work our community and Canonical do no matter where they want to do it.

Windows Azure IAAS is in beta. If you are using the cloud today, or interested in it, I highly recommend you try it out. There’s no better way to make yourself heard over there.

As we move from “tens” to “hundreds” to “thousands” of nodes in a typical data centre we need new tools and practices. This hyperscale story – of hyper-dense racks with wimpy nodes – is the big shift in the physical world which matches the equally big shift to cloud computing in the virtualised world. Ubuntu’s popularity in the cloud comes in part from being leaner, faster, more agile. And MAAS – Metal as a Service – is bringing that agility back to the physical world for hyperscale deployments.

Servers used to aspire to being expensive. Powerful. Big. We gave them names like “Hercules” or “Atlas”. The bigger your business, or the bigger your data problem, the bigger the servers you bought. It was all about being beefy – with brands designed to impress, like POWER and Itanium.

Things are changing.

Today, server capacity can be bought as a commodity, based on the total cost of compute: the cost per teraflop, factoring in space, time, electricity. We can get more power by adding more nodes to our clusters, rather than buying beefier nodes. We can increase reliability by doubling up, so services keep running when individual nodes fail. Much as RAID changed the storage game, this scale-out philosophy, pioneered by Google, is changing the server landscape.

In this hyperscale era, each individual node is cheap, wimpy and, by historical standards for critical computing, unreliable. But together, they’re unstoppable. The horsepower now resides in the cluster, not the node. Likewise, the reliability of the infrastructure now depends on redundancy, rather than heroic performances from specific machines. There is, as they say, safety in numbers.

We don’t even give hyperscale nodes proper names any more – ask “node-0025904ce794”. Of course, you can still go big with the cluster name. I’m considering “Mark’s Magnificent Mountain of Metal” – significantly more impressive than “Mark’s Noisy Collection of Fans in the Garage”, which is what Claire will probably call it. And that’s not the knicker-throwing kind of fan, either.

The catch to this massive multiplication in node density, however, is in the cost of provisioning. Hyperscale won’t work economically if every server has to be provisioned, configured  and managed as if it were a Hercules or an Atlas. To reap the benefits, we need leaner provisioning processes. We need deployment tools to match the scale of the new physical reality.

That’s where Metal as a Service (MAAS) comes in. MAAS makes it easy to set up the hardware on which to deploy any service that needs to scale up and down dynamically – a cloud being just one example. It lets you provision your servers dynamically, just like cloud instances – only in this case, they’re whole physical nodes. “Add another node to the Hadoop cluster, and make sure it has at least 16GB RAM” is as easy as asking for it.

With a simple web interface, you can  add, commission, update and recycle your servers at will.  As your needs change, you can respond rapidly, by adding new nodes and dynamically re-deploying them between services. When the time comes, nodes can be retired for use outside the MAAS.

As we enter an era in which ATOM is as important in the data centre as XEON, an operating system like Ubuntu makes even more sense. Its freedom from licensing restrictions, together with the labour saving power of tools like MAAS, make it cost-effective, finally, to deploy and manage hundreds of nodes at a time

Here’s another way to look at it: Ubuntu is bringing cloud semantics to the bare metal world. What a great foundation for your IAAS.

Building clouds for fun and profit

Monday, September 19th, 2011

So you’d like to spin up an internal cloud for hadoop or general development, shifting workloads from AWS to your own infrastructure or prototyping some new cloud services?

Call Canonical’s cloud infrastructure design and consulting team.

There are a couple of scenarios that we’re focused on at the moment, where we can offer standardised engagements:

  • Telco’s building out cloud infrastructures for public cloud services. These are aiming for specific markets based on geography or network topology – they have existing customers and existing networks and a competitive advantage in handling outsourced infrastructure for companies that are well connected to them, as well as a jurisdictional advantage over the global public cloud providers.
  • Cloud infrastructure prototypes at a division or department level. These are mostly folk who want the elasticity and dynamic provisioning of AWS in a private environment, often to work on products that will go public on Rackspace or AWS in due course, or to demonstrate and evaluate the benefits of this sort of architecture internally.
  • Cloud-style legacy deployments. These are folk building out HPC-type clusters running dedicated workloads that are horizontally scaled but not elastic. Big Hadoop deployments, or Condor deployments, fall into this category.

Cloud has become something of a unifying theme in many of our enterprise and server-oriented conversations in the past six months. While not everyone is necessarily ready to shift their workloads to a dynamic substrate like Ubuntu Cloud Infrastructure (powered by OpenStack) it seems that most large-scale IT deployments are embracing cloud-style design and service architectures, even when they are deploying on the metal. So we’ve put some work into tools which can be used in both cloud and large-scale-metal environments, for provisioning and coordination.

With 12.04 LTS on the horizon, OpenStack exploding into the wider consciousness of cloud-savvy admins, and projects like Ceph and CloudFoundry growing in stature and capability, it’s proving to be a very dynamic time for IT managers and architects. Much as the early days of the web presented a great deal of hype and complexity and options, only to settle down into a few key standard practices and platforms, cloud infrastructure today presents a wealth of options and a paucity of clarity; from NoSQL choices, through IAAS choices, through PAAS choices. Over the next couple of months I’ll outline how we think the cloud stack will shape up. Our goal is to make that “clean, crisp, obvious” deployment Just Work, bringing simplicity to the cloud much as we strive to bring it on the desktop.

For the moment, though, it’s necessary to roll up sleeves and get hands a little dirty, so the team I mentioned previously has been busy bringing some distilled wisdom to customers embarking on their cloud adventures in a hurry. Most of these engagements started out as custom consulting and contract efforts, but there are now sufficient patterns that the team has identified a set of common practices and templates that help to accelerate the build-out for those typical scenarios, and packaged those up as a range of standard cloud building offerings.