Archive for the 'free software' Category

Good architectural layering, and Bzr 1.1

Wednesday, January 9th, 2008

I completely failed to blog the release of Bzr 1.0 last year, but it was an excellent milestone and by all accounts, very well received. Congratulations to the Bazaar community on their momentum! I believe that the freeze for 1.1 is in place now so it’s great to see that they are going to continue to deliver regular releases.

I’ve observed a surge in the number of contributors to Bazaar recently, which has resulted in a lot of small but useful branches with bugfixes for various corner cases, operating systems and integrations with other tools. One of the most interesting projects that’s getting more attention is BzrEclipse, integrating Bzr into the Eclipse IDE in a natural fashion.

I think open source projects go through an initial phase where they work best with a tight group of core contributors who get the basics laid out to the point where the tool or application is usable by a wider audience. Then, they need to make the transition from being “closely held” to being open to drive-by contributions from folks who just want to fix a small bug or add a small feature. That’s quite a difficult transition, because the social skills required to run the project are quite different in those two modes. It’s not only about having good social skills, but also about having good processes that support the flow of new, small contributions from new, unproven contributors into the code-base.

It seems that one of the key “best practices” that has emerged is the idea of plug-in architectures, that allow new developers to contribute an extension, plug-in or add-on to the codebase without having to learn too much about the guts of the project, or participate in too many heavyweight processes. I would generalize that and say that good design, with clearly though-through and pragmatic layers, allow new contributors to make useful contributions to the code-base quickly because they present useful abstractions early on.

Firefox really benefited from their decision to support cross-platform add-ons. I’m delighted to hear that OpenOffice is headed in the same direction.

Bazaar is very nicely architected. Not only is there a well-defined plug-in system, but there’s also a very useful and pragmatic layered architecture which keeps the various bits of complexity contained for those who really need to know. I’ve observed how different teams of contributors, or individuals, have introduced whole new on-disk formats with new performance characteristics, completely orthogonally to the rest of the code. So if you are interested in the performance of status and diff, you can delve into working tree state code without having to worry about long-term revision storage or branch history mappings.

Layering can also cause problems, when the layers are designed too early and don’t reflect the pragmatic reality of the code. For example, witness the “exchange of views” between the ZFS folks and the Linux filesystem community, who have very different opinions on the importance and benefits of layering.

Anyhow, kudos to the Bazaar guys for the imminent 1.1, and for adopting an architecture that makes it easier for contributors to get going.

I’m absolutely thrilled to see this chart of untriaged bugs in Inkscape since the project moved to Launchpad:

Untriaged Inkscape bugs after move to LP

As you can see, the Inkscape community has been busy triaging and closing bugs, radically reducing the “new and unknown” bug count and giving the developers a tighter, more focused idea of where the important issues are that need to be addressed.

A lot of my personal interest in free software is motivated by the idea that we can be more efficient if we collaborate better. If we want free software to be the norm for personal computing software, then we have to show, among other things, that the open, free software approach taps into the global talent pool in a healthier, more dynamic way than the old proprietary approach to building software does. We don’t have money on our side, but we do have the power of collaboration.

I put a lot of personal effort into Launchpad because I love the idea that it can help lead the way to better collaboration across the whole ecosystem of free software development. I look for the practices which the best-run projects follow, and encourage the Launchpad guys to make it easy for everyone to do those things. These improvements and efficiencies will help each project individually, but it also helps every Linux distribution as well. This sort of picture gives me a sense of real accomplishment in that regard.

Bryce Harrington, who happens to work for Canonical and is a member of the Inkscape team, told me about this and blogged the experience. I’ve asked a few other Inkscape folks, and they seem genuinely thrilled at the result. I’m delighted. Thank you!

Is it possible to have training materials that are developed in partnership with the community, available under a CC license, AND make those same materials available through formal training providers? We’re trying to find out at Canonical with our Ubuntu Desktop Course.

Billy Cina @Canonical has been making steady progress towards the goal of having a full portfolio of training options available for commercial users of Ubuntu. Companies that want to ensure that their staff are rigorously trained, and individuals who want to present their Ubuntu credentials in a formal setting, need to have a certified and trusted framework for skills assurance.

Most of the work we are doing in this line is following the traditional model, where content is funded as a private investment, and the content is then licensed to authorized training providers who sell courses to their local markets. These courses are usually sold to companies that have adopted a platform or tool and want to ensure a consistent level of skills across the organization. Many companies are moving to Ubuntu for both desktop and server, so demand is hotting up for this capability. We have a system builder course, and a system administrator course are now available from authorized training providers.

But we wanted also to try a different approach, that might be more accessible to the Ubuntu community and might also result in even higher quality materials. We think the key ingredients are:

  • Use of an open format (Docbook)
  • Content source available in a public Bazaar repository (here)
  • Licensing under open terms (CC-BY-NC-SA)
  • Working with the Ubuntu doc-team, who have a wealth of experience

The license is copyleft and non-commercial, so that it is usable by any person for their own education and edification with the requirement that commercial use will involve some contribution back to the core project.

It’s already a 400 page book which gives a great overview of the Ubuntu desktop experience, a very valuable resource for folks who are new to Linux and Ubuntu.

We are getting to the point where we can publish a “daily PDF” which will have the very latest version (“trunk”) compiled overnight. So anyone has free access to the very latest version, and of course anyone can bzr branch the content to make changes that suit them.

If you want to have a look at the latest content, try this:


bzr launchpad-login <your-lp-username
bzr branch lp:ubuntu-desktop-course

The source is huge (712MB, lots of images in a large book), so grab a cup of tea, and when you get back you will have the latest version of the content, hot and well-brewed 🙂 This is a great set of materials if you are offering informal training. Corrections and additions would be most welcome, just push your branch up to Launchpad and request a merge of your changes.

It’s too early to say for certain, but there are very encouraging signs that the world’s standards bodies will vote in favour of a single unified ISO (“International Standards Organisation”) document format standard. There is already one document format standard – ODF, and currently the ISO is considering a proposal to bless an alternative, Microsoft’s OpenXML, as another standard. In the latest developments, standards committees in South Africa and the United States have both said they will vote against a second standard and thereby issue a strong call for unity and a sensible, open, common standard for business documents in word processing, spreadsheets and presentations.

It’s very important that we build on those brave decisions and call on all of our national standards committees, to support the idea of a single common standard for these critical documents.

The way the ISO works is interesting. There are about 150 member countries who can vote on any particular proposal. Usually, about 40 countries actually vote. In order to pass, a proposal needs to get a 75% “yes” vote. Countries can vote yes, no, or “abstain”. So normally, 10 “no” or “abstain” votes would be sufficient to send the proposal back for further consideration. In this case, however, Microsoft has been working very hard, and spending a lot of money, to convince many countries that don’t normally vote to support their proposed format.

So there is something concrete you can do, right now, today, this week! Find out which body in your country is responsible for your national representation on ISO. In SA is the South African Bureau of Standards (SABS) and in the US I believe it is ANSI. Your country will likely have such a body. There is a list of some of them here but it may not be complete so don’t stop if your country isn’t listed there!

Call them, or email them, and ask them which committee will be voting in the OpenXML proposal. Then prepare a comment for that committee. It is really important that your comment be professional and courteous. You are dealing with strong technical people who have a huge responsibility and take it seriously – they will not take you seriously if your comment is not well thought out, politely phrased and logically sound.

If you have a strong technical opinion, focus on a single primary technical issue that you think is a good reason to decline the proposal from Microsoft. There are some good arguments outlined here. Don’t just resend an existing submission – find a particular technical point which means a lot to you and express that carefully and succinctly for your self. It can be brief – a single paragraph, or longer. There are some guidelines for “talking to standards bodies” here.

Here are the points I find particularly compelling, myself:

  1. This is not a vote “for or against Microsoft”.
    In fact, this is a vote for or against a unified standard. Microsoft is a member of the body that defines ODF (the existing ISO standard) but is hoping to avoid participating in that, in favor of getting their own work blessed as a standard. A vote of “no OpenXML” is vote against multiple incompatible standards, and hence a vote in favour of unity.If the ISO vote is “no”, then there is every reason to expect that Microsoft will adopt ODF, and help to make that a better standard for everybody including themselves. If we send a firm message to Microsoft that the world wants a single, unified standard, and that ODF is the appropriate place for that standard to be set, then we will get a unified global standard that includes Microsoft.The reason this point is important is because many government officials recognise the essential position Microsoft holds in their operations and countries, and they will be afraid to vote in a way that could cost their country money. If they perceive that a vote “no” might make it impossible for them to work with Microsoft, they will vote yes. Of course Microsoft is telling them this, but the reality is that Microsoft will embrace a unified standard if the global standards organisation clearly says that’s a requirement.
  2. Open, consensus based document standards really WORK WELL – consider HTML
    We already have an extraordinary success in defining a document format openly, in the form of HTML. The W3 Consortium, which includes Microsoft and many other companies, defines HTML and CSS. While Microsoft initially resisted the idea, preferring to push Internet Explorer’s proprietary web extensions, it was ultimately forced to participate in W3C discussions.The result is a wonderfully rich document format, with many different implementations. Much of the richness of the web today comes directly from the fact that there is an open standard for web documents and web interactions. Look at a classy web page, and then look at a classy Word document, and ask yourself which is the most impressive format! Clearly, Word would be better with an open standard, not one defined by a single company.
  3. A SINGLE standard with many implementations is MUCH more valuable than multiple standards
    Imagine what would happen if there were multiple incompatible web document standards? You couldn’t go to any web site and just expect it to work, you would need to know which format they used. The fact that there is one web document standard – HTML – is the key driver of the efficiency of the web as a repository of information. The web is a clear example of why ODF is the preferred structure for a public standard.ODF, the existing standard, is defined openly by multiple companies, and Microsoft can participate there along with everyone else. They know they can – and they participate in other standards discussions in the same organisation.Microsoft will say that “multiple standards give customers choice”. But we know that it is far more valuable to have a single standard which evolves efficiently and quickly, like HTML. The network effects of document exchange mean that one standard will in any event emerge as dominant, and it is important to governments, businesses and consumers that it be a standard which ITSELF offers great choice in implementation. People don’t buy a standard, and they don’t use a standard document, they use a software or hardware tool. If the “standard” only has one set of tools from one vendor, then that “choice of standards” has effectively resulted in zero choice of provider for customers. Consider the richness of the GSM cellular world, with hundreds of solution providers following a single global standard, compared to the inefficiency of countries which allowed proprietary networks to be installed on public frequencies.ODF is already implemented by many different companies. This means that there are many different tools which people can choose to do different things with their ODF documents. Some of those tools are optimised for the web, others for storage, others for data analysis, and others for editing. In the case of OpenXML, there is not even one single complete implementation – because even Microsoft Office12 does not exactly implement OpenXML. There is also no other company with any tool to edit or manage OpenXML documents. Microsoft is trying to make it look like there is broad participation, but dig beneath the surface and it is all funded by one company. The ODF standard is a much healthier place to safeguard all of our data.

I’d like to thank the team at TSF for the work they put into briefing the South African standards committee. I hope that each of you – folks who have read this far, will pick up the phone and contact your own standards body to help them make a smart decision.

The USA, South Africa, China, and other countries will be voting “no”. Let’s not allow heavy lobbying to influence what should be a calm, rational, sensible and ultimately technical discussion. Standards are important, and best defined in transparent and open forums. Pick up the phone!

With projects like Gobuntu and gNewSense aiming to provide a platform that is zealous about free software, the obvious question is “where can I run it?”. And right now, as far as laptops go, there are no good answers. Pretty much any laptop you can buy today needs some sort of non-free bits to make the most of its hardware, putting you in the tricky position of having to choose between hardware usefulness and software freedom. And boy, do we know about that choice in Ubuntu!

There have been several threads about this, in comments on this blog and also on comments to Bug #1. Most of them have focused on free drivers but we should also be thinking about OpenBIOS (the new name for the LinuxBIOS project). An ideal solution would also use firmware that has a free software licence as well, but I personally would see OpenBIOS and free drivers as a good start.

Right now, software freedom isn’t a huge priority for most of the companies that make up components for the PC and laptop industry. If we want to get onto their radar screen, we need to show that its worth their while to think about it. To that end I’d like to build up a list of people who are interested in this idea, and would potentially buy a high-powered laptop if it were guaranteed to work completely with free software drivers and OpenBIOS.

So I’ve setup a mailing list over here:

Please go ahead and join that list if you think you would seriously consider buying a laptop that was powerful and designed specifically to be free-software friendly.

This is a totally moderated list – I’ll only allow messages through that specifically let people know about the possibility of acquiring a laptop that can pass the free software test. So it’s news-only, and ultra-low traffic. If we can get sufficient numbers of people to express interest in such a laptop then I will start hunting for an OEM to offer a solution for pre-order.

I’ve also started to sketch out the components and specifications for a laptop that would meet these requirements here:

It will take a lot of committed buyers to move from concept to execution but if we can pull it off it will have an excellent ripple effect in the PC hardware industry. Make yourself heard!

Gobuntu is… go

Tuesday, July 10th, 2007

Thanks to Colin and Evan’s efforts we now have daily images of a freedom-focused flavour of Ubuntu, “Gobuntu”. This is a call for developers who are interested in pushing the limits of content and code freedom – including firmware, content, and authoring infrastructure, to join the team and help identify places where we must separate out pieces that don’t belong in Gobuntu from the standard Ubuntu builds.

At the moment this primarily addresses hardware drivers but as the team grows we will be able to maintain a bigger delta between Ubuntu and Gobuntu. The goal is to provide a cleaner and easier to maintain base for projects like gNewSense. Bug reports are welcome, but patches and offers of help will get better results.

Thanks guys!

Update: a number of comments have asked what Gobuntu is. It is a flavour of Ubuntu (like Kubuntu or Xubuntu) that is basically the same desktop environment as Ubuntu (a GNOME desktop) and a very strict set of restrictions on the licences of code and content. This means that we try to strip out ANYTHING which is not modifiable and redistributable, including firmware, PDF’s, video footage, sounds etc. We are trying to apply the FSF “rights” definition to everything in the platform. Gobuntu will not correctly enable much hardware today – but it exists as a banner for the cause of software freedom and as a reference of what IS possible with a totally rigorous approach. The goal is to make it a real point of pride to be able to run Gobuntu on a laptop or desktop or server, because it means that all of the stars have aligned to ensure that you have complete freedom to use that hardware with free software.

Joining: there is now a gobuntu-devel mailing list for folks interested in Gobuntu development.

Continuing my discussion of version control tools, I’ll focus today on the importance of the merge capability of the tool.

The “time to branch” is far less important than the “time to merge”. Why? Because merging is the act of collaboration – it’s when one developer sets down to integrate someone else’s work with their own. We must keep the cost of merging as low as possible if we want to encourage people to collaborate as much as possible. If a merge is awkward, or slow, or results in lots of conflicts, or breaks when people have renamed files and directories, then I’m likely to avoid merging early and merging often. And that just makes it even harder to merge later.

The beauty of distributed version control comes in the form of spontaneous team formation, as people with a common interest in a bug or feature start to work on it, bouncing that work between them by publishing branches and merging from one another. These teams form more easily when the cost of branching and merging is lowered, and taking this to the extreme suggests that it’s very worthwhile investing in the merge experience for developers.

In CVS and SVN, the “time to branch” is low, but merging itself is almost always a painful process. Worse, merging a second time from another branch is WORSE, so the incentives for developers to merge regularly are exactly the wrong way around. For merge to be a smooth experience, the tools need to keep track of what has been merged before, so that you never end up redoing work that you’ve already solved. Bzr and Git both handle this pretty well, remembering which revisions in someone else’s branch you have already integrated into yours, and making sure that you don’t need to bother to do it again.

When we encourage people to “do their own thing” with version control, we must also match that independence with tools to facilitate collaboration.

Now, what makes for a great merge experience?

Here are a couple of points:

  1. Speed of the merge, or time it will take to figure out what’s changed, and do a sane job of applying those changes to your working tree. Git is the undisputed champion of merge speed. Anything less than a minute is fine.
  2. Handling of renames, especially renamed directories. If you merge from someone who has modified a file, and you have renamed (and possibly modified) the same file, then you want their change to be applied to the file in your working tree under the name YOU have given it. It is particularly important, I think, to handle directory renames as a first class operation, because this gives you complete freedom to reshape the tree without worrying about messing up other people’s merges. Bzr does this perfectly – even if you have subsequently created a file with the same name that the modified file USED to have, it will correctly apply the change to the file you moved to the new name.
  3. Quality of merge algorithm. This is the hardest thing to “benchmark” because it can be hugely subjective. Some merge algorithms take advantage of annotation data, for example, to minimise the number of conflicts generated during a merge. This is a highly subjective thing but in my experience Bzr is fantastic in merge quality, with very few cases of “stupid” conflicts even when branches are being bounced around between ad-hoc squads of developers. I don’t have enough experience of merging with tools like Darcs which have unusual characteristics and potentially higher-quality merges (albeit with lots of opportunity for unexpected outcomes).

I like the fact that the Bazaar developers made merging a first-class operation from the start, rather than saying “we have a few shell scripts that will help you with that” they focused on techniques to reduce the time that developers spend fixing up merges. A clean merge that takes 10 seconds longer to do saves me a huge amount of time compared to a dirty (conflict-ridden, or rename-busted) merge that happened a few seconds faster.

Linus is also a very strong advocate of merge quality. For projects which really want as much participation as possible, merge quality is a key part of the developer experience. You want ANYBODY to feel empowered to publish their contribution, and you want ANYBODY to be willing to pull those changes into their branches with confidence that (a) nothing will break and (b) they can revert the merge quickly, with a single command.

No negotiations with Microsoft in progress

Saturday, June 16th, 2007

There’s a rumour circulating that Ubuntu is in discussions with Microsoft aimed at an agreement along the lines they have concluded recently with Linspire, Xandros, Novell etc. Unfortunately, some speculation in the media (thoroughly and elegantly debunked in the blogosphere but not before the damage was done) posited that “Ubuntu might be next”.

For the record, let me state my position, and I think this is also roughly the position of Canonical and the Ubuntu Community Council though I haven’t caucused with the CC on this specifically.

We have declined to discuss any agreement with Microsoft under the threat of unspecified patent infringements.

Allegations of “infringement of unspecified patents” carry no weight whatsoever. We don’t think they have any legal merit, and they are no incentive for us to work with Microsoft on any of the wonderful things we could do together. A promise by Microsoft not to sue for infringement of unspecified patents has no value at all and is not worth paying for. It does not protect users from the real risk of a patent suit from a pure-IP-holder (Microsoft itself is regularly found to violate such patents and regularly settles such suits). People who pay protection money for that promise are likely living in a false sense of security.

I welcome Microsoft’s stated commitment to interoperability between Linux and the Windows world – and believe Ubuntu will benefit fully from any investment made in that regard by Microsoft and its new partners, as that code will no doubt be free software and will no doubt be included in Ubuntu.

With regard to open standards on document formats, I have no confidence in Microsoft’s OpenXML specification to deliver a vibrant, competitive and healthy market of multiple implementations. I don’t believe that the specifications are good enough, nor that Microsoft will hold itself to the specification when it does not suit the company to do so. There is currently one implementation of the specification, and as far as I’m aware, Microsoft hasn’t even certified that their own Office12 completely implements OpenXML, or that OpenXML completely defines Office12’s behavior. The Open Document Format (ODF) specification is a much better, much cleaner and widely implemented specification that is already a global standard. I would invite Microsoft to participate in the OASIS Open Document Format working group, and to ensure that the existing import and export filters for Office12 to Open Document Format are improved and available as a standard option. Microsoft is already, I think, a member of OASIS. This would be a far more constructive open standard approach than OpenXML, which is merely a vague codification of current practice by one vendor.

In the past, we have surprised people with announcements of collaboration with companies like Sun, that have at one time or another been hostile to free software. I do believe that companies change their position, as they get new leadership and new management. And we should engage with companies that are committed to the values we hold dear, and disengage if they change their position again. While Sun has yet to fully deliver on its commitments to free software licensing for Java, I believe that commitment is still in place at the top.

I have no objections to working with Microsoft in ways that further the cause of free software, and I don’t rule out any collaboration with them, in the event that they adopt a position of constructive engagement with the free software community. It’s not useful to characterize any company as “intrinsically evil for all time”. But I don’t believe that the intent of the current round of agreements is supportive of free software, and in fact I don’t think it’s particularly in Microsoft’s interests to pursue this agenda either. In time, perhaps, they will come to see things that way too.

My goal is to carry free software forward as far as I can, and then to help others take the baton to carry it further. At Canonical, we believe that we can be successful and also make a huge contribution to that goal. In the Ubuntu community, we believe that the freedom in free software is what’s powerful, not the openness of the code. Our role is not to be the ideologues -in-chief of the movement, our role is to deliver the benefits of that freedom to the widest possible audience. We recognize the value in “good now to get perfect later” (today we require free apps, tomorrow free drivers too, and someday free firmware to be part of the default Ubuntu configuration) we always act in support of the goals of the free software community as we perceive them. All the deals announced so far strike me as “trinkets in exchange for air kisses”. Mua mua. No thanks.

One of the tough choices VCS designers make is “what do we REALLY care about”. If you can eliminate some use cases, you can make the tool better for the other use cases. So, for example, the Git guys choose not to care too much about annotate. By design, annotate is slow on Git, because by letting go of that they get it to be super-fast in the use cases they care about. And that’s a very reasonable position to take.

My focus today is lossiness, and I’m making the case for starting out a project using tools which are lossless, rather than tools which discard useful information in the name of achieving performance that’s only necessary for the very largest projects.

It’s a bit like saying “shoot your pictures in RAW format, because you can always convert to JPEG and downscale resolution for Flickr, but you can’t always get your top-quality images back from a low-res JPEG”.

When you choose a starting VCS, know that you are not making your final choice of tools. Projects who started with CVS have moved to SVN and then to Bitkeeper and then to something else. Converting is often a painful process, sometimes so painful that people opt to throw away history rather than try and convert properly. We’ll see new generations of tools over the next decade, and the capability of machines and the network will change, so of course your optimal choice of tools will change accordingly.

Initially, projects do best if they choose a tool which makes it as easy to migrate to another tool, as possible. Migrating is a little bit like converting from JPEG to PNG, or PNG to GIF. Or PNG to JPEG2000. You really want to be in the situation where your current format has as much of the detail as possible, so that your conversion can be as clean and as comprehensive as possible. Of course, that comes at a price, typically in performance. If you shoot in RAW, you get fewer frames on a memory stick. So you have to ask yourself “will this bite me?”. And it turns out, that for 99% of photographers, you can get SO MANY photos on a 1GB memory stick, even in RAW mode, that the slower performance is worth trading for the higher quality. The only professional photographers I know who shoot in JPEG are the guys who shoot 3-4000 pictures in an event, and publish them instantly to the web, with no emphasis on image quality because they are not to sort of pics anyone will blow up as a poster.

What’s the coding equivalent?

Well, you are starting a free software project. You will have somewhere between 50 and 500 files in your project initially, it will take a while before you have more than 5,000 files. During that time, you need performance to be good enough. And you want to make sure that, if you need to migrate, you have captured as much of your history in detail so that your conversion can be as easy, and as rich and complete, as possible.

I’ve watched people try to convert CVS to SVN, and it’s a nightmare, because CVS never recorded details that SVN needs, such as which file-specific changes are a consistent set. It’s all interpolation, guesswork, voodoo and ultimately painful work that results often enough in people capitulating, throwing history away and just doing a fresh start in SVN. What a shame.

The Bazaar guys, I think, thought about this a lot. It’s another reason the perfect rename tracking is so important. You can convert a Bazaar tree to Git trivially, whenever you want to, if you need to scale past 10,000 files up to 100,000 files with blazing performance. In the process, you’ll lose the renaming information. But going the other way is not so simple, because Git never recorded that information in the first place. You need interpolation and an unfortunate goat under a full moon, and even then there’s no guarantee. You chose a lossy tool, you lost the renaming data as you used it, you can’t get that data back.

Now, performance is important, but “good enough performance” is the threshold we should aim for in order to get as much out of other use cases as possible. If my tool is lossless, and still gives me a “status” in less than a heartbeat, which Bazaar does up to about 7,000 files, then I have perfectly adequate performance and perfectly lossless recording. If my project grows to the point where Bazaar’s performance is not good enough, I can convert to any of the other systems and lose ONLY the data that I choose to lose in my selection of new tool. And perhaps, by then, Git has gained perfect renaming support, so I can get perfect renaming AND blazing performance. But I made the smart choice by starting in RAW mode.

Now, there are projects out there for which the optimisations and tradeoffs made for Git are necessary. If you want to see what those tradeoffs are, watch Linus describe Git here. But the projects which immediately need to make those tradeoffs are quite unusual – they are not multiplatform, they need extraordinary performance from the beginning, and they are willing to lose renaming data and have slow annotate in order to achieve that. X, OpenSolaris, the Linux kernel… those are hardly representative of the typical free software project.

Those projects, though are also the folks who’ve spoken loudest about version control, because they have the scale and resources to do detailed assessments. But we should recognise that their findings are filtered through the unique lenses of their own constraints, and don’t let that perspective colour the decision for a project that does not operate under those constraints.

What’s good enough performance? Well, I like to think in terms of “heartbeat time”. If the major operations which I have to do regularly (several times in an hour) take less than a heartbeat, then I don’t ever feel like I’m waiting. Things which happen 3-5 times in a day can take a bit longer, up to a minute, and those fit with regular workbreaks that I would take anyhow to clear my head for the next phase of work, or rest my aching fingers.
In summary – I think new and smaller (<10,000 files) projects should care more about correctness, completeness and experience in their choice of VCS tools. Performance is important, but perfectly adequate if it takes less than a heartbeat to do the things you do regularly while working on your code. Until you really have to lose them, don’t discard the ability to work across multiple platforms (lots of free software projects have more users on Windows than on Linux), don’t discard perfect renames, don’t opt for “lossy over lossless” just because another project which might be awesomely cool but has totally different requirements from yours, did so.

Further thoughts on version control

Monday, June 11th, 2007

I’ve had quite a lot of positive email feedback on my posting about on renaming as the killer app of distributed version control. So I thought it would be interesting to delve into this subject in more detail. I’ll blog over the next couple of months, starting tomorrow, about the things I think we need from this set of tools – whether they be Git, Darcs, Mercurial, Monotone or Bazaar.

First, to clear something up, Ubuntu selected Bazaar based on our assessment of what’s needed to build a great VCS for the free software community. Because of our work with Ubuntu, we know that what is important is the full spectrum of projects, not just the kernel, or X, or OpenOffice. It’s big and large projects, Linux and Windows projects, C and Python projects, Perl and Scheme projects… the best tools for us are the ones that work well across a broad range of projects, even if those are not the ones that are optimal for a particular project (in the way that Git works brilliantly for the kernel, because its optimisations suit that use case well, it’s a single-platform single-workflow super-optimised approach).

I’ve reviewed our choice of Bazaar in Ubuntu a couple of times, when projects like OpenSolaris and X made other choices, and in each case been satisfied that it’s still the best project for our needs. But we’re not tied to it, we could move to a different one. Canonical has no commercial interest in Bazaar (it’s ALL GPL software) and no cunning secret plans to launch a proprietary VCS based on it. We integrated Bazaar into Launchpad because Bazaar was our preferred VCS, but Bazaar could just as well be integrated into SourceForge and Collab since it’s free code.

So, what I’m articulating here is a set of values and principles – the things we find important and the rationale for our decisions – rather than a ra-ra for a particular tool. Bazaar itself doesn’t meet all of my requirements, but right now it’s the closest tool for the full spectrum of work we do.

Tomorrow, I’ll start with some commentary on why “lossless” tools are a better starting point than lossy tools, for projects that have that luxury.