Merging is the key to software developer collaboration
Tuesday, June 19th, 2007Continuing my discussion of version control tools, I’ll focus today on the importance of the merge capability of the tool.
The “time to branch” is far less important than the “time to merge”. Why? Because merging is the act of collaboration – it’s when one developer sets down to integrate someone else’s work with their own. We must keep the cost of merging as low as possible if we want to encourage people to collaborate as much as possible. If a merge is awkward, or slow, or results in lots of conflicts, or breaks when people have renamed files and directories, then I’m likely to avoid merging early and merging often. And that just makes it even harder to merge later.
The beauty of distributed version control comes in the form of spontaneous team formation, as people with a common interest in a bug or feature start to work on it, bouncing that work between them by publishing branches and merging from one another. These teams form more easily when the cost of branching and merging is lowered, and taking this to the extreme suggests that it’s very worthwhile investing in the merge experience for developers.
In CVS and SVN, the “time to branch” is low, but merging itself is almost always a painful process. Worse, merging a second time from another branch is WORSE, so the incentives for developers to merge regularly are exactly the wrong way around. For merge to be a smooth experience, the tools need to keep track of what has been merged before, so that you never end up redoing work that you’ve already solved. Bzr and Git both handle this pretty well, remembering which revisions in someone else’s branch you have already integrated into yours, and making sure that you don’t need to bother to do it again.
When we encourage people to “do their own thing” with version control, we must also match that independence with tools to facilitate collaboration.
Now, what makes for a great merge experience?
Here are a couple of points:
- Speed of the merge, or time it will take to figure out what’s changed, and do a sane job of applying those changes to your working tree. Git is the undisputed champion of merge speed. Anything less than a minute is fine.
- Handling of renames, especially renamed directories. If you merge from someone who has modified a file, and you have renamed (and possibly modified) the same file, then you want their change to be applied to the file in your working tree under the name YOU have given it. It is particularly important, I think, to handle directory renames as a first class operation, because this gives you complete freedom to reshape the tree without worrying about messing up other people’s merges. Bzr does this perfectly – even if you have subsequently created a file with the same name that the modified file USED to have, it will correctly apply the change to the file you moved to the new name.
- Quality of merge algorithm. This is the hardest thing to “benchmark” because it can be hugely subjective. Some merge algorithms take advantage of annotation data, for example, to minimise the number of conflicts generated during a merge. This is a highly subjective thing but in my experience Bzr is fantastic in merge quality, with very few cases of “stupid” conflicts even when branches are being bounced around between ad-hoc squads of developers. I don’t have enough experience of merging with tools like Darcs which have unusual characteristics and potentially higher-quality merges (albeit with lots of opportunity for unexpected outcomes).
I like the fact that the Bazaar developers made merging a first-class operation from the start, rather than saying “we have a few shell scripts that will help you with that” they focused on techniques to reduce the time that developers spend fixing up merges. A clean merge that takes 10 seconds longer to do saves me a huge amount of time compared to a dirty (conflict-ridden, or rename-busted) merge that happened a few seconds faster.
Linus is also a very strong advocate of merge quality. For projects which really want as much participation as possible, merge quality is a key part of the developer experience. You want ANYBODY to feel empowered to publish their contribution, and you want ANYBODY to be willing to pull those changes into their branches with confidence that (a) nothing will break and (b) they can revert the merge quickly, with a single command.