Renaming is the killer app of distributed version control
June 6th, 2007 · in Free software · 34 CommentsThe number one thing I want from a distributed version control system is robust renaming. Why is that? Because without a rigorous approach to renaming that guarantees perfect results, I’m nervous to merge from someone I don’t know. And merging from “people you don’t know” is the real thing that distributed version control gives which you cannot get from centralized systems like CVS and Subversion.
Distributed version control is all about empowering your community, and the people who might join your community. You want newcomers to get stuck in and make the changes they think make sense. It’s the difference between having blessed editors for an encyclopedia (in the source code sense we call them “committers”) and the wiki approach, which welcomes new contributors who might just have a very small fix or suggestion. And perhaps more importantly, who might be willing to spend time on cleaning up and reshaping the layout of your wiki so that it’s more accessible and understandable for other hackers.
The key is to lower the barrier to entry. You don’t want to have to dump a whole lot of rules to new contributors like “never rename directories a, b and c because you will break other people and we will be upset”. You want those new contributors to have complete freedom, and then you want to be able to merge, review changes, and commit if you like them. If merging from someone might drop you into a nightmare of renaming fixups, you will be resistant to it, and your community will not be as widely empowered.
So, try this in your favorite distributed VCS:
- Make two branches of your favorite upstream. In Bzr, you can find some projects to branch in the project cloud.
- In one branch, pretend to be a new contributor, cleaning up the build system. Rearrange some directories to make better sense (and almost every large free software project can benefit from this, there’s a LOT of cruft that’s crept in over the years… the bigger the project, the bigger the need).
- Now, in the second branch, merge from the branch where you did that renaming. Some systems will fail, but most will actually handle this easy case cleanly.
- Go back to the first branch. Add a bunch of useful files to the repo in the directories you renamed. Or make a third branch, and the files to the directories there.
- Now, merge in from that branch.
- Keep playing with this. Sooner or later, if you are not using a system like Bzr which treats renames as a first class operation… Oops.
Now, this is not a contrived example, it’s actually a perfect study of what we HOPE will happen as distributed version control is more widely adopted. If I look at the biggest free software projects, the thing they all have in common is crufty tree structures (directory layouts) and build systems. This is partly a result of never having had tools which really supported renaming, in a way which Would Not Break. And this is one of the major reasons why it takes 8 hours to build something like OpenOffice, and why so few people have the stomach to step up and contribute to a project like that.
The exact details of what it takes to break the renaming support of many DVCS’s vary from implementation to implementation. But by far the most robust of them is Bzr at the moment, which is why we make such heavy use of it at Ubuntu. Many of the other systems have just waved past the renaming problem, saying it’s “not essential” and that heuristics and guesstimates are sufficient. I disagree. And I think the more projects really start to play with these tools, the more they will appreciate renaming is the critical feature that needs to Just Work. I’ll gladly accept the extra 0.3 seconds it takes Bzr to give me a tree status in my 5,100 file project, for the security of knowing I never ever have to spend long periods of time sorting out a merge by hand when stuff got renamed. It still comes back in less than a second. Which is plenty fast enough for me. Even though I know it will get faster, that extra performance is not nearly as important to me as the overall time saved by the robustness of the tool in the face of a constant barrage of improvements by new contributors.