The number one thing I want from a distributed version control system is robust renaming. Why is that? Because without a rigorous approach to renaming that guarantees perfect results, I’m nervous to merge from someone I don’t know. And merging from “people you don’t know” is the real thing that distributed version control gives which you cannot get from centralized systems like CVS and Subversion.

Distributed version control is all about empowering your community, and the people who might join your community. You want newcomers to get stuck in and make the changes they think make sense. It’s the difference between having blessed editors for an encyclopedia (in the source code sense we call them “committers”) and the wiki approach, which welcomes new contributors who might just have a very small fix or suggestion. And perhaps more importantly, who might be willing to spend time on cleaning up and reshaping the layout of your wiki so that it’s more accessible and understandable for other hackers.

The key is to lower the barrier to entry. You don’t want to have to dump a whole lot of rules to new contributors like “never rename directories a, b and c because you will break other people and we will be upset”. You want those new contributors to have complete freedom, and then you want to be able to merge, review changes, and commit if you like them. If merging from someone might drop you into a nightmare of renaming fixups, you will be resistant to it, and your community will not be as widely empowered.

So, try this in your favorite distributed VCS:

  1. Make two branches of your favorite upstream. In Bzr, you can find some projects to branch in the project cloud.
  2. In one branch, pretend to be a new contributor, cleaning up the build system. Rearrange some directories to make better sense (and almost every large free software project can benefit from this, there’s a LOT of cruft that’s crept in over the years… the bigger the project, the bigger the need).
  3. Now, in the second branch, merge from the branch where you did that renaming. Some systems will fail, but most will actually handle this easy case cleanly.
  4. Go back to the first branch. Add a bunch of useful files to the repo in the directories you renamed. Or make a third branch, and the files to the directories there.
  5. Now, merge in from that branch.
  6. Keep playing with this. Sooner or later, if you are not using a system like Bzr which treats renames as a first class operation… Oops.

Now, this is not a contrived example, it’s actually a perfect study of what we HOPE will happen as distributed version control is more widely adopted. If I look at the biggest free software projects, the thing they all have in common is crufty tree structures (directory layouts) and build systems. This is partly a result of never having had tools which really supported renaming, in a way which Would Not Break. And this is one of the major reasons why it takes 8 hours to build something like OpenOffice, and why so few people have the stomach to step up and contribute to a project like that.

The exact details of what it takes to break the renaming support of many DVCS’s vary from implementation to implementation. But by far the most robust of them is Bzr at the moment, which is why we make such heavy use of it at Ubuntu. Many of the other systems have just waved past the renaming problem, saying it’s “not essential” and that heuristics and guesstimates are sufficient. I disagree. And I think the more projects really start to play with these tools, the more they will appreciate renaming is the critical feature that needs to Just Work. I’ll gladly accept the extra 0.3 seconds it takes Bzr to give me a tree status in my 5,100 file project, for the security of knowing I never ever have to spend long periods of time sorting out a merge by hand when stuff got renamed. It still comes back in less than a second. Which is plenty fast enough for me. Even though I know it will get faster, that extra performance is not nearly as important to me as the overall time saved by the robustness of the tool in the face of a constant barrage of improvements by new contributors.

40 Responses to “Renaming is the killer app of distributed version control”

  1. martin Says:

    are you sure darcs doesn’t do renames and moves as good as bzr?
    Ok, darcs might be slow for very large projects. But it’s got very good patch merging and cherry picking. But maybe you are just comparing bzr with some versioned “file system” with added merge tools…

    Mark Shuttleworth says:

    Yes, I think Darcs is pioneering some very interesting ideas. I don’t know if it’s as rigorous as Bzr on the renaming front.

  2. Zeno Davatz Says:

    Why do you not just use GIT? The Kernel uses GIT, we use GIT, it is the best SCM you can get these days. Just watch Linus tell the people at Google: http://www.youtube.com/watch?v=4XpnKHJAok8

    And you will understand.

    Mark Shuttleworth says:

    GIT is an excellent system that does a very specific job. It does not track renames as a first class operation, but does a reasonable job of guessing in simple cases. However, if you want to do real reshaping of your tree, that guessing can hit limits quickly. Git is very good in projects like the kernel which don’t rename stuff and need the speed, and also are UNIX-specific (Git is quite difficult to get to work on Windows, as I understand it).

  3. jonner Says:

    Would you care to explain what issues other DVCS’s have with regard to renaming? e.g. git, mercurial, etc?

  4. Kyle Cordes Says:

    Linus Torvalds explains distributed source control…

    On several occasions over the last year, I’ve pointed out that distributed source control tools are dramatically better than centralized tools. It’s quite hard for me to explain why. This is probably because of sloppy and incomplete thinkin…

  5. Ian Bicking Says:

    Well, to be accurate about the wiki model, that would be one with a single centralized repository where everyone has commit access. It’s hard to imagine this working for software, but the benefits for content could translate to code as well. It’s an environment of continuous integration, and one where social conflicts cannot be easily deferred — because you are working on a single set of data, you cannot go in different directions and then attempt to merge later. This way differences don’t accumulate, and it becomes clear quite quickly what your investment in a community will mean.

    Continuous integration isn’t the only way to achieve these goals, but the lack of technical prowess in wiki implementations is also something that defines the way their communities work, and branching is something missing from every wiki software I’ve seen.

  6. Maciej Bliziński Says:

    Mark,

    I did what you suggested. Instead of taking real projects, I’ve written a simple test case in a form of a bash script. It simulates two developers, one renaming a directory and second adding a file to it. I’ve tested it with Bazaar, Git and Subversion. You were right, only Bazaar handled it in the way we would expect an SCM to behave. Git didn’t move the new file to a new directory, merge ended up with two separate directories. Subversion was worse, discarding the added file altogether. Details and scripts themselves are available on my blog:

    http://automatthias.wordpress.com/2007/06/07/directory-renaming-in-scm/

    Regards,
    Maciej Bliziński

  7. Richard Heycock Says:

    Just to clarify that darcs treats renames as part of a patch which is a first class object. I’ve been using darcs for a while now without any problems with respect to renames, whether they be files or directories. From the darcs manual :

    A patch describes a change to the tree. It could be either a primitive patch (such as a file add/remove, a directory rename, or a hunk replacement within a file), or a composite patch describing many such changes.

  8. Chui Tey Says:

    Keep taking this to the n-th degree and you’ll soon find yourself making functions and classes a first class object that is tracked by version control. Perhaps this is will come a full circle, leading back to Lisp or Smalltalk style images.

  9. tecosystems » links for 2007-06-07 Says:

    […] Mark Shuttleworth » Blog Archive » Renaming is the killer app of distributed version control Mark points to renaming as the killer app; hadn’t heard that one before (tags: renaming DSCM Subversion CVS Bzr Bazaar) […]

  10. Noah Tye Says:

    What is wrong with mercurial’s renaming?

    http://hgbook.red-bean.com/hgbookch5.html#x9-1010005.4

    Mark Shuttleworth says:

    Nothing “wrong”, just not implemented as rigorously as Bzr. The Hg team, like the Git team, have just decided that renaming is not important enough that it needs to be tracked explicitly. So when you rename a file using Hg, it is internally represented as a delete and an add. Later, when you merge across branches where this happens, Hg does a (very good) job of guessing what to do. But that guessing process, while it handles obvious cases well, is likely to break down as directories, and subdirectories, get moved around. It’s easy to show it break, but it’s not my aim here to actually demonstrate a failing in any other project. And any individual use case can be fixed with better guessing – the problem is that on big projects, over time, you will see the use cases get increasingly baroque.

    Here’s an example of what you see when you rename a directory in Hg:

    mark@peregrine:/tmp/vctest/linux-source-2.6.20-2.6.20$ hg mv ipc foo
    copying ipc/Makefile to foo/Makefile
    copying ipc/compat.c to foo/compat.c
    copying ipc/compat_mq.c to foo/compat_mq.c
    copying ipc/mqueue.c to foo/mqueue.c
    copying ipc/msg.c to foo/msg.c
    copying ipc/msgutil.c to foo/msgutil.c
    copying ipc/sem.c to foo/sem.c
    copying ipc/shm.c to foo/shm.c
    copying ipc/util.c to foo/util.c
    copying ipc/util.h to foo/util.h
    removing ipc/Makefile
    removing ipc/compat.c
    removing ipc/compat_mq.c
    removing ipc/mqueue.c
    removing ipc/msg.c
    removing ipc/msgutil.c
    removing ipc/sem.c
    removing ipc/shm.c
    removing ipc/util.c
    removing ipc/util.h

    By contrast, here’s the rename in Bzr:

    mark@peregrine:/tmp/vctest/linux-source-2.6.20-2.6.20$ bzr mv foo ipc
    foo => ipc
    mark@peregrine:/tmp/vctest/linux-source-2.6.20-2.6.20$ bzr st
    renamed:
    foo => ipc

    So, to be clear, I don’t think there’s anything wrong with Hg, it’s a great tool that is best for certain use cases, but I also don’t think it handles renames in a way that is healthy for projects which want to do real surgery on the shape of their tree. And on this import of the Linux kernel tree, on my laptop, trunk-Bzr does “status” in 1.4 seconds, while Hg 0.9.3 (not the latest trunk which is probably faster) does it in 1.3 seconds. Granted, commit is much faster with Hg currently, but the Bzr team have not done any optimisation on commit, they are focused on status because that’s what everyone does all the time. And 0.1 seconds is not worth the lossiness of guesstimated renames. This is 23,000 files.

  11. Kieran Says:

    Check out this good tech talk about Mercurial,

    http://video.google.com/videoplay?docid=-7724296011317502612

    It does seem very nice, although I have not tried GIT or Bzr.

    The comments in this blog post offer some comparison

    http://blog.mwolson.org/tech/why_i_dislike_subversion.html

  12. Piers Says:

    In honour of Mark’s favourite word (as evidenced by this post) I would humbly like to suggest the new name of the “post Gustsy” Ubuntu be:

    Ubuntu 8.04: The Crufty Cow!

    Oh wait… “C” comes before “G”…oh, and no one actually knows what crufty means… 😉

    Love your work Mark. (K)ubuntu rocks… and is cruft-free ‘n’ all.

  13. tonfa Says:

    Wow your reply about mercurial was a FUDish. Mercurial does a remove and a add, but it stores the pointer to the last version of the previous file during the add. So there is no information lost.

    In bzr or hg, you have two things: the fileobject (doesn’t change across rename) and the filename (change across rename). Both applications are keeping this information, but the difference is in the implementation. The primary storage of a fileobject in hg depends of the filename (with a pointer to the previous fileobject if the previous fileobject doesn’t have the same filename, that is the couple (filename, revision)). In bzr the fileobject is stored directly on disk.
    All of this was done in hg because of the locality, if you want to store the fileobject, you will use something like a hash derived from the filename which breaks locality.

    tonfa, who is disappointed by your comment given that Matt explained it to you in the sprint in London.

    (btw what you could actually use for your pro-bzr marketing is the fact that bzr tracks directory and not hg, so hg uses heuristics to see if a directory was renamed. Anyway the heuristic is just slower than tracking directory, it shouldn’t give wrong results)

  14. DSCM Mercurial or Bazaar? « Abstract Simplicity Says:

    […] DSCM Mercurial or Bazaar? I just recently switched over to using Mercurial for all my home scm needs. I figured that the Mercurial team had solved the rename problem by now. However, I just noticed Mark Shuttleworth’s entry on the topic. It seems Bazaar could still be in the running as the DSCM of choice. However with Sun choosing it for OpenSolaris and now OpenJDK it kinda becomes enevitable for Java developer to go with Mercurial. After all, Mercurial has Eclipse and Netbeans plugins. But then again, I love to restructure my tree… Then again I don’t merge with anyone right now ;). […]

  15. cate Says:

    I understand that renaming is important, but I don’t think it should be “a first class feature”. People relay on files and project structure. If renaming is done without a previous discussion, people are lost, bug report are difficult to track and google will help less. IMHO the important files (which are modified regularly) should not be renamed (ev. renamed with some coordination and rarely). So it is no a priority to have a good patch handling on renamed files.

  16. John Meinel Says:

    One place that I’m aware of Mercurial’s rename support doing different things is if you have 2 people rename the same file in 2 different branches.

    Specifically:
    % hg init branch1
    % cd branch1
    % echo a > a
    % hg add a
    % hg commit -m a
    % cd ..
    % hg clone branch1 branch2
    1 files updated, 0 files merged, 0 files removed, 0 files unresolved
    % cd branch1
    % hg mv a b
    % hg commit -m “move a=>b”
    % cd ../branch2
    % hg mv a c
    % hg commit -m “move a=>c”
    % cd ../branch1
    % hg pull -u ../branch2
    pulling from ../branch2
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 1 changes to 1 files (+1 heads)
    not updating, since new heads added
    (run ‘hg heads’ to see heads, ‘hg merge’ to merge)
    % hg merge
    1 files updated, 0 files merged, 0 files removed, 0 files unresolved
    (branch merge, don’t forget to commit)
    % hg status
    M c
    % ls
    b c

    There are a few thing I personally notice here.

    1) The file ‘c’ is marked as modified, not added. I really don’t know why.
    2) The file ‘c’ isn’t marked as conflicting with the rename of ‘a=>b’. So now you have 2 copies of ‘a’ floating around. This isn’t a huge thing, but it does mean that if two people are trying to clean things up, and do so slightly differently, when they merge back together, they may get unexpected results.

    Also, because of copy semantics, if someone modifies ‘a’ in a third branch, and you merge that into branch1, it will update ‘b’ and ‘c’. Which is logical from a “copy+delete” point of view. But it may not fit what people were thinking when they typed “hg mv a b”.

  17. Brandon Casey Says:

    Would bzr be able to handle splitting a file into two different files? How about merging?

    1) Source repo contains file.
    2) I split the file into two files in my repo.
    3) File is modified in source repo.
    4) I retrieve changes from source repo.
    Will changes be correctly applied to relevant sections in
    the two new files?

    How about the reverse, where two files become one?
    Will changes to the two files be correctly applied to the
    relevant sections of the merged file?

    -brandon

  18. Version Control: The Future is Adaptive « Agile Teams, Open Software, Passionate Users Says:

    […] use Bazaar this way, it is still more powerful than Subversion and CVS thanks to features such as true rename tracking and intelligent […]

  19. Thomas Arendsen Hein Says:

    Hi John,

    your merge example is discussed in http://www.selenic.com/mercurial/bts/issue455

    Regarding the M instead of A, yes, this looks wrong at a first glance. I’ll discuss it with the others.

  20. ButterFly Says:

    Mercurial in general does not help with:
    * cleanups: this file/folder duplication error caused by renames is just the opposite of “help”
    * clear outputs: running a command does not tell what happened, you have to run another
    * clear command line: compared to darcs there are so many commands
    and options nobody can understand

    A tool should help, not bother and not surprise.

  21. ButterFly Says:

    There seems to be a serious flaw in the original post: “But by far the most robust of them is Bzr at the moment …” is not true at all.

    Darcs has no weak point in renaming, and GIT even tells you where parts of a file came from, which nobody else can do.

  22. ThurnerRupert Says:

    thanks a lot mark – matt extended mercurial’s implementation now to warn about such renames!

  23. Tzvetan Mikov Says:

    This is a very strong point for renaming, but it is not necessarily an universal one.

    Here is one example of the issue: one developer renaming a directory in his branch, and another adding a file to the original directory in his branch. What happens at the merge ?
    – Bazaar renames the directory and puts the new file in the _renamed_ directory.
    – Git renames the directory with its files, but keeps the old directory too and adds the new file there.

    Bazaar’s behavior certainly is better for C. However it is not universally better.

    For example in Java you cannot rename a file without changing its contents. So, moving a file to a directory different from where its author put it will almost certainly break the build.

    The bottom line is, both behaviors can seem valid or broken, depending on the case. Neither is perfect. At the very abstract level file renames are _not_ a first-class operation. This is especially apparent in a language like Java.

    Content movement is the first class operation. Things like moving functions, etc. The question is how one can handle that and whether the current strategy has a path for improvement. It could be argued that once you commit yourself to explicitly tracking file renames, you are giving up a slew of opportunities for handling the more general cases.

    One thing is for certain, a 100% ideal solution is impossible. It would have to be aware of the target programming language _and_ the build environment.

  24. Bryan O'Sullivan Says:

    Now now, Mark, no misleading assertions if you please. Mercurial tracks renaming information perfectly. It implements it in a different way than Bazaar, but in fact the technique that it uses is more general than Bazaar’s approach.

    As you know, Bazaar requires a file to have a unique identity. If I rename A to B, and you rename A to C, the only possible outcome when we merge is a conflict that results in either B or C, because Bazaar requires there to be a single file afterwards. However, Mercurial reports the possible conflict to you *and* is perfectly okay if you decide that the appropriate response is to keep either one, *or* to preserve *both* B and C.

    In the book, I mention a bug (455) with this handling which has subsequently been fixed. I’ve not had a chance to update the book with the current behaviour, but I want to note the fact that it’s fixed here.

    Mark Shuttleworth says:

    Now now Bos, don’t cry FUD when all you’re getting is constructive criticism! I like Hg, but I think it’s important to version directories like files, because I think this gives the result most people actually expect. I wouldn’t call renaming support “perfect” unless it clearly included support for renaming directories. The principle of least surprise is important, and I think Bazaar best reflects that. Giving everyone an A, B, and C option when most people really expect A is cute but ultimately makes the tool harder to use. As for your example, in neither case did either developer ADD a file, in both cases they renamed the same file, so it seems odd to think that ending up with TWO files is an expected result. I don’t think a limitation of a tool should be sold as a feature 😉

  25. brendan Says:

    By the way, in addition to noting that Mercurial 0.9.4 addresses some of the interesting points raised by John Meinel above, I’d like to mention that Mercurial has an inotify extension that reduces the time for status to basically instantaneous (vs the 1.3 seconds mentioned above). Since status is implied by many other commands (commit, diff, etc), this tends to be a very noticeable improvement.

  26. João Marcus Says:

    I happen to prefer Bazaar. However, the multiple nasty “UnicodeDecodeError” messages I get when adding files with non-ascii characters in their names is a major blocker. I simply cannot use Bazaar. Pleeeeeeeeeease fix it!

  27. Thomas Says:

    Mark, I know this may sound a bit biased because I’m working on a related project, but I for myself can state that the rename support in monotone (http://monotone.ca) should perfectly fit your needs. monotone may be a bit slower in certain areas (initial pull, log) which are worked on heavily, but if you just want the feeling that a system actually _cares_ about your data, then it should be the right thing for you. If speed is an issue, you maybe want to look into mercurial or even git. But as one of the project founders of monotone recently stated, “the battle is over” and the most important thing is that DVCS have won _at all_. IMHO everything distributed is better than subversion or even CVS =)

  28. Contributing to projects on Launchpad.net with Bazaar | Muffin Research Labs by Stuart Colville Says:

    […] and renaming of files with ease. (See more about that in Mark Shuttleworth’s article: Renaming is the killer app of distributed version control) this is really important, as you want to have the flexibility to re-structure a project as you […]

  29. Directory renaming in SCM « Maciej Bliziński Says:

    […] Shuttleworth has written an interesting thing: that file and directory renaming is one of the most important operations to be handled with an […]

  30. Hendy Irawan Says:

    Mark I’ve just tried Bazaar on Launchpad and the slowness esp. for pushing (SFTP) is killing me at this time.

    Robert Collins has stated that this is a known problem (expensive roundtrips) and Bazaar developers are working on a better algorithm for 0.92… I really do hope it gets better.

    I haven’t tried the smart server, but I really hope you implement it _at least_ on Launchpad since it’s your flagship Bazaar hosting.

    I think I’m going back to SVN… 🙁

    Mark Shuttleworth says: Right now the performance focus has been on local performance. I think the latest Bzr code is faster than almost anything other than Git for initial commit, and very close to the others on status, incremental commit etc. Some local things like annotate are faster in Bzr than anything else, which is cool, I use that a lot. The 1.0 release should stack up well for local performance. In the next cycle, for 2.0, I believe the high performance smart server (HPSS) branch will land, and then Bzr should be great over a slow link too. That’s not quite as important to me because I tend to work offline then push to the network in the background while I’m doing other online stuff, but I know it’s quite important to other folks and the bzr guys want to make it rock.

  31. Hendy Irawan Says:

    Come on Bazaar, speed up! I’m behind you… 🙂

  32. Mark Shuttleworth on renaming and merging — Version Control Blog — branching and merging, Ideas, SCM features and concepts, Bazaar/bzr, Says:

    […] “Renaming is the killer app of distributed version control”; […]

  33. code, code.back, code.back2… - A better way with Revison Control (svn/git/bzr/hg tutorials & comparisons) « ☠ I could not think of a blog title ☠ Says:

    […] improvements in their changelogs. I haven’t found any videos on Bazaar but there have been three, shuttleworth, posts recently on bazaar as a lossless […]

  34. On choosing Bazaar for DVCS « Handwaving Says:

    […] performance because my projects are small. The feature that broke the tie between the other two is renaming. Bazaar treats renaming as a primitive operation, whereas Mercurial treats it as a copy and delete. […]

  35. wolf++ » Bazaar TuneWiz Launchpad Says:

    […] is not the P2P type repository, it’s that branching and merging are first class citizens (See Renaming is the killer app of distributed version control ).   The three big players I’ve been watching are Git, Hg (Mercurial), and Bazaar.  All of […]

  36. Versioning e renaming » JUG Brescia Says:

    […] Girovagando in rete alla ricerca di informazioni ed articoli in proposito, mi sono imbattuto in queste considerazioni relative al supporto che i diversi sistemi di versioning offrono al renaming dei file. […]

  37. DVCS Try Out. Bazaar, Mercurial and Git | Moomalade Says:

    […] the only one of the three to support directories are first class object and handle rename in a very robust way, which is great. By any standard it’s much slower than Mercurial and Git, but whether this is […]

  38. Why is Git so Fast? - News ums Netz Says:

    […] maybe Git’s shortcut for handling renames is faster than doing them more correctly like Bazaar […]

  39. Bazaar or Git? Moving away from Subversion. « Kash Farooq’s blog Says:

    […] going to try Git. I downloaded binaries, documentation and guides. But then I read this article: Renaming is the killer app of distributed version control. And I agree, renaming is really, really important. We rename files all the time. We […]

  40. DSCM: Mercurial or Bazaar? « steshaw Says:

    […] I figured that the Mercurial team had solved the rename problem by now. However, I just noticed Mark Shuttleworth’s entry on the topic. It seems Bazaar could still be in the running as the DSCM of choice. However with Sun […]