One on the chin

Sunday, August 27th, 2006

Luis Villa is absolutely right in his castigation of our X update on Wednesday this week. As a team we made a series of errors, and the result was a desktop that was broken for thousands of users, for several hours. It has been a severe lesson in QA, something Luis knows plenty about.

An incident report is being compiled by the team and we will publish that for our broader community and users as soon as it is complete. My apologies to those who have been affected, I know that a blue screen of death is the very last thing anybody ever wants to see on Linux desktops and that any downtime caused by mistakes on our part, even measured in minutes, is unacceptable.

In addition to the incident report, we are also putting into production a long-discussed mechanism for widespread testing of non-essential updates (support for new hardware, for example) by users who want advanced access to that code, or those who are part of our more sophisticated user community. We know now that no amount of internal testing will find certain issues, even issues which could have a widespread footprint and obvious failures, and the only way to get certainty on the potential impact of a change is to put it out to a wider, but controlled, audience.

If there is a silver lining to the error, it is that it happened during the one week in six months when we have the core distribution development team together in one place. This gave us the opportunity not just to analyse and fix the issue, and to talk about the sequence of events that led to the problem, but also to discuss the processes we must improve to further reduce the likelihood of a repeat. The team is now more aware than ever of the responsibility we assume given extraordinary rate of adoption of Ubuntu.

My goal is for the team to grow and learn from this experience without becoming paralyzed on future updates. We can’t afford to take risks with our user’s trust, but I balance that with the need to continue to improve the desktop. We WANT to certify new hardware and keep Dapper usable for as much of its five year lifespan as possible. That said, Edgy is the right place to make exciting changes, not Dapper!

Luis, well said.

36 comments:

  1. Mark Shuttleworth: One on the chin « Alberto Milone’s Blog says: (permalink)
    August 27th, 2006 at 1:51 pm

    [...] read more [...]

  2. kilps says: (permalink)
    August 27th, 2006 at 7:20 pm

    Hi Mark – thanks for the openness, while I wasn’t effected as I have just switched to Ubuntu pretty much fulltime this kind of post makes me more confident that those behind Ubuntu are the right people

  3. Luis Villa’s Blog » on blogging in the corporate open source context says: (permalink)
    August 27th, 2006 at 7:43 pm

    [...] Since then, Tollef and Mark have each blogged about the problem, about 48-72 hours (depending on how you want to count) after it happened. So a key issue (lack of frank, open commentary on the problem from significant, relevant community members) is resolved in this specific case. Each of the posts are detailed, relevant, honest communications that go a long way towards reassuring me as an Ubuntu user that Canonical takes the problem very seriously and is going to make sure it never happens again. [I will return to the substance of the QA problem involved at some other time; it seems Mark, Tollef and others have a solid grasp on the key issues, so it isn’t pressing.] [...]

  4. Christian Rh. says: (permalink)
    August 27th, 2006 at 8:24 pm

    Hi, first i’ve to say sorry for my english, i’m from austria.
    I had a bluescreen of death, but the community (german natives) helped me, so this was not so a big problem for me.
    But you’re right, this should never happen again.
    Bad news are more diskussed than good news!

    But everybody of the worldwide Linux Team (not only Ubuntu Developers) make a good job!
    This is the thing, we should never forget to think about.

    So long – stay free!
    Thank you

  5. Ioannus de Verani says: (permalink)
    August 27th, 2006 at 8:27 pm

    A very nice post.

    Thanks,
    Ioannus de Verani

  6. Harold J. Johnson says: (permalink)
    August 27th, 2006 at 9:30 pm

    Is this why I had some trouble reinstalling the distro on my PowerBook G3? I had been thoroughly enjoying Ubuntu (and particularly, Xubuntu) on my PowerBook G3 until sometime last week, when I tried to reinstall Ubuntu (Xubuntu, actually) on my PowerBook’s new harddrive. Without going into too much detail, I first installed Ubuntu server, which appeared to work fine, then I tried upgrading to Xubuntu, which installs a GUI (among other niceties). For a few days, my ‘Book would not boot into X…

    Well, perhaps this was the problem, perhaps not. I suspect it had to do with something else, since my system is a particularly tough one to boot and install. Also, I don’t recall making any further upgrades: It just up and booting into X one day, and now I’m back in business. But thank you for this explanation, anyway. You’ve got a great Linux distro, and I’ve been using it for about a year and a half now, perhaps longer.

  7. dave says: (permalink)
    August 27th, 2006 at 10:26 pm

    You dont very often see people taking responsibility for stuff like this and its nice when it does happen. Ubuntu has a great community and the people behind the project do a fantastic job.

  8. meneame.net says: (permalink)
    August 28th, 2006 at 1:38 am

    Mark Shuttleworth habla acerca del fallo de la actualización de Ubuntu…

    En su blog personal Shuttleworth da explicaciones para todos los que vieron la pantalla azul de la muerte en Ubuntu…

  9. Jack Imsdahl says: (permalink)
    August 28th, 2006 at 2:05 am

    Stuff does indeed happen. Y’all seem to have handled it very well and to be willing to do what it takes to make sure the liklihood of something like this recurring is very, very low..

    The problem did not affect me. I guess I was lucky. But the way it was handled gives me more confidence, not less, in the distribution and the people behind it. I work with Windows, have Windows machines here and probably always will. But for ALL my personal work, I am using ubuntu. You have won me over. And I think my wife is dumping SuSE for Dapperr, as well.

    I’ll be sticking with you all for a good while, I think. You conduct yourselves admirably and the product is the best for me.

    Jack

  10. Step says: (permalink)
    August 28th, 2006 at 2:06 am

    As a new Ubuntu user, let me say Thanks for the open communication on this. Soon after the problem was reported, there were helpful tips up all over the ‘net. The community response was great, and it’s good to see how your team handles a major issue. I’m glad to see it taken so seriously.

    Keep doing the awesome work, and we’ll see more and more people using Ubuntu. It’s really exciting to see everything Ubuntu is accomplishing!

  11. Shuttleworth Responds :: ultramookie says: (permalink)
    August 28th, 2006 at 2:18 am

    [...] Mark Shuttleworth has responded to the criticism of Ubuntu QA which let a pretty nasty xorg bug slip through this week. The xorg update broke X and an update to fix the problem was not released for some 17 hours. As a team we made a series of errors, and the result was a desktop that was broken for thousands of users, for several hours. It has been a severe lesson in QA. [...]

  12. SC says: (permalink)
    August 28th, 2006 at 3:11 am

    A useful feature in synaptic/adept would be one which indicates the length of time updates/fixes have been released:

    Arnold the Advanced user searches for ubuntu fixes that have been out for over a week and applies only those fixes.

    Terry the Timid user’s default setting is to only apply fixes/patches available for over 48 hours.

  13. Philip Fletcher says: (permalink)
    August 28th, 2006 at 3:38 pm

    Hi Mark

    As one of the affected users, I have no issues with Ubunutu or the team. This is the first time this has happened. I pay no support fees for your software, yet you still provide first class support in a timely manner. Do not beat yourselves up to much over this.

    I would like to make one suggestion (there’s always one:) – my first reaction on X failing to load was: “How do I reverse that last update?”. A mechanism to do just this would relieve a little of the responsibility you place upon yourselves.

    apt-get unupdate anyone?

    Thank you once again for a most excellent Linux distribution.

    All the best

    Philip

  14. James Ryan says: (permalink)
    August 28th, 2006 at 3:54 pm

    I just installed Ubuntu for the first time last night, and if the occasional “screen of death” is the only price I have to pay for such a, frankly, amazing distribution, then so be it.

  15. Jeffrey Rollin says: (permalink)
    August 28th, 2006 at 4:53 pm

    I suppose I better confess that I don’t currently use Ubuntu, but all the same, this apology, if not the mistake which prompted it, is good PR for you and the Linux community, and I thankyou for it.

  16. Rick Barnich says: (permalink)
    August 28th, 2006 at 5:22 pm

    Users tend to acceppt an occasional glitch if the originator is forthcomming in recognizing the problem, fixing it and accepting responsibility for it. Over the years, I’ve found that users are forgiving you ‘step up to the plate’ on the issue. You are to be commended on your quick response to this problem.

  17. Tom Moitie says: (permalink)
    August 28th, 2006 at 6:12 pm

    I see there a LugRadio reference in the title :)

  18. Pieter Kubben says: (permalink)
    August 28th, 2006 at 6:53 pm

    Well, I also had this problem, but let’s take it from the positive side: I learned even more from the shell, how to use apt-get and was glad to solve the problem. Of course, this would have been much harder if I only had one computer with one OS, fortunately this was not the case.

    However, I still think that Ubuntu is by far the best Linux distro ever – I tried a lot, always went back to Windows until I bought a Mac one year ago. For now, Mac remains my nr 1 choice because of better hardware support, something Linux will still lack for a couple of years, I’m afraid. That said, I am glad to add that Linux is now my second OS before Windows, and an even more accurate I can say that several tasks are best done in Linux, in my opinion. So I am waiting for my next pc, which will be a MacBook Pro. On this, Ubuntu can be installed, and for those tasks where I prefer Linux, it can be used.

    I hope you will be able to continue Ubuntu development. Perhaps one day, when hardware support has improved and Spotlight has been adapted into GNOME ;-) that I will make a complete switch.

    Go on! Still many dragons to fight…

    Pieter Kubben, MD
    resident in neurosurgery
    Maastricht University Hospital
    The Netherlands

  19. woody Dolphin, from Colorado USA says: (permalink)
    August 28th, 2006 at 7:59 pm

    Like I tell my daughter, “Lick your wounds, re-group, and come back stronger.” Now that theres a major blunder under your wing, it’s imperative that you come back stronger. This is what champs are made of.

    Also consider of pre-releasing (by a couple days or so) to a team who’s setup to run a series of tests. By all means include some lay people, they will really do some out of the ballpark things, giving you some real life feedback.

  20. Problems with Ubuntu Upgrade « dreaming spires says: (permalink)
    August 28th, 2006 at 10:49 pm

    [...] Ubuntu founder Mark Shuttleworth has some interesting comments here. Here’s a brief sample: If there is a silver lining to the error, it is that it happened during the one week in six months when we have the core distribution development team together in one place. This gave us the opportunity not just to analyse and fix the issue, and to talk about the sequence of events that led to the problem, but also to discuss the processes we must improve to further reduce the likelihood of a repeat. The team is now more aware than ever of the responsibility we assume given extraordinary rate of adoption of Ubuntu. [...]

  21. jonobacon@home » Transparency in process says: (permalink)
    August 31st, 2006 at 12:50 am

    [...] As I finish up my few remaining days at OpenAdvantage, a few people have mailed me with comments and thoughts about the recent update debacle with Ubuntu. Personally, I have not wanted to blog about it as I have not had a huge amount to bring to the discussion, but Mark’s post brings up some issues I do want to talk about. Now, I must stress here that I am not privy to any internal strategy at Canonical about this issue, I haven’t even started working there yet, and my blog is most certainly not a platform for me to advertise Canonical strategy, not that they would ever ask for it to be. Every word you read on this blog now and in the future are my words, so do read them as my words. [...]

  22. Fabio Leitao says: (permalink)
    September 2nd, 2006 at 4:33 am

    I was on of the first affected by the BSOD.

    Congrats on the prompt response upon the serious mistake. But, on the other hand, tsk tsk tsk.

    ;-D

    Don’t worry, I still love Ubuntu and have been using it heavely on Desktops, Laptops and Servers at home and work alongsinde with XP and 2K3 Win boxes (it depends on the software, such as real DWG CAD are only on Windows so far).

    I’ve been an on and off linux user/admin since 1994 and had never had so much plesure (instead of the usual geek pain) as I have expireienced with Ubuntu this last year or so.

    I am seriously willing to take part on such group of special bread of users for advance code test on such non essential updates (specially at my home machines)

  23. Sveinn Valfells says: (permalink)
    September 3rd, 2006 at 11:25 am

    Hi,

    SNAFU’s are invevitable in most areas of human activity, for a relatively young organization you dealt with this one pretty well.

    Don’t take it too hard on yourselves.

    Sveinn
    (Running ubuntu since November, 2005)

  24. Orlando says: (permalink)
    September 4th, 2006 at 2:31 am

    I first used suse, then slackware, and now i have 4 linux machines at home, 3 run ubuntu and 1 still runs slackware.
    Every week/month I convince someone to move to linux [Now i tell them to use ubuntu]

    This past week during the fatal X crash I had convince my sister to use ubuntu on her old laptop. [only one she has]
    I was on vacation at her house when i was telling her what to do when she sees the update manager icon. And bingo, x did not work, but then i showed her how fast there was a fix, in our case it was less then 5 minutes!!!!

    She is still using ubuntu.
    I was impress with the speed I found a solution and ubuntu irc channel.
    Great job guys

  25. Paul Morley says: (permalink)
    September 4th, 2006 at 3:36 pm

    Thank you for the updates, I must have hit it lucky as I had no problems with my down loading-took thee hours but who cares. System is now running great doing all the things I want it to do,
    Best Wishes, Paul

  26. John L. Peatfield says: (permalink)
    September 7th, 2006 at 1:03 pm

    As a new user of Ubuntu, it has been a bumpy ride and this was another such bump. I loaded Kubuntu on another hard drive booted and was ready for action in just 20 minutes. I gained asscees to my former primary hard drive and started moving all my data to CD’s and DVD’s. This is due to no as yet acquired knowledge of the Kubuntu Linux Command line and knowing there was a problem or a fix for it.

    I limped along not wanting to fill out my new install since I was going to tear it down again and go back to my significantly larger primary hard drive. So I deecided to take the time to investigate firsthand, the Kunbuntu Operating System of Linux. This helped ultimately in full data recovery! All my E-Mail and even my Internet Bookmarks! They’re so hard to rebuild but!, that was not necessary since I ws able to recover them. I also now know what system folders to include in backup.

    So how did I fair in this issue in my eyes? Well, very well! I am more knowledgeable about Kubuntu/Ubuntu, I have all my data, and didn’t really have any downtime to speak of. That’s a really good “New User” experience. One that will never happen as a Microsoft Windows User. I know! – I can’t tell you how many times I reloaded Windows 3.10 from 14 floppies in the same amount of time I have been using Kubuntu.

    The Internet upgrade from 5.10 to 6.06 went bad too. I corrected the Sources.List file per the instructions on your Web-Page and it failed. I saw the Linux KDE Desktop do something I have only seen in Windows. It went numb! I knew I would not come back to a Desktop evironment the next time I booted. So I stayed up and backed-up. I restarted and only came to a command line with the complaint that X could not start KDE.

    I had also made an Install disk of 6.06 and so I just re-installed Kubuntu 6.06. Having all my data safe and not really knowing how to fix the problem.

    Each failure of the Kubuntu Operating System has left me in a better place than the one I was in before the failure. In other words, my user experience was better after the issue than before. That has never been my experience in the Microsoft Windows experience.

    So how do you find fault with this experience? Yes, it has been frustrating but, I am enjoying my best user experience thanks to the release of the experimental X-update. So in what some my thnk to be twisted, thanks for messing me up! My Kubuntu experience is richer for it.

    Please feel free to use this as an endorcement of Ubuntu.

    John L. Peatfield U.S.A.

  27. Jim says: (permalink)
    September 8th, 2006 at 8:18 pm

    Please advertise on TV in America, we need Ubuntu to replace the garbage OS that’s dominate here.

  28. Phil Stone says: (permalink)
    September 10th, 2006 at 9:37 pm

    You’re still releasing junk into the biosphere…

    With today’s updates I find Amorok is now 1.4.2 and
    sorely broken, no flak, collections gone, who knows what
    else.

    You gotta stop this… You say 6.6 is Long Term Support, then support it, stop changing it.

    Don’t add new versions into released editions of Kubuntu. Only security patches, maybe bug fixes, but not when some package has a rewrite. You’re using a million desktops as your final test.

    You gotta have a standard ability to install older, known working,
    versions, without the need for humans at the server to uninstall a bad upgrade. Adept doesn’t let me back off 1.4.2 and go back,
    to, say 1.3.X…

    Phil

    [MarkShuttleworth:  I think you have enabled dapper-backports on your system, which says you *want* to get newer versions of stuff from Edgy, compiled for Dapper, installed on your system.]

  29. Joe Bong says: (permalink)
    September 11th, 2006 at 6:56 pm

    @Phil:

    “Don’t add new versions into released editions of Kubuntu”

    If you don’t want new versions of software, then by all means don’t upgrade, the choice is yours. All software has the potential for bugs, holding back new versions from everyone so they have to compile from source to get to the next version is not my idea of linux for human beings.

  30. Steve Weed says: (permalink)
    September 12th, 2006 at 7:38 pm

    The guy above posted on Sep 10 about AmaroK 1.4.2, but did he know that on Kubuntu.org on the 6th of September AmaroK 1.4.3 was made available for Kubuntu?

    http://kubuntu.org/announcements/amarok-1.4.3.php

    People need to chill, read, and chill some more, because chilling is good.. :) There’s also ubuntuforums.org to vent frustrations rather than cluttering a nice blog with such tripe.

  31. David MacIntosh says: (permalink)
    September 14th, 2006 at 9:29 am

    Phil:

    I’m not sure that static software in release editions is the answer, but stable software in release editions certainly is. I say don’t add new versions into release editions unless the only realistically perceivable changes are pleasing to 99.99% of users… I say bug test like your life depends on it and don’t release updates that force users to change the way they’re used to doing things.

    Mark Shuttleworth:

    Releasing poorly tested updates for a ‘stable’ release is inexcusable.

    Pragmatic Linux enthusiasts [like myself] will hesitate to recommend and use an OS in any professional capacity if it is undependable. These ‘mishaps’ scare off new users and push away the fringe of the current user base.

    In the hierarchy of development demands, a policy to deliver and maintain flawless internal stability to users who require it must trump other concerns, even in virtually all security situations. For example, MS is notoriously slow in patching vulnerabilities, but to the average user that’s not half as bad as faulty updates. If you promise stability and deliver less, people get incredibly pissed, especially when there is nobody to blame but the source, and especially when that source is already struggling to build a reputation in the areas of usability, trust, and practicality.

    Faulty updates are a death sentence for Ubuntu.

  32. Caroline Ford says: (permalink)
    September 22nd, 2006 at 3:55 pm

    I’m concerned that this happened again on September 14 and what this says about our QA procedures (or possible lack of).

    http://ubuntuforums.org/showthread.php?t=257459 refers to the nVidia breakage caused by USN-346-1

    The comments on Malone (https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/60433) cause concern as it looks like no-one checked the patch before it went out.

    As broken security patches cause more problems than the exploits they are designed to fix (and this has happened TWICE in a month) we seem to have a problem.

    I don’t expect developers to release 100% perfect patches (they are as human as the rest of us) but we need to have another pair of eyes looking over patches like this. Kernel patches signed off by two members of the core development team? Patches being installed successfully on our supported architectures before being released in the wild?

    The current setup appears bugged in any case..

  33. John Doe says: (permalink)
    September 25th, 2006 at 1:44 am

    “Faulty updates are a death sentence for Ubuntu.”

    And yet Windows 98 had blue screens of death often, (without an update available that I could ever see that fixed this problem) and people accepted it for how many years? I didn’t see a death sentence there, just more sardines lining up to buy an operating system that crashed often.

  34. Kick Bill » Blog Archive » El Bug#1 says: (permalink)
    September 25th, 2006 at 5:01 am

    [...] Todo lo anterior esta muy bien, pero trae implícito cierto riesgo: el confundir “facilidad de uso” con “igual a Windows”. Con esto quiero decir que eventualmente en la búsqueda de hacer las cosas más user friendly, podría acercarse demasiado al modo-Windows-de-hacer-las-cosas y caer en prácticas que reduzcan aquella diferenciación en calidad que caracteriza a las distros Linux (un peligroso acercamiento a esto lo tuvimos con el pantallazo azul que daba al actualizarse a dapper, tema que el mismo Mark Shuttleworth trató en su blog). [...]

  35. Paul Cubbage says: (permalink)
    October 6th, 2006 at 7:19 pm

    Dissonance:

    “We can’t afford to take risks with our user’s trust, but I balance that with the need to continue to improve the desktop.”

    How can you “balance that”? Does that mean you will “take risks with our user’s trust” when some real cool feature pops up (or come Canonical need drives it)?

    I’m not trying to wordsmith you. I appreciate your frankness. Frankness has it’s own risks as everything you say has to be taken at face value. I’ve been bitten on that one myself.

    I like ubuntu as a distro and have installed it in several places.

  36. Why Not Just Reinstall? « dreaming spires says: (permalink)
    October 30th, 2006 at 4:53 pm

    [...] This result actually echoes your DistroWatch maintainer’s experience – during the upgrade procedure of two machines last week, one went without any major trouble, while the other required several hours of fiddling with dpkg and performing manual resolution of dependencies before the box was made to boot into Edgy. It is hard to pinpoint the cause of the problems at this stage, but they indicate continuing quality control problems at Ubuntu, despite an earlier promise to set up mechanisms to prevent any future update disasters. Nevertheless, once installed, Edgy appears to be a highly usable release, perhaps not as “edgy” as we were led to believe at the start of its development process, but still fairly up-to-date and certainly beautifully crafted. Just remember to download an installation CD in case your upgrade experience turns sour and you have to re-install. [...]