Now that Skype has recovered from the serious worldwide outage on December 22nd, CIO Lars Rabbe has blogged about what went wrong behind the scenes. The problem began when a group servers that handle Skype's offline instant messaging became overloaded, in turn causing some Skype clients to receive delayed responses and crash.
The bug only affected users of version 188.8.131.52 on Windows -- unfortunately, almost 50% of Skype's users happened to be running that version. Included among those were more than a quarter of Skype's supernodes. As those supernodes failed, an increased load was placed on the remaining supernodes. The increased load coupled with a flood of users attempting to reconnect eventually caused Skype's network to collapse under the weight of traffic that was 100 times greater than normal.
In order to prevent future outages, Skype is looking closely at their process for testing and deploying new versions and Rabbe also promised increased investment to boost capacity and reliability.