On the German show Bits und so two of our regulars are remote. Leo in Wiesbaden, Germany, and Alex in Helsinki, Finland. The studio is in Munich, Germany. For the past couple of years, they’ve joined the conversation via Skype, as so many podcasters do. The reliability and quality of this service has been flaky, and even worse, it seems that nobody at Skype seems to care. Apart from terrible UI updates, the underlying codecs and protocols don’t seem to get any attention in a shipping client. A new feature we tried using for a while was paid multi-user video chat, which always resulted in inexplicable audio quality degradation. We had to switch back to one person per Skype machine.
I’ve been evaluating alternatives such as iChat (remember that?), Facetime, SIP software phones, ISDN hybrid codecs for a long time, but despite the pain, none would even come close to the relative ease of use, reliability and quality of Skype. And Skype is horrible. Go figure.
For the past three months, we’ve been evaluating Mumble 1.2.4 with Opus support and won’t be looking back at Skype anytime soon. Read more details on Opus at the Auphonic Blog.
Mumble appears to have been developed, similarly to Teamspeak, with gaming in mind. The idea is to start the client, connect to a server and let the connection run in the background, while you play your game.
Mumble’s server (named “Murmur”) can be run locally provided you have sufficient upstream bandwidth. In most cases, you will want to run the server in the cloud. We tried both cheap virtualized boxes and a dedicated high end machine and didn’t really see any differences. CPU load should be minimal, and as long as you’re talking about a handful of participants, network bandwidth also will be minimal percentage wise on a 100 MBit/s connection.
Our server now is located somewhat centrally between all participants, in Falkenstein, Germany. Geographically and network topology wise.
By having an open channel, participants can dial into that channel whenever they want – the host doesn’t need to take action when somebody loses their connection and needs to reconnect.
The defaults in the Mac client are a little odd. We found that disabling all sound processing, transmitting continously and setting “Audio per packet” to the minimum value of 10 ms, we achieve our goal of a high quality, low latency connection that will not cut out when multiple people are talking at once. This of course requires all ends to minimize all crosstalk from headphones to microphones, or all echo will be transmitted as well.
There’s also a bunch of notification sounds that need to be disabled on both ends.
Mumble also comes as an iPhone app (no Opus support in the App Store yet, need to build it yourself from source) which enables high quality voice calls even on a 3G connection. Again, weird default settings that need some massaging. Turn on continuous transmission and disable any sound processing.
SILK, Skype’s standard audio codec is specified at a maximum sampling frequency of 24 kHz. This means that frequencies above 12 kHz will be cut off. This gives you the typical Skype sound you’re used to. The maximum bitrate is 40 kbit/s for SILK.
But we really don’t need to be restricted by this codec nor by very low bitrates. Even on the somewhat archaic DSL line in Finland, we have 1 Mbit/s upstream.
With Opus over Mumble we gain the ability to set the audio bitrate manually. We found that anything over 70 kbit/s sounds really great, and 40 kbit/s still is much better than Skype on a good day. Also, the signal contains frequencies up to 16 kHz, implying there may be a low pass or some 32 kHz sampling frequency in the signal path. Unlike Skype, there’s no level compression or hard limiting built-in, which means that you should be extra-careful not to clip the input and that you need to take some extra care in processing the signal yourself in post or on a hardware audio processor.
Opus has a default codec latency of 22.5 ms, configurable down to 5 ms, although I’m not sure which value is used in the current implementation. Anyway, it’s lower than 25 ms with SILK.
Syncing with Skype Video
So you want to look at each other or want to broadcast the webcam image as well? Just run a Skype video session in parallel, with mics and Skype audio muted. This will reveal the higher latency Skype provides over the very same connection. The result is a very visible and audible ~100 ms delay of the video. Usually, that might be a deal breaker, but in our setup this is actually really great: Our little HDMI cameras that record the images from the studio introduce a similar visual delay and lag behind the audio. All we need is to delay audio for 80 to 100 ms and all sources will be in sync again.
Optimizing UDP Latency
With increased manual control over the protocol, we can try to optimize latency even further. As Murmur will not mix the audio on the server and provide each participant with their own mix, but will only forward individual streams to all participants, you may look into running the server locally, to minimize latency from Murmur to your local instance of Mumble. Local latency on the loopback device should be well under 0.1 ms. This would provide you with a minimal latency “truth” on your mixing board.
In our case though, this theoretical advantage does not help in the real world.
Comparing ping times from Helsinki to Falkenstein (avg 90 ms) to ping times from Helsinki to Munich (avg 160ms) and a quick traceroute shows that the Finnish ISP has better peering to Germany over DE-CIX to Falkenstein and a much slower route, via Sweden, to the network of Deutsche Telekom where my VDSL line is hooked up.
|Munich||25 ms||0.1 ms|
|Wiesbaden||25 ms||25 ms|
|Helsinki||90 ms||160 ms|
Adding the 25 ms from the studio in Munich to the server in Falkenstein gives us a better latency to Helsinki (25 ms + 90 ms = 115 ms) than directly from Helsinki to Munich (160.1 ms)
After 10 episodes (combined runtime: ~30 hours) we’re quite confidently using Mumble as a vastly superior alternative to Skype for our purposes, enjoy much more lively conversations without anyone cutting out, reduced latency and increased audio quality. Skype is demoted to providing us with a moving image.
The increased work to set this up may not be for everyone, but for professional, regular shows this little effort will pay off in a huge way.
As we’ve seen in an ever increasing number of examples, relying on third party services will at some point come back to bite you. Whenever Skype goes belly up or turns super evil, we’re covered, at least for the audio part. The irony in that is of course that Skype is heavily involved in the development of Opus, but so far no Skype version supporting it has been published.
Update 2013-01-22: Read more on how to set up Mumble for optimized latency and audio quality.