


To solve this, the team then focused on both algorithmic optimizations as well as techniques like loop vectorization beyond what the compiler could achieve. Microsoft's approach with Satin, whatever it looks like in detail, poses a computational challenge, explain Microsoft engineers, since both the encoding and the decoding steps are computationally intensive. The signal reconstructed using this DNN has only slightly lower quality to that of a 16KHz-filtered audio signal. In WaveNet's case, a DNN is trained to upsample to 24KHz an audio signal filtered through a 8KHz codec. Microsoft hasn't released many additional details about their approach with bandwidth extension through the use of a neural network (DNN), but a similar technique is used by Google WaveNet text-to-speech synthesizer. Satin uses deep neural networks to estimate the high band parameters from the received low band parameters, and a minimal amount of side information sent over the wire. This is made possible by more advanced models of speech production and psychoacoustics. This improvement is made possible by a two-fold approach: on the encoding end, only a sparse representation of the audio signal is processed. As a comparison, Silk could only provide a 4KHz bandwidth at the same 6kbps bitrate.

Additionally, reducing the bandwidth required by audio conversation has the benefit of increasing the bandwidth available for other concurrent tasks by the same users or other people sharing the same network connection.Īfter all these years, it turns out that utilization of available bitrate is every bit as important today as it was in the dial-up world.Īccording to Dani and Srinivasan, Satin can cover frequencies up to 16KHz, which is dubbed "super wide band" and doubles the usual 8KHz bandwidth used for human speech sampling, at just 6kbps. While high-connectivity is widely available today, 3G and 4G cellular networks often limit the quality of conversation, with over 50% packet loss and sporadic loss of coverage, explain Microsoft's Jigar Dani and Sriram Srinivasan. Microsoft announced Satin, a new audio codec that leverages AI techniques to outperform Skype's Silk codec over ultra-low bandwidth and highly constrained network conditions.
