The wonderful world of voice codecs

Written by Daniel Noworatzky | Jan 23, 2019 3:48:00 PM

Demystifying codecs, part 2

Different VoIP codecs digitize sound differently. Knowing which codec to use for a given application will allow you to better optimize your telephony implementations.

In this article, we examine the most commonly used codecs for VoIP, including the the G.711, G.729, G.722, G.726, iLBC and Opus codecs, and identify the situations for which each one is best suited.

Codec Types

Simply put, codecs are standards used to digitize audio. As an introduction, check out our recent article, Demystifying Codecs, Part 1: Digitizing the Human Voice, which describes the fundamental principles of voice digitization that are essential for understanding how codecs work.

While there are well over a hundred voice codecs available today developed by multiple suppliers, those most commonly used for VoIP have been standardized by the Telecommunication Standardization Sector of the International Telecommunications Union (ITU-T) and the Internet Engineering Task Force (IETF).

The ITU-T has developed the G series of codecs. Originally introduced in the early 70s when digitization was first implemented in telephony, they have been continually updated and advanced to be used effectively in VoIP applications. They are the most commonly supported codecs in VoIP devices today.

The IETF has introduced two newcomers into the codec world: Opus and iLBC, both of which have been developed specifically for use over the internet and are quickly gaining ground among VoIP vendors.

Codec Glossary of Terms

Let’s start by reviewing some basic terminology. Key terms not included here can be found in the article, Demystifying Codecs, Part 1: Digitizing the Human Voice.

Bitrate – The amount of bandwidth that is used by the transmission of voice packets of a particular codec. Bitrates do not include the IP packet headers, which are typically 20 bytes per packet.

Lossless audio compression – Compression of digitized voice that allows data to be represented with fewer bits, but without losing any information. These compression algorithms take advantage of statistical redundancies in data to rebuild digitized voice during decompression.

Lossy audio compression – Compression of digitized voice with some loss of information. This is not always discernable to the human ear and thus is not necessarily bad. The voice quality depends on how well the compression algorithm and other codec parameters compensate for the lost information.

Narrowband codecs – These are codecs that typically digitize sound frequencies between 300 and 3400 Hz, which is the approximate frequency range of the human voice. This limitation of frequencies causes voice to acquire its characteristic “telephone-like” tone. Music, on the other hand, uses much higher frequencies, which are truncated by narrowband codecs – this causes music to sound distorted when heard over the telephone.

Wideband codecs – These are codecs that digitize a greater frequency range of the audio spectrum, resulting in a clearer and more natural-sounding voice. The frequency range covered is typically between 50 and 7000 Hz.

Fullband codecs – These are codecs that use the full frequency range available to the human ear, between 50 and 20000 Hz. These codecs can be used for all sound, including music.

Mean Opinion Score (MOS) – A measure of the quality of the sound of a particular codec measured from 1 to 5, where 1 is bad and 5 is excellent. MOS can be subjective and may vary for a particular codec depending on who is scoring. As such, these values should be taken with a grain of salt.

Voice payload size – The typical size of a single voice packet of a particular codec. This can be expressed in either bits/bytes or in milliseconds of digitized voice.

Voice Activity Detection (VAD) – VAD is a technique in which the absence of speech in a voice conversation is detected. During these silent portions of a conversation, sampling and digitization of sound is paused, thus statistically reducing required bandwidth and CPU resources.

Companding – This is a method of signal processing that is used to mitigate the detrimental effects of sending a signal through a channel with a limited dynamic range. It is employed by codecs to counteract the limitations of the frequency ranges provided by narrowband codecs.

Codec Descriptions

Below we examine the most commonly used codecs for VoIP, their features, attributes, and the applications for which they are most commonly used.

G.711

This is one of the most mature and widely used audio codecs for VoIP, supported by the vast majority of VoIP devices. Its parameters are based on analog telephony quality – traditionally known as “toll quality” voice – and was originally used in the digitization of voice for ISDN circuits. This is a good choice for networks where traditional telephony quality is acceptable and there is enough bandwidth in the internal network infrastructure to support the expected number of 64 Kb/s voice conversations. For WANs, where bandwidth is more expensive, codecs with lower bitrates are preferable. Although enhancements are available for the G.711 codec, they are not often supported by VoIP devices.

There are two slightly different versions known as μ-law and a-law, which refers to the algorithm used to perform companding. The μ-law algorithm is used in the United States, Canada and Japan, while a-law is used everywhere else in the world. This is useful to know for compatibility issues that may arise with systems from other countries.

Pros: Supported by many VoIP equipment vendors, simple to employ, uses very few CPU resources, mature and dependable

Cons: Large bitrate for low bandwidth links, limited to toll-quality voice, enhancements not generally supported by vendors

G.729

G.729 is an ultra-low bitrate codec that delivers a drastic reduction in bitrate and payload size for a small decrease in voice quality. Compared with G.711, it has a large bit depth, and employs a lossy compression algorithm. The result is a codec that is excellent for use in limited bandwidth networks such as WANs. Like G.711, it is one of the most widely used VoIP codecs in the industry. Also like G.711, enhancements are not generally supported.

Pros: Many VoIP equipment vendors support it, very low bit rate, tiny payload size, relatively good MOS

Cons: Slightly lower quality than traditional analog telephony, lossy compression used, limited to less-than-toll-quality voice with little support for enhancements, inadequate for Music on Hold (MoH)

G.722

As a wideband codec, G.722 delivers more natural sounding voice compared with G.711 and G.729, and at similar or even better bitrates than G.711. As network bandwidths continue to increase, wideband codecs like this one can vastly improve the telephony experience and are increasingly being adopted into VoIP systems. Even though it uses lossy compression, this is counterbalanced by the high sampling rate and enhanced compression algorithms used. Like G.711, it is not recommended for limited bandwidth links like WANs.

Pros: High-quality sound, excellent MOS, large audio frequency range, high sampling rate and bit-depth, excellent for Music on Hold

Cons: Not as extensively supported as other G series codecs, unsuitable for WANs

G.726

G.726 is similar in its application, quality and attributes as G.711, providing somewhat lower quality at half the bitrate. It is primarily used for the Digital Enhanced Cordless Telecommunications (DECT) wireless telephony standard for cordless phones. It is flexible in that it supports various bit rates and qualities of voice, depending on the requirements of the application.

Pros: Near toll-quality voice at half the bitrate of G.711, ideal for wireless DECT applications, flexible in bitrate and quality settings

Cons: Supported primarily by DECT devices, generally not configurable since it is locked in by DECT manufacturers

Internet Low Bitrate Codec (iLBC)

iLBC offers excellent toll-quality voice for a very low bitrate. Moreover, it employs something called “graceful speech quality degradation” to deal with lost frames, which can occur in connection with lost or delayed IP packets. This makes it remarkably forgiving in adverse network environments. This method “fills in the gaps” using a predictive algorithm. The CPU load required is similar to that of G.729. Although supported by many innovative VoIP device vendors, its initial purpose was to be used on the internet within the framework of WebRTC, allowing VoIP to function using a webpage application or browser as an end device.

Pros: Low bitrate, good quality, relatively low CPU usage, allows interoperability with web voice applications

Cons: Not as widely supported by more traditional VoIP equipment vendors (although this is changing), no wideband option available

Opus

Opus is one of the most versatile audio codecs available today. It can be configured with ultra-low bitrates for networks with the most limited bandwidths, and also includes fullband audio with CD-quality sound. While most VoIP devices don’t support the full range of the codec’s capabilities, they often offer many of its available features.

To fully appreciate the wide range of features this codec supports, take a look at Opus’ audio examples on their website. You can adjust parameters and listen to the resulting sound for both voice and music.

Pros: Extreme flexibility, wide range of options, useful for almost all VoIP applications

Cons: Not as widely supported by more traditional VoIP equipment vendors, but this is changing

Conclusion

Between this article and its precursor on digitizing voice, we hope we have succeeded in shedding some light on different VoIP codecs and the applications they are best suited for. In the future, codecs will continue to evolve and offer even more features and flexibility, resulting in better performance in adverse network environments, while at the same time reducing resource requirements of end devices.