LPC10 Speech Coding

Problem Description

Some applications need speech transmitted over a medium with limited bandwidth. With severe limitation, bit rates of the coded speech must be as low as possible, often sacrificing quality, but remaining intelligable.

Past Solutions

Various voice coding schemes exist to fit many different applications. The schemes vary by bit rate and percieved quality. A some voice coding standards may be found in Realtime speech and voice trasmission on the Internet . LPC10 is not mentioned but remains one of the ultra-low bit rate standards. Many variations exist of LPC coding of voice. Some notable alterations vary the bit rate based on the type of speech.

Presented Solution

This project attempts to make a near real time LPC10 encoder with ~2.4Kbps bit rate to solve the problem. A few alteration were made in an attempt to be efficient and perhaps improve speech quality.

Since I could not find one concrete standard for LPC10 coding, I made a few alterations to what was explained in class, as well as the "standard" outlined in Introduction to Data Compression by Khalid Sayood . LPC Vocoder GUI ver. 1.0 gave a lot of insight into how data should be windowed and overlapped as well as many of the computations involved in coding voice. It did not include any help with automatic pitch detection, unvoiced, voiced, silence detection, "real time" voice acquisition, quantizing data for network transmition, nor the network transmission itself.

Quantizing the data for transmission, transmitting the data (and the reverse process), and pitch detection have not been implemented. Please see the code and other sections for more details on implementation.

Initially wavrecord was used to record each window. This was a very bad idea since each start and stop of the recording has a lot of distortion or delay which resulted in the aggregated signal being choppy. In an attempt to get rid of distortion and make it more real time, I chose to use an audiorecorder where audio can be recorded while I process and only need to be paused when copying new samples from the buffer. It is still causes some choppiness but is much less noticable. Making the application outside of Matlab would allow a non-stop recording through the duration of use.

[h,o] = Speak(5); is the syntax to run the program. 5 is the number of seconds to record the voice. h is the synthesized signal and o is the original signal.

Simulation Results

The LPC10 coded speech is intelligable but sounds very robotic. It will likely sound worse after quantizing and unquantizing the data (instead of using the data directly as the program currently does). Changing the pitch for synthesizing from the hard-coded value of 100Hz to the pitch detected for the window should improve the quality.

Below is an example graph taken from a test run of the program. Clearly the synthesized speech from the LPC10 parameters gives a pretty good match to the actual speech. Note the flat line silence when I was not talking. Even though I detect if something is voiced or unvoiced, feeding in a random noise to the filter when the window is unvoiced is not implemented but should not be hard.

Voice Examples

Changes from the Past

I used 40 frames per seconds (each being 25ms long) and instead of representing each from with 62 or 54 bits, I chose 56 bits because it is nice round factor of eight. I chose to overlap my windows by 50%. Silence is detected by energy below a threshold which was found through experimentation in my lab environment. Voiced/unvoiced decision is made based solely on zero crossings. This threshold was also found via experimentation. My quantizing differs from the "standard" by allocating an extra bit to the two most significant LPC coefficients as well as one more bit to the 9th coefficient.

Appendix

Source Code