Residual Excited Linear Predictive Coding

Santosh F Bankar; Prof. N.R Kolhare

doi:10.17577/IJERTV4IS050982

Volume 04, Issue 05 (May 2015)

Residual Excited Linear Predictive Coding

DOI : 10.17577/IJERTV4IS050982

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 194
Total Downloads : 410
Authors : Santosh F Bankar, Prof. N.R Kolhare
Paper ID : IJERTV4IS050982
Volume & Issue : Volume 04, Issue 05 (May 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS050982
Published (First Online): 26-05-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Residual Excited Linear Predictive Coding

Mr. Santosh F. Bankar

Electronics & Telecommunication department Government Engineering College Aurangabad, India

Prof. N. R Kolhare

Electronics & Telecommunication department Government Engineering College Aurangabad, India

Abstract – In this paper we present a low bit rate voice coding technique called the residual-excited linear prediction (RELP) coding. It uses 10th order Levinson-Durbin Recursive algorithm. It provides very good and accurate estimates of speech parameters and is relatively efficient for computation. In the RELP system, vocal tract modeling is done by the LPC technique, and the LPC residual signal is used as the excitation signal.

The range of the transmission rate is reduced to 9.6 kbits/s the synthetic speech in this range is quite good. As the transmission rate is lowered, the synthetic speech quality degrades very gradually. Since no pitch extraction is required, it is robust in any operating environment .The speech signal of males and females were coded and the results showed that the coding technique gives good speech quality with low complexity.

Keywords Residual excited linear predictive coding.

INTRODUCTION

Speech coding has been and still is a major issue in the area of digital speech processing. Speech coding is the act of transforming the speech signal to a more compact form, which can then be transmitted with a considerably smaller memory. Speech compression is required in long distance communication, high quality speech storage, and message encryption. For example, in digital cellular technology many users to share the available system. Another example where speech compression is needed is in digital voice storage. For a fixed amount of available memory, compression makes it possible to store longer messages.

Speech coding is a lossy type of coding, which means that the output signal does not exactly sound like the input. The input and the output signal could be distinguished to be different, several techniques of speech coding such as LPC, wave form coding and Sub band coding exist. The speech signals that need to be coded are wideband signals with frequencies ranging from 0 to 8 KHz.

The LPC approach is an analysis-synthesis method of coding in which the excitation (pitch) and the vocal tract modeling are treated separately. In the LPC approach the vocal tract is modeled by a time-invariant, all-pole recursive digital filter over a short time segment (typically

10 to 30 ms). The time variant character of speech is handled by a succession of such filters with different parameters. The excitation is modeled either as a series of pitch pulses (voiced) or as white noise (unvoiced).

There are, two types of linear prediction coding techniques: pitch-excited and residual-excited. The major difference

between these two types lies in how the excitation signal for the synthesizing filter is characterized. In a pitch excited LPC coding the vocal tract, glottal flow, and radiation are represented by the prediction coefficients. Those coefficients are transmitted together with the information regarding excitation of speech, that is, the fundamental frequency or pitch Fo, the voiced/unvoiced (V/UV) decision, and a gain A extracted from either the residual signal or the speech input.

In the residual-excited LPC coding the vocal tract is Characterized in the same way as in the pitch-excited one. However, instead of the excitation feature properties (V/UV, and A) being extracted and transmitted, the residual signal (or the prediction error signal) is encoded and transmitted.
HUMAN SPEECH PRODUCTION

Speech coding algorithms can be made more efficient by removing the irrelevant information from speech signals. In order to design a speech coding algorithm, it is necessary to know about the production of human speech, its properties and human perception of speech signal, so that the redundancies and the irrelevant parts of these signals can be identified.

A speech signal is produced in three stages: first of all, air flows outwards from the lungs; then the air flow is modified at the larynx; and finally further constriction of the airflow occurs by varying the shape of the vocal tract. The simplified diagram of vocal tract is shown below.

Figure 1. Vocal tract

For certain voiced sound, vocal cords vibrate. The rate at which the vocal cords vibrate determines the pitch of voice. For certain fricatives and plosive(or unvoiced) sounds,

vocal cords do not vibrate but remain constantly opened. The shape of vocal tract determines the sound. As one speaks, vocal tract changes its shape producing different sounds.
RELP CODER

Figure 2. RELP analysis (Tx)

Predictive coding is a model based approach. Here it is suggested that an estimate x^(n) for a signal sample value x(n) can be predicted from past sample values using a model P. such that

Depending on the type of the predictor P, there is some estimation method, which can be used to find parameters or coefficients of the model such that mean square error is minimized. Typically, the error is the expectation of the energy of a prediction error signal, given by

If the model is simply a linear weighted combination of previous sample values given by,

This formulation leads to a technique which is linear prediction LP. The coefficients of the filters are estimated from an input so that the energy of prediction error is minimized by using Levinson durbin algorithm. This is performed usually in frames and the filter is updated after each interval of an analysis frame which is also called as segments.
LEVINSON DURBIN ALGORITHM

An efficient algorithm known as Levinson Durbin algorithm is used to estimate the linear prediction coefficients from a given speech waveform.

Assume that the present sample of the speech is predicted by the past M samples of the speech such that

Where x^(n) is the prediction of x(n), x(ni) is the i th step previous sample, and {ai} are called the linear prediction coefficients. The error between the actual sample and the predicted one can be expressed as

The sum of the squared error to be minimized is expressed as

We would like to minimize the sum of the squared error. By setting to zero the derivative of E with respect to ai (using the chain rule), one obtains

It results in M unknowns in M equations such that

If there are N samples in the sequence indexed from 0 to N1 such that {x(n)} ={x(0), x(1), x(2), x(N2), x(N1)} it can be approximately expressed in terms of matrix equation.

Where

The sum of squared errors of the M-th order prediction (or simply the M-th order prediction error) can be rewritten as

Equation can be rewritten as

Because of equation, the second summation is zero. Thus, the final expression of the prediction error becomes

We now want to develop a recursive method to solve Topletz matrix.

Initial values:
RELP ANALYSIS MEAN SQUARE ERROR:

Figure 2. Original and Â½ rate constructed signal

For Â½ rates: MSE=0.1891 PSNR=7.2336

Figure 3. Original and Â¼ rate constructed signal

For Â¼ rate: MSE=0.1898 PSNR=7.2163

The MSE is given by,

MSE={ err2}/N

The difference between the original signal and the reconstructed signal is Error signal, which is denoted as

err.

Power Signal to Noise Ratio:

PSNR=10log10 {[max (A)]/MSE}

Where A=samples of original signal.

Power signal to noise ration compares the level of a desired sgnal to the level of background noise.
CONCLUSION

We carried out speech compression for Â½ rate and Â¼ rate in RELP the quality of compressed signal obtained was nearly same and the MSE and PSNR was also nearly same so with this we can conclude that with Â¼ rate of compression also the speech quality produced is very good.

REFERENCES

Zarkadis, D.J.; Evans, B.G, Performance considerations of a 9.6kb/s RELP coder IEEE Trans, pp.172-177, August 2002.
Katterfeldt, H., A DFT-based residual-excited linear predictive coder IEEE INFOCOM 2003, pp.824-827, January 2003.
Katterfeldt, H.; Behl, E., Implementation of a robust RELP speech coder, IEEE 1983, pp.1316-1319.
Chong Un; Magill, D., The Residual-Excited Linear Prediction VocoderIEEE comm., pp.1466-1474, jan.2003

Residual Excited Linear Predictive Coding

Leave a Reply