- Open Access
- Total Downloads : 410
- Authors : Santosh F Bankar, Prof. N.R Kolhare
- Paper ID : IJERTV4IS050982
- Volume & Issue : Volume 04, Issue 05 (May 2015)
- DOI : http://dx.doi.org/10.17577/IJERTV4IS050982
- Published (First Online): 26-05-2015
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Residual Excited Linear Predictive Coding
Mr. Santosh F. Bankar
Electronics & Telecommunication department Government Engineering College Aurangabad, India
Prof. N. R Kolhare
Electronics & Telecommunication department Government Engineering College Aurangabad, India
Abstract – In this paper we present a low bit rate voice coding technique called the residual-excited linear prediction (RELP) coding. It uses 10th order Levinson-Durbin Recursive algorithm. It provides very good and accurate estimates of speech parameters and is relatively efficient for computation. In the RELP system, vocal tract modeling is done by the LPC technique, and the LPC residual signal is used as the excitation signal.
The range of the transmission rate is reduced to 9.6 kbits/s the synthetic speech in this range is quite good. As the transmission rate is lowered, the synthetic speech quality degrades very gradually. Since no pitch extraction is required, it is robust in any operating environment .The speech signal of males and females were coded and the results showed that the coding technique gives good speech quality with low complexity.
Keywords Residual excited linear predictive coding.
-
INTRODUCTION
Speech coding has been and still is a major issue in the area of digital speech processing. Speech coding is the act of transforming the speech signal to a more compact form, which can then be transmitted with a considerably smaller memory. Speech compression is required in long distance communication, high quality speech storage, and message encryption. For example, in digital cellular technology many users to share the available system. Another example where speech compression is needed is in digital voice storage. For a fixed amount of available memory, compression makes it possible to store longer messages.
Speech coding is a lossy type of coding, which means that the output signal does not exactly sound like the input. The input and the output signal could be distinguished to be different, several techniques of speech coding such as LPC, wave form coding and Sub band coding exist. The speech signals that need to be coded are wideband signals with frequencies ranging from 0 to 8 KHz.
The LPC approach is an analysis-synthesis method of coding in which the excitation (pitch) and the vocal tract modeling are treated separately. In the LPC approach the vocal tract is modeled by a time-invariant, all-pole recursive digital filter over a short time segment (typically
10 to 30 ms). The time variant character of speech is handled by a succession of such filters with different parameters. The excitation is modeled either as a series of pitch pulses (voiced) or as white noise (unvoiced).
There are, two types of linear prediction coding techniques: pitch-excited and residual-excited. The major difference
between these two types lies in how the excitation signal for the synthesizing filter is characterized. In a pitch excited LPC coding the vocal tract, glottal flow, and radiation are represented by the prediction coefficients. Those coefficients are transmitted together with the information regarding excitation of speech, that is, the fundamental frequency or pitch Fo, the voiced/unvoiced (V/UV) decision, and a gain A extracted from either the residual signal or the speech input.
In the residual-excited LPC coding the vocal tract is Characterized in the same way as in the pitch-excited one. However, instead of the excitation feature properties (V/UV, and A) being extracted and transmitted, the residual signal (or the prediction error signal) is encoded and transmitted.
-
HUMAN SPEECH PRODUCTION
Speech coding algorithms can be made more efficient by removing the irrelevant information from speech signals. In order to design a speech coding algorithm, it is necessary to know about the production of human speech, its properties and human perception of speech signal, so that the redundancies and the irrelevant parts of these signals can be identified.
A speech signal is produced in three stages: first of all, air flows outwards from the lungs; then the air flow is modified at the larynx; and finally further constriction of the airflow occurs by varying the shape of the vocal tract. The simplified diagram of vocal tract is shown below.
Figure 1. Vocal tract
For certain voiced sound, vocal cords vibrate. The rate at which the vocal cords vibrate determines the pitch of voice. For certain fricatives and plosive(or unvoiced) sounds,
vocal cords do not vibrate but remain constantly opened. The shape of vocal tract determines the sound. As one speaks, vocal tract changes its shape producing different sounds.
-
RELP CODER
Figure 2. RELP analysis (Tx)
Predictive coding is a model based approach. Here it is suggested that an estimate x^(n) for a signal sample value x(n) can be predicted from past sample values using a model P. such that
Depending on the type of the predictor P, there is some estimation method, which can be used to find parameters or coefficients of the model such that mean square error is minimized. Typically, the error is the expectation of the energy of a prediction error signal, given by
If the model is simply a linear weighted combination of previous sample values given by,
This formulation leads to a technique which is linear prediction LP. The coefficients of the filters are estimated from an input so that the energy of prediction error is minimized by using Levinson durbin algorithm. This is performed usually in frames and the filter is updated after each interval of an analysis frame which is also called as segments.
-
LEVINSON DURBIN ALGORITHM
An efficient algorithm known as Levinson Durbin algorithm is used to estimate the linear prediction coefficients from a given speech waveform.
Assume that the present sample of the speech is predicted by the past M samples of the speech such that
Where x^(n) is the prediction of x(n), x(ni) is the i th step previous sample, and {ai} are called the linear prediction coefficients. The error between the actual sample and the predicted one can be expressed as
The sum of the squared error to be minimized is expressed as
We would like to minimize the sum of the squared error. By setting to zero the derivative of E with respect to ai (using the chain rule), one obtains
It results in M unknowns in M equations such that
If there are N samples in the sequence indexed from 0 to N1 such that {x(n)} ={x(0), x(1), x(2), x(N2), x(N1)} it can be approximately expressed in terms of matrix equation.
Where
The sum of squared errors of the M-th order prediction (or simply the M-th order prediction error) can be rewritten as
Equation can be rewritten as
Because of equation, the second summation is zero. Thus, the final expression of the prediction error becomes
We now want to develop a recursive method to solve Topletz matrix.
Initial values:
-
RELP ANALYSIS MEAN SQUARE ERROR:
Figure 2. Original and ½ rate constructed signal
For ½ rates: MSE=0.1891 PSNR=7.2336
Figure 3. Original and ¼ rate constructed signal
For ¼ rate: MSE=0.1898 PSNR=7.2163
The MSE is given by,
MSE={ err2}/N
The difference between the original signal and the reconstructed signal is Error signal, which is denoted as
err.
Power Signal to Noise Ratio:
PSNR=10log10 {[max (A)]/MSE}
Where A=samples of original signal.
Power signal to noise ration compares the level of a desired sgnal to the level of background noise.
-
CONCLUSION
We carried out speech compression for ½ rate and ¼ rate in RELP the quality of compressed signal obtained was nearly same and the MSE and PSNR was also nearly same so with this we can conclude that with ¼ rate of compression also the speech quality produced is very good.
REFERENCES
-
Zarkadis, D.J.; Evans, B.G, Performance considerations of a 9.6kb/s RELP coder IEEE Trans, pp.172-177, August 2002.
-
Katterfeldt, H., A DFT-based residual-excited linear predictive coder IEEE INFOCOM 2003, pp.824-827, January 2003.
-
Katterfeldt, H.; Behl, E., Implementation of a robust RELP speech coder, IEEE 1983, pp.1316-1319.
-
Chong Un; Magill, D., The Residual-Excited Linear Prediction VocoderIEEE comm., pp.1466-1474, jan.2003