Lossy Coding of Speech Signals using Subband Coding

Dr. B. Kirubagari; T. Akilan

doi:10.17577/IJERTCONV3IS16035

TITCON - 2015 (Volume 3 - Issue 16)

Lossy Coding of Speech Signals using Subband Coding

DOI : 10.17577/IJERTCONV3IS16035

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 999
Total Downloads : 12
Authors : Dr. B. Kirubagari, T. Akilan
Paper ID : IJERTCONV3IS16035
Volume & Issue : TITCON – 2015 (Volume 3 – Issue 16)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Lossy Coding of Speech Signals using Subband Coding

Dr. B. Kirubagari

Department of Computer Science and

Engineering Annamalai University

T. Akilan

Department of Computer Science and

Engineering Annamalai University

Abstract-Speech coding is a methodology of representing a digitized speech signal using as few bits as could reasonably be expected, keeping up the quality at the same time. In Ubiquitous environments, analysis and encryption of speech plays a discriminating part in different acoustic- based coding systems. In this work, a new speech coding technique using subband coding is proposed for reducing the memory occupied of the speech signals. The amplitude values of the input is extracted after pre-processing, decomposing and windowing, the values are transformed into frequency domain by applying discrete cosine transform (DCT). The initial 20 coefficients, which holds the maximum content of speech features are seperated and coded using subband coding. To reconstruct the speech signal the signal is retransformed into time domain by applying inverse discrete cosine transform (IDCT). The experiments are conducted using a speech signal at 8 KHz with 16 bits per sample. The signal to noise ratio (SNR) demonstrates the effectiveness of the model used.

Keywords DCT, SNR, Subband, Quantization, Windowing.

milliseconds in the middle of recording and playback), and perceptual quality of the synthesized speech. Narrowband (NB) coding alludes to coding of speech signals whose bandwidth is less than 4 kHz (8 kHz sampling rate), while wideband (WB) coding refers to coding of 7-kHz-bandwidth signals (1416 kHz sampling rate). Subband coders are broadly utilized for high-quality audio coding. The advantage of subband coding is that each band can be coded distinctively and that the coding error in each band can be controlled in relation to human perceptual characteristics.

PROPOSED METHODOLOGY
1. Speech compression
  1. INTRODUCTION
    
    Speech coding is the art of creating a minimally redundant representation of the speech signal that can be efficiently transmitted or stored in digital media, and decoding the signal with the best possible perceptual quality [1]. Today, speech coders have been taken, the essential components are used in telecommunications and in multimedia infrastructures. In the same way as other different signals, however, a sampled speech signal contains a lot of information that is either redundant (nonzero mutual information between successive samples in the signal) or perceptually irrelevant (information that is not perceived by human listeners). Most telecommunications coders are lossy, implying that the synthesized speech is perceptually similar to the original but may be physically dissimilar.
    
    A speech decoder gets coded frames and synthesizes reconstructed speech. Standards typically dictate the inputoutput relationships of both coder and decoder. Speech coders differ fundamentally in bit rate (measured in bits per sample or bits per second), complexity (measured in operations per second), delay (measured in
    
    Figure 1: The Uncoded Speech Signal
    
    Speech compression may be varying the amounts of compression in data according to the sampling rate utilized. This gives distinctive levels of system complexity and compressed quality of speech data. The recorded waveform which is compressed can be transmitted with or without loss. The digital audio data is handled through mixing, filtering and equalization. The speech signal is fed into an encoder that uses fewer bits than original audio data bit rate [2]. This results in reducing the transmission bandwidth of digital audio streams and also reduces storage size of audio files. Compression can be classified into lossy and lossless. Lossy compression is transparent to human perceptibility yet lossless being have a compressing factor from 6 to 1. The uncoded speech signal is shown in the figure1 and figure 2 shows block diagram of Speech Coding using Subband Coding.
    
    Original Speech Signal
    
    Decomposition
    
    Windowing
    
    DCT
    
    IDCT
    
    Subband Codec Quantization
    
    Decompressed Speech Signal
    
    Figure 2: Block diagram of Speech coding using Subband Coding
2. Subband Codec
  
  The procedure of breaking the input speech signals into sub signals using band pass filters and coding each signals independently is called subband coding. To keep the number of samples to be coded at the very least, the sampling rate for the signals in each band is reduced by decimation. Since the band pass filters are not ideal, there is some overlap between nearby bands and aliasing occurs during decimation. Ignoring the distortion or noise due to compression, Quadrature mirror filter (QMF) banks permit the associating that happens filtering and sub sampling at the encoder to be cancelled at the decoder. The codecs used in each band can be PCM, ADPCM, or even an analysis-by-synthesis method. The advantage of subband coding is that each band can be coded differently and that the coding error in each band can be controlled in relation to human perceptual characteristics. Transform coding methods were initially applied to still images however later explored for speech [3,4]. The essential principle is that a piece of speech samples is worked on by a discrete unitary transform and the resulting transform coefficients are quantized and coded for transmission to the recipient. Low bit rates and good performance can be acquired because more bits can be allotted to the perceptually important coefficients. For well-designed transforms, many coefficients require not be coded at all, but are simply discarded, and acceptable performance is still attained to. The distinction between transforms and filter bank methods is somewhat blurred, and the choice between a filter bank implementation and a transform technique may simply be a design choice. Subband encoder and decoder is shown in the figure 3 and 4.
  
  Figure 3: Subband Encoder
  
  Figure 4: Subband Decoder
3. Decomposition
  
  Wavelets decompose a signal into different resolutions or frequency bands. Signal compression is focused around the concept that selecting small number of approximation coefficients and some of the detail coefficients can represent the signal components accurately. Speech is initially investigated to focus the voiced and unvoiced parts of the signal [5]. Decomposition of the voiced part into periodic and aperiodic components is then accomplished by first recognizing the frequency regions of harmonic and noise components in the spectral domain. The signal corresponding to the noise regions is used as a first approximation to the aperiodic component.
4. Windowing
  
  The windows are applied to raw speech frames in order to reduce the spectral leakages effect. For most phonemes the properties of the speech signals remain invariant for a short period of time (5-100 ms). Hence for a short window of time, traditional signal processing methods can be applied relatively successful. A large portion of speech processing, in fact it is carried out in this way by taking short windows (overlapping possibly) and processing them [6]. The short window of signal is called frame. A long signal (of speech for instance or ideal impulse response) is increased with a window function of finite length,
  
  giving limited length weighted (normally) form of the original signal. In speech processing, the shape of the window function is not that crucial but usually some soft window like Hanning, Hamming, triangle, half parallelogram, not wth right angles. In this proposed work we use Hamming-window, the window is enhanced to minimize the maximum (closest) side lobe, providing for a height of about one-fifth that of the Hanning window. This window is zero at the edges and rises gradually to be 1 in the middle. When this window is used the edges of the signal are de-emphasised and the edge effects are reduced. It is important to use a Hamming (or the similar Hann window) in some kinds of analysis, especially the frequency domain methods. The strategy for using overlapping short-time signals and forming the reconstruction by summing partially overlapping frames is called overlap-add- method.
5. DCT/IDCT
  
  The vector value of input speech signal is isolated into smaller frames and arranged into matrix form. DCT operation is performed on the matrix. DCT operation is performed and the elements are sorted in their matrix form to find components and their indices [7,8]. Here 90 values of speech signals are taken and they fed into for further processing to acquire the good perception. The elements are arranged in descending order. After the arrangement has been carried out, the higher values are chosen, then the threshold values are obtained. The coefficients below the threshold values are discarded. Subsequently decreasing the size of the signal which results in compression.
  
  Where,
  
  The original form of data is attained back by using reconstruction process. Then, we perform IDCT operation on the signal. In this manner the signal is reconstructed.
6. Quantization
The sampled analog signal must be converted from a voltage value to a binary number that the computer can read. The conversion from infinitely precise amplitude to a binary number is called quantization. During quantization, the A/D converter uses a finite number of evenly spaced values to represent the analog signal. The number of distinctive values is determined by the number of bits used for the transformation. Typically, the converter chooses the digital value that is closest to the actual sampled value. A device or algorithmic function that performs quantization is known as a quantizer. The round-off error acquainted by quantization is alluded with as quantization error. In analog-to-digital conversion, the contrast between the actual analog value and quantized digital value is called quantization error or quantization distortion [9]. This error is either due to rounding or truncation. The error signal is sometimes modelled as an additional random signal called quantization noise on account of its stochastic behaviour. Quantization is included to some degree in nearly all digital signal processing, as the methodology of representing a signal in digital form ordinarily involves rounding. Quantization also forms the center of basically all lossy compression algorithms. The first 90 values of speech signals can be taken which holds the maximum content of speech signals
EXPERIMENTAL RESULTS AND DISCUSSIONS
Experiments are done on noisy speech signal database (NOIZEUS), which contains noisy signals. The noisy signals are taken in different areas like airport, babble, street, restaurant, exhibition, train, car, station. Noisy is add to the original clean signals, it will processed and finally SNR value will be taken to compares the level of a clean signal to the level of noisy signal [10].

0 dB

Airport

Babble

Exhibition

Restaurant

SNR

Values

2.2829

2.3382

0.8531

2.3078

Street

Station

Train

Car

0.8162

1.7363

0.7987

1.3133

0 dB

Airport

Babble

Exhibition

Restaurant

SNR

Values

2.2829

2.3382

0.8531

2.3078

Street

Station

Train

Car

0.8162

1.7363

0.7987

1.3133

Table 1: SNR values for 0 decibel speech signal

Figure 11: SNR values for 0 dB Speech signals

The SNR is applied to the 0 dB of noisy speech signal, better results will obtained for Airport, Babble and Restaurant.

Table 2: SNR values for 5 decibel speech signal:

5dB

Airport

Babble

Exhibition

Restaurant

SNR

Values

4.7607

4.6134

1.964

4.3242

Street

Station

Train

Car

3.3806

3.1754

2.0885

4.0706

Figure 12: SNR values for 5 dB Speech signals

The SNR is applied to the 5 dB of noisy speech signal, better results will obtained for Airport, Babble, Restaurant and Car.

Table 3: SNR values for 10 decibel speech signal:

10 dB

Airport

Babble

Exhibition

Restaurant

SNR

Value

8.8565

8.602

5.4182

9.0188

Street

Station

Train

Car

9.119

7.7203

4.445

5.9361

Figure 13: SNR values for 10 dB Speech signals

The SNR is applied to the 10 dB of noisy speech signal, better results will obtained for Airport, Babble, Restaurant, Street.

Table 4: SNR values for 15 decibel speech signal:

15 dB

Airport

Babble

Exhibition

Restaurant

SNR

Values

15.1591

12.8307

9.5267

13.754

Street

Station

Train

Car

9.7573

12.0957

8.262

12.2973

Figure 14: SNR values for 15 dB Speech signals

The SNR is applied to the 15 dB noisy speech signal. Except airport noise, the remaining results are in lesser quality while compared with the airport noisy speech signal.
CONCLUSION

Speech coding is an emerging research area and speech compression is a standard for designing and compressing audio and speech signals which are transmitted to the recipient end. This work focuses on developing an efficient speech coding techniques using subband coding. DCT based speech compression approaches are used to produces the better results. This experiment was conducted with the noizeus database. Speech signal is reconstructed from the coded features. We tried out playing the reconstructed speech signal after processing the noisy speech. The subband coding technique is worked and produces an efficient result in these harsh conditions. Few

of them (the more experienced) could understand each word from the corrupted utterance. While hearing the speech, that the clear speech can be achieved by applying subband coding technique.
REFERENCES

[1]. Ulrich Benzler, Student Member, IEEE, Spatial Scalable Video Coding Using a Combined Subband- DCT Approach, IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 7, pp. 1080-1087, October 2000.

[2]. Huijun Ding, Ing Yann Soon, and Chai Kiat Yeo, A DCT-Based Speech Enhancement System With Pitch Synchronous Analysis, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 8, pp. 2614-2623, November 2011.

[3]. K. Satyapriya, Yugandhar Dasari, Performance Analysis Of Speech Coding Techniques, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering Vol. 2, Issue 11, pp. 5725-5732,

November 2013.

[4]. Yang-Jeng Chen, Robert C. Maher, Sub-Band Coding Of Audio Using Recursively Indexed Quantization, pp. 1-4.

[5]. Sheetal D. Gunjal, Dr. Rajeshree D. Raut, Advance Source Coding Techniquesfor Audio/Speech Signal: A Survey, Int.J.Computer Technology & Applications, Vol 3 (4), 1335-1342, August 2012.

[6]. Sangita Roy, Dola B. Gupta, Sheli Sinha Chaudhuri and P. K. Banerjee, Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution, Emerging Trends in Computing and Communication, copyright in Springer India, pp. 11-25, 2014.

[7]. Sorin Dusan, James L. Flanagan, Amod Karve, and Mridul Balaraman, Speech Compression by Polynomial Approximation, IEEE Transactions On Audio, Speech, And Language Processing, VOL. 15, NO. 2, pp. 387- 395, February 2007.

[8]. Chandra R. Murthy, Ethan R. Duni, and Bhaskar D. Rao, High-Rate Vector Quantization for Noisy Channels With Applications to Wideband Speech Spectrum Compression, IEEE Transactions on Signal Processing, vol. 59, no. 11, pp. 5390- 5403, November

2011.

[9]. Serajul Haque, Roberto Togneri, and Anthony Zaknich, An Auditory Motivated Asymmetric Compression Technique for Speech Recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2111- 2124, September

2011.

[10]. Hu, Y. And Loizou, P. Subjective evaluation and compression of speech enhancement algorithms, speech communication, vol.49, no. 7, pp. 588-601, 2007.

0 dB	Airport	Babble	Exhibition	Restaurant
SNR Values	2.2829	2.3382	0.8531	2.3078
	Street	Station	Train	Car
	0.8162	1.7363	0.7987	1.3133

0 dB	Airport	Babble	Exhibition	Restaurant
SNR Values	2.2829	2.3382	0.8531	2.3078
	Street	Station	Train	Car
	0.8162	1.7363	0.7987	1.3133

5dB	Airport	Babble	Exhibition	Restaurant
SNR Values	4.7607	4.6134	1.964	4.3242
	Street	Station	Train	Car
	3.3806	3.1754	2.0885	4.0706

10 dB	Airport	Babble	Exhibition	Restaurant
SNR Value	8.8565	8.602	5.4182	9.0188
	Street	Station	Train	Car
	9.119	7.7203	4.445	5.9361

Lossy Coding of Speech Signals using Subband Coding

Leave a Reply