Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform

Smita Vatsa; Dr. O. P. Sahu

doi:10.17577/IJERTV1IS5270

Volume 01, Issue 05 (July 2012)

Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform

DOI : 10.17577/IJERTV1IS5270

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 113
Total Downloads : 2744
Authors : Smita Vatsa, Dr. O. P. Sahu
Paper ID : IJERTV1IS5270
Volume & Issue : Volume 01, Issue 05 (July 2012)
Published (First Online): 02-08-2012
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform

Smita Vatsa, Dr. O. P. Sahu

M. Tech (ECE Student ) Professor

Department of Electronicsand Department of Electronics and

Communication Engineering Communication Engineering

NIT Kurukshetra, India NIT Kurukshetra, India

Abstract

Aim of this paper is to explain and implement transform based speech compression techniques. Transform coding is based on compressing signal by removing redundancies present in it. Speech compression (coding) is a technique to transform speech signal into compact format such that speech signal can be transmitted and stored with reduced bandwidth and storage space respectively

.Objective of speech compression is to enhance transmission and storage capacity. In this paper Discrete wavelet transform and Discrete cosine transform based speech compression techniques are implemented with Run length encoding, Huffman encoding and Run length encoding followed by Huffman encoding. Reconstructed speech obtained from implementation are compared on the basis of Compression Factor(CF), Signal to noise ratio (SNR), Peak signal to noise ratio (PSNR), Normalized root mean square error (NRMSE), Retained signal energy (RSE). Results shows that Discrete wavelet transform with RUN length encoding followed by Huffman encoding gives higher compression with intelligible reconstructed speech.

Introduction

Objective of speech is communication whether face to face or cell phone to cell phone. A huge amount of data is a big issue for transmission or storage. Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. Major objective of speech compression is to represent speech with less or few number of bits with level of quality. Speech coding means representing speech signal in an efficient way for transmission and storage such a way it can be

reproduced with desired level of quality. Main approaches of speech compression used today are waveform coding, transform coding and parametric coding. Waveform coding attempts to reproduce input signal waveform at the output. In transform coding at the beginning of procedure signal is transformed into frequency domain, afterwards only dominant spectral features of signal are maintained. In parametric coding signals are represented through a small set of parameters that can describe it accurately. Parametric coders attempts to produce a signal that sounds like original speechwhether or not time waveform resembles the original.

By removing redundancy between neighbouring samples signal can be compressed. In this paper we have implemented compression technique in two step, in 1st step a transform function is applied on speech signal to get result with a new set of data with smaller values and more repetition, 2nd step is coding(compression) step, this step willrepresent the data set in its minimal form by using encoding techniques such as Run Length encoding, Huffman encoding, Run lengthencoding followed by Huffman encoding. Performance measurescompression factor(CF),signal to noise ratio(SNR),peak signal to noise ratio(PSNR),normalized root mean square error(NRMSE), retained signal energy (RSE) is measured for reconstructed speech obtained from DWT and DCT based speech compression techniques. A comparative analysis of performance of transform technique based speech compression systems for 3 different encoding methods is done. In this paper a brief introduction of speech compression technique is presented in section 2, section 2 discusses Transform techniques, in section 3 implementation of speech compression based different transform technique with several encoding methods is explained, section 4 presents simulation results obtained from MATLAB 7.12.0, followed by concluding remarks in section 5.
Overview of transform techniques

In this paper DWT and DCT transforms are exploited for speech compression.
1. Discrete wavelet transform
  
  A wavelet is a waveform of effectively limited duration that has an average value zero. Fundamental idea behind wavelet is to analyse according to scale[4]. Wavelets are functions that
  
  2.2 Discrete cosine transform
  
  We can use DCT for speech compression because of high correlation in adjacent coefficient. Energy compaction property of DCT is good, we can often reconstruct a sequence very accurately from only a few DCT coefficients, this property of DCT is very useful for applications requiring data reduction[7].
  
  Discrete cosine transform of 1-D sequence x(n) of length N is
  
  satisfy certain mathematical requirements and are
  
  2 1/2
  
  N 1
  
  (2n 1)m
  
  used in representing data or other functions.wavelet
  
  X (m)
  
  cm x(n) cos
  
  functions are localized in space. In contrast Fourier sine and cosine functions are non-local and are
  
  N
  
  m0
  
  2N
  
  active for all time t. This localization feature, along with wavelets localization of frequency, makes
  
  Where m=0,1,..,N-1.
  
  The inverse discrete cosine transform is defined as
  
  many functions and operators using wavelets
  
  2 1/2 N 1
  
  (2n 1)m
  
  sparse, when transformed into the wavelet
  
  x(n) c X (m) cos
  
  domain. This sparseness, in turn results in a number
  
  N m
  
  2N
  
  of useful applications such as data compression, detecting features in images and de -noising signals. The Discrete Wavelet Transform (DWT) involves choosing scales and positions based on powers of two , so called dyadic scales and positions. The mother wavelet is rescaled or dilated, by powers of two and translated by integers.Specifically, a function f(t) L2(R) (defines space of square integrable functions) can be represented as[9].
  
  L
  - j
  m0
  
  In both equations cm is defined as Cm= (1/2)1/2 for m=0
  
  1 for m0
METHODOLOGY FOR COMPRESSION OF SPEECH SIGNAL

In this paper we are implementing speech compression technique based on transform method

L i.e. DCT and DWT. When we apply wavelet

f (t) d ( j, k ) (2

j1 k

t k ) a(L, K )(2

k

t kt)ransform technique on speech signal original

Signal can be represented in terms of a wavelet

The function (t) is known as the mother wavelet, while (t) is known as the scaling function. The set of functions

expansion (using coefficients in a linear combination of wavelet function),similarly in case of DCT transform speech can be represented in terms of DCT coefficient. Thus data operation can

2 L(2 L t k); 2 j (2 j t k) | j L, j, k, L Zbe performed using just the corresponding DWT

and DCT coefficients. Transform techniques and

where Z is the set of integers, is an orthonormal basis for L2(R).

The numbers a(L,k) are known as the approximation coefficients at scale L, while d(j,k) are known as the detail coefficients at scale j.

The approximation and detail coefficients can be expressed as:

thresholding does not actually compress a signal, it simply provides information about the signal, which allows the data to be compressed by standard encoding techniques.

Speech compression is achieved by neglecting small coefficients as insignificant data and discarding them and then applying quantization

1

and encoding scheme on coefficients. Speech

a(L, k)

d ( j, k)

f (t)(2 L t k)dt

2

L

1

f (t) (2 j t k)dt

compression algorithm is performed in following steps:

(1). Transform technique on speech signal (2). Thresholding of transformed coefficient.

2 j

wavelets with a high number of vanishing moments lead to a more compact signal representation and are hence useful in coding applications[9].
1. Quantization
2. Encoding
By following above 4 steps we will get a compressed speech signal. For reconstruction of speech signal inverse of above processes are performed i.e. decoding, de-quantization, inverse

transform. 4 different speech signal samples are used for data base. Data base used are in .wav format, sampled at 8khz.

Explanation for compression algorithm:
symbols are arranged in ascending order along with their probabilities. The main computational step in encoding data from this source using a Huffman code is to create a dictionary that associates each data symbol with a codeword [1].

Run length encoding followed by Huffman encoding: Our 3rd approach is to perform run length encoding on truncated and quantized coefficient, and apply Huffman encoding on symbols (data values) and their counts separately. For reconstruction of speech signal inverse process of above mentioned processes i.e. decoding,de- quantization and inverse transform is performed.
Performance Measures

Performance of speech compression techniques based on 2 different transform methods is measured for 3 encoding approaches. Here performance is measured in terms of CR, SNR ,PSNR, NRMSE, and RSE.
1. Compression Factor:
  
  CF= Length of original signal Length of compressed signal
  
  for the compressed signal we have to take into account all the values that would be needed to completely represent the signal.
2. Signal to noise ratio:
  
  ||(x`(n)||2 and ||(x(n)||2 represent energy of reconstructed and original speech signal. RSE indicates the amount of energy retained in the compressed signal as a percentage of the energy of original signal.

SIMULATION RESULTS

Experiments are conducted on 4 different sentences. Here results are obtained using different transform techniques DCT and DWT(db10) and for both case 3 different encoding techniques are exploited. Performance measures obtained from all cases are compared. From table 1 we can see that average Compression Factor obtained from DCT with Run length encoding is 2.05 , in case of DCT with Huffman encoding is 5.56 ,and for DCT with Run length encoding followed by Huffman encoding CF is 5.95. For DWT average Compression Factor is 14.6938 with Run length encoding,

11.432 With Huffman encoding and 29.11 with Run length followed by Huffman encoding. Results shows that we can achieve average compression factor 29.11 when we are implementing speech compression technique using DWT with run length followed by Huffman encoding. Since coefficients obtained from DCT contains very small runs hence

10 2

SNR 10log x

2 2

average compression factor in that case has minimum value 2.05. Other performance measures

e

x

o 2is the mean square of the speech signal and

e

2 is the mean square difference between the original and reconstructed speech signal [3].

Peak signal to noise ratio:

NX 2

PSNR 10 log10 || x x`||2

N is the length of reconstructed signal, X is the maximum absolute square value of signal x and ||x- x`||2 is the energy of the difference between the original and reconstructed signal.
Normalized root mean square error:

x(n) x`(n)2

SNR,PSNR,NRMSE and RSE depends on Transform technique but not on encoding technique.

NRMSE

n

x

x(n) (n)2

n

X(n) is the speech signal, x(n) is reconstructed speech signal and Âµx(n) is the mean of speech signal.
Retained signal energy:

|| x`(n) ||2

RSE

|| x(n) ||2

*100

TABLE I

COMPARISON OF PERFORMANCE BASED ON DIFFERENT TRANSFORM AND DIFFERENT ENCODING METHODS

<>Transform Technique	Speech signal	CF			SNR			PSNR
<>Transform Technique	Speech signal	RLE	Huff	RLE+ Huff	RLE	Huff	RLE+ Huff	RLE	Huff	RLE+ Huff
DWT	S4.wav	12.9249	11.0380	25.7960	1.7525	1.7525	1.7525	20.2607	20.2607	20.2607
	S6.wav	12.3991	10.7182	24.5242	2.1581	2.1581	2.1581	2.1581	23.0731	23.0731
	SP10.wav	12.9249	11.6080	28.3600	1.2334	1.2334	1.2334	13.9402	13.9402	13.9402
	SP15.wav	20.5263	12.362	37.7662	1.3594	1.3594	1.3594	8.3044	8.3044	8.3044
DCT	S4.wav	2.3361	6.1646	7.0613	7.8735	7.8735	7.8735	32.1771	32.1771	32.1771
	S6.wav	2.0868	5.5213	5.7854	8.0589	8.0589	8.0589	33.6532	33.6532	33.6532
	SP10.wav	1.6095	4.6519	4.6111	8.1561	8.1561	8.1561	27.6174	27.6174	27.6174
	SP15.wav	2.1783	5.8974	6.3454	7.8168	7.8168	7.8168	27.2869	27.2869	27.2869

Transform Technique	Speech signal	RSE			NRMSE
Transform Technique	Speech signal	RLE	Huff	RLE+ Huff	RLE	Huff	RLE+ Huff
DWT	S4.wav	65.9925	65.9925	65.9925	0.5847	0.5847	0.5847
	S6.wav	73.1896	73.1896	73.1896	0.5204	0.5204	0.5204
	SP10.wav	50.3752	50.3752	50.3752	0.7060	0.7060	0.7060
	SP15.wav	48.5701	48.5701	48.5701	0.7180	0.7180	0.7180
DCT	S4.wav	98.2396	98.2396	98.2396	0.1483	0.1483	0.1483
	S6.wav	98.4133	98.4133	98.4133	0.1539	0.1539	0.1539
	SP10.wav	98.1177	98.1177	98.1177	0.1462	0.1462	0.1462
	SP15.wav	98.5682	98.5682	98.5682	0.1611	0.1611	0.1611

CONCLUSION

We have concluded from experiments that when in transform based speech compression system we exploit DWT with run length followed by Huffman encoding we can represent speech signal with minimum data values, and reconstructed speech obtained is intelligible hence it is an effective technique for speech compression.
REFERENCES

G. Rajesh, A. kumar and k. Ranjeet, speech compression using different transform techniques,

IEEE 2011
MichelMisiti,YvesMisiti,GeorgesOppenheim,Jean- Michel Poggi,wavelet toolbox for use with matlab.
Najih A. M,,Ramli A. R,Syed A.R., comparing speech compression using wavelets with other speech compression technique, IEEE 2003.
Amara Graps, An introduction to wavelet
Jalal Karam, and RaedSaad ,The Effect of Different Compression Schemes on Speech Signals , WASET 2006
VarunSetia, Vinod Kumar,Coding of DWT Coefficients using Run-length coding and Huffman Coding for the purpose of

Color Image Compression,International Journal of Computer and Communication Engineering 6 2012

[7].www.mathworks.com/help/toolbox/signal/ref/dct.html Cached

[8]W. Kinsner and A. Langi, .Speech and Image Signal Compression withWavelets,. IEEE Wescanex Conference Proceedings, IEEE, New York, NY,1993, pp. 368-375.

[9] Rao N.,ELEC 4801 THESIS PROJECT

Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform

Encoding

Leave a Reply