Melody Extraction from Polyphonic Music Signal using STFT and Fanchirp Transform

Sridevi S.   H.; S.   R.   Gulhane

doi:10.17577/IJERTV4IS060928

Volume 04, Issue 06 (June 2015)

Melody Extraction from Polyphonic Music Signal using STFT and Fanchirp Transform

DOI : 10.17577/IJERTV4IS060928

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 168
Total Downloads : 357
Authors : Sridevi S. H. , S. R. Gulhane
Paper ID : IJERTV4IS060928
Volume & Issue : Volume 04, Issue 06 (June 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS060928
Published (First Online): 27-06-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Melody Extraction from Polyphonic Music Signal using STFT and Fanchirp Transform

Sridevi S. H. Department of E&TC DYPSOEA

Ambi, Talegaon, Pune,India

Prof. S. R. Gulhane Department of E&TC DYPCOE

Ambi, Talegaon,Pune,India

Abstract Music is an art form whose medium is sound. It includes various attributes like rhythm, melody, timber etc. The term melody is a musicological concept based on the judgment of human listeners .Melody extraction from polyphonic music is a difficult task in music information retrieval. In melody identification stage, the main job is to find the vocal melody. In a polyphonic music two or more notes can sound simultaneously, be it different instruments or a single instrument capable of playing more than one note at a time. The main aim of melody extraction is to produce a sequence of frequency values corresponding to the pitch of the dominant melody present in the given musical recording. In this paper the melody is extracted from polyphonic music signal by using Short Time Fourier Transform and Fan Chirp Transform.

KeywordsMIR, STFT, Fan chirp Transform(FCht), Melody, multiptch, Poyphonic,FFT

I.INTRODUCTION

The development in the field of music information retrieval (MIR) has created a need for indexing systems that automatically extract semantic descriptions from music signals. This description would typically include melodic, tonal, timbral, and rhythmic information. So far, the scientific community has mostly focused on the extraction of melodic and tonal information (multipitch estimation, melody transcription, chords, and tonality recognition) but also to a lesser extent on the estimation of the main rhythmic structure. Most of the time the concept of melody is associated to a sequence of pitch notes. This definition can be found: A combination of a pitch series and a rhythm having a clearly defined shape (Solomon 1996), and on Grove Music: Pitched sounds arranged in musical time in accordance with given cultural conventions and constraints (Groves online 2002). Multiple fundamental frequency (f0) estimation is one of the most important problems in music signal analysis and constitutes a fundamental step in several applications such as melody extraction.In this paper an effort has been made to extract melody from the polyphonic music signal using Fan chirp transform.

RELATED WORKS ON MELODY

EXTRACTION.

Pitch detection algorithms (PDAs) in audio signal processing, especially in speech processing, have been an active topic of research since the late twentieth century. A comprehensive review of the early approaches to pitch

detection in speech signals is provided in (Hess, 1983) and a comparative evaluation of pitch detection algorithms in speech signals is provided in (Rabiner, Cheng, Rosenberg, & Mc Gonegal, 1976). A more recent review of previous approaches to pitch detection in speech and music signals is provided in (Hess, 2004). The general recent consensus is that pitch detection or tracking for monophonic signals (speech or music) is practically a solved problem and most state-of-the-art approaches yield high quality and acceptable solutions (Hess, 2004; Klapuri, 2004). The problem of melody extraction from polyphony is different from the monophonic speech pitch detection problem in two major aspects:

Multiple sound sources (pitched and unpitched) are usually simultaneously present.
The characteristics of the target source (here the singing voice) are a larger pitch range, more dynamic variation, and more expressive content than normal speech.

Table 1. Principle melody transcription algorithms.

System	Front end	No. of Pitches	Voicing
Dressler[6]	\|STFT\|+sines	5	Melody + local threshold
Marolt [23]	\|STFT\|+sines	> 2	Melody grouping
Goto [14]	Hier.\|STFT\|+sines	> 2	Continuous
Poliner[27]	\|STFT\|	1	Global Threshold

The second column, Front end, concerns the initial signal processing applied to input audio to reveal the pitch content. The most popular technique is to take the magnitude of the short-time Fourier transform (STFT) the Fourier transform of successive, windowed, snippets of the original waveform denoted |STFT| in the table, and commonly visualized as the spectrogram. |STFT| is invariant to relative or absolute time or phase shifts in the harmonics because the STFT phase is discarded. Frequency resolution of the STFT improves with temporal window length, these systems tend to use long windows, from 46 ms for Dressler, to 128 ms for Poliner. Goto uses a hierarchy of STFTs to achieve a multiresolution Fourier analysis, downsampling his original 16 kHz audio through 4 factor-of-2 stages to have a 512 ms window at his lowest 1 kHz sampling rate.

The final column, Voicing, considers how, specifically, the systems distinguish between intervals where the melody is present and those where it is silent (gaps between melodies).Goto reports his best pitch estimate at every frame and do not admit gaps. Poliners basic pitch extraction engine is also continuous, but this is then gated by a separate melody detector; a simple global energy threshold over an appropriate frequency range was reported to work as well as a more complex scheme based on a trained classifier. As discussed above, the selection of notes or fragments in Dressler naturally leads to gaps where no suitable element is selected; Dressler augments this with a local threshold to discount low-energy notes.

PROPOSED METHODOLOGY.
The Fan Chirp transform provides an insight representation of harmonically related linear chirp signals. It can be considered as time wrapping followed by a Fourier transform[2]. In this paper FCht is applied to the analysis of pitch content in polyphonic music signal. A F0gram is calculated based on collecting harmonically related peaks of the FCht.The number of valid f0 values in the frame is calculated. Considering a masking function given by the valid pitches a correct estimate of near boundaries are estimated. The f0 parameters are chosen as ;the minimum fundamental frequency to be 80Hertz, the number of octaves to be equal to 4 and the number of f0s per octave is taken to be 192.The 3 most salient f0gram peaks are selected as pitch candidates to form pitch contours are considered as main melody.
RESULT AND DISCUSSION

Melody extraction has become an increasing research topic area.In this paper a novel way of extracting melody from the polyphonic music signal is described. The technique is based on STFT and FCht. The FCht provides salient information about the non stationary signal like music signal. This technique is based on current pitch salience representation called f0gram. The result obtained is shown in Fig 4.1, 4.2, 4.3. Grouping the F0gram peaks into contours involves the determination of where does a contour starts and when does it ends, necessarily leaving some time intervals without melody estimation. This is avoided when isolated F0gram peaks are considered as main melody estimates, since for every melody labeled frame there is always pitch estimation. Therefore, this performance measure can be considered as a best possible reference.

Fig 4.1 Spectrogram of the polyphonic music signal considered.

Fig 4.2 Melodic content visulasation

Fig 4.3 Colour representation of melodic content visulasation
CONCLUSION

At an abstract level, the benefits of common, standardized evaluation are clearly shown by this effort and analysis. In this paper a system for automatically extracting the main melody of a polyphonic piece of music from its audio signal is described.Melody extraction has many advantages, it can be used in Query by humming, music de- soloing, music retrieval , music classification and so on.
ACKNOWLEDGEMENT

I take this opportunity to express my deep heartfelt gratitude to all those people who have helped me in the successful completion of the paper. First and foremost, I would like to express my sincere gratitude towards my guide Prof .S.R. Gulhane for providing excellent guidance, encouragement. Without his valuable guidance, this work would never have been a successful one.I would like to express my sincere gratitude to our Head of the Department of Electronics & Communication Engineering, Prof. Santosh G.Bari for his guidance and inspiration. I would like to thank our Principal Dr.V.N. Nitnaware for providing all the facilities and a proper environment to work in the college campus.
REFERENCES

Justin Salamon, Emilia GÃ³mez, Daniel P.W. Ellis, and GaÃ«l Richard, Melody extraction from polyphonic music signal, IEEE signal processing magazine, March 2014, date of publication ,Feburary 12.
Pablo Cancela , Ernesto LÃ³pez, MartÃn Rocamora, Fan chirp transform for music representation, Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria

, September 6-10, 2010.
Olivier Gillet Sand GaÃ«l Richard, Transcription and Separation of Drum Signals From Polyphonic Music, IEEE transactions on audio, speech, and language processing, vol. 16, no. 3, march 2008.
G. Poliner, D. Ellis, A. Ehmann, E. GÂ´omez, S. Streich, and B. Ong. Melody transcription from music audio: Approaches and evaluation. IEEE Tr. Audio, Speech, Lang. Proc., 14(4):1247 1256, May 2007.
Gael Richard, Melody Extraction from Polyphonic Music Signals, International Workshop on Acoustic Signal Enhancement (IWAENC 2014) Sept. 11th, 2014.
Jean-Louis Durrieu, GaÃ«l Richard, Bertrand David, and CÃ©dric FÃ©votte, Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals, IEEE transactions on audio, speech, and language processing, vol. 18, no. 3, march 2010.

Melody Extraction from Polyphonic Music Signal using STFT and Fanchirp Transform

Leave a Reply