Home   >   CSC-OpenAccess Library   >    Manuscript Information
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic Features of Speech Recognition In Noisy Environment
Hajer Rahali, Zied Hajaiej, Noureddine Ellouze
Pages - 25 - 37     |    Revised - 31-03-2014     |    Published - 30-04-2014
Volume - 8   Issue - 2    |    Publication Date - April 2014  Table of Contents
MORE INFORMATION
KEYWORDS
Gammachirp Filter, Wavelet Packet, MFCC, Impulsive Noise.
ABSTRACT
Modern automatic speech recognition (ASR) systems typically use a bank of linear filters as the first step in performing frequency analysis of speech. On the other hand, the cochlea, which is responsible for frequency analysis in the human auditory system, is known to have a compressive non-linear frequency response which depends on input stimulus level. It will be shown in this paper that it presents a new method on the use of the gammachirp auditory filter based on a continuous wavelet analysis. The essential characteristic of this model is that it proposes an analysis by wavelet packet transformation on the frequency bands that come closer the critical bands of the ear that differs from the existing model based on an analysis by a short term Fourier transformation (STFT). The prosodic features such as pitch, formant frequency, jitter and shimmer are extracted from the fundamental frequency contour and added to baseline spectral features, specifically, Mel Frequency Cepstral Coefficients (MFCC) for human speech, Gammachirp Filterbank Cepstral Coefficient (GFCC) and Gammachirp Wavelet Frequency Cepstral Coefficient (GWFCC). The results show that the gammachirp wavelet gives results that are comparable to ones obtained by MFCC and GFCC. Experimental results show the best performance of this architecture. This paper implements the GW and examines its application to a specific example of speech. Implications for noise robust speech analysis are also discussed within AURORA databases.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 TechRepublic 
5 Scribd 
6 SlideShare 
7 PdfSR 
Alex Park. Using the gammachirp filter for auditory analysis of speech. May 14, 2003.18.327: Wavelets and Filter banks.
E. Ambikairajah, J. Epps, L. Lin. Wideband speech and audio coding using gammatone filter banks. Proc. ICASSP’01, Salt Lake City, USA, May 2001, vol.2, pp.773-776.
Greenwood, D.D. A cochlear frequency-position function for several species – 29 years later. J.Acous. Soc. Am, Vol. 87, No. 6, Juin 1990.
H. G. Hirsch, D. Pearce. The AURORA Experiment Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Condition. ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium, France,2000.
H.G. Musmann. Genesis of the MP3 audio coding standard. IEEE Trans. on Consumer Electronics, Vol. 52, pp. 1043 – 1049, Aug. 2006.
Irino, T., Patterson R. D. A compressive gammachirp auditory filter for both physiological and psychophysical data. J. Acoust. Soc. Am. Vol. 109, N° 5, Pt. 1, May 2001. pp. 2008-2022.
J. O. Smith III, J.S. Abel. Bark and ERB Bilinear Transforms. IEEE Tran. On speech and Audio Processing, Vol. 7, No. 6, November 1999.
M. Brookes. VOICEBOX: Speech Processing Toolbox for MATLAB. Software, available[Mar, 2011] from,
M. N. Viera, F.R. McInnes, M.A. Jack. Robust F0 and Jitter estimation in the Pathological voices. Proceedings of ICSLP96, Philadelphia, pp.745–748, 1996.
Miller A., Nicely P. E. (1955). Analyse de confusions perceptives entre consonnes anglaises. J. Acous. Soc. Am, 27, 2, (trad Française,Mouton, 1974 in Melher & Noizet,textes pour une psycholinguistique).
P. Rajmic, J. Vlach. Real-time Audio Processing Via Segmented wavelet Transform. 10th International Conference on Digital Audio Effect , Bordeaux, France, Sept. 2007.
P.R. Deshmukh. Multi-wavelet Decomposition for Audio Compression. IE (I) Journal –ET,Vol 87, July 2006.
R.E. Slyh, W.T. Nelson, E.G. Hansen. Analysis of m rate, shimmer, jitter, and F0 contour features across stress and speaking style in the SUSAS database. vol. 4. in Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, pp. 2091-4, Mar. 1999.
S. Mallat. A Theory for multiresolution signal decomposition: Wavelet representation.IEEE Trans. Pattern Analysis and Machine Intelligence. Vol. 11. No. 7 pp 674-693 July 1989.
Salhi.L. Design and implementation of the cochlear filter model based on a wavelet transform as part of speech signals analysis. Research Journal of Applied Sciences2 (4):512-521, 2007 Medwell-Journal 2007.
Stephan Mallat. Une exploitation des signaux en ondelettes. Les éditions de l’école polytechnique.
T. Irino, M. Unoki. An Analysis Auditory Filterbank Based on an IIR Implementation of the Gammachirp. J. Acoust. SocJapan. 20(6): 397-406, November, 1999.
T. Irino, R. D. Patterson. A time-domain, Level-dependent auditory filter: The gammachirp. J. Acoust.Soc. Am. 101(1): 412-419, January, 1997.
T. Irino, R. D. Patterson. Temporal asymmetry in the auditory system. J. Acoust. Soc. Am.99(4): 2316-2331, April, 1997.
WEBER F., MANGANARO L., PESKIN B. SHRIBERG E. Using prosodic and lexical information for speaker identification. Proc. ICASSP, Orlando, FL, May 2002.
Miss Hajer Rahali
National Engineering School of Tunis (ENIT) L aboratory of Systems and Signal Processing (LSTS) BP 37, Le Belvédère, 1002 Tunis - Tunisia
hajer.rahali@enit.rnu.tn
Mr. Zied Hajaiej
National Engineering School of Tunis (ENIT) L aboratory of Systems and Signal Processing (LSTS) BP 37, Le Belvédère, 1002 Tunis - Tunisia
Dr. Noureddine Ellouze
National Engineering School of Tunis (ENIT) Laboratory of Systems and Signal Processing (LSTS) BP 37, Le Belvédère, 1002 Tunis, Tunisie - Tunisie


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS