Feature Matching Techniques for Speaker Recognition


  • Maharaja Surajmal Institute of Technology, Department of ECE, New Delhi, India


Speaker recognition is a stream of biometric authorization which deals with the automatic identification of individual person using some inherent characteristics of that individual. The last stage of this system is the classification of feature templates generated during the previous stage i.e. feature extraction. This classification stage, also known as feature matching, provides the final decision about the speaker under observation. Hence, it is most important to use appropriate feature matching technique to get the accurate result. There are numerous feature matching techniques which can be used for the purpose. The present work provides an analysis of the various feature matching techniques used in the final step of a speaker recognition system. These techniques can be categorized in Statistical techniques, Soft-computing techniques and hybrid techniques. Statistical techniques include: “Vector Quantization (VQ), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) etc.”, while Soft-computing techniques are “Artificial Neural Network (ANN), Support Vector Machine (SVM) and Fuzzy logic etc.” Hybrid techniques make use of both the above said techniques.


Artificial Neural Network (ANN), Feature Matching, Speaker Recognition, Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Vector Quantization (VQ)

Subject Discipline

Engineering and ECE

Full Text:


Roberto T, Danie P. An overview of speaker identification: Accuracy and robustness issues, IEEE CSM. 2011; 11:23-61.

Soong FK, Rosenberg AE, Juang BH, Rabiner LR. A vector quantization approach to speaker recognition, AT and T Technical J. 1987; 14-26. https://doi.org/10.1002/j.1538-7305.1987.tb00198.x.

Saastamoinen J, Karpov E, Hautamaki V, Franti P. Accuracy of MFCC based speaker recognition in series 60 device, EURASIP J. ASP. 2005; 2816-27.

Reynolds DA, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models, DSP. 2000; 19-41.

Bishop C. Pattern recognition and machine learning. New York: Springer Science+Business Media; 2006.

Saeidi R, Mohammadi H, Ganchev T, Rodman R. Particle swarm optimization for sorted adapted Gaussian mixture models, IEEE Trans. ASLP. 2009; 17(2):344-53. https://doi.org/10.1109/TASL.2008.2010278.

Castaldo F, Colibro D, Dalmasso E, Laface P, Vair C. Compensation of nuisance factors for speaker and language recognition, IEEE Trans. ASLP. 2007; 15(7):969-78. https://doi.org/10.1109/TASL.2007.901823.

Tong R, Ma B, Lee K, You C, Zhu D, Kinnunen T, et al. Fusion of acoustic and tokenization features for speaker recognition, ISCSLP. 2006. https://doi.org/10.1007/11939993_59.

Campbell W, Campbell J, Reynolds D, Singer E, Torres-Carrasquillo P. Support vector machines for speaker and language recognition, CSL. 2006; 210-29. PMid: 16338636.

Karrey FO, DeSilva C. Soft-Computing and Intelligent System Design, Pearson Education. 2006.


  • There are currently no refbacks.