Department of Electrical and Electronics Engineering

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1925

Browse

Search Results

Now showing 1 - 4 of 4

Multiresolution Features Based Polynomial Kernel Discriminant Analysis for Speaker Recognition
(IEEE, 2009) Ajmera, Pawan K.
This paper describes polynomial kernel subspace approach to speaker recognition systems. Auditory motivated wavelet packet transform is used to derive the desirable speaker features. The nonlinear mapping between the input space and the feature space is implicitly performed using the kernel trick. This nonlinear mapping increases the discrimination capability of a pattern classifier. The use of Mel-scale based and Bark-scale based wavelet packet trees for feature extraction process adds human auditory perception behavior to enhance the classification performance. Experimental results show that the proposed kernel based technique is computationally efficient and performs well with less training data.
Speaker Recognition Using Auditory Features and Polynomial Classifier
(International Journal of Computer Applications, 2010-02) Ajmera, Pawan K.
This paper presents a speaker recognition method which makesuse of auditory features and polynomial classifier for speakerrecognition. Auditory features based on an auditory peripherymodel extract significant speaker characteristics. Polynomialclassifier has been used to accomplish speaker recognition task.Polynomial classifier has several advantages over theconventional classifiers such as computational scalability with thenumber of speakers, discriminative training allowing it to use outof class data and the statistical interpretation of scoring allowing it to combine with HMM and GMM. This approach achievessubstantial performance improvement in a speaker identificationtask compared with state-of-the-art in a wide range of signal tonoise conditions.
Robust feature extraction from spectrum estimated using bispectrum for speaker recognition
(Springer, 2012-06) Ajmera, Pawan K.
Extraction of robust features from noisy speech signals is one of the challenging problems in speaker recognition. As bispectrum and all higher order spectra for Gaussian process are identically zero, it removes the additive white Gaussian noise while preserving the magnitude and phase information of original signal. The spectrum of original signal can be recovered from its noisy version using this property. Robust Mel Frequency Cepstral Coefficients (MFCC) are extracted from the estimated spectral magnitude (denoted as Bispectral-MFCC (BMFCC)). The effectiveness of BMFCC has been tested on TIMIT and SGGS databases in noisy environment. The proposed BMFCC features yield 95.30 %, 97.26 % and 94.22 % speaker recognition rate on TIMIT, SGGS and SGGS2 databases, respectively for 20 dB SNR whereas these values for 0 dB SNR are 45.84 %, 50.79 % and 44.98 %. The experimental results show the superiority of the proposed technique compared to conventional methods for all databases.
Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
(Elsevier, 2011-11) Ajmera, Pawan K.
This paper presents a new feature extraction technique for speaker recognition using Radon transform (RT) and discrete cosine transform (DCT). The spectrogram is compact, efficient in representation and carries information about acoustic features in the form of pattern. In the proposed method, speaker specific features have been extracted by applying image processing techniques to the pattern available in the spectrogram. Radon transform has been used to derive the effective acoustic features from the speech spectrogram. Radon transform adds up the pixel values in the given image along a straight line in a particular direction and at a specific displacement. The proposed technique computes Radon projections for seven orientations and captures the acoustic characteristics of the spectrogram. DCT applied on Radon projections yields low dimensional feature vector. The technique is computationally efficient, text-independent, robust to session variations and insensitive to additive noise. The performance of the proposed algorithm has been evaluated using the Texas Instruments and Massachusetts Institute of Technology (TIMIT) and our own created Shri Guru Gobind Singhji (SGGS) databases. The recognition rate of the proposed algorithm on TIMIT database (consisting of 630 speakers) is 96.69% and for SGGS database (consisting of 151 speakers) is 98.41%. These results highlight the superiority of the proposed method over some of the existing algorithms.

Department of Electrical and Electronics Engineering

Browse

Filters

Settings

Sort By

Results per page

Search Results