关键词:
Register Transfer Level-Software Defined Radio
Speech/Music classification
Real-time system
Multimedia signal processing
EFFICIENT IMPLEMENTATION
SPEECH/MUSIC CLASSIFIER
MUSIC
DISCRIMINATION
FEATURES
TRACKING
摘要:
This work introduces a real-time system to automate the selection of radio stations based on the listener's preference (either speech/music) by analyzing the incoming audio signals using a speech/music classifier (SMC) using machine learning approaches. Radio Frequency data from different Frequency Modulated (FM) stations are directly read in MATLAB using the Communications Toolbox Support Package for Register Transfer Level-Software Defined Radio (RTL-SDR). Further, the work is divided into two phases. Initially, the efficiency of different state-of-the-art features for designing a speech/music classifier is studied on new speech/music corpora developed for Indian radio stations. Box plots and Region of Convergence (ROC) plots were used to study feature importance. The learning from the experiments was used to select optimum sets of features for efficient working of the real-time model. The models were tested for both offline and online test data. The best-performing features (Mel-Frequency Cepstral Coefficients (MFCC) + Variance Spectral Roll-off) vectors were then concatenated to obtain the best classification accuracy of 93.06% and 83.91% for offline and realtime data, respectively, using Gaussian Mixture Model (GMM) classifier. We also studied the efficiency of recently proposed Empirical Mode decomposition (EMD)-based statistical and Hilbert Spectrum-based features on both standard (Slaney, GTZAN, Musan dataset) and newly created datasets. We achieved an overall accuracy of 94.91%, 93.20%, 91.72%, and 95.40% for SS, GTZAN, MUSAN, and BITM datasets respectively. However, concatenating several features increased the latency of the algorithm. This work also proposes a new speech/music corpus based on recordings from Indian radio stations in the Hindi language.