Skip to content

prerna250/Audio-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Language-classification

Language Classification Using CNN

Language classification is the technique of classifying languages based on many factors, suchas their historical origin, ancestor languages, linguistic characteristics, or grammatical structures. Convolutional Neural Networks (CNN) is a specific deep neural network design that is frequently used for computer vision, object recognition, and picture classification tasks. A CNN uses a series of convolutional layers to apply a set of filters to the input image in order to extract different features. One or more fully connected layers subsequently carry out the classification task using the extracted features on the output of the convolutional layers. During the training process, the filters in the convolutional layers are learned.

PROBLEM STATEMENT

A dataset of audio clips recorded in different languages is given, the objective is to develop a model which can accurately classify the language spoken in each clip. The model that is created should be able to handle all the different accents and dialects within every language and should be robust to background noise and variations in recording quality. The dataset mayalso contain a variable number of speakers and speaking styles, and the model that is created should be able to generalize well to unseen speakers and speaking styles.

BASIC CONCEPTS

  1. SPECTROGRAM A spectrogram is a graphic depiction of how effectively a signal develops over time at various frequencies. The spectrogram matrix can be further evaluated to obtain relevant classification features. Mel Frequency Cepstral Coefficients (MFCCs), a set of features that capture the spectral attributes of an audio signal and Spectral Contrast, which captures the variations inenergy between adjacent frequency bands, are two common feature extraction techniques.

  2. LIBROSA LIBRARY Librosa is a popular Python package for audio analysis and signal processing. It includes tools for loading and saving audio files in various formats, computing various audio features such as Mel Frequency Cepstral Coefficients (MFCCs) and Chroma features, and generating visualizations such as waveforms and spectrograms.

  3. TENSORFLOW TensorFlow is a dynamic tool for developing and training machine learning models that hasbeen adopted in a variety of applications such as image and speech recognition, natural language processing, and recommendation systems.

  4. CNN Convolutional Neural Networks (CNNs) are neural networks developed using deep learning that are typically used for image and video recognition but may also be utilized for other sorts of data such as audio signals. CNNs are intended to learn features from input data, such as images or audio signals, and then utilize those characteristics to produce predictions or classifications.

About

Language Classification Using CNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published