IndoNewStream

Speech recognition is the ability of a machine or a program to identify words and phrases in spoken language and convert them to a machine-readable format. Research is done for comparing best between Convolutional neural network (CNN) and Basic neural network. Some datasets collected from internet. After elimination of noise, cleaned audio data is fed to CNN and Basic NN and trained through different layers. Finally, the trained model is checked for accuracy, validation accuracy. Trained data is tested with test data to check the accuracy and efficiency of Model.

Introduction:

Speech recognition is playing a major role in most of the fields such as smart phones, TVs, voice call routing, voice dialing, search keywords, simple data entry.
While entering data in datasheets we can use ASR for that we are using microphones to dictate the word. Computer takes lots of time to enter data in particular cell. This problem can be reduced using more and more audio data to train using deep learning algorithms.
Audio data collected from internet is converted into signals using Feature extraction.
CNN is fed with signals .All information is fed to 1st layer of CNN. Second layer is Fed with some extra features.
CNN cannot recognize whole signal so it’s divided into patches. The way this is done by considering the features and lined up with patch signal. One by one pixel are compared and multiplied and then add it and divide it with total number of pixels. It is repeated unless and until all pixels are considered.

Algorithms used:

Convolutional neural network and Basic Neural Network.
CNN has 3 layers
Convolutional layer: It has filters so we get filtered signals. A bunch of features which creates stack of filtered images called as convolutional layer.
Pooling layer: Compression of signal stack is takes place. Maximum value is considered from particular window and it’s reduced.
Rectified linear unit: In this layer if pixel is with negative value then negative values are replaced with zeros. Now it becomes stack with no negative values.
Fully connected layer: In this layer, the output of previous layers is given as input to this layer.

Search This Blog

IndoNewStream

Speech Recognization

Comments

Post a Comment

Popular posts from this blog

New technical trends in data science