Emotion Recognition From Speech

Python
Machine Learning
Research
Neural Networks
Artificial Intelligence
Voice Recognition
Robotics
Algorithms
Problem Solving

This project deals with classifying the emotions experienced by a person based on their speech and the intonations in their speech. It was Influenced by the need for emotion recognition for human-robot interaction. Often facial recognition is used to detect the emotion a person might be experiencing. However, there are certain situations when a machine like a robot may not have access to a visual data stream. In such a case, a robot could use a person's speech and speech intonation to detect their emotion/mood. This knowledge could then be used to tailor a robot's response to the person during human-robot interaction. This project used an MTCC conversion to convert an audio sample into an image representation. This image was then run through a ResNet using Convolutional Neural Networks and a custom created Convolutional Network to compare and contrast the accuracy of emotion detection. The SAVEE, RAVDESS and TESS Datasets were used for training data and the emotions categorized were Calm, Happy, Sad, Angry, Fearful, Surprise, Disgust. The ResNet model was able to achieve a validation accuracy of 83.62% and the Convolutional Model achieved a validation accuracy of 82.12%.

0 Lifts 

Artifacts

Name Description
Project Presentation A presentation detailing the project including the problem statement, purpose, data sets, results, conclusion and future steps.   Download
Raw Code The raw code for the project without any of the data-sets or outputs.   Download