Persian Speech Emotion Recognition (SER)

🔥 Overview

Understanding emotions in speech is crucial for natural human-computer interaction. However, accurately capturing emotions—especially in low-resource languages like Persian—poses significant challenges. This project introduces a multimodal Speech Emotion Recognition (SER) system that integrates acoustic and textual features to enhance emotion classification accuracy.

🎯 Key Contributions

Multimodal Fusion: Combines acoustic and textual features for improved recognition.
Whisper ASR: Converts speech to text for extracting linguistic features.
Modified Differential Evolution (MDE): Optimized feature selection technique.
Self-Attention Mechanism: Enhances the fusion of extracted features.
Deep Learning-based Classification: Leverages CNN and Transformer-based architectures.

📌 Dataset

We evaluate our model on the ShEMO (Sharif Speech Emotion) dataset, a well-established Persian speech emotion corpus.

📊 Model Architecture

🔹 Feature Extraction

Acoustic Features: MFCCs, spectral descriptors, and LLDs
Text Features: Speech-to-text conversion using Whisper ASR, followed by tokenization and embedding

🔹 Feature Selection

Modified Differential Evolution (MDE) optimizes feature selection, reducing dimensionality while improving classification accuracy.

🔹 Classification Model

CNN-1D for acoustic feature extraction.
CNN-2D for textual feature processing.
Self-Attention Mechanism to enhance multimodal fusion.
Final Emotion Classification via a deep learning-based model.

📈 Results

Model	Accuracy
Baseline Acoustic Model	74.5%
Baseline Text Model	76.8%
Proposed Multimodal Model	82.3%

🛠️ Requirements

Python 3.7+
PyTorch / TensorFlow
Hugging Face Transformers
Whisper ASR

📬 Contact

For questions or collaborations, reach out via esmaeilimobina98@gmail.com or open an issue on GitHub!

🎯 Keywords: Speech Processing, Emotion Recognition, Differential Evolution, Multimodal Learning

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
ExtractAllFeatures_Hand.py		ExtractAllFeatures_Hand.py
FeatureSelection_MDE.ipynb		FeatureSelection_MDE.ipynb
Graph2txt.py		Graph2txt.py
README.md		README.md
SVM.py		SVM.py
TEST_Modified_ShEMO.py		TEST_Modified_ShEMO.py
extract_spectrogram_features.py		extract_spectrogram_features.py
modified_shemo.json		modified_shemo.json
multimodal_model.py		multimodal_model.py
sentences_1_bestV1.txt		sentences_1_bestV1.txt
sentences_normalized.txt		sentences_normalized.txt
sentences_test.py		sentences_test.py
shemo.json		shemo.json
speech_features_opensmile.py		speech_features_opensmile.py
speech_model.py.py		speech_model.py.py
text_features.py		text_features.py
text_normalizer.py		text_normalizer.py
vocab.json		vocab.json
w2c_cleanedV2.txt		w2c_cleanedV2.txt
w2c_mergedV2.txt		w2c_mergedV2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian Speech Emotion Recognition (SER)

🔥 Overview

🎯 Key Contributions

📌 Dataset

📊 Model Architecture

🔹 Feature Extraction

🔹 Feature Selection

🔹 Classification Model

📈 Results

🛠️ Requirements

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Persian Speech Emotion Recognition (SER)

🔥 Overview

🎯 Key Contributions

📌 Dataset

📊 Model Architecture

🔹 Feature Extraction

🔹 Feature Selection

🔹 Classification Model

📈 Results

🛠️ Requirements

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages