International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 17 Issue 1
January-March 2026
Indexing Partners
Deep Hybrid CNN-LSTM Architecture with Mel-MFCC-Chroma Feature Fusion and Attention Mechanism for Enhanced Egyptian Arabic Speech Emotion Recognition
| Author(s) | Mr. Sajad Muhil Abd |
|---|---|
| Country | Iraq |
| Abstract | Speech Emotion Recognition (SER) for Arabic dialects remains challenging due to small datasets and peculiarities of prosodic features. This paper presents a new deep hybrid CNN-LSTM architecture with an attention mechanism to recognize Egyptian Arabic emotions. With the EYASE dataset we can classify 4 emotions (Angry, Happy, Neutral, Sad), with 93.64 percent accuracy, which is 26.84 percentage point higher than the state-of-the-art. Our model is a joint of 2D convolutional layers used to extract spatial features and bidirectional LSTM networks used to perform temporal modeling and improved with an attention mechanism. We use hybrid features of Mel spectrograms (128 bands), MFCC (40 coefficients) and Chroma features (12 pitch classes). Extensive data augmentation in terms of pitch shifting, time stretching, and SpecAugment facilitates strong results with training data of small size. The model is highly generalized to both unknown speakers (84.0%), and new utterances (87.0%). The work promotes the Egyptian Arabic SER and offers understanding to creating strong systems of low-resource languages. In addition, to this, extensive ablation research confirms the role of each architectural element and modality to the overall performance. The suggested method is stable in different recording conditions and variability of speakers, which serves as the evidence of its relevance to the real world. This study provides the foundations of further developments to cross-dialect Arabic SER and multilingual emotion-sensitive systems. |
| Keywords | Speech Emotion Recognition, Egyptian Arabic, CNN-LSTM, Attention Mechanism, Deep Learning, Low-Resource Languages |
| Field | Computer > Artificial Intelligence / Simulation / Virtual Reality |
| Published In | Volume 17, Issue 1, January-March 2026 |
| Published On | 2026-01-16 |
Share this

CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.