International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 17 Issue 1 January-March 2026 Submit your research before last 3 days of March to publish your research paper in the issue of January-March.

Deep Hybrid CNN-LSTM Architecture with Mel-MFCC-Chroma Feature Fusion and Attention Mechanism for Enhanced Egyptian Arabic Speech Emotion Recognition

Author(s) Mr. Sajad Muhil Abd
Country Iraq
Abstract Speech Emotion Recognition (SER) for Arabic dialects remains challenging due to small datasets and peculiarities of prosodic features. This paper presents a new deep hybrid CNN-LSTM architecture with an attention mechanism to recognize Egyptian Arabic emotions. With the EYASE dataset we can classify 4 emotions (Angry, Happy, Neutral, Sad), with 93.64 percent accuracy, which is 26.84 percentage point higher than the state-of-the-art. Our model is a joint of 2D convolutional layers used to extract spatial features and bidirectional LSTM networks used to perform temporal modeling and improved with an attention mechanism. We use hybrid features of Mel spectrograms (128 bands), MFCC (40 coefficients) and Chroma features (12 pitch classes). Extensive data augmentation in terms of pitch shifting, time stretching, and SpecAugment facilitates strong results with training data of small size. The model is highly generalized to both unknown speakers (84.0%), and new utterances (87.0%). The work promotes the Egyptian Arabic SER and offers understanding to creating strong systems of low-resource languages.
In addition, to this, extensive ablation research confirms the role of each architectural element and modality to the overall performance. The suggested method is stable in different recording conditions and variability of speakers, which serves as the evidence of its relevance to the real world. This study provides the foundations of further developments to cross-dialect Arabic SER and multilingual emotion-sensitive systems.
Keywords Speech Emotion Recognition, Egyptian Arabic, CNN-LSTM, Attention Mechanism, Deep Learning, Low-Resource Languages
Field Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In Volume 17, Issue 1, January-March 2026
Published On 2026-01-16

Share this