Transfer Learning for Automatic Speech Recognition in Low-Resource Languages: A Case Study of Palestinian Arabic

Supervisor Name

Hamed Abdelhaq

Supervisor Email

hamed@najah.edu

University

An-Najah National University

Research field

Data Science

Bio

Hamed Abdelhaq is an Assistant Professor of Computer Science at An-Najah National University, Palestine. He received his PhD in 2016 from Heidelberg University in Germany, where his research focused on spatio-temporal data analysis, social media mining, and event detection. His doctoral thesis, supervised by Prof. Dr. Michael Gertz, explored methods for mining spatio-temporal patterns from social media streams to support the real-time identification of real-world localized events. Hamed earned both his BSc (2005) and MSc (2007) degrees in Computer Science from the University of Jordan. He later served for about three years as a lecturer in the Computer Science Department at An-Najah National University before pursuing his PhD under a DAAD scholarship. His current research interests include the application of Large Language Models (LLMs) and generative AI across various domains, with a particular emphasis on intelligent systems for healthcare. Hamed worked as a senior data analyst at moovel group GmbH that provides a wide range of mobility services. His main role was to build recommendation systems that improve mobility. In addition, he worked remotely as a part-time data mining consultant at SocialDice, USA, with the main goal of building a smart resume ranking system. 

Description

State-of-the-art automatic speech recognition (ASR) systems achieve strong performance in high-resource languages; however, they remain underdeveloped for many Arabic dialects, including Palestinian Arabic. The lack of curated, high-quality speech datasets and the limited number of deployment-oriented studies hinder the advancement of practical, locally deployable speech technologies for applications such as education, legal proceedings, and privacy-sensitive environments. This research aims to develop a carefully curated Palestinian Arabic speech dataset and to systematically investigate methods for adapting and optimizing modern ASR models to enable efficient, real-time, and privacy-aware local deployment. The outcomes of this research enables impactful applications such as: (1) Healthcare and privacy-sensitive environments, where hospitals can preserve patient privacy by enabling locally transcribed voice messages. (2) Live transcription tools for courts and legal environments. (3) A unified Palestinian university audio hub containing searchable lectures and recordings. (4) Foundations for multimodal educational archives supporting Palestinian students and researchers.