Using Vision Language Models VLM for Arabic Hand-written text recognition
Supervisor Name
Hamed Abdelhaq
Supervisor Email
hamed@najah.edu
University
An-Najah National University
Research field
Data Science
Bio
Hamed Abdelhaq is an Assistant Professor of Computer Science at An-Najah National University, Palestine. He received his PhD in 2016 from Heidelberg University in Germany, where his research focused on spatio-temporal data analysis, social media mining, and event detection. His doctoral thesis, supervised by Prof. Dr. Michael Gertz, explored methods for mining spatio-temporal patterns from social media streams to support the real-time identification of real-world localized events. Hamed earned both his BSc (2005) and MSc (2007) degrees in Computer Science from the University of Jordan. He later served for about three years as a lecturer in the Computer Science Department at An-Najah National University before pursuing his PhD under a DAAD scholarship. His current research interests include the application of Large Language Models (LLMs) and generative AI across various domains, with a particular emphasis on intelligent systems for healthcare. Hamed worked as a senior data analyst at moovel group GmbH that provides a wide range of mobility services. His main role was to build recommendation systems that improve mobility. In addition, he worked remotely as a part-time data mining consultant at SocialDice, USA, with the main goal of building a smart resume ranking system.
This project explores the adoption of Vision-Language Models (VLMs) for recognizing Arabic hand-written text. By jointly leveraging both visual and linguistic features, the model aims to accurately transcribe handwritten Arabic words and sentences into digital text. The project involves fine-tuning pre-trained VLMs on Arabic handwriting datasets, evaluating their performance, and comparing them with traditional OCR and deep learning approaches to demonstrate the advantages of multimodal learning for complex scripts like Arabic.
