Using Vision Language Models VLM for Arabic Hand-written text recognition

Supervisor Name

Hamed Abdelhaq

Supervisor Email

hamed@najah.edu

University

An-Najah National University

Research field

Data Science

Bio

Hamed Abdelhaq is an Assistant Professor of Computer Science at An-Najah National University, Palestine. He received his PhD in 2016 from Heidelberg University in Germany, where his research focused on spatio-temporal data analysis, social media mining, and event detection. His doctoral thesis, supervised by Prof. Dr. Michael Gertz, explored methods for mining spatio-temporal patterns from social media streams to support the real-time identification of real-world localized events. Hamed earned both his BSc (2005) and MSc (2007) degrees in Computer Science from the University of Jordan. He later served for about three years as a lecturer in the Computer Science Department at An-Najah National University before pursuing his PhD under a DAAD scholarship. His current research interests include the application of Large Language Models (LLMs) and generative AI across various domains, with a particular emphasis on intelligent systems for healthcare. Hamed worked as a senior data analyst at moovel group GmbH that provides a wide range of mobility services. His main role was to build recommendation systems that improve mobility. In addition, he worked remotely as a part-time data mining consultant at SocialDice, USA, with the main goal of building a smart resume ranking system. 

This project explores the adoption of Vision-Language Models (VLMs) for recognizing Arabic hand-written text. By jointly leveraging both visual and linguistic features, the model aims to accurately transcribe handwritten Arabic words and sentences into digital text. The project involves fine-tuning pre-trained VLMs on Arabic handwriting datasets, evaluating their performance, and comparing them with traditional OCR and deep learning approaches to demonstrate the advantages of multimodal learning for complex scripts like Arabic.