Faisal Shafait

Title Urdu OCR: A journey from hand-crafting to deep learning

Urdu is the national language of Pakistan and is one of the prominent languages of the Indian subcontinent. It belongs to the family of Nabataean scripts and shares several attributes of other family members like Arabic and Persian. Urdu has posed major challenges to the OCR community due to the diagonal and seamless joining of individual letters to form ligatures. In this talk, I will present the efforts done on recognition of printed Urdu text using traditional computer vision and machine learning approaches over the last two decades. I will further demonstrate how long short-term memory based deep learning architectures have solved this long-standing problem, making it possible to create practical Urdu OCR systems.


Faisal Shafait is currently working as the Director of Deep Learning Laboratory at the National Center of Artificial Intelligence, Islamabad, Pakistan as well as a Professor at the School of Electrical Engineering and Computer Science, National University of Sciences and Technology (NUST), Islamabad, Pakistan. Previously, he was an Assistant Research Professor at The University of Western Australia in Perth, Australia; a Senior Researcher at the German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany; and a Visiting Researcher at Google Inc., Mountain View, California. He received his PhD with the highest distinction in computer engineering from TU Kaiserslautern, Germany in 2008. His research interests include machine learning and pattern recognition with a special emphasis on applications in document image analysis. He has co-authored over 150 publications in international peer-reviewed conferences and journals in this area. He is serving as the Founding President of Pakistan Pattern Recognition Society, which is IAPR’s official chapter in Pakistan. He has recently received IAPR Young Scientist Award, making him the only Pakistani and Muslim scientist to receive this prestigeous award given to the most outstanding young scientist worldwide in the field of pattern recognition and document analysis.