Advancing Pashto OCR: Introducing PsOCR and Benchmarking Large Multimodal Models

Optical Character Recognition (OCR) is a cornerstone of digitization, enabling machines to convert scanned documents and images into editable, searchable text. While OCR technology has matured for widely spoken languages, low-resource languages like Pashto—written in a cursive Perso-Arabic script with…

