Optical Character Recognition (OCR) is a cornerstone of digitization, enabling machines to convert scanned documents and images into editable, searchable text. While OCR technology has matured for widely spoken languages,...
Read More
Research on Several Key Technologies of NLP for Low Resource Pashto Language
December 31, 2023
Online Social Networks (OSNs) have revolutionized communication but also brought challenges like hate speech, cyberbullying, and offensive content. While Natural Language Processing (NLP) helps detect such abuse, most research focuses...
Read More
Detecting Offensive Language in Pashto: A Breakthrough in NLP for Low-Resource Languages
October 18, 2023
In the digital age, social media platforms are flooded with offensive content, posing challenges for maintaining a healthy online environment. While significant progress has been made in detecting toxic language...
Read More
Correction of Whitespace and Word Segmentation in Noisy Pashto Text using CRF
August 14, 2023
Pashto, a low-resource language spoken by millions, presents unique challenges in NLP, particularly in word segmentation. Unlike English, where whitespace reliably marks word boundaries, Pashto uses whitespace inconsistently, leading to...
Read More
NLPashto: NLP Toolkit for Low-resource Pashto Language
June 15, 2023
Natural Language Processing (NLP) has revolutionized communication and technology, but many low-resource languages, like Pashto, remain underserved. Pashto, spoken by over 50 million people, lacks essential NLP tools and resources....
Read More
