Assessing and Analyzing Tesseract Based Nepali Script OCR
Keywords:
Optical Character Recognition, Nepali Script, Tesseract, Nepali Font, Character Recognition
Sudan Prajapati - Department of Computer Science, Deerwalk Institute of Technology, Kathmandu, Nepal
Aman Maharjan - Central Department of Computer Science and Information Technology, Tribhuvan
University, Kirtipur, Nepal
Shashidhar Ram Joshi - IOE, Pulchowk Campus
Bikash Balami - Central Department of Computer Science and Information Technology, Tribhuvan
University, Kirtipur, Nepal
Published Date: 2019-04-03
ABSTRACT
Character recognition is commonly referred to as Optical Character Recognition
as it deals with the recognition of optically processed characters. With the advent of digital
optical scanners, a lot of paper- based books, textbooks, magazines, articles, and documents
are being transformed into an electronic version that can be manipulated by a computer.
OCR is an instance of off-line character recognition, where the system recognizes the fixed
static shape of the character. This paper focuses on character recognition of printed text
in Nepali script. This work analyzes the efficiency of Nepalese OCR based on Tesseract
engine. The benchmark of this investigation and analysis is to create the dataset of the
69 different fonts with the 2,484 samples of consonants data of Nepali script. The overall
accuracy of 96% was obtained in the training phase and 69% in the testing phase.
View PDF
Download