Deerwalk

JOURNAL OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY

Assessing and Analyzing Tesseract Based Nepali Script OCR

Keywords: Optical Character Recognition, Nepali Script, Tesseract, Nepali Font, Character Recognition
Sudan Prajapati - Department of Computer Science, Deerwalk Institute of Technology, Kathmandu, Nepal
Aman Maharjan - Central Department of Computer Science and Information Technology, Tribhuvan University, Kirtipur, Nepal
Shashidhar Ram Joshi - IOE, Pulchowk Campus
Bikash Balami - Central Department of Computer Science and Information Technology, Tribhuvan University, Kirtipur, Nepal
Published Date: 2019-04-03

ABSTRACT
Character recognition is commonly referred to as Optical Character Recognition as it deals with the recognition of optically processed characters. With the advent of digital optical scanners, a lot of paper- based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. OCR is an instance of off-line character recognition, where the system recognizes the fixed static shape of the character. This paper focuses on character recognition of printed text in Nepali script. This work analyzes the efficiency of Nepalese OCR based on Tesseract engine. The benchmark of this investigation and analysis is to create the dataset of the 69 different fonts with the 2,484 samples of consonants data of Nepali script. The overall accuracy of 96% was obtained in the training phase and 69% in the testing phase.


View PDF Download