Ajou University repository

Multi-lingual optical character recognition system using the reinforcement learning of character segmenteroa mark
  • Park, Jaewoo ;
  • Lee, Eunji ;
  • Kim, Yoonsik ;
  • Kang, Isaac ;
  • Koo, Hyung Il ;
  • Cho, Nam Ik
Citations

SCOPUS

33

Citation Export

DC Field Value Language
dc.contributor.authorPark, Jaewoo-
dc.contributor.authorLee, Eunji-
dc.contributor.authorKim, Yoonsik-
dc.contributor.authorKang, Isaac-
dc.contributor.authorKoo, Hyung Il-
dc.contributor.authorCho, Nam Ik-
dc.date.issued2020-01-01-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://dspace.ajou.ac.kr/dev/handle/2018.oak/31682-
dc.description.abstractIn this article, we present a new multi-lingual Optical Character Recognition (OCR) system for scanned documents. In the case of Latin characters, current open source systems such as Tesseract provide very high accuracy. However, the accuracy of the multi-lingual documents, including Asian characters, is usually lower than that for Latin-only documents. For example, when the document is the mix of English, Chinese and/or Korean characters, the OCR accuracy is lowered than English-only because the character/text properties of Chinese and Korean are quite different from Latin-type characters. To tackle these problems, we propose a new framework using three neural blocks (a segmenter, a switcher, and multiple recognizers) and the reinforcement learning of the segmenter: The segmenter partitions a given word image into multiple character images, the switcher assigns a recognizer for each sub-image, and the recognizers perform the recognition of assigned sub-images. The training of recognizers and switcher can be considered traditional image classification tasks and we train them with a supervised learning method. However, the supervised learning of the segmenter has two critical drawbacks: Its objective function is sub-optimal and its training requires a large amount of annotation efforts. Thus, by adopting the REINFORCE algorithm, we train the segmenter so as to optimize the overall performance, i.e., we minimize the edit distance of final recognition results. Experimental results have shown that the proposed method significantly improves the performance for multi-lingual scripts and large character set languages without using character boundary labels.-
dc.description.sponsorshipThis work was supported in part by the Ministry of Science and ICT (MSIT), South Korea, through the Information Technology Research Center (ITRC) Support Program supervised by the Institute for Information and communications Technology Planning and Evaluation (IITP) under Grant IITP-2020-2020-0-01461.-
dc.language.isoeng-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.subject.meshCharacter boundaries-
dc.subject.meshCharacter images-
dc.subject.meshLarge character set-
dc.subject.meshObjective functions-
dc.subject.meshOpen source system-
dc.subject.meshOptical character recognition (OCR)-
dc.subject.meshOptical character recognition system-
dc.subject.meshSupervised learning methods-
dc.titleMulti-lingual optical character recognition system using the reinforcement learning of character segmenter-
dc.typeArticle-
dc.citation.endPage174448-
dc.citation.startPage174437-
dc.citation.titleIEEE Access-
dc.citation.volume8-
dc.identifier.bibliographicCitationIEEE Access, Vol.8, pp.174437-174448-
dc.identifier.doi10.1109/access.2020.3025769-
dc.identifier.scopusid2-s2.0-85096537095-
dc.identifier.urlhttp://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639-
dc.subject.keywordDeep learning-
dc.subject.keywordDocument analysis-
dc.subject.keywordOptical character recognition-
dc.description.isoatrue-
dc.subject.subareaComputer Science (all)-
dc.subject.subareaMaterials Science (all)-
dc.subject.subareaEngineering (all)-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

 KOO, HYUNG IL Image
KOO, HYUNG IL구형일
Department of Electrical and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.