AUTHORS: Ponlawat Khamlae, Kingkarn Sookhanaphibarn and Worawat Choensawat

ABSTRACT: This paper presents a methodology for automated data entry of salary payslips from document images. The challenging problems are 1) the payslips vary from one company to another, and 2) the appeared wording terms are different but similar meaning terms. The proposed methodology is the essential preprocess by using image processing and regular expression setting before an optical character recognition or OCR. The post-process for number validation must be considered by checking the financial formula.

Keywords: -

LINK: https://ieeexplore.ieee.org/document/9754842

REFERENCES: 

[1]   C.-H. Chang, C.-N. Hsu and S.-C. Lui, "Automatic information extraction from semi-structured web pages by pattern discovery", Decision Support Systems, vol. 35, no. 1, pp. 129-147, 2003.
[2]   Z. Huang, K. Chen, J. He, X. Bai, D. Karatzas, S. Lu, et al., "Ic-dar2019 competition on scanned receipt ocr and information extraction", 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516-1520, 2019.
[3]   N. Awalgaonkar, P. Bartakke and R. Chaugule, "Automatic license plate recognition system using ssd", 2021 International Symposium of Asian Control Association on Intelligent Robotics and Industrial Automation (IRIA), pp. 394-399, 2021.
[4]   H. Shruthi, H. Latha, J. Lakshmi, D. Kumar, G. Babu and P. K, "An experiment analysis on tracking and detecting the vehicle speed using machine learning and iot", 2021 Smart Technologies Communication and Robotics (STCR), pp. 1-5, 2021.