Redirigiendo al acceso original de articulo en 15 segundos...
Inicio  /  Applied System Innovation  /  Vol: 2 Par: 4 (2019)  /  Artículo
ARTÍCULO
TITULO

Business Process Automation: A Workflow Incorporating Optical Character Recognition and Approximate String and Pattern Matching for Solving Practical Industry Problems

Coenrad de Jager and Marinda Nel    

Resumen

Companies are relying more on artificial intelligence and machine learning in order to enhance and automate existing business processes. While the power of OCR (Optical Character Recognition) technologies can be harnessed for the digitization of image data, the digitalized text still needs to be validated and enhanced to ensure that data quality standards are met for the data to be usable. This research paper focuses on finding and creating an automated workflow that can follow image digitization and produce a dictionary consisting of the desired information. The workflow introduced consists of a three-step process that is implemented after the OCR output has been generated. With the introduction of each step, the accuracy of key-value matches of field names and values is increased. The first step takes the raw OCR output and identifies field names using exact string matching and field-values using regular expressions from an externally maintained file. The second step introduces index pairing that matches field-values to field names based on the location of the field name and value on the document. Finally, approximate string matching is introduced to the workflow, which increases accuracy. By implementing these steps, the F-measure for key-value pair matches is measured at 60.18% in the first step, 80.61% once index pairing is introduced, and finally 90.06% after approximate string matching is introduced. The research proved that accurate usable data can be obtained automatically from images with the implementation of a workflow after OCR.

 Artículos similares

       
 
Timotej Jagric and Alja? Herman    
This paper presents a broad study on the application of the BERT (Bidirectional Encoder Representations from Transformers) model for multiclass text classification, specifically focusing on categorizing business descriptions into 1 of 13 distinct industr... ver más
Revista: Information

 
Abdulghafour Mohammad and Job Mathew Kollamana    
One of the main obstacles in software development projects is requirement volatility (RV), which is defined as uncertainty or changes in software requirements during the development process. Therefore, this research tries to understand the underlying fac... ver más
Revista: Informatics

 
Martin Krajcovic, Gabriela Gabajová, Martin Ga?o and Marek Schickerle    
The Demand-Driven Material Resource Planning (DDMRP) method is one of the newer methods of inventory management in an enterprise. Its creation was initiated by a change in the business environment and the characteristics of today?s supply chains. DDMRP b... ver más
Revista: Applied Sciences

 
Chen Li, Yinxu Lu, Yong Bian, Jie Tian and Mu Yuan    
The quality and safety of agricultural products involve a variety of risk factors, a large amount of risk information data, and multiple circulation and disposal processes, making it difficult to accurately trace the source of risks. To achieve precise t... ver más
Revista: Applied Sciences

 
José Brás, Ruben Pereira and Sérgio Moro    
Robotic process automation and intelligent process automation have gained a foothold in the automation of business processes, using blocks of software (bots). These agents interact with systems through interfaces, replacing human intervention with the ai... ver más
Revista: Information