Text extraction tool to convert PDF documents into machine-readable text format
Quickcomm initially was processing telecom PDF invoices manually. With this project they intended to be able to do the pre-processing for PDF data uploads into Quickcomms databases automatically.
Application requirements
Quickcomm initially was manually processing telecom PDF invoices that contained information not available in the normal data feeds from telecom vendors. With this project they intended to be able to do the pre-processing for PDF data uploads into Quickcomms databases automatically. By implementing this, the labor expenses should be reduced and the accuracy and speed of processing ought to be increased. The company was looking for a flexible tool to map and transform the PDF text contents. Previous products were not reliable or unable to process some of the necessary PDF documents and furthermore they were very inflexible.
Customer benefits
Different teams in the accounting department work together to process and load data from invoices that are originally in the PDF format. Others have to pay the invoices, analyze the results and provide reporting to the clients. By using the pdtxt/pdtotxt component the data from PDF documents are easily and efficiently uploaded into the databases. Moreover they are now able to process PDF’s from countries around the world in their original languages. The extracted data is used for further processes, e.g. to pay invoices or to do financial audits and reporting. Thereby Quickcomm benefits from reduced labor expenses, increased accuracy of their data and fast turn-around.
Implementation
Quickcomm started to use the shell tool pdtxt – a part of the PDF Extract Tool – from PDF Tools in combination with pdtotxt to convert PDF documents into machine-readable text format with the necessary transformations.