Make meaning of receipt data using JavaScript fetch() and Space OCR API
As part of a software development curriculum, students at a local software training institute pick a project after covering 3 weeks of vanilla JavaScript content with the requirements that it must manipulate the DOM and make requests to an API.
The receipt reader was born out of boredom and amusement at the pile of receipts accumulated in the span of a year. Yes, there are existing solutions but none of them hosted locally, so lets make one.
Step 1: The Interface : HTML & some client-side JavaScript
The initial part of this project involves creating a simple HTML form with an input whose property “type”is set to “file”.
We can then use JavaScript to read in the passed file when the form is submitted.
let photo = document.getElementById("filename").files[0];
Let us then parse the raw data contained in the photo using a FileReader() object.
if (photo) {
var reader = new FileReader();
reader.onloadend = function () {uploadFile(reader.result)};
reader.readAsDataURL(photo);
}
Now with the contents of the image in hand, we can move to making sense of it.
Step 2 : Process the receipt image
For test data, we shall use receipts from camera images such as the one shown below.
Attempt 1: Using Google Cloud’s OCR APIs
Google Cloud offers a vast library of APIs and a generous 300$ free credits on first sign up including the Google Cloud Vision & Google Cloud Documents APIs, both suitable for the task at hand.
For this task, Google provides a follow along tutorial that involves the following steps:
- Set up your Google Cloud Account & Install Google cli Locally
- Create a Project
- Install the Google Document / Google Vision API in the project
- Create A Service Account
- Set Up a Processor
Honestly, at this point creating this much infrastructure seems like overkill for a first iteration. Change of plans.
Attempt 2 & Final: Re strategize and use OCR Space
A wise man changes their mind — Unknown
With the knowledge of the desired input: a base 64 string image and the output: a JSON, we can try again with less set up. Enter the free OCR API from OCR Space who so graciously provide a free endpoint with no credit card required.
To test the endpoint, we can run the Postman collection provided by OCR Space. Once you have validated that the endpoint works and returns a valid response, we can then proceed to make the JavaScript equivalent. A handy shortcut for this is provided by Postman as a code snippet.
First , create a Header object and add your API key like so:
var myHeaders = new Headers();
myHeaders.append("apikey", "mykey");
Then we can create a FormData object and add the language of the scanned image as well as additional properties.
var formdata = new FormData();
formdata.append("language", "eng");
Because of the grainy and tabular nature of camera images, it would be prudent to scale the images up and parse them as tabular data . For a full list of all the possible properties that can be added to the form data , checkout the description of the POST parameters . Something to also note is that you have multiple OCR engines available. Taking a closer look at the API documentation and it is safe to conclude that OCR engine 3 works best for this task as it offers extra language support, so let us go ahead and set this up
formdata.append("base64Image", blobFile);
formdata.append("scale", "true");
formdata.append("isTable", "true");
formdata.append("OCREngine", "3");
Now to make the magic happen.
fetch("https://api.ocr.space/parse/image", {
method: 'POST',
headers: myHeaders,
body: formdata,
redirect: 'follow'
}).then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
In the spirit of a quick first iteration , from the returned JSON we only need 4 items: Total amount in a receipt , the name of the store , Value Added Tax(VAT ) and the date of the purchase and the image url. For the next step, this information needs to be contained in a data store.
In this case, I use a local db.json file as my “database” and mocked up an API that runs CRUD operations against it using JSON server.
Step 3 : Rendering the received Information
To beautify the initial form ,you can use Tailwind CSS styling using the Mamba Component Library. We also need a table that allows us to display the parsed output of receipts as well as allows us to delete an existing receipt.
The end result is still looks like WIP but works.
You can find the code for this project on GitHub , with updates as I continue to explore the Google Cloud Document API.