Optical Character Recognition (“OCR”) in a Ember & Electron App

A good friend and myself recently started a digital recipe book called Itamae. With an interest in turning Itamae into a desktop application and integrating Optical Character Recognition we decided to use Ember and Electron for the build. Because it was our first time using these technologies, we thought it would be helpful to keep track of the experience to help others to get started. The following is an ongoing reflection…

Itamae shell from day 1:

itamae ember electron ocr recipe book

What the… Ember, Electron, and OCR?

Ember

Coming from a Ruby and Rails background, Ember offers a sense of comfort that has been lacking in the raw JavaScript and jQuery applications I’ve worked on recently. Creating a new project, generating models, writing an adapter for the filesystem, testing with Mocha and Chai, crafting templates, etc. is sooo much easier to reason through when you have a foundation to build upon.

But, what really is Ember? Per documentation, “A Framework for Creating Ambitious Web Applications“. Okay, I can get on board with that. From my understanding, the creators of the framework have made many decisions that allow developers to get going FAST. With these decisions comes the loss of flexibility gained from using other JavaScript frameworks such as Angular.

I view the choice as ordering from a fancy menu that provides an appetizer and dessert with limited substitutions to grazing on a all you can eat buffet. Sometimes I want the ability to keep going back for more Crab Rangoons. But most of the time I know I want Panang Curry and don’t really care to try everything else made by the Chef.

Electron

Electron is badass. Previously called Atom Shell, the folks at Github open sourced the JavaScript framework for buildingcross platform desktop apps. Companies using Electron for building their desktop applications include Microsoft, Facebook, Slack, Docker, Atom Editor, VisualStudio, WordPress, and many others.

It is perfect for Itamea because I want to upload, edit, and delete recipes from the filesystem on my laptop. Electron provides the ability to do so without the constant need for an internet connection to process data. I can also provide users with easy access to their recipes by storing the app on my local dock or menu bar and by integrating many of the key commands they’re already familiar with.

Similar to publishing iOS and Android applications built in React Native, I hope to eventually distribute Itamae on the MacOS app store so others can easily download. Below is a talk at EmpireJS by the brilliant Steve Kinney on Electron.

Optical Character Recognition

From the start, it was important to me to explore OCR technologies and integrate it into Itamae. A problem I’ve faced when cooking is remembering where all the class recipes are location. This frustration in searching through old cookbooks and half-torn printouts is why I want the user to be able to take a photo of a recipe and easily convert the image to text directly in the application. Think about snapping a photo of something you see in a magazine or on a menu, and easily adding the information to a new or existing recipe.

The three main types of text recognition include OCR for typewritten, intelligent character recognition (ICR) using machine learning, and intelligent word recognition (IWR) for handwritten notes. It is important to note that not all the technologies are bullet-proof and each one often requires some form of cleaning the image (cropping text, adjusting contrast) before results are accurate.

For our project, we decided to integrate a JavaScript OCR library built by Kevin Kwok called Ocrad.js. Integrating this library allowed us to leverage past OCR experience with the ability for the user to take a screenshot, picture from a webcam, upload directly, or draw directly on a tablet and have that text converted.

Creating a Electron-OCR Module

After deciding to use Ocrad.js for the application, we decided it would be more beneficial to put development of Itamae on pause and focus on understanding the OCR technology better. Base functionality for the app we felt should be upload an image directly so the user could snap a photo and easily transfer that information to the recipe book. So others don’t have to recreate the wheel in regard to Electron apps, it was decided to build a electron-orc NPM module that was easy to integrate.

Ocrad requires the image, photo or drawn, be on a HTML5 canvas element, a Context2D instance, or an instance of ImageData. Integration the canvas parallel to an existing recipe felt like the most logical choice so the user could see the raw image, crop and adjust the contrast for better conversion, and edit the converted text side by side.ocrad.js ocr javascript

The electron-ocr library was built in a empty electron shell to experience the functionality apart from other dependencies. The goal is for usage to be agnostic without other developers having to tweak major configurations. Although the Ocrad.js API is simple once configured, its documentation and other resources online are not the best.

A technical challenge we faced when integrating the API with the canvas was grabbing the correct image data. For hours, OCRAD kept returning “-” with little other feedback. An open pull request discussed others experiencing a similar bug, referring to the need to empty the canvas before passing data to be converted. Hours later, we found that the problem was not the canvas not being empty but rather the dimensions of image data being grabbed had to be exact to the image being passed through. If not, the library would return gibberish, an empty string, or nothing.

Screen Shot 2016-02-24 at 7.54.29 PM

Above is a screengrab from the final steps of fine-tuning our integration. The library will dynamically adjust to picture size for upload functionality, but we hard coded the dimensions above to get a base working conversion. The end conversion was dead on for the image passed in. Needless to say, we were ecstatic to see the words come through.

With a working integration of OCRAD, we refactored the library to the most essential code that any Electron project could integrate. Users have the choice of whether they want to grab data from a canvas element, a Context2D instance, or an instance of ImageData. The library is tested using Mocha and Chai and supported with Travis CI. The goal for future iterations of the library is to add additional methods that allow the user to pass and receive data with less configuration (cropping, adjusting contrast, input type) of the environment.

If you’re looking to add optical character recognition in a Electron desktop application, then npm i electron-ocr. Please let me know your thoughts of the library and, if you experience bugs or have an idea for future versions please open a pull request on the GitHub repo. Now, back to build Itamae.

Screen Shot 2016-02-25 at 9.20.40 AM

Back to Itamae

We are still working hard at getting version 0.1.0 of Itamea ready for beta testing. Thanks for reading about our journey in Ember, Electron, and Optical Character Recognition. Message me on Twitter if you have any thoughts on this post.

Check back soon for updates!

Advertisements