Optical Character Recognition (“OCR”) in a Ember & Electron App

A good friend and myself recently started a digital recipe book called Itamae. With an interest in turning Itamae into a desktop application and integrating Optical Character Recognition we decided to use Ember and Electron for the build. Because it was our first time using these technologies, we thought it would be helpful to keep track of the experience to help others to get started. The following is an ongoing reflection…

Itamae shell from day 1:

itamae ember electron ocr recipe book

What the… Ember, Electron, and OCR?

Ember

Coming from a Ruby and Rails background, Ember offers a sense of comfort that has been lacking in the raw JavaScript and jQuery applications I’ve worked on recently. Creating a new project, generating models, writing an adapter for the filesystem, testing with Mocha and Chai, crafting templates, etc. is sooo much easier to reason through when you have a foundation to build upon.

But, what really is Ember? Per documentation, “A Framework for Creating Ambitious Web Applications“. Okay, I can get on board with that. From my understanding, the creators of the framework have made many decisions that allow developers to get going FAST. With these decisions comes the loss of flexibility gained from using other JavaScript frameworks such as Angular.

I view the choice as ordering from a fancy menu that provides an appetizer and dessert with limited substitutions to grazing on a all you can eat buffet. Sometimes I want the ability to keep going back for more Crab Rangoons. But most of the time I know I want Panang Curry and don’t really care to try everything else made by the Chef.

Electron

Electron is badass. Previously called Atom Shell, the folks at Github open sourced the JavaScript framework for buildingcross platform desktop apps. Companies using Electron for building their desktop applications include Microsoft, Facebook, Slack, Docker, Atom Editor, VisualStudio, WordPress, and many others.

It is perfect for Itamea because I want to upload, edit, and delete recipes from the filesystem on my laptop. Electron provides the ability to do so without the constant need for an internet connection to process data. I can also provide users with easy access to their recipes by storing the app on my local dock or menu bar and by integrating many of the key commands they’re already familiar with.

Similar to publishing iOS and Android applications built in React Native, I hope to eventually distribute Itamae on the MacOS app store so others can easily download. Below is a talk at EmpireJS by the brilliant Steve Kinney on Electron.

Optical Character Recognition

From the start, it was important to me to explore OCR technologies and integrate it into Itamae. A problem I’ve faced when cooking is remembering where all the class recipes are location. This frustration in searching through old cookbooks and half-torn printouts is why I want the user to be able to take a photo of a recipe and easily convert the image to text directly in the application. Think about snapping a photo of something you see in a magazine or on a menu, and easily adding the information to a new or existing recipe.

The three main types of text recognition include OCR for typewritten, intelligent character recognition (ICR) using machine learning, and intelligent word recognition (IWR) for handwritten notes. It is important to note that not all the technologies are bullet-proof and each one often requires some form of cleaning the image (cropping text, adjusting contrast) before results are accurate.

For our project, we decided to integrate a JavaScript OCR library built by Kevin Kwok called Ocrad.js. Integrating this library allowed us to leverage past OCR experience with the ability for the user to take a screenshot, picture from a webcam, upload directly, or draw directly on a tablet and have that text converted.

Creating a Electron-OCR Module

After deciding to use Ocrad.js for the application, we decided it would be more beneficial to put development of Itamae on pause and focus on understanding the OCR technology better. Base functionality for the app we felt should be upload an image directly so the user could snap a photo and easily transfer that information to the recipe book. So others don’t have to recreate the wheel in regard to Electron apps, it was decided to build a electron-orc NPM module that was easy to integrate.

Ocrad requires the image, photo or drawn, be on a HTML5 canvas element, a Context2D instance, or an instance of ImageData. Integration the canvas parallel to an existing recipe felt like the most logical choice so the user could see the raw image, crop and adjust the contrast for better conversion, and edit the converted text side by side.ocrad.js ocr javascript

The electron-ocr library was built in a empty electron shell to experience the functionality apart from other dependencies. The goal is for usage to be agnostic without other developers having to tweak major configurations. Although the Ocrad.js API is simple once configured, its documentation and other resources online are not the best.

A technical challenge we faced when integrating the API with the canvas was grabbing the correct image data. For hours, OCRAD kept returning “-” with little other feedback. An open pull request discussed others experiencing a similar bug, referring to the need to empty the canvas before passing data to be converted. Hours later, we found that the problem was not the canvas not being empty but rather the dimensions of image data being grabbed had to be exact to the image being passed through. If not, the library would return gibberish, an empty string, or nothing.

Screen Shot 2016-02-24 at 7.54.29 PM

Above is a screengrab from the final steps of fine-tuning our integration. The library will dynamically adjust to picture size for upload functionality, but we hard coded the dimensions above to get a base working conversion. The end conversion was dead on for the image passed in. Needless to say, we were ecstatic to see the words come through.

With a working integration of OCRAD, we refactored the library to the most essential code that any Electron project could integrate. Users have the choice of whether they want to grab data from a canvas element, a Context2D instance, or an instance of ImageData. The library is tested using Mocha and Chai and supported with Travis CI. The goal for future iterations of the library is to add additional methods that allow the user to pass and receive data with less configuration (cropping, adjusting contrast, input type) of the environment.

If you’re looking to add optical character recognition in a Electron desktop application, then npm i electron-ocr. Please let me know your thoughts of the library and, if you experience bugs or have an idea for future versions please open a pull request on the GitHub repo. Now, back to build Itamae.

Screen Shot 2016-02-25 at 9.20.40 AM

Back to Itamae

We are still working hard at getting version 0.1.0 of Itamea ready for beta testing. Thanks for reading about our journey in Ember, Electron, and Optical Character Recognition. Message me on Twitter if you have any thoughts on this post.

Check back soon for updates!

Advertisements

Sorting Algorithms – Bubble, Merge, Insertion

What are some of the balances and trade offs between different sorting algorithms?

Everything we’re dealing with is sorting arrays. Consider Speed vs. Space when deciding what algorithm to use.

Notes:

  • Insertion: good for small data sets and sorting in the browser, stable (alex before adam, etc.), doesn’t take much space
  • Bubble: looks similar to insertion sort, even if array is sorted we’re checking through every item (slow)
  • Merge: recursively split array to sort then merge (divide & conquer), won’t work with a lot of data in the browser (memory constraint)

Pros:

  • Insertion: low resources needed, fast for nearly-sorted data
  • Bubble: easy to implement, cool name
  • Merge: fast, stable

Cons:

  • Insertion: slow when data is reverse order
  • Bubble: slow, unstable
  • Merge: needs resources for temp space and arrays from recursive calls

 

jQuery Fundamentals

I recently worked through Bocoup’s jQuery tutorial. Highly reccomend taking a look, even if you’ve been working with the library for years. Every time I revisit the fundamentals of a topic my depth of understanding greatly improves. Here are a few notes from each section:

 

JavaScript Fundamentals

  • Everything in JavaScript is an object, except the primitives: strings, booleans, numbers, undefined, and null.
  • “this” refers to the object inside the function that was called
  • .call and .apply let you pass arguments to a function
  • Array literal notation ( var myArray = [ ‘a’, ‘b’, ‘c’ ]; ) is better than invoking “new Array”
  • Most values in JS are truthy, only 5 values are falsey: undefined, null, NaN, 0, and ‘ ’
  • Ternary operator is a cleaner way to write an if/else statement: var propertyName = ( dim === ‘width’ ) ? ‘clientWidth’ : ‘clientHeight’;
  • Naming: “_foo” names with an underscore are typically private, “Dogs” names with an uppercase are usually constructors, and “$.ajax” names with a dollar sign are usually jQuery objects
  • The double space surrounding a semicolon when setting properties (firstName : ‘Aaron’) seems weird to me.

jQuery Basics

  • Interesting to look at the source code: https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.js
  • Test the contents of a selection by using: if ( $( ‘#nonexistent’ ).length ) { // }
  • $() creates a new element in memory. But it wont display until placed it’s on the page
  • Getters retrieve info and are typicallly limited to the first element of a selection
  • Setters operate on all elements in a selection

Traversing & Manipulating the DOM

  • .end() allows you to get back to the original selection but should be refactored out
  • .clone() copies an element but the copy must be placed. It is important to change or remove the copied element’s id attribute
  • .remove(), .detach(), and .replaceWith() allow you to remove elements from the document

Events & Event Delegation

  • Common event methods include: .click(), .keydown(), .keypress(), .keyup(), .mouseover(), .mouseout(), .mouseenter(), .mouseleave(), .scroll(), .focus(), .blur(), and .resize().
  • These can all be used with .on( ‘focus blur’, console.log(“Hello”)). You can also bind multiple events at once with .on()
  • .trigger() will trigger bound event handlers
  • .off() will remove any event handlers that were bound to an event
  • Namespacing events allows you to target specific handlers
  • Event object properties include: type, which, target, pageX, and pageY
  • Event bubbling allows you to use event delegation. This provides performance gains and consistency with handlers executing as planned.

Animating with jQuery

  • Common effects include: .show(), .hide(), .fadeIn(), .fadeOut(), .slideDown(), .slideUp(), and .slideToggle()
  • Use callback functions at the end of animation methods to specific what should happen next
  • .animat() allows the use of custom CSS effects
  • .stop() and .delay() are helpful when managing animations

AJAX & Deferreds

  • Pass $.ajax() an option URL to apply same configuration across several routes
  • Use JSON.stringify() and JSON.parse() to create and parse a JSON string outside of jQuery
  • .then() and .always() allow you to attach callbacks on requests
  • Specifiy ‘jsonp’ at the dataType to get around blocked XHR requests when calling API’s
  • $.Defferred() allows you to manage asynchronous operations

Functions, Variables, and Objects in JavaScript

I recently read a few chapters from Speaking JavaScript. The digital book is a great resource that I plan to reference as I progress in JS development. Below are a few thoughts from chapters covering the nature of the language and it’s functions, variables, and objects.

Nature of JS

Chapter 3 was a gentle introduction to the basic differences found in JavaScript. I especially liked “the elegant parts help you work around the quirks… they allow you to implement block scoping, modules, and inheritance APIs—all within the language.”

Functions

Chapter 15 opened my eyes to the many uses of functions in JS. Functions have three main roles: normal ( id(‘hello’) ), constructor ( new Date() ), and method ( obj.method() ). They can also be declared or combined with an expression (var id = function…). Functions can be called with any number of parameters, independent of what was initially defined. This flexibility allows you to pass additional parameters as an array but can also expose an undefined element.

You can check if a parameter is missing by comparing the optional arguments to an undefined or false value. Or by checking for a minimum length on the parameters. To alter the value of a variable from a function, the variable must be wrapped (array, hash, etc). It is important to understand param structure when using functions and methods. Named parameters via object literals can clarify function and method use.

Variables

Chapter 16 discussed variables in regard to scopes, environments, and closures. Scope is where you can call the variable (local vs. global, inner vs outer). Variables in JS are function scoped, meaning only functions can change a variable’s scope. It is important to assign a variable otherwise it becomes a global. Similar to other languages, it’s best to avoid global variables (and global objects) when possible.

The data structure that stores variable names and values is called an environment. Related, a closure is a function and that function’s scope. Closures are examples of environments that survive after the program has executed a function and its variable(s). Because of closures the scope of a variable might be maintained longer than anticipated, creating unanticipated results when using loops. This is often experienced using event handlers with loops on the DOM.

Objects

Chapter 17 on objects and inheritance was extremely valuable. This chapter helped relate my understanding of object and class structure in other languages to JavaScript. It’s important to remember that all functions are also objects. Object oriented programming in JS can be split into two levels of “difficulty”, basic (single objects and prototype chains) and advanced (constructors as factories and subclassing).

It is important to note that you can create objects without the need of a class. Also, all objects are like hashes (maps of keys to values) but also involve inheritance and other added layers of abstractions. The dot operator (ex. aaron.age) allows you to get/set/delete properties and call methods on objects. The bracket operator “[]” allows reference to properties through expressions ( [“person” + “name”] ).

Interestingly, every object can have another object as its prototype (creating a chain). This allows the first object to inherit properties from its prototype object. Setting a property only affects the first object where getting a property looks at all objects in the chain. You can protect an object by preventing extensions, and sealing or freezing its properties. I think of this similarly to setting permissions on a file.

A constructor function allows you create multiple objects with similar properties and is invoked using the “new” operator. JS constructors are dynamic, allowing you to return a direct instance of the object or whatever quality you desire from that object. But using an object as a map can cause problems. It’s better to use a library like StringMap.js when arbitrary keys exist.