Handwriting Recognition and Retrieval

A page from George Washington's letters.

A large quantity of information of historical and scientific interest remains locked up in archives of handwritten papers. While such collections are gradually being made available in the form of scanned images, creating transcripts that can be automatically searched is prohibitively expensive in most cases. At the same time, historical document collections present special challenges that make standard handwriting recognition technology ineffective.

I have been working in collaboration with researchers at the U. Mass Amherst Center for Intelligent Information Retrieval to develop better ways to recognize the text in handwritten historical documents. Working with digitized copies of George Washington's letter obtained from the Library of Congress, we have developed approaches to handwriting recognition that attempt to identify entire words at a time. By employing recent developments in machine learning, we have been able to achieve high word recognition rates. These rates prove high enough to support retrieval of letters containing specific words and phrases with high average precision.

Boosted Decision Trees for Word Recognition in Handwritten Document Retrieval, N. Howe & T. Rath, & R. Manmatha.  ACM SIGIR Conference on Research and Development in Information Retrieval, August 2005. [PDF] [PS.GZ] [BibTeX].

Research links: Overview | Segmentation | Retrieval | Tracking