{"id":1104,"date":"2020-07-08T11:23:33","date_gmt":"2020-07-08T11:23:33","guid":{"rendered":"https:\/\/blogs-dev.lib.uconn.edu\/news\/?p=1104"},"modified":"2020-07-08T11:56:38","modified_gmt":"2020-07-08T11:56:38","slug":"connecticut-digital-archive-to-expand-19th-century-handwritten-text-recognition","status":"publish","type":"post","link":"https:\/\/blogs-dev.lib.uconn.edu\/news\/connecticut-digital-archive-to-expand-19th-century-handwritten-text-recognition\/","title":{"rendered":"Connecticut Digital Archive to Expand 19th Century Handwritten Text Recognition"},"content":{"rendered":"\n<p>19<sup>th<\/sup> century handwritten documents are essential for researchers but are widely inaccessible even after digitization due to their inability to be searched. The Connecticut Digital Archive, a project of the UConn Library, is working to change that with a Catalyst Fund grant recently awarded by <a href=\"https:\/\/lyrasisnow.org\/press-release-lyrasis-announces-the-2020-catalyst-fund-recipients-and-their-projects\/\">LYRASIS<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-large is-resized\"><a href=\"https:\/\/blogs.lib.uconn.edu\/news\/files\/CT-Soldiers-Orphans-Home-Diary_pg8.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blogs.lib.uconn.edu\/news\/files\/CT-Soldiers-Orphans-Home-Diary_pg8.jpg\" alt=\"\" class=\"wp-image-1548\" width=\"335\" height=\"222\"\/><\/a><figcaption>Documents like this one from the CT Soldiers&#8217; Orphans&#8217; Home are unrecognizable through OCR. The CT Soldiers&#8217; Orphans&#8217; Home provided housing, schooling, and religious training to some two hundred or more orphans of Connecticut men who lost their lives in the Civil War.  <a href=\"http:\/\/hdl.handle.net\/11134\/20002:860139206\">Image<\/a> from October 29, 1866 provided by the UConn Library Archives &amp; Special Collections through the CT Digital Archive. <\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"ticss-d5dc0282\">Archives and special collections from across Connecticut fill the <a href=\"https:\/\/collections.ctdigitalarchive.org\/\">Connecticut Digital Archive <\/a>(CTDA), providing online access to a treasure of historic materials. However, even digitized, the irregularity in the handwriting in many of the manuscripts leaves the historical information in these documents inaccessible to Optical Character Recognition (OCR), a transfer method that has been used for more than 20 years to assist in document discoverability. To address this, historians and computer scientists have worked to apply machine learning to handwriting text recognition (HTR) through a relatively small number of projects with varied techniques and varied success.&nbsp;<\/p>\n\n\n\n<p>In the summer of 2019, the Library, in partnership with <a href=\"https:\/\/greenhousestudios.uconn.edu\/\">Greenhouse Studios<\/a>, the <a href=\"https:\/\/www.masshist.org\/\">Massachusetts Historical Society<\/a>, and <a href=\"https:\/\/www.engr.uconn.edu\/\">UConn School of Engineering<\/a>, created a set of over 16,000 images of 22 different characters from the John Quincy Adams Papers. These characters were used to train a neural network, or a set of algorithms modeled loosely after the human brain, designed to recognize patterns in those images. The neural network takes these handwritten digits, known as training examples, and develops a system to learn from them. As you increase the examples, the network learns more and improves its accuracy in identifying the individual letters and words. The pilot project over the summer produced promising results, with an 86%+ accuracy rate when testing on all 22 characters and an amazing 96%+ accuracy rate when testing on four of the characters.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignleft size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blogs.lib.uconn.edu\/news\/files\/LYRASIS-Grant-Student.jpeg\" alt=\"Student Matthew Mulhall working in the Greenhouse Studios on developing a neural network to identify handwritten characters.\" class=\"wp-image-1541\" width=\"244\" height=\"334\"\/><figcaption>Student Matthew Mulhall working in the <a href=\"https:\/\/greenhousestudios.uconn.edu\/\">Greenhouse Studios<\/a> on developing a neural network to identify handwritten characters.<\/figcaption><\/figure><\/div>\n\n\n\n<p>\u201cHistorical manuscripts are essential for humanities research and these funds will help scholars engage with unique and distinctive collections in a way they couldn\u2019t before,\u201d noted Greg Colati, Assistant University Librarian for University Archives, Special Collections &amp; Digital Curation for the UConn Library.<\/p>\n\n\n\n<p>The grant funds from LYRASIS will allow the Library and the <a href=\"https:\/\/www.cse.uconn.edu\/\">Computer Science &amp; Engineering Department in the School of Engineering <\/a>to expand this work on additional volumes of handwritten documents in the John Adams Papers. The goal is to expand the datasets, adjust the neural networks, and release the updated version to the public for free.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.lyrasis.org\/Pages\/Main.aspx\">LYRASIS<\/a> is a non-profit organization whose mission is to support enduring access to the world\u2019s shared academic, scientific and cultural heritage through leadership in open technologies, content services, digital solutions and collaboration with archives, libraries, museums and knowledge communities worldwide. The grant is part of their Catalyst Fund which provides support for new ideas and innovative projects that explore, test, refine and collaborate on innovations with community-wide impact.<\/p>\n\n\n\n<p>The <a href=\"https:\/\/collections.ctdigitalarchive.org\/\">CTDA<\/a> is a service of the UConn Library, providing services to preserve and make available digital assets related to Connecticut and created by Connecticut-based, not-for-profit educational, cultural, and historical institutions, including libraries, archives, galleries, and museums.<\/p>\n\n\n\n<p><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>19th century handwritten documents are essential for researchers but are widely inaccessible even after digitization due to their inability to be searched. The Connecticut Digital Archive, a project of the UConn Library, is working to change that with a Catalyst &hellip; <a href=\"https:\/\/blogs-dev.lib.uconn.edu\/news\/connecticut-digital-archive-to-expand-19th-century-handwritten-text-recognition\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/posts\/1104"}],"collection":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/comments?post=1104"}],"version-history":[{"count":3,"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/posts\/1104\/revisions"}],"predecessor-version":[{"id":1108,"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/posts\/1104\/revisions\/1108"}],"wp:attachment":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/media?parent=1104"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/categories?post=1104"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/news\/wp-json\/wp\/v2\/tags?post=1104"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}