{"id":5452,"date":"2015-04-08T14:22:40","date_gmt":"2015-04-08T14:22:40","guid":{"rendered":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/?p=5452"},"modified":"2015-04-08T14:22:40","modified_gmt":"2015-04-08T14:22:40","slug":"replicating-the-human-voice-gender-automation-and-created-beings","status":"publish","type":"post","link":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/2015\/04\/08\/replicating-the-human-voice-gender-automation-and-created-beings\/","title":{"rendered":"Replicating the Human Voice: Gender, Automation and Created Beings"},"content":{"rendered":"<p><em>\u201cHello, I\u2019m here.<\/em>\u201d Throaty, warm, and incredibly human, the first lines spoken by Samantha, the incorporeal female operating system in Spike Jonze\u2019s 2013 film <em>Her, <\/em>are a far cry from the mechanical voice recognition technologies that we are used to. In addition to its thought provoking philosophical predictions about the near-future, Jonze\u2019s film also hones in on the ideal of voice-replication technology: an artificially intelligent system so natural and intuitive that we can fall in love with it.<\/p>\n<p>Replicating the intricacies of human characteristics and behaviors in non-sentient technologies is far from a new curiosity. Throughout history, automata creators worked to imbue automata with the ability to pen poems, perform acrobatics, play musical instruments, and bat their eyelashes, and these self-operating <a href=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HumanVocalTract.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-5454\" src=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HumanVocalTract-300x279.jpg\" alt=\"HumanVocalTract\" width=\"300\" height=\"279\" srcset=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HumanVocalTract-300x279.jpg 300w, https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HumanVocalTract-1024x953.jpg 1024w, https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HumanVocalTract-322x300.jpg 322w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>machines were often admired and judged for their ability to replicate human behaviors. In perhaps the most dramatic fictional interaction with an automaton that blurs the lines separating human from machine, protagonist Nathaniel in E.T.A. Hoffmann\u2019s short story \u201cThe Sandman\u201d falls in love with automaton Olympia, whom he mistakes for a human female.<\/p>\n<p>If imitation and replication of human characteristics is one of the driving forces in the creation of artificial beings, it is also one of the greatest challenges. Moving away from automata and toward operating systems and other examples of artificial intelligence, reproduction of human speech is one of the greatest hurdles to clear if we are to produce operating systems like Jonze\u2019s fictional Samantha. The 1981 premiere issue of <em>High Technology <\/em>affirms that the simulation of human speech has historically been one of the most elusive replication technologies. One article, \u201cTalking Machines Aim For Versatility,\u201ddiscusses various methods used to replicate speech and reflects upon the value of machines that are capable of producing and understanding human speech.<\/p>\n<p>At the time, most recordings (like the voice on the operator line) consisted of actual recordings of human speech or utilized early word synthesis technologies that tended to sound flat and robotic. Higher-end technologies were very expensive; the Master Specialties <a href=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HighTechnology.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-5453\" src=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HighTechnology-239x300.jpg\" alt=\"HighTechnology\" width=\"239\" height=\"300\" srcset=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HighTechnology-239x300.jpg 239w, https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HighTechnology-817x1024.jpg 817w, https:\/\/blogs-dev.lib.uconn.edu\/archives\/files\/2015\/04\/HighTechnology.jpg 1832w\" sizes=\"(max-width: 239px) 100vw, 239px\" \/><\/a>model 1650 synthesizer, for example, was $550 for only a one-word vocabulary, and each additional word cost $50, so the technology was very cost restrictive. Technologies have vastly evolved since the time of rudimentary \u201ctalking machines,\u201d but the article discusses the potential of speech synthesis and compression technologies that would streamline the process of stringing phonemes (the smallest speech units) into complete sentences, therefore maximizing speech output, concepts which have influenced current speech synthesis techniques. While there are various approaches to \u201cbuilding\u201d voices, the most common technique in the modern day is concatenative synthesis. A voice actor is recorded reading passages of text, random sentences, and words in a variety of cadences, which are then combined with other recording sequences by a text-to-speech engine to form new words and sentences. The technique vastly expands upon the range and comprehensiveness of operating systems like Apple\u2019s Siri.<\/p>\n<p>The <em>High Technology <\/em>article reflects that because machines would be able to communicate in a form that is natural to humans, they are more equipped to fulfill \u201cthe role of mankind\u2019s servants, advisors, and playthings.\u201d Over thirty years later, this goal is still extraordinarily relevant, especially as artificially intelligent systems become more integrated into consumer products like phones, tablets, cars, and security systems. Furthermore, advancements in voice recognition and synthesis technologies have benefitted individuals with impairments who require speech-generating devices to communicate verbally. There are, of course, many challenges that still remain, including the ability of these systems to understand different human accents and dialects. Creating believable artificial speech, however, still harkens back to the greatest challenge in the evolution of these technologies: authenticity. Humans are able to register subtle changes in tone and inflection when we communicate with each other, and these subtleties are currently difficult to replicate in artificially intelligent systems, which explains why we can easily discern a human voice from that of a machine. Jonze\u2019s film suggests that clearing this authenticity hurdle is essential to our ability to truly connect with our technology. Far from the utilitarian \u201cservants, advisors, and playthings\u201d suggested in the <em>High Technology<\/em> article, intuitive and human-like operating systems could alter our emotional relationship with machines to the extent that they become our confidants and romantic partners.<\/p>\n<p><em style=\"color: #373737\">Intern Giorgina Paiella is an undergraduate student majoring in English and minoring in philosophy and women\u2019s, gender, and sexuality studies. In her new blog series, \u201cMan, Woman, Machine: Gender, Automation, and Created Beings,\u201d she explores treatments of created and automated beings in historical texts and archival materials from Archives and Special Collections.<\/em><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>\u201cHello, I\u2019m here.\u201d Throaty, warm, and incredibly human, the first lines spoken by Samantha, the incorporeal female operating system in Spike Jonze\u2019s 2013 film Her, are a far cry from the mechanical voice recognition technologies that we are used to. &hellip; <a href=\"https:\/\/blogs-dev.lib.uconn.edu\/archives\/2015\/04\/08\/replicating-the-human-voice-gender-automation-and-created-beings\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[251,9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/posts\/5452"}],"collection":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/comments?post=5452"}],"version-history":[{"count":5,"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/posts\/5452\/revisions"}],"predecessor-version":[{"id":5461,"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/posts\/5452\/revisions\/5461"}],"wp:attachment":[{"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/media?parent=5452"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/categories?post=5452"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs-dev.lib.uconn.edu\/archives\/wp-json\/wp\/v2\/tags?post=5452"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}