PlainTalk

PlainTalk is the collective name for several speech synthesis (MacInTalk) and speech recognition technologies developed by Apple Inc.

In 1990, Apple invested a lot of work and money in speech recognition technology, hiring many researchers in the field. The result was "PlainTalk", released with the AV models in the Macintosh Quadra series from 1993. It was made a standard system component in System 7.1.2, and has since been shipped on all PowerPC and some other 68K Macintoshes.

Software

Speech synthesis

Technology

Apple's text-to-speech uses diphones. Compared to other methods of synthesizing speech, it is not very resource-intensive, but limits how natural the speech synthesis can be. American English and Spanish versions have been available, but since the advent of Mac OS X, Apple has shipped only American English voices, relying on third-party suppliers such as Acapela Group to supply voices for other languages (in OS X 10.7, Apple licensed a lot of third-party voices and made them available for download within the Speech control panel).

An application programming interface known as the Speech Manager enables third-party developers to use speech synthesis in their applications. There are various control sequences that can be used to fine-tune the intonation and rhythm. The volume, pitch and rate of the speech can be configured as well.

Input to the synthesizer can be controlled explicitly using a special phoneme alphabet.

Original MacInTalk

The initial Macintosh text-to-speech engine, MacinTalk (named by Denise Chandler), was used by Apple in the 1984 introduction of the Macintosh in which the computer announced itself to the world (and poked fun at the weight of an IBM computer). While it was incorporated into the Macintosh's operating system, it was not officially supported by Apple (though programming information was made available through an Apple Technical Note). MacinTalk was developed by Joseph Katz and Mark Barton who later founded SoftVoice, Inc. which currently markets TTS engines for Windows, Linux and embedded platforms.

MacInTalk 2

Eventually, Apple released a supported speech synthesis system, called MacInTalk 2. It supports any Macintosh running System Software 6.0.7 or later. It remained the recommended version for slower machines even after the release of MacInTalk 3 and Pro.

MacInTalk 3, Pro

MacinTalk 3 introduced a great variety of voices. Apart from the standard adult voices "Ralph", "Fred" and "Kathy", and children's voices like "Princess" and "Junior", various novelty voices were included, including "Whisper", "Zarvox" (a robot voice with melodic background sounds, with a similar voice called "Trinoids" also included), "Cellos" (a voice that sung its text to an Edvard Grieg tune, with similarly-singing voices like "Good News", "Bad News", "Pipe Organ"), "Albert" (a hoarse-sounding voice), "Bells", "Boing", "Bubbles", and others.

Each of these voices came with its own example text, that would be spoken when one hit the "Test" button in the Speech control panel. Some would just say their name, language and the version of MacinTalk they were introduced with. Others would say funny things, like "I sure like being inside this fancy computer", "I have a frog in my throat... No, I mean a real frog!", or "The light you see at the end of the tunnel is the headlamp of a fast approaching train". These voices as well as their test texts are still in Mac OS X today.

With the increase in computing power that the AV Macs and PowerPC based Macintoshes provided, Apple could afford to increase the quality of the synthesis. MacInTalk 3 required a 33 MHz 68030 processor and MacInTalk Pro required a 68040 or better and at least 1 MB of RAM. Each synthesizer supported a different set of voices.

Text-to-speech in Mac OS X

Text-to-speech has been a part of every Mac OS X version. The Victoria voice was enhanced significantly in Mac OS X v10.3, and added as Vicki (Victoria was not removed). Its size was almost 20 times greater, because of the higher-quality diphone samples used.

A new, much more natural-sounding voice, called "Alex" has been added to the Mac text-to-speech roster with the release of Mac OS X 10.5 Leopard.[1]

With Mac OS X 10.7 Lion, voices are available in additional U.S. English and other English accents, as well as 21 other languages. [2]

The Speak selected text when key is pressed feature allows selected text from any application to be read via a key combination. From Mac OS X 10.1 to Mac OS X 10.6, the feature would copy the selected text to the clipboard and read it from there. From Mac OS X 10.7 to Mac OS X 10.10, a new implementation of the feature required software developers to implement a speech synthesis API into their applications.[3][4] This prevented the clipboard from being overwritten, but also meant that, for applications that did not use the API, the feature would not function as expected, reading the title bar rather than the selected text.[5][6]

Speech recognition

Apple hired many speech recognition researchers in 1990. After about a year, they demonstrated a technology codenamed Casper. It was released as part of the PlainTalk package in 1993. Although available for all PowerPC Macintoshes and AV 68k machines (it was one of the few applications that made use of the DSP in the Centris 660AV and Quadra 840AV), it was not part of the default system install prior to Mac OS X, requiring the user to perform a custom OS installation to get speech recognition capabilities.

In Mac OS X 10.7 Lion and earlier, Apple's speech recognition was voice-command oriented only, i.e. not intended for dictation. It can be configured to listen for commands when a hot key is pressed, after being addressed with an activation phrase such as "Computer", or "Macintosh", or without prompt. A graphical status monitor, often in the form of an animated character, provides visual and textual feedback about listening status, available commands and actions. It can also communicate back with the user using speech synthesis.

Early versions of the speech recognition provided full access to the menus. This support was later removed, since it required too many resources and made recognition less reliable, only to be re-added in Mac OS X 10.3 as a "universal access technology" called spoken user interface.

The user can launch items located in a special folder, called "Speakable Items", simply by speaking their name (while the system is in listening mode). Apple shipped a number of AppleScripts in this folder, but aliases, documents and folders can be opened in the same way.

Additional functionality is provided by individual applications. An application programming interface lets programs define and modify an available vocabulary. For example, the Finder provides a vocabulary for manipulating files and windows.

In OS X 10.8 Mountain Lion, Apple introduced “Dictation,[7]” intended for general text. Originally, it required the sending of audio data to Apple servers for processing. In OS X 10.9 Mavericks, Apple added the option to download support for dictation without an Internet connection. As of OS X 10.9.3, eight languages (19 dialects) are supported.

In radio

The MacinTalk speech synthesis can be heard in a few radio programmes:

In music

The MacinTalk speech synthesis can be heard in a few songs:

In film

In television

In video games

Hardware

Apple produced two microphones under the moniker "Apple PlainTalk Microphone". The first shipped inclusive with Macintosh LC and early Performa models, and was circular in appearance. It was designed to sit in a holder attached to the side of a CRT display, and be lifted out and held by the mouth when talking. The second model was introduced alongside the AV models in the Macintosh Quadra series in 1993 but was also sold separately. It was designed to be positioned on top of the screen and to be sensitive to sound from the front. Both models had a longer connector, the tip of which was used to provide the microphone with extra power.

References

  1. "Accessibility - OS X". Apple. Retrieved 2016-04-27.
  2. "Archived copy". Archived from the original on September 24, 2011. Retrieved July 23, 2011.
  3. "Introduction to Speech Synthesis Programming Guide". Developer.apple.com. 2006-09-05. Retrieved 2016-04-27.
  4. "Speech Synthesis in OS X". Developer.apple.com. 2006-09-05. Retrieved 2016-04-27.
  5. "[Solved] Text to speech only reads the document title (View topic) • Apache OpenOffice Community Forum". Forum.openoffice.org. Retrieved 2016-04-27.
  6. "scottmartin/speak-selected-text-sublime: A plugin to use the Mac's text to speech from Sublime Text 2". GitHub.com. Retrieved 2016-04-27.
  7. "Use your voice to enter text on your Mac - Apple Support". Support.apple.com. 2016-04-05. Retrieved 2016-04-27.
  8. "Chris Morris - Blue Jam - Steve Lamacq Sting". YouTube. BBC Radio 1. Retrieved 30 November 2014.
  9. "Marilyn Manson - Antichrist Superstar Official Music Video". Antichrist Superstar Official Music Video. NME.com. Retrieved 15 August 2011.
  10. Steve "Capone" Prokopy (2008-06-24). "Andrew Stanton Gives Up the Goods on WALL-E and JOHN CARTER to Capone!". Ain't It Cool News. Retrieved 2008-11-22.
This article is issued from Wikipedia - version of the 9/3/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.