The Web Speech API Specification from the W3C describes APIs for both Speech Recognition and Speech Synthesis. Tutorial #37 Web Speech Recognition described the first of these. This tutorial shows how you can use Speech Synthesis to make a web page talk.
There are many potential applications for this feature, from mobile web applications that can give spoken directions like a SatNav system, to user feedback in web-based games.
NOTE: As of March 2014, ONLY the Google Chrome browser (version 33 and higher) fully supports Speech Synthesis, with partial support available in Safari on iOS7.
NOTE: Some of the options such as rate, volume, pitch do not work with all the voices - for example the default voice on Mac OS X
Speech Synthesis has been available on Windows and Mac OS X as a System Feature for a while. On Mac OS X, for example, you can choose among several voices from the Dictation and Speech Preference Panel and then turn on speech by selecting some text, right-clicking and choosing the Speech item in the menu.
Speech Synthesis in the Browser makes use of this system service and, in the case of Google Chrome, also provides additional voices that appear to be coded within the browser software itself.
On Mac OS X 10.9.2 (Mavericks) with Google Chrome 33, the list of voices is shown below:
- Voice 0 Google US English
- Voice 1 Google UK English Male
- Voice 2 Google UK English Female
- Voice 3 Google Español
- Voice 4 Google Français
- Voice 5 Google Italiano
- Voice 6 Google Deutsch
- Voice 7 Google 日本人
- Voice 8 Google 한국의
- Voice 9 Google 中国的
- Voice 10 Alex
- Voice 11 Agnes
- Voice 12 Albert
- Voice 13 Bad News
- Voice 14 Bahh
- Voice 15 Bells
- Voice 16 Boing
- Voice 17 Bruce
- Voice 18 Bubbles
- Voice 19 Cellos
- Voice 20 Deranged
- Voice 21 Fred
- Voice 22 Good News
- Voice 23 Hysterical
- Voice 24 Junior
- Voice 25 Kathy
- Voice 26 Pipe Organ
- Voice 27 Princess
- Voice 28 Ralph
- Voice 29 Trinoids
- Voice 30 Vicki
- Voice 31 Victoria
- Voice 32 Whisper
- Voice 33 Zarvox
Voices 0 through 9 are provided by Google and would appear to be generated within the browser itself.
The other voices are generated by, in my case, the Mac OS X built in Speech Synthesis capability.
NOTE that not all voices support pitch, rate and volume.
Understanding the Code
speechSynthesis is an API in the window object. The demo code prefixes it with window but this does not seem to be necessary
You should check whether the user's browser supports the API by testing the presence of speechSynthesis.
A single instance of speech is called a SpeechSynthesisUtterance. You create a SpeechSynthesisUtterance object and specify its attributes before passing it to a call of window.speechSynthesis.speak().
There are five demos contained in the demo page - each of them is wrapped in an event handler for the associated button.
- The simplest syntax - two lines of code
- The alternate syntax that exposes the attributes
- Fetching the list of voices that are available on your system
- Specifying a Voice
- Applying options to the voice
In addition to the features shown in the demos, the API also provides several event handlers and methods that can be used while an utterance is being spoken, etc.