An Introduction to Speech-Processing Systems

Published: Jun 19, 2007

Since the late 1700s when Wolfgang von Kempelen built a "Speaking Machine," things have developed rapidly and now millions of blind people are being helped thanks to text-to-speech technology. Learn the basics of text-to-speech technology and find out what the future possibilities are for this technology.


 

Researchers at IBM have made remarkable developments in speech processing. You can listen to an example here (link to audiofile ). The IBM project uses input from actual speakers to sample the voice and recreate natural sounding words (you can try it here). They use a building-block model where software transforms words and letters to a series of phonemes. English contains about 40 unique phonemes and the software contains a collection of recorded samples of each phoneme and uses it to reconstruct new words. Together with pitch, timing and loudness also known as prosody, a complete sentence can be reconstructed.

With a database of 10,000 recorded samples of each English phoneme, IBM covers all possible words in the English language. All words are connected to written words with speech-to-text technology and added to the database.

Speech Processing has a big impact on blind and visually impaired people of which 40% are employed (from the working age). With 5% to 6.8% of the US population blind or visually impaired, the reach of English speech processing software is over 18 million.

Sources: Scientific American and American Foundation for the Blind


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

Back to top