Speech synthesis

From Sustainable linguistics
Jump to navigationJump to search

Speech synthesis or Text to Speech (TTS) is automatically generated speech that mimics the human voice. It can be used as a tool for blind, deafened, or vocally handicapped people where either understanding written text or producing audible speech is necessary or preferred, but it is also used in a variety of context that the majority of people with access to technology can utilize. For example, tools like Siri (Apple voice assistant) can help make using technology more effortless. Many customer services today utilize and AI or a speech bot for phone calls. It can also be used in machine translation in order to vocalize the target language in the same way speech recognition can be used to recognize the words and sentences from the target language.

Synthetic speech has developed a lot in the recent years and the modern equipment and software can, to an extent, mimic emotions, tone and individual differences between speakers.[1]

Availability[edit | edit source]

Speech synthesis relies heavily on source materials. Creating working tools for minority languages with a low number of speakers, researchers and the amount of materials requires effort and resources. Effects of favoring individual accents, dialects or standardized forms might also have an effect on availability of the technology.

Future challenges[edit | edit source]

Companies developing TTS are actively trying to create more natural-sounding speech. Moreover, the interaction between human and machine and how flexible a machine can be in producing language is also constantly developing. Ultimately the goal right now is to make communication between humans and computers (or smart phones or any technology) as effortless and natural as possible.[1]