Speech recognition

From Sustainable linguistics
Jump to navigationJump to search

Automatic speech recognition (ASR) is the process and development of systems that recognize, convert to text and possibly translate spoken speech automatically. The reverse of this process is speech synthesis. Ideally ASR not only recognizes the utterances and single words, but also interprets the meaning in them. This enables the technology to act based on spoken commands, further helping communication between humans and machines. The ability to interpret meaning also adds to the correctness of speech recognition, as context becomes an additional source of information.

ASR can be utilized in versatile ways, for example in recognizing speech from recordings or live events, making it available for people with hearing difficulties. A well known example of a virtual assistant based on ASR is Siri (Apple), that recognizes and interprets the user's questions and commands and acts accordingly. Some police body cameras also use this technology in order to make handling the recorded materials easier.[1]

Challenges[edit | edit source]

Challenges posed by different accents and background noise, as well as interpreting natural conversations possibly involving multiple speakers are still something that are being constantly studied and developed. Multilingualism and the adaptability to several languages spoken in on context as well as the development of technologies for minority languages are something that also need to be worked on in the future. In order to make the technology more efficient, it also has to learn to process more complex and natural language.

Future of speech recognition[edit | edit source]

People working in the development of ASR describe the future of the technology as more multilingual, more available, adaptable and bias-free. The communication between humans and machines should become seamless and human-like, and not only would the technology process simple meanings of language, it would also recognize different styles and be able to respond.[1] All these factors add to the usability and availability to the speakers of different languages and varieties, helping in maintaining linguistic diversity rather than reducing it.