Definition of

voice recognition

Internet of Things

Thanks to the IoT (Internet of Things) and voice recognition, it is now possible to design smart homes.

Speech recognition is the processing of speech by a system . The concept refers to an area of ​​artificial intelligence and the service or application that facilitates oral communication between a human being and a machine .

Before moving forward, it is important to mention that recognition is the act and result of recognizing: determining an identity ; inspect something to gain knowledge of its properties or nature; or carry out an analysis to find information. Voice , on the other hand, is the sound produced when the vocal cords vibrate.

Through speech recognition, a system can identify, authenticate and process what a person says . This technology, which has grown a lot in the last decade, has multiple applications.

History of speech recognition

The history of speech recognition dates back to the 1950s . A Bell Laboratories system known as AUDREY (name derived from the expression automatic digit recognizer ) is often mentioned as a pioneer, which was capable of recognizing spoken digits (0 to 9) with an accuracy of over 90%. However, this precision was associated with the developer's voice: when another person spoke, the system did not achieve this precision .

From then on, advances in voice recognition were progressive and constant. In 1962, IBM unveiled a computer called Shoebox that could understand sixteen words. Also in the 1960s , scientists from the Soviet Union devised an algorithm capable of recognizing about two hundred terms.

The evolution continued with Harpy , a startup funded by the US Department of Defense whose recognition capacity exceeded a thousand words and even included complete sentences. Tangora from IBM in the 1980s and Dragon Dictate and Dragon NaturallySpeaking from Dragon Systems in the 1990s continued in the evolutionary line of speech recognition, which expanded enormously and became more accessible to the general public starting in the 21st century .

Google, Amazon, Apple y Microsoft son algunas de las compañías que, en la actualidad, incluyen el reconocimiento de voz en numerosos programas, aplicaciones y dispositivos.

speech technology

Correction of speech recognition errors is achieved through system training.

Your applications

Voice recognition technology has multiple uses. Some of these applications are so commonplace today that the user is almost unaware of the artificial intelligence involved in operations with smart devices.

Virtual assistants like Alexa, Siri , Google Assistant , Bixby and Cortana , for example, work through voice recognition. It is possible to dictate different instructions to them so that they offer answers or perform certain actions. Thus, anyone who speaks to their smartphone , smart TV or other devices with voice recognition capabilities to search for information or detail a command is taking advantage of this speech technology .

Speech and dictation software also makes it possible to switch from speech to text (STT) and text to speech (TTS) . Automatic transcription requires natural language processing (NLP) to provide adequate results.

Voice search is also used with chatbots . These virtual teleoperators are prepared to provide an answer based on the words or phrases they record.

It cannot be failed to mention that hands-free communication involving the use of speech was a great advance in vehicles. Cars with voice control can act on certain inputs from the driver.

La authentication de voz, por otro lado, se convirtió en un mecanismo de security de gran importancia. En este caso, se apela a la biometría para permitir el acceso a un sistema, una información o incluso un lugar. El sistema debe reconocer la voz en cuestión para levantar las restricciones; de lo contrario, no concede el permiso.

Technological future

Future trends in voice recognition are associated with the advancement of artificial intelligence.

Voice recognition training

Training voice recognition systems is essential to increase accuracy and effectiveness. Said training consists of the introduction of large amounts of data so that the system learns how to function correctly and thus improves.

What is intended is for the software to perfect pattern recognition . For this, neural networks are used, a method that aims to process information in a way similar to that carried out by the human brain. In this network, there are interconnected nodes that send and receive signals.

Voice recognition training is based on machine learning : certain elementary parameters are configured and the computer is programmed to independently learn to recognize patterns through different processing layers. In the specific case of voice recognition, deep learning is used, through which the system develops tasks in a manner analogous to people thanks to the aforementioned neural networks.

As voice recognition technology records and processes data, it learns to work better. The data enables the system to distinguish between words, taking into account sound, pronunciation, context and other variables.