How Experts Are Working to Improve Inclusivity of Voice Recognition Software

While the first speech recognition software was launched back in the early ’90s, technology has significantly progressed since then, offering users improved capabilities and specialised applications.

Even so, however, voice recognition software is still faced with a plethora of issues, limiting its use and, subsequently, inclusivity.

That is, the software is generally designed to be able to detect specific voices and accents due to the way in which it’s developed and trained, so to speak. Furthermore, the subject matter that the software is able to work with is very much dependent on what it’s been intentionally exposed to.

Therefore, the more complex and specialised the subject matter, the less effective the technology is – unless it’s been specifically designed for the subject in question. This may include things like specific medical or legal jargon, for instance.

Thus, many companies and developers in the field have been working to improve the capabilities of voice recognition software, broadening the use of the technology.

In this article, we’re going to explore the current limitations of voice-recognition software, the ways in which technological improvements can result in increased inclusivity and how industry leaders are working towards achieving this and improving technology overall.

 

Current Limitations of Voice Recognition Software 

 

The primary purpose of voice recognition software is to receive dictation to either create different types of transcripts or understand and perform spoken commands.

With regards to the former, this may be to create documentation directly from speech, thus saving time that would previously have been used to painstakingly listen to audio and make manual transcripts.

This technology is being used by professionals to create things like legal documents, film subtitles and recently, medical scripts.

In terms of understanding spoken commands, voice recognition software is increasingly being used as a component of smart technology, acting as a way to make commands. This may include things like turning lights on and off within a home or locking and unlocking your front door.

So, if these are the things that voice recognition technology is supposed to be able to do, in what ways is it limited and how do these limitations affect inclusivity?

As with any kind of software, there are several challenges faced by the speech recognition industry, including things like the length of time required to create voice templates and samples; the high rate of errors in longer samples; the need to train users; and the potential ramifications of software downtime.

The main challenge faced by speech recognition software, however, is that its capabilities are dependent on the information it’s provided with, meaning that it can only learn from the voice samples developers feed it. So, in the most basic sense, if the voice of a user is different from the samples the software has been trained with, it’ll have difficulty recognising the speech.

Most frequently, these so-called differences include things like heavy accents that are different to those used in the samples, as well as speech impediments.

This is because the samples used to train the software tend to utilise speech that is “clean”. This means that it’s clear, free from errors and the speaker doesn’t have speech impediments.

The use of “clean” speech isn’t problematic in itself, since it’s a fairly accurate representation of most of the speech the software will be exposed to.

However, the increasingly popular opinion is that it shouldn’t be the only type of speech that speech recognition software is trained in.

By providing programmes with voice samples in different accents and with speech impediments like stutters, for instance, the software will eventually be able to detect this kind of speech more accurately in the future.

Let’s have a look at a few startups that are combatting these limitations directly.

 

Startups Challenging the Limitations Faced By Voice Recognition Software

 

While there are significant challenges being faced in the voice recognition industry, there are also a number of promising startups that are using technology to alleviate the effects of these limitations and improve voice recognition software overall.

We’re going to highlight some important startups that are working to improve voice recognition software.

By focusing on five of the main limitations faced in the industry – samples that don’t include diverse samples in terms of accents,  we’re going to highlight startups that are specifically working towards alleviating these limitations and using technology to improve voice recognition software.

 

1. Intron Health

 

Intron Health company logo.

 

Intron Health, a Nigerian-based clinician speech recognition startup in the healthcare sector, has identified significant limitations of the technology as a direct result of current software not being provided with diverse enough voice samples.

They aim to tackle an issue faced by many people and practitioners with various African accents in that their speech is often not accurately recognised by existing software. The reason for this is that these programmes are not provided with voice samples spoken in African accents, making it impossible, or very difficult, for the software to accurately pick up what’s being said.

Intron Health is aiming to combat this problem by means of its advanced algorithm which has been trained using more than 3.5 million different audio clips from about 18,000 different people. The contributors of the voice samples come from 29 different countries and include 288 various accents, creating a diverse sample pool representing more African accents than ever before.

Toby Olatunji, the CEO and founder of Intron Health, was inspired by his own experience in the medical industry. It became clear to him that his Nigerian accent was, essentially, excluding him from being able to use available voice recognition software that had the potential to not only make his life easier but also improve the care he could provide his patients.

Hence, Intron Health was born as an alternative to voice recognition software that excludes users with imperfect speech or African accents that aren’t included in voice samples. The software is able to recognise and understand African accents specifically, and it can also be integrated into existing Electronic Medical Records (EMR).

 

 

2. Deepgram

 

 

Founded by Scott Stephenson in the US in 2015, Deepgram is an advanced speech recognition company that allows speech to be converted to text, as well as text to be converted to speech.

By means of AI technology, Deepgram provides fast and accurate voice recognition services that can be employed by various platforms and businesses, from healthcare professionals to leaders in the entertainment industry.

Deepgram’s advanced AI technology allows the software to be able to go beyond merely translating text to speech and vice versa. It actually has language understanding capabilities, allowing for content analysis too.

 

3. Soapbox Labs

 

Soap Box company logo

 

Soapbox Labs is geared towards improving the recognition of children’s speech in the world of education.

Bursting into the industry after being founded in the US in 2010, their technology makes use of advanced technology that can be used as a tool by teachers to improve children’s learning within busy classroom scenarios.

By developing AI software that has the ability to detect children’s voices and speech across a range of different accents and types of speech, Soapbox Labs is increasing inclusivity in the industry of voice recognition and creating a safe way for this technology to be used with children.

 

4. SoundHound AI Inc.

 

SoundHound AI Ltd. company logo.

 

By means of new-age AI technology, SoundHound AI Inc. develops speech recognition software with capabilities in natural language understanding, sound recognition and search technologies.

A US-based startup, SoundHound aims to integrate voice, visuals and touch to achieve a multi-modal interface powered by artificial intelligence. The software can be used in a broad range of industries, from restaurants to contact centres.

 

5. Hume

 

Hume company logo

 

Hume has endeavoured to conquer the ultimate challenge in AI – emotion. With their advanced technology, this startup boasts AI software that is able to respond emphatically in order to foster a sense of care for human wellbeing.

Their advanced software uses AI to be able to detect emotion, tone of voice, word emphasis and more to be able to optimise the current usage of voice recognition software. Businesses can make use of Hume’s software to improve processes in terms of time and accuracy.

 

Moving Forward in the World of Voice Recognition Software

 

By means of AI technology, the possibilities for voice recognition software are expansive, creating a world in which voices can be accurately detected and analysed for the sake of efficiency and time-saving in professional business settings, education, healthcare and more.