A team of researchers working with The Washington Post found that Alexa comprehends certain non-American accents and dialects poorly in the US, and that there was a pattern to this weakness.
The team had more than 100 people from nearly 20 US cities dictate thousands of voice commands to Alexa. From the exercise, it found that Amazon’s Alexa-based voice-activated speaker was 30 percent less likely to comprehend commands issued by people with non-American accents.
The Washington Post also reported that people with Spanish as their first language were understood 6 percent less often than people who grew up around California or Washington and spoke English as a first language.
Amazon officials also admitted to The Washington Post that grasping non-American accents poses a major challenge both in keeping current Amazon Echo users satisfied, and expanding sales of their devices worldwide.
Rachael Tatman, a Kaggle data scientist with expertise in speech recognition, told The Washington Post that this was evidence of bias in the training provided to voice recognition systems.
“These systems are going to work best for white, highly educated, upper-middle-class Americans, probably from the West Coast, because that’s the group that’s had access to the technology from the very beginning,” she said.
Having said that, the problem is not new, and Amazon is well aware of the efforts it should take to train Alexa for regional accents and dialects.
Last year, Factor Daily reported how the assistant was specifically trained for the Indian market. To train Alexa for India, Amazon began with a finite set of words called the training data. Once the assistant learned these words, they trained it with an infinite amount of test data which is from a mix of human interactions, and sentences and phrases from the internet.
Despite these efforts, Alexa still had difficulties comprehending street names and providing traffic predictions, reported Shonali Muthalaly of The Hindu.
Training a device for voice recognition requires a ton of recorded speech, and corresponding literal transcriptions. Basically, a speech-recognition system trains by matching one to the other. And when in a new session, it tries to guess what a new stream of words could be. But the system’s ability to comprehend diverse accents will only get better as bigger teams with better cultural and linguistic diversity start training the devices.
Let’s block ads! (Why?)