Mannheimer Morgen, a high-circulation newspaper in the metropolitan Rhine-Neckar region, published an article featuring Aristech’s digital voices on 05.03.14.
The article, featured in the IT section of the newspaper, expresses admiration for the quality and scope of the digital voice that has been developed by the Heidelberg based company and is used for the traffic hotline of the biggest German public radio station.
The whole article reads as follows
The human computer voice
If you’re calling the SWR traffic hotline, you’re talking to Alex. He’s not an employee but a piece of software, developed by Heidelberg based company Aristech.
Written by LARA FEDER
His range of vocabulary is ever increasing. By now, Alex knows most towns in Southern Germany, but also many from other parts of the country, effortlessly pronouncing names such as Müllerwik, Itzehoe or Zurow. The domain of Alex has recently been increased. He doesn’t only work for the SWR hotline, which is based in the south of the country, but also for the NDR (northern German broadcasting company).
The voice of Alex was born in an office in Rohrbach, a district in Heidelberg. Here, Alexander Edler, namesake of the digital voice, has recorded more than 3,000 sentences. Words from these sentences can be extracted and recombined to create new sentences. “The more sentences are available, the better and the more accurate it becomes,” says Michael Mende, founder and CEO of Aristech.
Alex isn’t limited to the traffic jargon. Recently Mr Mende has taught him to properly pronounce grape varieties. Even human speakers sometimes struggle with names such as “Chardonnay”, “Merlot” or “Bordeaux”, but for software that relies solely on arithmetic instructions and for which every deviation from the norm has to be specified, the struggle is so much bigger. With a click of the mouse, Mr Mende increases the length of the final syllable of Bordeau. “Bordooo,” Alex voice can be heard protracting the final O. Mr Mende isn’t quite happy. Another click of the mouse and finally, Alex produces a well pronounced “Bordo”. “He’s sounds human doesn’t he?” asks Mr Mende. “Many callers don’t even realise that they’re talking to a computer. It’s the little things that make a difference.” If there’s a wrong-way driver, Alex speaks faster. “In these cases, he sounds scared,” says Mendes’ daughter Carolin, who has been working as Aristech’s project manager since 2010. According to her, the German market for computational linguistics (that’s the science behind voices such as Alex) is really exciting.
Other countries have recognised the potential for speech software early on. In Germany, however, users were mainly restricted to people with special needs, such as the visually impaired. Carolin Mende says that the reluctance is based on prejudices against machines and computers as well as a low tolerance of errors. For speech software, German is one of the most difficult languages. Conjugations and declinations lead to a very high number of variations for a single word. Where English often assembles single words, German tends to form compounds. “Way of living” thus becomes „Lebensart“ in German. But German speech technology is picking up the pace. “Steve Jobs played a key role with the development of Siri,” says Carolin.
Individual voices have long been a part of a company’s corporate identity. To increase recognition value, the same voice should be heard throughout the company. This is hardly feasible without speech software and just a single speaker.
This kind of technology can also help at work. Doctors can, for example, enter patient data such as pulse, blood pressure or body temperature using only their voice. An app then extracts all relevant information from the spoken text and adds it to a database. Text-to-speech and speech recognition can also increasingly be found in cars: Alex can read text messages and the driver can dictate his answer, taking away the danger from texting while driving.
By now, Alex only seldom has problems with his pronunciation. The main problem is the people who use the service. “Some people, who would normally speak with a strong dialect, try to use a standard dialect for man machine communication. Unfortunately, this is even more incomprehensible. It would be easier if they just talked the way they always do,” says Michael Mende, laughing.
See the original article here
Michael Mende read German studies at university and initially wanted to be a writer – but then he found his passion in computational linguistics. For ten years, his corporate group has been providing software for speech recognition and text-to-speech. Aristech was founded two years ago and employs ten people; mainly computational linguists but also physicists and classical linguists. Customers include German broadcasting companies SWR, WDR and NDR, who use Alex for their traffic hotlines but also the big software company SAP (uses speech technology for their medical and retail apps) and Unitymedia (customer hotline).