1. Skip to content
  2. Skip to main menu
  3. Skip to more DW sites

Speech synthesized dialect

Kerry Skyring, ViennaOctober 2, 2014

Computers, smartphones, satnavs, lifts… they all speak to us. And sometimes we speak to them. But how well are we communicating? Austrian scientists are finding a niche for dialect in speech synthesis.

https://p.dw.com/p/1DOpl
Speech Synthesis Research in Austria
Image: The Austrian Academy of Telecommunications

Most synthesized speech is based on a standard or commonly spoken form of the language.

Now scientists in Vienna have developed a speech synthesiser which delivers dialects along with an on-screen Avatar, which backs up comprehension by providing lip and facial movements.

Several years ago they synthesised the local Viennese dialect, known as Wienerisch, and now they are setting out to capture a range of regional and local dialects and incorporate them into a new generation synthesiser.

Known as Adaptive Audio-Visual Dialect Speech Synthesis, the system transforms text into acoustic and visual signals.

The research and development is being conducted at Vienna's Telecommunications Research Center and the Austrian Acoustics Research Institute.

"First of all we have to record authentic dialect speakers, speakers who have grown up, who have been raised in this dialect and from that we just look into the phonetics, into the phonology of the dialect," Dr Sylvia Moosmüller from the Acoustic Research Institute told DW as she described her recording forays into Austria's dialect speaking provinces.

"A phoneme inventory usually has a set of 40 phonemes, approximately, but a phonetic set is much larger because we have to capture specific phonetic items, for example diphthongs, things like that," she explains.

Phoneme is the term linguists use to describe distinctive sounds in a language.

Speech Synthesis Research in Austria
Acoustic researcher Sylvia Moosmüller records an Austrian dialectImage: The Austrian Academy of Science/Konstantin Ulitsch

After collecting a set of about 700 phonetically balanced sentences, based on analysis of the dialect, the researchers then have the raw material for synthesising that dialect.

Dialect or accent - what's the difference?

"Most of the interest so far has been in developing the system to understand and produce good quality accents," says UK based Professor David Crystal, an author of numerous books on dialects and accents, including the Cambridge Encyclopaedia of the English language.

He adds that if you can't understand the pronunciation "it's not much good worrying about the dialect at all."

Professor Crystal, who's not involved in the Austrian research, says synthesising dialects is an obvious next step in the development of artificial speech.

But what separates accent from dialect?

"A dialect is the use of a distinctive vocabulary and grammar that relates to the identity of a particular group," the language expert explains, adding that an accent is the particular mode of pronunciation which that group uses.

A word in your ear

If the quality of a speech synthesiser is to be judged by its similarity to the natural human voice then a system with the ability to speak a variety of dialects as well as accents is what the Austrian researchers felt was needed.

"And now we can transform this speaker into an east Tyrol dialect, to a different dialect, without having any data for this speaker in this dialect," said Michael Pucher, senior researcher at the Telecommunications Institute, as he demonstrates the dialect speech synthesiser.

Pucher and his colleagues have developed a model that is able to generate signal trajectories from recorded signals and use that same model for the visual data.

"So the visual data for us are several points in the lower part of the face - and then we record these points and then we re-synthesise in basically the same way as we synthesise the speech," he says.

As the recording plays, the speech changes from standard German through several stages until it is speaking pure dialect. At the same time the animated face on the screen, the Avatar, moves the lower half of its face to simulate the speech.

Synthesised Dialects - who needs them?

For the visually impaired the ability to put text into a computer and have it read back to them, in their own language, dialect and accent, is a big improvement over current speech synthesisers.

The Austrian Institute for the Blind is already using the system along with Austria Post and the City of Vienna's website.

There is interest from Scotland in this technology, says Michael Pucher.

And for the speech impaired, think of physicist and author Stephen Hawking, who for decades has used a speech synthesiser to communicate.

Stephen Hawking
British physicist Stephen Hawking is one of the best known users of speech synthesisImage: AP

But why is this research important when most people who speak a regional or country specific dialect of a language also understand and speak the standard version of that language?

Dr Moosmüller says "think of the problems of learning a foreign language and then, when actually visiting the country, still not being able to understand the locals."

She believes computer based language learning will become more effective by helping students "get acquainted with local dialects and get acquainted with the phonological processes involved with that and so really get the habit of understanding."

The Audio-Visual Dialect Speech synthesisers can be applied to any language with dialect variants, and that's most languages, as long as the basic raw data is available.

However Professor David Crystal speculates that the drive to synthesise German language dialects is linked to their cultural significance.

"Nobody in one part of the German-speaking world wishes to be associated, in terms of identity, with another part of the German-speaking world," he says, while acknowledging that the dialects, whether Swiss, Austrian or German, are to a large extent mutually intelligible.

"We are talking identity now, not intelligibility," he says, and suggests there's a strong driving force behind the need to capture these identity differences and use them in synthesized speech.

"Dialect means political power and economic power and the identity that goes with that," Professor Crystal concludes.