The future voice of speech-driven interfaces
One of the by-products of a market-driven digital economy over recent years has been the almost exclusive use of English as the means of communication between man and machine. To ensure the future multilingualism of man-machine communications, the IST programme-funded SpeeCon project has focused on building speech recognition databases. This will assist the development of speech-driven interfaces (SDIs) that can be activated by a wide-range of European and other languages. SpeeCon recognised that to successfully develop the market for SDIs two essential technical obstacles had to be removed. First, the words used to command interfaces have to be transferred to many languages because of linguistic diversity. Secondly, they have to work satisfactorily under acoustic conditions in which the consumer devices are used (with background noise, or with different types of microphone e.g. using a mobile phone in hands-free mode). This required a pool of expertise in speech processing to boost progress in the field, to create affordable user-friendly, multilingual interfaces for future consumer electronics devices, and to ensure a larger European share of this global market. With a large multinational consortium including Siemens (the project coordinator), Nokia and IBM, amongst many others, SpeeCon arose from a strong base. The collection of speech data is very costly there are around 600 speakers required to create each language database, says Herbert Tropf, project coordinator at Siemens They need to be recruited, recorded, and the resulting data then needs to be transcribed and validated. In the end 24 separate language databases were collected made possible due to the wide-range of SpeeCon partners. The languages included French, Spanish, Mandarin and Hebrew, a number of them with assorted formats e.g. Austrian and Swiss dialects of German. The important addition to previous work is the wide-range of data collected so within each database there will usually be anywhere from 4-6 dialects and a range of age-groups (e.g. 30 per cent of the data collected were from under 15 year olds), says Tropf. Each speaker would have to repeat several hundred words that were a mixture of application specific data (e.g. load, play), general phrases (like date and time) and other phonetically rich words. Everyones talking: market analysis,The project also focused on the market for voice-driven interfaces. The analysis looked at six market segments: mobile phones, information kiosks, audio/video devices, automotive devices, toys, and Personal Digital Assistants (PDAs). Although all segments have grown rapidly, SpeeCon identified the greatest growth in cars and mobile phones. The research also identified the need for speech recognition technology to be able to handle a variety of different environments, and also with a difference between sexes, dialects and age groups across the globe. Other results revealed that SDIs will be one of the future key features of the huge consumer electronics industry. They are considered easier to use especially for non-technical users. And, says Tropf: it is widely accepted, that SDIs will make everyday life safer in many ways: they offer car drivers the ability to operate radios and navigation devices without taking their hands off the steering wheel. One of the greatest challenges for the team was adaptation. This refers to the ability to use the recorded voice to command interfaces in environments with different acoustic qualities. So, an interface that recognised in-car speech would not typically recognise the speech associated with open or office environments. However, using the SpeeCon data the project team developed algorithms that enabled the raw speech data to be coupled with environmental data to enable use of data in new acoustic situations. This massively broadens the range of applications in which the speech data can be used. For example it launches researchers on the path to developing new algorithms to enable dynamic speech recognition no matter the acoustic conditions ideal for controlling mobile devices one of the areas where market analysis reveals enormous growth potential. Real-world applications,Three SpeeCon consortium partners developed demonstrator applications, operating with different languages and in different environmental conditions, to illustrate that the collected data could work in a real-life applications. Philips demonstrated a voice-driven CD player and mobile phone that can be used in a car or other environment. The user is able to operate the main functions of the CD-player and some functions of the telephone by voice. Sony, on the other hand, took a voice-driven toy the AIBO (an artificial intelligence pet dog) - and demonstrated command recognition in Spanish and in Polish using data from the SpeeCon-created language databases. And IBM has just announced the full-scale launch of its speech-driven in-car navigation system in partnership with Honda. SMEs have benefited from SpeeCon by being given commercial access to the SpeeCon speech databases via the ELRA (European Language Resource Association). This allows SMEs to play an active role in the market of speech driven interfaces for consumer applications stimulating the market with innovative ideas and products. SpeeCons work is a significant landmark for speech recognition research: it has collated a massive amount of data to enable SDIs for consumer devices across the EU, and has demonstrated the possibility of speech recognition across a wide range of acoustic environments. The SpeeCon team also envisage other spin-offs from the research: products that will enable speaker identification, and multilingual speech understanding and translation systems. Contact:,Herbert Tropf ,Siemens AG,CT IC 5 SP,Otto-Hahn-Ring 6,D 81739 Munich,Germany,Tel: +49-89-63644195,Email: herbert.tropf@siemens.com Source: Based on information from SpeeConPublished by the IST Results service which gives you online news and analysis on the emerging results from Information Society Technologies research. The service reports on prototype products and services ready for commercialisation as well as work in progress and interim results with significant potential for exploitation,
Kraje
Germany