Speech recognition technology, specially designed for highly variable speech (such as spoken by persons with dysarthria)
OLP Speech Recognizer (GRIFOS) is a speaker-dependent, small-vocabulary, automatic speech recognition system. Training data can be gathered during sessions, supplementing existing databases of material from similar client cases. In early sessions, GRIFOS serves to set appropriate thresholds to control acceptability of a client's speech productions and to analyse quantitatively these productions in syllable and word context during speech therapy. In later stages, it primarily serves to evaluate client productions in continuous speech, with training material taken during the sessions. For clients with severe articulation problems (e.g., dysarthria) unlikely to be fully resolved by therapy, GRIFOS will help stabilise production by providing feedback to increase production consistency rather than intelligibility per se. These aspects will jointly contribute to increase (or even provide) adequate communication abilities and independence for the users of the system. Grifos is a key feature of OLP system providing an innovative therapy tool not included in other commercial systems. Speech recognition will attract a number of potential users interested in automatic, real time feedback (from patients to therapists).
This software, known as STAPTk, provides real-time visual feedback of speech articulation by kinematic map. A window is set up with target areas corresponding to paricular sounds, e.g. 'ah, 's'... As the user speaks a sprite moves around the window: the closer the acoustics is to a target sound the closer the sprite will move to that target. The underlying technology is neural nets trained to map from acoustics to x,y position. The maps may be trained for any data, in particular that recorded by a single speaker. STAPTk is used by speech therapists to train clients to improve their articulation. In the OLP project it is linked to the OLPy software, which provides other tools and facilities for therapists. STAPTk is of potential use outside the project for a range of applications which involve practicing speech, e.g. for second language learning. STAPTk is the only package we know of providing this kind of software. STAPTk is at the prototype stage.
The Lemon and Lime Library provides a wealth of material for use with clients with articulation disorders including an articulation screening test, words systematically selected according to linguistic and phonetic criteria and fun sentences and tongue twisters. The database was initially produced for use with the OLP program but has been exploited as a useful resource in its own right. The large amount of material can be stored and accessed quickly from a CD database by filling in the required phonetic criteria on a colourful and simple screen. This is quicker and easier than looking words up in a book and photocopying them for speech practice. The same screen allows you to design worksheets with words, pictures and instructions for each individual enabling practice material to be patient specific and look professionally produced at the same time. This versatile resource will be invaluable to speech and language therapists, teachers and students beginning to get to grips with phonetics! A contract has been signed with Speechmark Ltd, a commercial company who produce resources for speech and language therapists and teachers. The Lemon and Lime Library is therefore already in press and will be commercially available in the 2006 Speechmark catalogue.
Library of Greek and Swedish phonemes, words and sentences to be used for speech therapy. The aim of speech therapy is to teach new speech patterns, increase the intelligibility of speech, and establish automaticity and to transfer skills to untrained situations. The OLP word-list (library) was needed to adequately sample the full range of expected expressions and to allow for automatic evaluation support and therapy design for the client groups. When constructing the library two aspects were considered. The material had to be quite large to represent the language as much as possible. On the other hand the spoken material should not be too long especially in the case of clients with speech difficulties. Therefore the use of the same word for different phonemes and for different positions was used in order to reduce the number of words. The Greek and Swedish libraries were constructed in such a way to fully cover all possible deviations made. This means that the libraries are language-dependent, systematically constructed and based on the phonetics and phonology of the Greek and Swedish language respectively. Selection criteria for the construction of the Greek and Swedish libraries were that all vowels should occur in mono- and polysyllabic words and that all consonants should occur in all possible positions (initial, medial, final), followed by both rounded and unrounded vowels (if applicable) At least three examples of each condition were included. Content of the Greek library to be used for speech therapy: - Isolated speech sounds (26 consonants, 5 vowels) - CV, CCV and VC, VCC syllables (128) - 2-syllable words with the target phoneme in initial position (129) - 3-syllable words with the target phoneme in initial position (128) - 2-syllable words with the target phoneme in medial position (142) - 3-syllable words with the target phoneme in medial position (119) - 2-syllable words with the target phoneme in final position (35) - 3-syllable words with the target phoneme in final position (35) - Clusters in initial position (92) - Phrases (91) - Sentences (50) Content of the Swedish library to be used for speech therapy: - Isolated speech sounds (18 consonants, 22 long and short vowels) - CV, CCV and VC, VCC syllables (339) - Rate drills repeating the same consonant (18) - Rate drills repeating alternating consonant (12) - Monosyllabic words and clusters in initial and final position (947) - Polysyllabic words: 2 syllables and 3-4 syllables (247, 257) - Phrases and short sentences (109)
Remote data exchange and synchronisation utility integrated with the local OLP data management system & database
Using the remote exchange component, the therapists can monitor client practice through the Internet, using an advanced system to assign exercises and examine the results. During regular sessions, the therapist can focus on the initial acquisition of correct articulation, and then assign additional intensive practice as homework. Thus the number of clients served by one therapist can be increased without sacrificing control or effectiveness of therapy. As regards the marketing strategy, the product will be offered in combination with training and support paid services. The pricing scheme will be based on user licenses, with lower customer cost when more licenses are ordered. The OLP access-to-service provision points will be located at the four countries of the members of the consortium (Denmark, France, U.K., and Greece) and the European market will be segmented into regions for better coverage, though the main aim is to support business partnerships and synergies that go beyond the conventional sales “geopolitics”. After the first year of OLP business operations, the product will also be available to local IT specialists and consultancies who will sell and install the software at their customers as part of a business partnership scheme that will be promoted. Heavy promotional tactics are also planned to be implemented.
Complete OLP speech therapy support system (software) including interfaces, data management, administration, speech recognition technology, phonetic mapping technology
The main OLP result is a software system for speech therapy support in cases of articulatory disorders. Using this software, there are three ways in which the therapist can bring computers to help speech therapy: - OLP includes ¿phonetic maps¿ on which target speech sounds are displayed. Clients¿ speech productions are plotted on the phonetic map in real time, along with the targets, for direct comparison. Thus, the display shows the degree of articulatory success and contributes to the gradual attainment of correct articulation. Maps can be constructed and customized to serve the needs of each particular client. - OLP automatically recognises spoken words and evaluates each production in comparison to a target word set by the therapist. The classification into ¿right¿ and ¿wrong¿ leads to a gradually increasing success rate. The target word can be redefined to facilitate generalisation and automatisation of speech production. - OLP helps control phonation by visually displaying the loudness and pitch of the client¿s voice. Exercises of increasing difficulty lead gradually to voice control. OLP technologies are supplemented by a filing and scheduling system. The speech therapist keeps a client record and defines an exercise schedule for each client. The speech therapist selects and designs the practice exercises that best suit the individual needs of both therapist and client. Each exercise contains the necessary steps of data collection, configuration, and repeated practice. OLP if fully flexible and adjustable, and the library of predefined exercises that comes with the software is automatically extended with the therapist¿s modifications and new exercises. Children practise their speech using graphical screens, as in video games. They receive feedback on their performance through amusing animations. For adults, exercises without graphics are provided, in which progress is indicated numerically. The speech therapist monitors each client¿s progress by studying the graphical results of training. In this way, weaknesses are easily identified and use of the technologies can be modified to serve the therapy objectives more effectively and efficiently. In OLP, the therapist can monitor client practice through the Internet, using an advanced system to assign exercises and examine the results. During regular sessions, the therapist can focus on the initial acquisition of correct articulation, and then assign additional intensive practice as homework. Thus the number of clients served by one therapist can be increased without sacrificing control or effectiveness of therapy.
It is usual for speech recordings to contain the occasional short period of silence/background noise or non-speech sound. If such recordings are to be used for training automatic speech recognition (ASR) systems such as OPTACIA phonetic maps, it is necessary to exclude these unwanted non-speech acoustic artefacts by identifying precisely where they occur within the recording. The technique for locating the start and end-points of speech and/or non-speech sounds is known as endpointing or segmenting. It is also important to associate the target speech sounds with items in the ASR application’s vocabulary. This process, known as labelling, is often combined with endpointing. Typically the speech sound’s start and end points are specified (in some unit of time) along with the identifying symbol, e.g. 0.20 0.50 a (indicating that the speech sound -- the vowel a -- started 0.2 seconds into the signal and ended 0.3 seconds later, i.e. at 0.5 seconds). Manual endpointing of speech data can be a tedious process. Assuming that speech sounds will contain more energy than background noise if the speaker is near to the microphone (e.g. if the microphone is head-mounted), the endpointing algorithm computes the average background noise energy level at the start of the recording before any speech is encountered. This energy threshold value is then used to distinguish between speech and non-speech. The application also supports segment separation and separate labelling of single consonant vowel (CV) clusters. This technology, which has been tested extensively and found to be robust under normal conditions, is currently implemented as a part of the STAPTK software but the source code can be easily modified to make it a stand-alone application which would be highly useful for endpointing and labelling of speech signals featuring isolated words and sub-word units (phones).