Some readers would tell you that the best texts or books are those which can provide the most immersive experience. The MUSE project is taking this idea to a whole other level by developing a translation system capable of converting texts into 3D virtual worlds.
As exciting as they can be, the facts discussed in, let’s say, history books are not always easy to grasp especially for young children, and reading through hundreds of pages of heavy text can be wearisome to say the least. Of course the tremendous success of history-based video game franchises shows that the interest is there, but it also demonstrates that interactivity can sometimes be a much better way to convey stories in an appealing and memorable manner.
To this day, however, the path from book to video game is often long and expensive. But what if, in the near future, computers could understand a text and convert it automatically into characters, situations, actions and objects depicted in a 3D virtual world, turning passive readers into active participants in a story? This is the exciting promise made by the EU-funded MUSE (Machine Understanding for interactive StorytElling) project — which aims to bring texts to life by developing an innovative text-to-virtual-world translation system.
Over the past two-and-a-half years, the team has been evaluating the technology in two scenarios: children’s stories and patient education materials. The project website notably features a demonstration video of a conversion from a patient manual to a video game where the reader can walk around a hospital, get familiar with the admission process and better understand the treatments he will receive.
This is the first major step towards commercialisation of such a groundbreaking technology. MUSE could have a tremendous impact on sectors such as the video game industry which could take advantage of its natural language processing method to simplify their development processes, or schools which could use it to make their teaching programmes more effective and impactful.
Prof. Dr Marie-Francine Moens from KU Leuven, who coordinates the project, sheds light on the way the technology will operate, as well as the team’s plans for bringing it closer to market.
How did you come up with the MUSE concept?
I already had the idea for a number of years that people or students in a learning environment — when accessing information — should benefit from a more lively experience. Hence the idea of automatically turning text into actions and situations taking place in a virtual world. In such a world, the user could eventually become part of the story. For instance, instead of reading or studying a rather boring historical text, the student could become one of the actors in a scene in which Napoleon was signing a treaty. Such an environment would stimulate the understanding of the text and the memorisation of its content. The MUSE project does not go that far, but it lays the foundations for such a technology.
How would the conversion work exactly?
The idea is to translate actions, actors and objects recognised in a text into visuals. We have developed advanced natural language processing components for the semantic processing of the texts. They include the recognition of semantic roles in sentences (i.e. ‘who’ ‘does what’ ‘where’, ‘when’ and ‘how’), spatial relations between objects (where an object or person is located) and chronology of events.
Here we follow standard linguistic semantic annotations, as they have been well studied in the past and provide annotated datasets for training our recognition algorithms. Because the recognitions are often uncertain and in many cases background information is needed to understand natural language utterances (which is left implicit in the text), we have developed a Bayesian network framework in order to find the most probable interpretation of a sentence in light of the evidence obtained from the text itself and from background knowledge.
Which core markets are you targeting with this technology?
We primarily target the game industry and publishers who provide e-learning tools. When creating virtual worlds, these industries currently rely heavily on hand-crafted knowledge. Automatically translating natural language utterances into instructions in a graphical world would deal with this bottleneck.
In the case of children’s stories, what are the benefits of your technology?
The MUSE tool can help as the child learns how to read. The visuals could be adapted to the reading level of the child. The tool can help children learn how to make inferences when reading and ultimately better understand a text. In addition it could be an assisting tool in text understanding, memorisation and crosslinking (for instance when studying a text about science or biology). Right now, we are evaluating the use of the visualisations with children.
The project will be ending soon. Are you happy with the results so far?
Overall, I am happy with the outcomes. I am especially pleased with the results of our natural language understanding research. We have managed to advance the state-of-the-art in this challenging field. Several publications are in the pipeline. Natural language understanding is very important for a huge amount of applications, but still needs fundamental research especially with regards to the automatic acquisition of perceptual background or world knowledge.
MUSE has given us valuable insights, which will translate into follow-up research. A partner who is knowledgeable about programming languages specifically designed to steer the visualizations in computer graphics could have strengthened the MUSE consortium, because the goal of MUSE is to map natural language into representation standards for the computer graphics world, but such expertise was hard to find.
When do you expect the MUSE technology to be commercialised? Have you been in touch with potential partners already?
The language technologies will be used in a Belgian spin-off named SmartSpoken, which is currently being set up. There are already talks with the Belgian gaming company Fishing Cactus.
Do you have any follow-up plans after the end of the project?
Yes, a very interesting novel field of research in natural language understanding looks at multimodal representation learning (based on neural network technology, vector models, probabilistic graphical models, etc.) in which textual and visual data help in acquiring and capturing background and world knowledge. This technology is beneficial for both natural language understanding and computer vision.
We have applied for several research projects on this topic at both national and European level, and we hope that they will be granted support.
For further information, please visit: