Periodic Reporting for period 2 - BITEXT (Building conversational chatbots faster using NLP and machine learning)
Reporting period: 2019-02-01 to 2019-10-31
- Customer care improvement
- Purchase process simplification
- Personalized service
- Resource saving
- User experience improvements
- Improved customer intelligence
Formal description of the developments to be carried out in the project. Main functional and non-functional elements that will be part of the final system have been described at a high level in the System Specification deliverable.
The high-level requirements from the System Specification have been detailed at a lower level in the System Design, where specific decisions have been made.
2.Lexical resources
Lexical-morphological dictionaries for every language developed (English, French, Spanish, Italian, German, Dutch, Portuguese, Swedish, Danish).
3.Syntactic resources
Analysis Grammars for English, French, Spanish, Italian, German, Dutch, Portuguese, Swedish, Danish completed.
Generation grammars for the same languages completed too.
The software for analyzing according to the syntactic resources is in its final version. Testing and fine-tuning for all grammars has been completed.
4.Semantic resources
Development of an ontology for every vertical (Home, Media, E-commerce) in English and Spanish. The ontologies define relevant sets of words from the point of view of their meaning/role into the specific use case of a vertical. The ontologies for the rest of languages in the Home vertical have also been completed.
Frame definition files for the same vertical and language combinations have been finished. Frame definition files formalize the potential meaning of the sentences.
The software for NLU is in its final version. Testing and fine-tuning of the semantic resources and frame files have been completed.
5.Integrations with bots
Endpoints have been developed in the BITEXT API to provide the NLU analysis needed by chatbots. Agents in Dialogflow and Rasa have been deployed. Final versions are available.
6. Testing
Annotated training and testing corpora have been created. NLU agents for the different combinations of vertical and language have been tested in both platforms and in 3 versions: Standard version, Version using Query Simplification, Version using Variants Generation. Tests have been performed in two phases, preliminary and final. Between the preliminary and the final version the whole system has been refined and fine-tuned. Test results have been gathered. Conclusions are that the use of Query Simplification or Variants Generation improves the results obtained with the Standard version. The use of Variants Generation reduces the effort required to produce training and testing data sets.
7. Communication, IPR & Commercialization
Attendance to various chatbot related fairs and conferences and publications in social networks increased Bitext presence in the market. Renewed Bitext web site focused on STD & NLG. For IPR, Copyright and Trade Secret measures already taken are to be continued. The activities carried out throughout the project led us to a more detailed definition of the initial product. We have defined a business innovation plan -objectives, targets and strategy- with an estimation of cost and revenues. The commercial activities for the next months will be accompanied by a communication plan. The project has also led to see a clear opportunity for conversational businesses in Europe, like e-commerce or customer support. Scarcity of training and testing data is blocking the development of assistant’s technologies. If all languages in Europe are going to be spoken by chatbots, a different technology paradigm is needed, and we think artificial data is the answer. If AI doesn’t speak European languages, that will exclude many European citizens from technical progress (namely, all those who don’t speak English with a “good enough” accent).
There is currently no automated way of generating such data; instead, it must be generated manually, usually by crowdsourcing examples of typical user utterances, and then manually tagging them with the appropriate intent, entities and slots. By contrast, what Bitext has achieved is a process that allows the automatic generation of the necessary tagged training data for the bot, without having to do so manually. The system has been designed as an economically competitive solution, leading faster training (months to weeks), with higher accuracies (70% to 90%).
In our view, the challenge is so critical and the solution so feasible after this project that we want to encourage EU authorities to promote the creation of a think tank to work around the idea of a “European Alexa”, so all European citizens and languages are safely included in this technology trend, so we can shop or get self-service tools in our own languages.