Periodic Reporting for period 2 - capito (Making information understandable for everyone)
Reporting period: 2023-03-01 to 2024-01-31
capito was founded as a role model for combining social impact and entrepreneurship. We are specialized in language simplification and accessible information. In the past, we have delivered more than 7.000 projects for easy-to-read and/or plain language in the German speaking countries. We have built up 16 offices from Zürich to Berlin and educated thousands of people to create accessible easy to understand information.
The problem we tackle is severe: 53% of all adults in the world are struggling to understand public and private information because the language level of official information is just too high.
Hence, with this project, we deliver AI-based automated text simplification. For the first time, we can scale the production of understandable information and can scale to a very high number of users.
capito is globally data-superior in the field of text simplification, and therefore we can train large language models with the adequacy needed, to create actual barrier-free information.
In the past 22 years, we have created a catalog of 90 criteria, a strict Quality Standard, and a TÜV-certified process, which enabled us to simplify thousands of pages to easy to understand language levels A1 (very easy language), A2 (easy 2 read, industry standard), B1 (plain language) according to the Common European Framework of Reference for Languages (CEFR).
With this data as a first batch, we trained large language models, mBert Transformers respectively, to simplify any information. We created a large set of rule-based fall-backs because artificial intelligence can not always get it right.
To create real impact, we designed a very scalable, language-independent infrastructure base to deliver our services as Software as a Service, as well as on-premises solution.
A Rest-API and in-house created Apps such as browser-addons and a MS-Word integration are our delivery channels to editing professionales of any kind. They can choose from 2 use-case:
Assistance: In this use-case we can give real-time feedback to any active writer. The system detects language barriers according to capito criteria and in many cases delivers suggestions for easier understand word and phrases immediately. For names and technical terms, which cannot be simplified but must be explained, we have integrated a distinct service.
Simplifier: In this use-case we simplify any given language level to easy-to-understand language levels A1, A2 and B1. The solution can be combined with the assistance use-case to simplify existing text in 2 steps very accurately.
Both solutions can be iteratively fine-tuned for customer domains or domains of public interest. In this way, we can make information understandable on such important topics as finance, climate crisis and democracy.
Having established the sector for German-speaking countries and after being the first to master SaaS for text simplification with this EIC project, we are already in phase 2 and developing transferring English, French and Spanish. Therefore we have already created a language-independent infrastructure, gathered and created large training data sets for those languages and currently working on our English and French prototypes. Spanish follows suit.
We set up a Kypernetes-based, independent, auto-scalable infrastructure that can also be deployed on-premises of public institutions and private companies. A database for language lemma including all sorts of understandability, semantic, syntactic and morphological information helps us to back up any deep learning based solution.
For data creation, we transferred our automated labeling solution to all project languages. And we are constantly increasing our rule-based fallbacks, respectively, computer-linguistic checks in English, French and Spanish, so that we can find any mistake our large language models may have made and that we can label third-party data as well as our output.
For our large language models, we started to create sophisticated training strategies to optimize their ability to simplify complex information. The next step will be training of these transformers in the languages English, French and Spanish.
We created the capito bot, a browser add-on solution that helps to simplify any information. It integrated 4 web browsers: Firefox, Edge, Chrome, Safari and various editors such as Textfields, TinyMCE, CKEditor, Prosemirror. Hence capito can deliver text simplification now in numerous browser-based CMS systems like WordPress, Magnolia, Typo3, Drupal, Joomla, WooCommerce and large social media platforms such as LinkedIn or Facebook. Currently, we are working on the integration into MS Word.
Public agencies, banks, ministries, public and private datacenters, museums, large-scale industry companies, insurances, and several media outlets are either already research partners or on their way to become pilot customers. Several successful public presentations of the project helped creating this early traction.