CORDIS - EU research results

Unbabel: Scalable, affordable and seamless content globalization using distributed crowd-post editing

Periodic Reporting for period 1 - Unbabel (Unbabel: Scalable, affordable and seamless content globalization using distributed crowd-post editing)

Reporting period: 2015-03-01 to 2015-06-30

Unbabel has developed an innovative translation pipeline that combines Statistical Machine Translation (MT) and distributed crowd post-editing. Unbabel’s approach has a strong emphasis on improving the current state-of-the-art technology to enable fast, scalable and cost-effective human quality translations.
The approach is based on two key insights:
1. A large percentage of mistakes in MT stem from small errors that are easy for humans to fix.
2. In order to produce consistent quality, we need to have multiple people working in sequence, each correcting the previous person’s work.

The use of post-editing and several computer aided translation tools embedded directly in the translation UI, significantly minimizes the human effort required to produce a quality translation, thus reducing the cost of translation tremendously. Furthermore, by using statistical MT engines, Unbabel generates post-edited data that is then used to update the MT engine, thus leading to higher quality MT output that is going to require less post-editing effort by the human editors, creating a virtuous cycle. Classical Post-editing has been shown to increase the translation speed by 20%-40% while increasing the translation quality. Adding parallelism to the translation process greatly increases its scalability, and to the best of our knowledge it is the first time this is done.

Unbabel’s business strategy is based on already existing cloud services and web application platforms, such as Zendesk, MailChimp or Wordpress. With this approach, Unbabel is able to address huge user communities at once, instead to trying to reach a huge amount of single users spread around the world individually. To this end, Unbabel provides an API that allows integrating Unbabel services with online applications already used by companies, making translation a transparent part of the process. This results in an increase in the ability especially for SMEs to compete in the international markets, effectively levelling the language playing field.

Two objectives have been identified as crucial for the next step before the start of the SME Instrument Phase 1 project:
1. To accelerate growth by integrating the Unbabel translation services with the most promising web/cloud platforms addressing users like SMEs that are the main target for our services and
2. To accelerate the growth and improve the quality of the editor community to improve quality and availability of our services. A critical issue in this context is the quality of user interfaces. To attract editors, the interfaces must be attractive, and to enhance efficiency of the post-editing process, the user interface itself must allow for efficient use.

Unbabel performed two studies to support the team in pursuing these goals. The first study is a market survey aimed to identify the most promising cloud/web services as channels for company growth. The second study is aimed at finding the best solution for post editing user interface. These studies were carried out in the course of the project and its results were integrated into a broader feasibility study evaluating the company’s business case and planning the next steps in business development.
Market Survey

The main objective of the Market Survey was to identify the most promising cloud services as main channels for Unbabels translation services. To this end, Unbabel analysed the market potential of each channel and the associated risk and effort to integrate the translation platform with this specific channel. Finally, Unbabel performed a trade-off, considering also strategic aspects, and decided on the channels to integrate first.

Before this study, the market potential, user segmentation and characteristics of the required translation process in order to meet demand of the various integrations were not yet well understood. Through an extensive market study for interesting integrations Unbabel has learned much about the market it intends to enter. By first listing possible integrations by their strategic and financial potential we came up with a short list of 50 potential integrations in the areas of E-commerce, CMS, Sales/CRM, Social Media, Marketing, Customer support, Websites, E-mail, Self Publishing, Collaboration Software, File sharing, Sharing Economy and Subtitling. After a first filtering based on data availability and market importance, 30 services were considered for further analysis. For these 30 potential integrations we conducted market analyses to look into the attractiveness and dynamics of the respective integration. We considered especially user data, international users, calculated approximate expectations of words to translate volumes, and how many of those customers we would be able to acquire, to get an idea about the size of the market we can conquer.
Our qualitative assessment of the 30 integrations was mostly focused on discovering the product’s growth potential, ease of finding customers, type of customers, pricing structure, availability of recurring dynamic content, presence of competing offers and worldwide presence. We categorized the integrations on these aspects. After an elimination process, a list of 13 potential integrations remained. These were prioritized on prior customer requests, enterprise tools, direct competitors that are already clients of Unbabel (e.g. Zendesk, MailChimp) and presence of competition. If any competing translation service was available, the integration was deprioritized in a ranking that was initially based on amount of revenue per month and the other prioritization topics (e.g prior customer requests) with the weight determined by strength of competition according to Unbabel. After ranking these on the total revenue and average revenue per users, we discussed the strategic intention of each one of those. We also took into account the ease of building the integration for each of them. As a preliminary result, we selected Google Drive and Facebook. These are the two integrations which scored very well overall and we believe make the most sense to develop.

The Unbabel integration for Google Drive will enable enterprise users to easily request the translation of whole files from within their workspace, thus overcoming language barriers. This tool will enable international managers to easily translate marketing material, manuals, customer facing presentations, etc. Unbabel for Google Drive will be an enterprise grade solution, that will allow the operations team to manage the budget for translation used by the employees. The tool will also be used to translate documents used internally in multinational companies, reducing internal communication difficulties caused by language. Finally, some companies will use it to translate content into other languages to reach new clients with their products. Unbabel already received several requests from customers to create a cloud storage tool/add-on that would work just as any other Google Drive add-on. Some current customers already upload files into a specific “Unbabel translation” folder on a cloud storage service, which they share with Unbabel so that these are quickly and without much additional effort translated by Unbabel and placed back in the folder. This led us to believe that an add-on in Google Drive would have considerable value for our customers. For Google Drive, the platform for which the add-on will be built, the integration can only be beneficial as it could drive consumers to their platform as a result of the useful add-on available.

The second integration is directed towards the global social networking service Facebook. Increasingly, companies use Facebook to target and communicate with consumers, by creating Facebook pages around their brands. The most global and international brands with large marketing teams maintain multiple such pages, each in one language.
A user would find a localization page under the settings of their Facebook account. There the user has the option to request professional translation of selected posts from Unbabel. In order to do so, the user has to create an Unbabel account. If the user already has an Unbabel account, it is just a sign-in. After that, whenever the user goes to the localization page under Facebook settings, they could opt to translate all or several posts into a requested language. When a post is translated, the user has the option on the localization page to post the translated message directly or select several to be posted manually. Next to executing orders, users can see their current Unbabel account balance and can see the progress of their translation requests.

UX Study
Unbabel’s approach has a strong emphasis on improving the current state-of-the-art machine translation technology to enable fast, scalable and cost-effective human quality translations. Most efforts on computer aided translation (CAT) tools have focused mostly on web interfaces, and as far as we know there are no good solutions for post-editing on cell phones where the dimension of the device is a major restriction. It’s a strong conviction that to achieve the scale and speed, at which Unbabel aims, editors have to be able to work on their cell phone. This will dramatically increase the scale of editors that are available at any given time to perform a task. Moreover, notifying editors becomes much easier once we move to the mobile world. With this in mind Unbabel has released a mobile app to editors. Something that became clear from the start was that text editing on a mobile device is a cumbersome and error-prone task.
It is Unbabel’s goal to innovate mobile translation experience by bringing together interactive MT, namely translation options selection, coupled with an efficient mobile interface. This will change the paradigm from text editing to option selection making it much easier for an editor to correct a given translation. The goal is to make translation in mobile devices (smartphones, tablets and laptops) a viable option.
The focus of the UX study was to find an efficient mobile user interface that could at the same time increase the speed and quality of the produced translations in this context.
The first step of this study was to develop a set of wireframes with different interaction patterns. These wireframes were a joint effort between Unbabel’s design team and Natural Language Processing team. The wireframes were showcased to a panel and had their major features debated. Technical, UI and UX aspects were taken into account to decide the best implementations to go forward. Three version were selected after this stage, and 5 users from our editor community and asked them to come to our office. Each of these users was asked to perform 10 tasks with each of the interfaces. After that, users were asked to rate their experience from 1 to 5 regarding different criteria following the System Usability Scale. Finally the editors were interviewed and asked for their overall experience on the app - what was unexpected, what got them stuck, what did they thought when confronted with certain tasks.
We implemented the changes suggested by the results of this round and invited 10 IOS users from our community to install our Beta APP and to do 10 tasks on each interface. We then asked them to fill a questionnaire about their experience.
The first and most important conclusion from this study is that indeed an interface combining option selection and an efficient editing mode does indeed improve both speed and quality of the resulting translations. This clearly shows that the UX study was a success and that the suggested novel mobile interfaces are the way to go for Unbabel. Overall the users were very satisfied with all interfaces, they agree that the learning curve is flat that the system is easy to use and that they would use the system frequently. The study provides insightful information. Users are in fact able to produce higher quality and faster translations on a mobile devices provided they are given the right solution and a set of meaningful options to choose from.
Both activities had very clear results that point us to next steps to accelerate the company's growth. During the first activity, two channels, Google Drive and Facebook, were identified that should be integrated with the Unbabel API. The second activity resulted in the selection of two main interfaces to be distributed and disseminated in the editor community with promising features improving translation speed and quality. Unbabel will now start the integration with Google Drive and Facebook and focus on these channels in customer acquisition. Unbabel will complete the implementations of interfaces and start to promote their use in the editor community.
Besides the very concrete results, the studies also point to other potential for improvement in and to areas that deserve further investigation. These results are:
· Unbabel should focus more on client relationship management proposing holistic solutions to customers that, by now, use Unbabel services only for a subset of their own services. With this approach it is much easier and cost-efficient to tailor solutions to the specific needs of customers improving customer satisfaction and making services even more affordable to end users.
· Concerning the integration of new channels, Unbabel should rely more on the user community to extend the technical platform through which the translation services are available. There is a huge number of platforms available and Unbabel will not be able to address them all in the near future. This has already influenced the decision to go for the platforms with the greatest user bases. Instead of providing integrations for smaller platforms in the future, Unbabel will rather support clients to develop their own integrations – proprietary or, even better, open source. The positive experience with building the editor community, is an inspiration to also foster a community for the Unbabel API. For this end, Unbabel must improve the API to make it more user-friendly and, even more importantly, must stimulate and support the growth of the community.
· In this context, a complete automation of the process is of growing importance.
· The complete pipeline includes machine quality assessment to decide if a result needs further editing or a human consistency check. A lot of attention is paid on those aspects in the research community. Unbabel will exploit and improve those results by adopting them in practice.
· As mentioned, there is still room for improvement on the mobile editor interfaces. Unbabel will study analytics data from the editor community to optimize the interfaces even further to get the maximum of speed and quality out of these interfaces.
· Even more important, however, is the integration of the MT engine with the editor perspective. Until now, these aspects were handled separately by the research community and by us. However, from the use editors make of available translation options, we can learn how to provide options that help editors to reach excellent human quality translation quickly, instead of providing a translation that is “the best” from a merely theoretical perspective.
· Finally, automatic error detection mechanisms should be implemented to hint editors to potential improvement of their work. One of the unexpected results of the UX study is that users often do not correct minor errors if they are not pointed at them. A simple indicator pointing to a potential error – similar to tools found in modern word processing tools like MS Word – therefore have the potential to significantly improve translation quality.
UX Study Interface D
UX Study Interface F
Ux StudY Interface C