Skip to main content
European Commission logo print header

Data Science for Tapoi

Periodic Reporting for period 1 - DataSci4Tapoi (Data Science for Tapoi)

Período documentado: 2017-09-01 hasta 2018-08-31

In today’s economic world, a major source of competitive advantage for businesses is given by their capability of knowing their customers - i.e. understanding their tastes and interests - in order to provide them with products and services that better match their needs.
Such a knowledge would be of course important for many B2C companies, since it would allow them not to miss any sale opportunity and to improve customer fidelization, but it would also be beneficial for customers given that their needs would be better satisfied.
At present, companies typically try to get to know their customers by monitoring their purchase history and then they recommend products/services by comparing such history with that of customers with similar tastes (recommender systems). However, besides frequently providing firms with vague customer profiles, this approach is often limited to those market segments with frequent online interactions between customers and firms, e.g. e-commerce stores. As such, it can be hardly applied to other sectors - like banking and insurance - where said interactions (and the resulting data) are quite scarce or even absent.
The objective of the project “DataSci4Tapoi” is to enhance and improve a software - called “Tapoi” - that, on the one hand, is able to provide firms with a more in-depth profile of each customer and, on the other, to extend such profiling activity also to some market segments (most notably, financial services) that, so far, have not been able to know their customers’ needs in detail.
The main idea is using activities on social media to understand a) what topics (for instance, music, sport, art, etc.) interest a customer the most and b) to detect the occurrence of so-called “Life Events”, i.e. those events - like a birth, a wedding, etc. - that usually trigger significant changes in purchase patterns.
A key enabling factor for Tapoi - that makes it also compliant with GDPR- is the social login, that is the practice of subscribing to an online service by making use of an account on a social network (Facebook, Twitter but also Google). In this way, each user is explicitly asked beforehand for her consent to grant access to her social activities. The latter will be the source of data from which Tapoi will build a profile/detect the occurrence of a Life Event.
Concerning the first aspect of the project, i.e. the building of detailed user profiles, at the beginning of “DataSci4Tapoi” a model - with some dedicated Python libraries - had actually already been implemented. However, it suffered from a few drawbacks:
- the definition of the categories (music, sport, arts, etc.), instrumental for the creation of the user profile, was complex and feasible only for a limited number of cases (basically corresponding to the typical section of a newspaper, like politics and economics, for instance);
- the model was able to process only social activities in the Italian language.
- when applied to some benchmark accounts on social networks, the model performance were rather poor;

At the end of the DataSci4Tapoi project, the scenario has changed, in particular a new algorithm based on the Wikipedia on ontology has been developed. Such a model has allowed for the following improvements:
- any category can in principle be defined and the definition process is much easier and faster than it used to be;
- at present, the algorithm supports both Italian and English languages;
- performance on more than 150 benchmark accounts in Italian and English has substantially improved.

A mathematical description of the model and an analysis of its results on said accounts have been published on arXiv (arXiv:1804.02245) and presented at the international WWW Conference hosted in Lyon (France) from April 23 to April 27 (

Concerning the second aspect of Tapoi, that is, the detection of Life Events, when the project started no meaningful step in this direction had been taken, in particular no algorithm - even tentative - existed and no dataset had been collected for training and testing purposes. At present, such dataset exists and a preliminary model is currently being tested.
As far as the profiling activity is concerned, it is envisaged to add more structure to the algorithm by including subcategories (for instance, besides determining whether a user is keen on sport, assessing whether she is more into football or tennis or skiing and so on) and to further develop the Python libraries in order to handle more languages and to track changes of interests over time.

For what concerns the detection of Life Events, it is planned to keep on working on the algorithm (in fact, only a preliminary version of it exists right now) and to enlarge the current dataset in order to better assess the software performances and, consequently, to better correct/refine the algorithm.

Once completed the project, the resulting software will allow firms for a deeper understanding of the customers’ preferences and interests and, consequently, should yield better customer experiences and satisfaction.

Overall, the DataSci4Tapoi has considerably strengthened the market position of the beneficiary (U-Hopper srl), paving the way to the official market launch of the Tapoi service suite. In parallel, it also allowed the Innovation Associate (Dr. C. Torrero) to acquire new knowledge and reposition himself on the job market, strengthening his data scientist profile and allowing him to transition from the academic sector to the industry one.