SemLib to Market

Final Report Summary - STOM (SemLib to Market)

Executive Summary:
StoM started with the goal to bring to market the results of a previous EU funded project, SemLib. This business oriented vision was the main driver in StoM from the very beginning. Now that the project is over, we evaluate its planned outcomes, the PunditBrain annotation platform and the event management system EventPlace, mature enough to enter the market: from this viewpoint StoM has certainly met its initial intentions.

The project planning gave strong emphasis on the analysis of the Business Exploitation of the two products, which was conducted on the WP1 workpackage and extended for the whole duration of the project.

The SemLib project, whose legacy StoM is based on, produced two valuable software products, the Annotation System and the Semantic Recommender: while they were quite complete and advanced from a functional viewpoint, their implementation was basically at a prototypal level. Work was needed to make these systems scalable and robust, in order to become a valid infrastructure on which the StoM outcomes could be built. This work was done mainly on the WP2 workpackage, which released the so called “Core Components” for the project’s software products. Work on WP2 started at the very beginning of the project and extended for 18 months.

At month 12, when the software deliverables of WP2 were mature enough, work on the actual StoM outcomes started. Two workpackages, WP3 and WP4, were dedicated respectively to PunditBrain and EventPlace. The idea to carry on in parallel the final phase of WP2 and the initial periods of WP3 and WP4 was to put in practice the principles of the “Agile” software development methodologies, in contrast of a traditional “Waterfall” approach in which development advances in steps. Here we decided on the contrary to organize the work in frequent iterations, which allowed a better integration amongst the Core Components and the final products.

Work in WP3 and WP4 was organized by always keeping in mind the possible commercial exploitation of the two outcomes. The results of the Business Exploitation workpackage, e.g. the D1.1 (Platforms value proposition & functional requirements) and D1.2 (PunditBrain & EventPlace Business Plans) deliverables, represented a significant input in the work carried on here. At the same time, we wanted to build the products around their most valuable features, those that their final users could perceive as useful and appealing. A huge focus was therefore applied to the analysis of the user experience. The fact that the two products were released, albeit in demo, during StoM, allowed the teams to analyse their usage in real-life situations and to gather valuable feedbacks from these early adopters.

Dissemination, carried on throughout the whole project in the WP6 workpackage, was also extremely important in StoM. Besides more “traditional” activities, like building and maintaining the web site (http://www.stom-project.eu) and using social networks to promote the project’s activities, a lot of attention was dedicated to the presentation of the StoM’s products in conferences and at events. This was especially useful to involve early adopters which started to use the project outcomes in their various beta versions.

Finally, from the management viewpoint (workpackage WP5) StoM was also a smooth project to carry on. The collaboration amongst partners was frequent and effective. Only in the second part of the project a reorganization was needed, to redistribute activities on WP1 and WP6 from the company Techin to the partner Innova. Since the companies were already collaborating on these two workpackages, the change didn’t produce any negative effect, neither in the results nor in the schedules of the project.
Project Context and Objectives:
As anticipated above, the main driver of StoM was the analysis of the possible business exploitation scenarios for the two products. From this viewpoint the work in the project showed very encouraging signs: in particular, the business planning confirmed that both PunditBrain and EventPlace have a large potential market, a differential offering, significant potential profits and that a scalable business model can be designed for them.

Since the very diverse nature of the two products, the analysis was carried on in parallel, considering the possible markets for annotation systems, on the one hand, and for event management services on the other.

For the former, first of all it was decided to officially call the product Pundit and not PunditBrain, as it was originally planned. Pundit has always been, since the SemLib project, the name of the web annotation client: in the last years, thanks to the continuous marketing actions performed also during StoM, Pundit has acquired a limited but significant interest as an annotation system. It has also been used in both EU funded initiatives and commercial projects for research centres on Humanities, in Italy snd in Europe. All this contributed to the idea to valorize “Pundit” as a trademark, since it makes sense to capitalize on its brand rather than to launch a new name.

Pundit is a web tool that allows users to create semantic annotations on web pages and fragments of text, with an easy to use interface. Moreover, it proposes a comprehensive service to manage the whole process of “applying digital marginalia” on web documents. Users can also share annotations with friends and colleagues and reuse them to develop new and interesting visualisations. References to public datasets (e.g. DBPedia) can be also inserted in annotations, integrating web documents into a rich knowledge graph by pulling from (but also enriching) the Web of Data. Pundit has been released as Open Source (under AGPL 3.0) to foster use and customisation in distinct domains and settings.

The business analysis conducted in StoM showed that the Pundit Annotator introduces a real innovative product into the market. The key of success for Pundit Annotator (as most of the Web and mobile applications) is trying to attract as much Internet users as possible. Therefore, all Internet users compose the total available market for Pundit. This market is huge, still raising and not saturated. With a more focus on Europe, where Internet users are about 604 millions, we can say that the target market for Pundit can be estimated as 5% of 604 millions: i.e. 30 millions of potential users.
In terms of segmentation, the following main classes of target users have been identified: scholar researchers, professionals (journalists, lawyers, doctors) and students/teachers.

The analysis of the competitive arena has been limited to 4 main applications: Evernote, OneNote (Microsoft), Genius and Hypothesis. Among them:
- Evernote is the market leader with about 150 millions of users worldwide;
- OneNote has the benefit of being part of the Microsoft Office Suite and thus potentially used by millions of users as well. Moreover, the new Microsoft web browser Edge, launched in November 2015, integrates the annotation client, to allow users to create notes on the fly while navigating the web.
- Genius is a new entry that has recently received a huge investment, demonstrating the high interests in the area, but still has a limited number of users.
- Hypothesis is leading the Annotate All Knowledge Consortium, that recently also Pundit joined, that aims at creating an open, interoperable annotation layer over web content. The project has received a good financial support, but at present it doesn’t support semantic annotation capabilities and its annotations management features are still very basic.

The unique selling propositions of Pundit are strictly related to the semantic capabilities and social features developed in StoM, that enable: (i) more qualitative annotations; (ii) attaching knowledge and data instead of natural text; (iii) enable the “machine” to understand the annotations and thus to potentially automate and drive some tasks such as suggestions, sharing, finding etc.

The business model of Pundit foresees multiple entry-levels, in order to properly target the identified user groups. In general, the “vision” that has been identified is based on the idea of a two-side business model, one Business-to-Consumer (B2C) and a more specialized Business-to-Business (B2B) approach.

For B2C a “classic” freemium approach can be considered, with basic services available for free and premium functionalities that customers can use through the payment of a fee (flat, like a monthly fee, or through a pay-as-you-grow model).
In the B2B case Pundit, can be customized in tailor-made, vertical solutions, to be developed for companies and cultural institutions. In practice this is the model through which Pundit has been sold until now by Net7. Another more promising option is to work on specific variants of Pundit (that we called “forks”, as they are defined in software development) to be implemented as add-ons of existing products in markets where advanced document and information management is a crucial requirement (e.g. the Legal or the Healthcare sectors). These business opportunities can be exploited by creating partnerships with software vendors active in those fields.

Starting from these considerations, the B2C approach has been defined to revolve around the following three versions of Pundit:
- Pundit Annotator. It simply allows in to highlight and comment parts of text in any web page with a few clicks.
- Pundit Annotator Pro is the semantic-powered and fully fledged web annotation client. It is a tool intended for researchers and professionals who – besides using highlights and comments – want to create semantic annotations using, as subjects or objects of a “triple”, part of text in a page or the whole web page itself.
- Pundit Annotator Premium extends the previous version with a wider support of predefined vocabularies, to allow a much more detailed and accurate annotation process. It integrates the Ann-O-Matic tool, based on SpazioDati’s Entity Extraction Dandelion API to automatically identify and annotate the web page with concepts taken from DBPedia/ Wikipedia. Finally, beside DBPedia, users can annotate using other public web resources, like Europeana and Geonames, as targets.

The revenue model foresees two main types of licence: Single User (for all three levels with distinct monthly fee) and Corporate (where an organisation can buy a package of many single user premium licences for its employees/associated, together with some ad-hoc training/supporting services). In particular, Pundit will mainly leverage the deal-flow and financing opportunities from corporate licence clients to drive the global growth. In this way, it will be possible to implement a sustainable scale-up.The financial plan foresees to reach a cumulative turnover of about 500.000 euros in the next 3 years. We consider that this level of sales represents a conservative but achievable target and correspond to a market share of approximately 20 corporate clients and 6500 single user licences. This is certainly much less than our most developed competitor (Evernote), but in this analysis, we currently do not include the support of external investors and/or additional other funding (e.g. European/National public funds).

EventPlace on the other hand is a software-as-a-service (SaaS) platform targeting customers in the event industry, in order to help them to better organize digital content around events (e.g. conferences, festivals), promote the event in more efficient ways and reach attendees and exhibitors. EventPlace can help event organizers to manage their digital image and promote their brand by:
- taking advantage of the content they have earned or are producing themselves (catalogues, photos, documents, flyers, announcements, videos, social media, blog posts) to easily create cross-platform stories and multimedia narratives
- gaining insights into how their events are perceived on social media and putting the positive social references at the forefront in order to manage their online reputation
- engaging with event attendees and the interested audience that is following the event online.

Specifically, the tool will enable event managers to effectively and timely:
- showcase everything that happened during an event, opening it to a bigger audience
- provide exhibitors with an additional service to promote their stand and engage with their target audience (In the case of conferences, meetings, fairs and exhibitions)
- turn the event website into a more informative, media-rich and interactive medium suitable for all stakeholders; ensuring that visitors stay on the branded website and not bounce to social media channels where their attention can be easily captured by a different matter.
- manage the archive of past event sessions
- recap the event days with a rich media experience that is gripping thanks to its mix of keynote videos, presentations and social media posts.

The event industry is huge and, in order to quantify it, we mainly referred to the MICE (Meetings, Incentives, Conventions and Exhibitions) industry, which is an increasing market worldwide (for example in 2014, we can count almost 29 million MICE trips only in Europe, with Germany, UK, France, Italy and Spain representing the 70% of the European business travel market). In addition, we could also consider the market for entertainment festivals (music, hobbies, sports, etc.). The world of festivals and consumer events has evolved well beyond pie judging at the county fair, pitching a lawn chair at a concert, and ogling hot rods at a car show. Today, people come together to celebrate more unique interests, hobbies and passions from the newest animes coming out of Japan to electronic dance music and craft-brewed pilsners, stouts, and wheat beers through targeted, niche events.

EventPlace mainly targets the following two classes of users: Event Attendees, which include individuals, employees of corporations and associations, and Event Organizers, which include a quite broad set of intermediaries, such as professional event/congress organizers, exhibition/event management companies, business travel and MICE tour operators.

The Event Organisers/Intermediaries are the actual target customers for EventPlace (i.e. who will pay the EventPlace solution). The competitive arena is characterized by many potential competitors, but with no a clear market leader. Specifically, we can state that the Unique Selling Propositions for EventPlace are: Event directory (private and public ones) with social media integration to aggregate social media content related to events; Archival of social media streams related to event in order to provide analytics on those data - competitors offer this feature; Upload custom content - most platforms do not provide this feature but only aggregate from social media; Moderate and curate the content; Embedding aggregated and uploaded content under private sub-domain; Full engagement for both the event visitors and event organizers (browsing and searching for events, registration, ticketing, networking, push notifications); Deep search for event content; Ability to deliver vertical specializations (e.g. ConferencePlace. FestivalPlace, SportEventPlace).

The business model of EventPlace has a unique value proposition (improving the event attendee experience, expanding customer engagement and improving the event ROI), while it proposes different revenue models. EventPlace in fact offers a unique (one-stop) solution that can be easily adapted to the needs of different classes of potential customers (small, medium, large). Therefore, we devised 2 main types of licence:
- Event Licences. In this model the licence for EventPlace will be limited to the duration of an event. This model is particularly suitable for companies that do not manage many events during a year. In order to target events of different dimension, different prices for the license are planned.
- Annual Licences. In this model the licence for EventPlace will last 1 year and the customer can use the platform any time. This model is particularly suitable for companies that organise many events for year. In this case, the different prices are related to the type of supporting service IN2 will give to the customer.

For both types of revenue model, an additional revenue stream can be created by offering a dedicated person for supporting the event organizer in managing the EventPlace platform during the event.
Similarly to Pundit, the technical development of EventPlace has been mainly completed and the current version has been already tested in large events with potential early adopters. Given the strong “local component” in organising events (i.e. working with local suppliers), IN2 sees the company’s channels and partnerships as the main go-to-market strategy. In particular, IN2 plans to engage with local promotion agencies and event professionals to promote the EventPlace solution and create partnerships. This idea is also reflected in the country-based approach to the market devises by IN2. Specifically, firstly, the UK market will be target, followed by Germany and then France/Benelux, being the most promising markets in Europe.

This will drive the economic sustainability of the EventPlace solution in its early phases. Specifically, it is planned for the first year to secure 8 annual licences and support at least other 8 events. The focus on annual licences will support a sustainable scale-up of the EventPlace solution and at the same time will create awareness about EventPlace. The financial plan foresees to reach a cumulative turnover of about 1.500.000 euros in the next 3 years. In the long term (i.e. after 4-5 years), it is expected the actual take-up of event licences with a more global market approach (e.g. targeting US and Asia markets).

Finally, it is worth mentioning the management side of StoM, which concerned the WP5 workpackage. The project run in a very smooth way with just two events to be mentioned. The first is the transfer of work and budget, corresponding to 9 person/months from SpazioDati to Net7, that let the latter being the only responsible for the implementation of the server side components of Pundit. Given the minor amount of budget involved, it was decided with the then Project Officer (Ms. Joanna Sowińska) to not produce a formal amendment. Then, in the second part of the project, Techin’s remaining activities and responsibilities for the “Business Exploitation” (WP1) and “Dissemination” (WP6) workpackages were transferred to Innova. This did not produce any negative effect on the project, since the two companies were already collaborating on these tasks and Innova could easily take over Techin’s role. The amount of budget that was transferred amounted to 22,5 person/months: together with StoM’s Project Officer Dr. Roumen Borissov, it was agreed to produce a formal amendment letter, that was finally approved on December 2015.

Project Results:
Given its business exploitation nature, a great attention in StoM has been given to the foregrounds. The goal of the project from the very start was to produce results that can enter the market and being therefore appealing to potential customers from a functional viewpoint; at the same time it was essential to guarantee that these products are also scalable and robust from a technical perspective.

The Consortium Agreement, signed at the beginning of the project and updated in February 2015 (month 10 of the project), cited as the main outcomes of the project:
- PunditBrain server side components, also known in the project as the Annotation Server. It has been refactored on WP2 starting from the prototype produced in the SemLib project. The ownership is totally assigned to Net7, while SpazioDati has been granted perpetual use rights at fair and reasonable conditions.
- Recommender: it consists of a Software-as-a-Service solution developed in WP2, with a joint ownership of SpazioDati and IN2, while Net7 has been assigned perpetual use rights at fair and reasonable conditions.
- PunditBrain client side components: this comprises a set of tools, including the various versions of the Pundit Annotator plus the Dashboard web app. Development has been carried on in WP3 and the ownership is again assigned to Net7, with SpazioDati has perpetual use rights at fair and reasonable conditions.
- EventPlace: completely developed in WP4, its ownership is assigned to IN2.

A detailed description for each one of these products follows. Before, it is worth mentioning how work has been organised in the project.
As far as the timeplan is concerned, StoM respected in full the original project planning. At the same time, in order to achieve the ambitious goal to produce outcomes ready to be marketed, the team realised that more work, in terms of person/months (PM), was needed. This was already anticipated in the first reporting phase, that showed an increase, albeit minor, in PM spent in the various WPs. This trend is much more evident for this final report and affected all WPs, not only those devoted to business exploitation and technical activities. For example, Management showed an increase in time spent, due to the amendment process. Dissemination at events on the other hand was fully exploited to present the goals and later the intermediate versions of the products to selected audiences.
It is essential to point out that this increase in PM spent didn’t produce an expansion in the overall personnel costs of the project. StoM partners decided to assign more persons in the project but always respecting the overall limit of the project budget.

PUNDITBRAIN SERVER SIDE COMPONENTS/ANNOTATION SERVER
The Annotation Server provides a RESTful interface that every client (including the Pundit annotation app, the Pundit dashboard and other future generic annotation apps) uses both for storing and retrieving annotation data. From an implementation viewpoint it consists of a Java Enterprise Edition server-side application that can be deployed on a standard servlet container like Apache Tomcat. Its internal architecture is organized in multiple layers:
- The API Layer is the front-end for client-side applications and it has been implemented using the Apache Jersey framework.
- The permission layer verifies that every request arrives from authorised users only. It makes use of the OAUth 2.0 protocol for users authentication.
- The Query & Persistence layer is the true heart of the Annotation Server, since it manages data retrieval and storage.
- The data layer consists of the repositories where annotation data are stored. For convenience we will refer herein the whole of annotation information as the Knowledge Base.

The previous version of the Annotation server, the one resulting from the SemLib project, divided data amongst a relational Data Base and a Triple Store, with unnecessary redundancies and replications. In specific projects a text search server, namely Apache Solr, was also used: in those cases, the Solr indexes were updated with a batch logic, through scripts that periodically extracted the whole knowledge base (from the DBMS and the Triple Store) and accordingly rebuilt the Solr indexes from scratch.

This new version uses a totally different approach. First of all, the knowledge base resides in the Triple Store only (specifically on Ontotext Graph DB Standard Edition, managed through the Sesame framework). The Solr search server is also present and it gets updated in a near real-time fashion, since its features (faceting search in particular) are heavily used in the Pundit web dashboard. The indexing process has been implemented in an asynchronous way using Apache Camel for the orchestration of the components involved and the ActiveMQ Message Oriented Middleware (MOM) for asynchronous data communication. The use of Camel allowed to define the interaction amongst components through standard Enterprise Integration Patterns, simplifying development thanks also to the rich set of ready-made modules that are available with this framework (including those for interacting with Solr).

The DBMS, MariaDB, a MySQL compatible fork, is now used only for a minor feature of the Pundit client (the collection of personal data items that a user can store during the annotation process).
In the data model, annotations consist in sets of RDF triples, which can have as the subject a text-fragment, an image or a web page, and as the object typically an entity from a vocabulary/ontology (e.g. a person, a place, etc.) or a literal. In the original version of the Annotation Server (the outcome of the SemLib project) the data model is loosely based on the Open Annotation ontology. While at the beginning of StoM the compliancy to this standard was considered a primary goal, in the course of the project we realized that a new W3C recommendation was emerging, Web Annotation (http://www.w3.org/TR/annotation-model/) which uses Open Annotation as the starting point: we decided therefore to fully support this new standard.

Despite the compatibility with a large part of Open Annotation, the support of Web Annotation forced a reorganization of the data model: this led to the decision of abandoning backward compatibility with projects based on previous versions of the Annotation Server (and on old versions of Pundit).
The data model heavily uses named graphs, which is an implementation choice, not explicitly covered in the aforementioned standards. On the one hand they allow to encapsulate related triples, for a better organization of the different information regarding a single annotation (e.g. the body versus the metadata header, etc); on the other they are particularly useful to manage annotation changes, deletes and updates in particular, since you can remove a whole set of triples with a single command by dropping and recreating its named graph. Named graphs can be also extremely valuable for implementing access control features, since the most interesting semantic web techniques on this topic (e.g. Shi3ld ) explicitly make use of them.

User management policies were completely refactored in the new Annotation Server. In the previous version, authentication was only based on the OpenID protocol: the system simply verified if a valid OpenID token was associated to the request, without a real authorization/validation of users’ identity. This strategy was of course not acceptable for the evolution of the system: for Pundit in fact it is necessary to identify the kind of license that is associated to every user (e.g. to differentiate users with a basic and free access to those with a premium account). Moreover, OpenID has become an obsolete protocol, since all public authentication systems (e.g. Google, Facebook, LinkedIn, Twitter) are now based on OAuth 2.0. The authentication system was therefore completely refactored by implementing an OAuth 2.0 provider in the Pundit system (client-side), developed as a PHP Symfony application which manages users’ data (both personal information and details about their licenses) in the relational database. Users can register and authenticate to Pundit, through the Symfony app, by:
- explicitly inserting their personal data in a form. This way, authentication always needs the user to insert her login and password
- using an external authentication service. For this first implementation we support, yet through OAuth 2.0 Google and Facebook.

Once the user is authenticated, the Symfony app calls a specific API of the Annotation Server in order to allow the latter to create the “user session” and to keep all information that are needed for the annotation process: at present they consist in a set of personal data of the current user, including name, surname, email and especially the user ID, that is used to create the URI that uniquely identifies her in the Pundit Knowledge Base (e.g. http://purl.org/pundit/as/user/46772ad3). The OAuth token generated by the Symfony webapp is used as the session key: session data also include administrative information about the user, e.g. her kind of license.

As said before, Apache Camel has been introduced in the Annotation Server architecture to simplify all processing that can be performed on an asynchronous basis. It has been decided to try this new paradigm to manage the indexing of annotation data on Apache Solr. Each time annotation data is changed, a new request for updating the Solr index is issued. A Camel service generates a message with the information that must be updated and “routes” it on an ActiveMQ queue. Another Camel component listens on the queue, reads the message, adapts the data for Solr and routes the request to the specific Camel module that interacts with the Search Server.

Solr has been another “stable” introduction in this new version of the Annotation Server. The reason behind this choice is manifold: on the one hand, an “external” Solr server was very often used in past annotation based projects carried on by Net7, where more advanced search features, full-text based, were needed. Moreover, the Pundit dashboard heavily relies on faceting search which is a very powerful feature in Solr, not adequately supported in triple stores. The availability of a text search engine might also pave the way to future features of the Annotation Server, like the automatic crawling and indexing of an annotated web page (and not only the “simple” annotation, like it happens now).

Finally, the Annotation Server codebase makes an extensive use of automated tests, both for validating the source code during build and for checking performances under simulated load (stress test). Unit tests are defined through the JUnit framework while integration tests relies on the Cargo plugin to run an instance of the Annotation Server deployed inside a servlet container (Tomcat). Stress tests have been defined with the JMeter tool and showed adequate performance results, with average time for reading and writing annotations in terms of hundreds of milliseconds (e.g. 298 msec to write one annotation set with multiple triples). Even under more extreme test conditions, the system responds with acceptable results, for example retrieving a complex set of 300 annotations in around 8300 msec.

RECOMMENDER SYSTEM
It consists of a general purpose recommender system for suggesting items associated to similar users. The recommender is delivered as a software-as-a-service running in a cloud computing platform. Recommendations are provided via a web service interface implemented using a REST architecture, with a web user interface available to ease configuration. The results obtained in the project are very encouraging, therefore SpazioDati is planning to add this service to its SaaS offering with the Dandelion Prediction API name.

The algorithms underlying the recommender are based on the state-of-the-art methods for collaborative filtering, in particular, matrix factorization models. Collaborative filtering strategies are based solely on similarities between users and items. Their major appeal is that they do not require additional contextual information of users and events, as opposed to alternative content-based strategy. The content-based approach creates a profile for a user or an event to characterize its nature. In addition to the demographic information, a user profile can include answers to surveys needed to collect user’s data that reflect the application’s domain. Profiling the domain becomes impractical in the settings of the StoM project, that aim to serve broad audiences: from scholars to corporate researchers to PhD students, data journalists and editors. In other words, using a content-based approach it would be unnecessary complicated to offer a domain independent general-purpose recommender system.

Given its collaborative approach, the Recommender System can operate without modelling knowledge about what could make a particular item more or less recommendable. A similar situation is also seen on the user side: there is no need to model needs, tastes or desires. This means that the approach is resilient to changes in user preferences through time. Moreover, by being centered on actions, it is often easier to derive reasoning that explains particular user behaviors and more general trends.

The Service is designed around the notion of data containers to which the owner of the container can attach different recommendation engines. Each recommendation engine can operate on the same data while satisfying different recommendation needs. The container abstracts storage and indexing concerns and offers normalised access for each engine to operate on. Our implementation choice for storing data works with a PostgreSQL backend. In particular, we use the Amazon RDS cloud service for reliability, scalability and to simplify administration and resource management, although it could be replaced by similar services since no architectural commitments are needed.

As the same prediction engine implementation is often re-used for different data containers, in cases for which similar recommendation answers are needed, Dandelion offers engine templates for the owners to instantiate, parameterise and train. Presently, there are three engine templates that answer slightly different questions, but the design allows the addition of more templates that could serve for other types of recommendation needs.

The engine templates that are presently available require for the recommendation problem to be modelled in terms of three different types of entities:
- Users: that is, the subjects for which the system emits recommendations.
- Items: the objects which the user prefers or has affinity with.
- Actions: that a User performs regarding an Item.
These can be understood as implicit (e.g. product purchases, page views, mouse clicks) or explicit ratings (stars, likes/dislikes) and the choice of engine determines which type of actions will be used.

The main components of the system are:
- REST Endpoint: it is in charge of exposing the API features to client applications. It handles HTTP requests and JSON serialisation and deserialisation and component orchestration. It’s implemented using the Django REST Framework (version 2.4.5).
- WebUI: it allows for basic operations on containers to assign, parameterise, train and deploy different engine instances.
- Authentication/Authorisation proxy: calls to the Dandelion Prediction Service are allowed or blocked by this component. This is an external component of the architecture, abstracting the details of authenticating owners and allowing each owner to operate only on its corresponding containers. The actual authentication and authorisation are performed by a specialised module within the companion Dandelion Ecosystem of SpazioDati. Although authorisation needs are kept simple in an all-or-nothing approach, more fine-grained control is straightforward to implement and would only concern this module. Moreover, billing strategies could also be implemented here without impacting the rest of the design.

A central role in the Recommender System is played by the Container Manager. This core component supports container administration. It doesn’t only create and destroy containers and add data to them, but more importantly it supports the lifecycle of engines, as they are associated to a container, and also abstracts data access. Containers can receive data after being instantiated. It is important to notice that data reception can occur throughout the life of the container independently of the state of associated engines. In this way, training can be repeated as many times as needed to reflect the availability of more data.

Engines can be associated to a container by instantiating engine templates. They are the entities that implement different recommendation algorithms. Since similar recommendation questions can be posed to different datasets, the system provides engine templates that can be reused and parameterised. We offer three different templates:
- product similarity: returns a list of similar product based on global product ratings
- product ranking: given a user, it returns a list of product recommendations based on ratings
- product ranking (views): given a user, it returns a list of product recommendations based on page views.

Engines evolve through several states. Once an engine is instantiated it goes to the BUILDING state while necessary data structures are allocated. The end of this preparation phase is signaled by a state change to BUILT, unless there’s an error condition, in which case the state is CREATED. Training is done on demand, taking whatever data is in the container in that moment. This is signaled by a change of state to TRAINING: on success, the resulting state becomes TRAINED. Once training is complete, the engine can be deployed and used to emit recommendations. For client applications controlling the lifecycle of an engine through APIs, this deployment is done automatically after a successful training.

The engines offered in the current release were selected in order to cover three common situations for which recommendations are needed:
- missing or mostly missing information about past behavior of the target user but available information about other users’ behaviors (recommendations are then made according to product similarity)
- explicit feedback (ratings) from target user and from other users
- implicit feedback (page views) from target user and from other users.

For each of these behaviors, a specific engine has been implemented. In detail:
- Product similarity by ratings. This engine type takes an item i and looks through events of the type User-rates-Item to produce a list of items that are considered to be similar to i. It is suited to situations in which there is not a lot of information about the target user. The cold start problem is somewhat minimised by not relying on any information about the target user and instead by working with positive feedback from other users on items that show co-occurrence with the item for which the similarity is computed.
- Product recommendation by ratings. This engine is used in cases for which there are enough instances of explicit ratings from users. It takes a target user and produces a list of recommended items through user-user similarity based on events of the type User-rates-Item.
- ‘E-commerce’ recommendation by views or buys. When direct feedback is not available, indirect approaches can also be used to predict preferences. This engine takes page views or item purchases as hints of user interests. It takes a user and produces a list of recommended items through user-user similarity based on events of the type User-views-Item or User-buys-item.

A specific attention has been devoted in the project to evaluate the quality of the performance of the Dandelion Prediction API. As part of our evaluation we computed metrics commonly used to evaluate predictive accuracy of recommender systems, in particular root means squared error (RMSE) and precision (in the information retrieval sense). While these metrics give a general idea of how much recommended items differ from the user’s true preferences, they do not take into account position of correct items in the whole list of recommended items. In other words, if users only views items at the top of the returned recommended list, they do not care about the accuracy of the items that are ranked low. This scenario is a typical user task, known in the literature as “Find Good Items” task. In order to incorporate it in our evaluation, we computed the half-life metric.

For the evaluation, we used the datasets publicly released by GroupLens research lab, obtained from user interaction in the MovieLens recommender system:
- MovieLens 100K Dataset with 100.000 ratings from 1.000 users on 1.700 movies.
- MovieLens 1M Dataset with 1 million ratings from 6.000 users on 4.000 movies.
- MovieLens 10M Dataset with 10 million ratings and 100.000 tag applications applied to 10.000 movies by 72.000 users.

Although there are larger public datasets (such as the one provided for the well-known competition organised by Netflix between 2006 and 2009), the first two MovieLens datasets are currently, by far, the most used in the field. We performed evaluation of the Dandelion Prediction API on the 1M dataset MovieLens dataset. Similarly to the expected data of the EventPlace platform, the MovieLens dataset have many more users that items.

The purpose of the evaluation was twofold: (1) to find suitable regularization parameter for the recommender system, but also (2) to show the performance of the system. We demonstrated that high values of the regularization parameter led to recommendations which are not supported by the available users’ preferences. We argue that this parameter does not necessarily produces bad recommendations, but rather novel ones. On the contrary, small values of the regularization parameter led to recommendations which most likely would be disliked by users. It has been proved that the middle value of the parameter, 0.05 is optimal in terms of the produced recommendations.

Performance of the recommender system was demonstrated on three metrics: RMSE, precision and half-life metric. The RMSE score computed over ten validation folds is equal to 0.62. This value is slightly lower than what reported in the literature. For example, the winners of the Netflix Prize competition reported the RMSE = 0.8567. We explain the difference in the scores by the fact that we, unlike the winners of the competition, didn’t use a fresh test set for validation, hence, it is possible that we overfit the data.

We estimated that the precision of the recommendations remains within the range of [0.7350 .. 0.9942] if we ensure that all users, for whom recommendations were produced, are present in the test dataset. The lowest value is reported for the settings when a user considers the first 10 recommended items and the recommendation is considered good if the rating of the movie is 5 stars. The highest value instead was when a user considers the first item only and the good recommendation could be a 3, 4 or 5 stars rated movie.

Finally, we noticed varying performance of the recommender system on the half-life utility metrics. We obtained different results for different parameters that define how many items a user will watch in a recommended list of items:
- 0.2208 if 50% of the users will not watch more than 2 items
- 0.5497 if 50% of the users will not watch more than 5 items
- 0.7552 if 50% of the users will not watch more that 10 items.

PUNDIT(BRAIN) CLIENT SIDE COMPONENTS
The Pundit client side components comprises all services that can be used to apply and manage annotations. On the one hand we have the proper client tools, the Pundit Annotators, on the other the Dashboard, that is the web app where users can find, in a single place, all the annotations that they applied on web pages while visiting the Internet.

We implemented three versions of the Pundit Annotator, the “basic”, the “Pro” and the “Premium”, with the former two already freely available for public use. The Pundit Premium instead, which was designed for a professional use, is currently privately released to selected users only. Users can download the Pundit Annotator and the Pundit Annotator Pro by visiting: http://thepund.it/annotator and http://thepund.it/annotatorpro. They all consist of Google Chrome extensions, developed in JavaScript using the AngularJS framework.

The Pundit Dashboard (https://thepund.it/app/) represents the central point where all annotations performed by users throughout the web can be accessed and reviewed. The design of this application was performed in WP2, while development was carried on in the WP3 workpackage. From the analysis of the initial design, it became clear that the annotation process must be approached as a whole, considering not merely the actual application of digital marginalia on web pages but also (if not prominently) the subsequent exploitation of this knowledge by users.

The WP3 workpackage is where the majority of work for the Pundit Client Side Components took place. Activities revolved around five main tasks, each one devoted to cover specific aspects of the implementation of Pundit:
- Accounting and billing services
- Integration of annotation vocabularies
- Customer Relationship services
- The Pundit web portal and Dashboard
- The Pundit Client evolution.
Development was mainly managed by Net7 with specific contributions of SpazioDati, in particular for the implementation of the integration of annotation vocabularies, which exploits the company’s API offering. A brief description of each task follows.

Accounting and billing services
In the present offering, the Pundit Annotator Premium (PAM) is the only version designed for paying customers. Each request from PAM must be accepted only for users with an active subscription. For this reason, small changes have been inserted in the Annotation Server logic to expand the authentication and authorization workflow.

Subscription to the service has been implemented in the Pundit Dashboard. Several payment systems have been evaluated, including PayPal, Stripe, AcceptOn and Chargebee. It was decided at the end to integrate Paypal (in particular, the Express Checkout service), given its diffusion as a payment method. A web back-end has been implemented with the Symfony framework to manage users’ subscription (Contracts). Recurring payments are directly managed through PayPal: an automated service checks PayPal to verify if there is any subscription that has been cancelled and accordingly updates the user status.

Integration of annotation vocabularies
One of the key aspects of Pundit, always stressed as distinctive added value, is the ability to create, through annotations, semantic links with the Linked Data cloud. This way it is possible to identify concepts in a rigorous and especially machine-readable fashion: the context of the annotated document can be easily identified, without any risk of ambiguity due to polysemy or homonymy, typical when dealing with natural language. It also opens the door to inferences, for automatically deducing other data that relate to the annotated document and complement the annotations originally performed by the user. This represents a significant plus because it allows to enrich the amount of information available by exploiting the knowledge already specified elsewhere.

It is of course fundamental to choose the best possible vocabularies and data repositories alike to support users in this linking process. Open Data sources are the obvious choices to consider and Wikipedia is the most useful resource to integrate, for various reasons. First, given its general nature, it can be useful in all possible knowledge domain (and therefore annotation scenarios). Moreover, its semantic counterpart DBPedia has a central role in the Linked Data cloud, being the de facto hub that interconnects a huge variety of datasets: further data can be easily obtained by starting from a Wikipedia/DBpedia entity and navigating its RDF links/predicates towards external data repositories.
In this perspective, Wikipedia represents therefore the most obvious choice as a vocabulary to support the annotation process. To simplify Wikipedia integration, Pundit supports a specific service developed by SpazioDati called WikiSearch.

Wikisearch is a semantic search API that helps users to better find specific Wikipedia pages. It is designed to work even when users don’t remember the exact title of the page or have only a vague remembrance of the topics it relates to.
For example, imagine a user who have heard about the “Game of Thrones TV series” but wants to know the book on which it is based. With Wikisearch, she can search for “screenplay game of thrones” to obtain the Wikipedia page titled “A song of ice and fire”, even if she had no clue about the name of the book.

Wikisearch helps users to search over Wikipedia concepts by filtering out disambiguation pages and providing content that is semantically relevant to annotation tasks. Wikisearch is especially useful in auto-complete scenarios, for example during a web page annotation, where users can examine only the first few search results before taking a decision.
At the moment, Wikisearch provides access to Wikipedia content in 5 languages: Italian, English, German, Portuguese and French (Wikisearch over Spanish Wikipedia has been also released as beta version).

By using Wikisearch users have a significant help in identifying Wikipedia concepts during annotations. Moreover it is also possible to transparently obtain the types of the entities from the service and exploit these data to significantly foster the selection, filtering and reuse of annotated data.
Pundit Pro and Pundit Premium use Wikisearch to support users in the semantic annotation process, in which a triple is built using the web page or its fragments as object or subject. It is therefore a manual task that can be very tedious for the users and error prone, if by distraction the wrong Wikipedia entity is selected. In the Premium service an automated service has been introduced, named Ann-O-Matic.

It is based on SpazioDati's DataTXT service, still part of the Dandelion API offering. It is a named entity extraction and linking service, especially effective in recognizing concepts in short fragments of text, a feature that many other similar services don’t support this well. It works on English, French, German, Italian and Portuguese texts, while support for Spanish is currently in beta. DataTXT uses Wikipedia as a controlled vocabulary for the identification of concepts: in this approach each Wikipedia page is seen as a concept and the links inside the pages are used to measure the logic relationship with other concepts. This way the whole Wikipedia can be seen as a graph of topics. Given a text, the service first identifies the entities that might correspond to Wikipedia pages/concepts, then applies a filtering by removing those that are not “close enough” in the graph of topics. This system proved to be very effective for the identification of concepts, also by solving interpretation problems due to polysemy or synonymy, and more generally speaking to the ambiguity of natural language.

By using this API, Pundit Premium users can receive “suggestions” on concepts to annotate. With Ann-O-Matic possible annotations are automatically identified and presented to the user, for approval or rejection. The identified concepts in fact are not immediately transformed into annotations: it is the user that manually reviews suggestions and decides what to maintain and what to discard.

Customer Relationship services
The implementation of this task in StoM had two different goals: on the one hand to provide Pundit customers a set of services to improve their general user experience; on the other to get all possible feedback from users, either by explicit communication or transparently, by monitoring their usage of the system. More than traditional CRM therefore, what was needed for Pundit was a Customer Service system, in order to support existing users and not for managing new commercial prospects.

Since plenty of ready-made services exist, the task started by assessing the solutions in the market, in order to evaluate the most suitable one to integrate in Pundit. At first the option to use a unique system to manage all possible CRM features has been considered. Services that have been assessed for that include Highrise, Fat Free CRM and Pipedrive. Although the assessment hasn’t been all-comprehensive and just considered a small number of possible options, it helped to understand that what is really needed for Pundit is a mix of solutions, not ready available in a unique service. In particular, the real areas to cover in Pundit for Customer Service are:
- Collecting explicit feedback from users, directly from the Pundit Annotator (this is especially useful to track bugs, support requests or simple informal communications)
- Monitoring their usage of Pundit’s features
- Activating a mailing-list to regularly inform users with news about Pundit.

For the first point it was decided to integrate the project management cloud service used by Net7 in its projects, Codebase (https://www.codebasehq.com/) which includes a task manager/bug-tracker system. All users’ requests can be automatically transformed into “tickets”, that can be later reviewed and assigned to the right member of the Pundit team. This workflow has been very easy to implement and it is very effective, since it is based on the same tool that the Pundit team uses to manage their daily development work and operations.

Requests are sent directly from the Pundit Annotator through a form: this produces an email sent “server side”, from a Pundit back-end service, to a specific Codebase email, that turns the message into a ticket. These incoming tickets can be reviewed from the Codebase interface and accepted or rejected by an operator. In the former case a new ticket is generated. This solution, albeit a bit rudimentary, proved to be quite useful and effective to support the launch of the several beta versions of Pundit Annotator, released in the past months. In this way the Pundit team can very easily receive notifications from users annotating with the tool “in the wild”. In particular, it proved very useful to correct problems with specific sites in which the “injection” of the client-side JavaScript code that makes Pundit work, caused some incompatibilities with the structure of the page: while of course it is unlikely that Pundit can seamlessly work with all possible web pages on the Internet, this way the development team was able to fix a great number of bugs.

For monitoring the usage of Pundit’s features, we decided to integrate the Mixpanel service (https://mixpanel.com). While explicit feedback from users is extremely important, there is no guarantee that useful information is always communicated to the Pundit team. At the same time, we wanted to build the tool around its most useful features, in order to avoid the implementation of unnecessary functionalities. Studies in Software Engineering in fact demonstrate that only a small percentage of features are often used in live software systems. With this lesson in mind, it became fundamental to identify the Pundit features that produce value for users, and to make the tool evolve around them.

Mixpanel was the perfect tool to achieve this goal: it consists in an analytic cloud service, initially designed to monitor events in mobile applications. Mixpanel provides an SDK to track actions performed by users on an application, letting developers decide which events to monitor. Data is collected and presented in a very functional and easy to use dashboard, in which it is possible to filter or drill down information. The integration of Mixpanel proved to be very useful, both for assessing how the new functionalities are received but also to better understand users, how and how often they use Pundit, which actions they most frequently perform (and therefore where is the real value that they perceive in Pundit) but also their location. It came in fact as a surprise to notice that, thanks also to the dissemination work that has been done during the years, Pundit has an international audience.

Finally a Pundit mailing list has been activated. As usual, at first the service to use has been assessed, considering MailUp and Mailchimp as the two possible options. At the end the latter was used, especially because the free plan could allow to start the service immediately. It has been primarily used until now to create direct mailing in promotional campaigns. A general informative Pundit mailing-list, open to the public, has been recently activated.

The Pundit web portal and Dashboard
A significant redesign of the Pundit brand was the starting point for the development of the web applications of the project. The Pundit web site (http://thepund.it) was already existing before the project start and all development carried on for it in the last years has been done outside StoM. In the project instead we invested to create two web applications, the Dashboard and the Administration interface. The Dashboard represents the central point where annotations are managed by users: they can review notebooks, search and filter and export annotations in open formats. Administration is on the other end a back-end application, that provides a view on users, their usage of the tool and the active contracts for Pundit Annotator Premium.

The User Interface for the Dashboard was initially designed in the WP2 workpackage. It revolves around the User’s Notebooks which is a central concept in Pundit. All annotations done with Pundit are in fact collected in Notebooks, which therefore can be seen as thematic containers of the work performed by users on the web. Notebooks are private as a default, but it is also possible to made them public: this way all annotations contained in them are visible in their respective web pages by all Pundit users. More sophisticated access control logic has been designed in StoM and will be implemented in the future.

The Notebook view in the Dashboard presents the user with the full list of its Notebooks, together with a summary of the information contained in each one of them: in particular, the number of highlights, comments and advanced (that is to say, semantic) annotations is provided for a quick view.
By clicking on a Notebook, the dashboard presents a list of all its annotations. Faceting filters are available to select specific annotations using different search parameters. The title of the annotated web page is also visible and users can directly open it on a new page with a click.
Annotations can be exported in OpenDocument Text (.odt), with a JSON-LD export also in the work.

Finally, from the Dashboard, users can edit their profile, upgrade their license to use Pundit Premium, manage their subscription to the Pundit mailing-list and send feedback to the development team.

An Administration web application was also developed, to provide Pundit administrators an overview of registered users and their characteristics. Since most of the detailed information about their use of the system can be obtained through the Mixpanel reports, this Admin application is quite basic but proved to be, even in this way, more than effective. It provides two main views, one about Users and the other about Contracts.
The Users view lists all registered users of Pundit, providing a quick feedback about their usage of the system. For each user it shows personal details (email, first and last name), the type of contract (Free or Premium), the subscription to the Pundit mailing-list, a quick overview about her usage of the tool, e.g. how many private or public notebooks she has, plus the total number of the annotations she made, aggregated by type (highlights, comments, semantic tagging) plus the date of her last login.

Finally, the Contracts view shows the users who paid a fee to purchase the Pundit Premium service. For each contract it is simply shown the state (Active/Closed), the start date and, if the contract is closed, its end date. Since the Premium model uses recurring payments, active contracts don’t have an explicit end date: if a user cancels the subscription, her state in Pundit will be updated at night, when the process which interacts with PayPal notices the cancellation and closes the contract by inserting the end date.

The Pundit client evolution
A significant amount of work has been dedicated to refactor the SemLib’s version of Pundit into its final form. The changes revolved both around the user interface and the features, especially the implementation of the three versions of Pundit mentioned before.

A very important role in the project has been played by “User Interaction Design”, where, through the application of User Centered Design principles, a significant redefinition of the Pundit look & feel and functionalities has been applied. This is evident confronting side by side the Pundit user interface at the beginning of StoM (the proper SemLib’s outcome) and the new interface that is available now.

The analysis was based therefore on user stories and visual mock-ups, to better convey to the team how to develop the Pundit user interface.
As said, from a marketing viewpoint the Pundit tool was verticalized into three different versions, Pundit Annotator, Pundit Annotator Pro and Pundit Annotator Premium. New functionalities were also introduced in the tool, including the possibility to simply highlight or comment a text fragment. While using behind the same semantic model of more complex annotations, these features represent a quick and effective entry for people who want to start using the Pundit family of tools. In this case the goal was to make the interface as simple and intuitive as possible: when the user selects a text fragment, a popover shows up asking if she wants to create a comment or to highlight it. When the user selects Highlight the text fragment is highlighted with the coloured background and a new highlight-icon is added on the right of the web page. When the user selects Comment, a text area is shown, allowing her to write her comment.

When an annotation is created, whichever the type (highlight, comment or semantic annotation), it is shown in the sidebar on the right, together with all other annotations inserted in the current page.

An experimentation has been also done to integrate the Recommender in Pundit. The implementation, from a software design viewpoint, was a bit tricky because of the diversity of the possible annotations and pages to work on. Moreover, the fact that there aren’t currently many users in the system, makes the available data very sparse and difficult to analyse to produce meaningful recommendations. The logic for the recommendation feature was based on the following assumptions:
- provide recommendations by grouping annotations by notebooks. This assumes that the selected annotations share a similar subject and lowers the risk to present out of focus and not interesting results
- ordering annotations relevance by date, giving more weight to the most recent ones
- associating DBPedia/Wikipeda entities to each annotation. For semantic annotations that already present link to DBPedia entities this comes of course for free. For others, in particular for highlight and comments, entities can be extracted from the annotated text fragments by using the DataTXT API from SpazioDati.

The logic described above seemed quite sound but proved not to be very successful. The StoM Recommender in fact is essentially based on a comparative filtering approach, whose results cannot be effective in presence of few and very sparse data, as it was our case. It might be useful to repeat this experiment in the future, when the number of Pundit users has risen significantly. In order to reduce the sparseness of data it could be useful to consider only the semantic classes of the Wikipedia entities and not the entities themselves, even if this can raise the risk of presenting out of focus results.

For this reason, the experimentation has been abandoned and another feature (Social Functionalities) has been implemented instead. It mimics the functionalities of Social Media to provide feedback on other people’s annotations, included in public notebooks (therefore visible to all Pundit users). It allows users to vote (like/don't like) and to comment other’s people annotations found in a web page. The Social Features have been implemented in an experimental version of the Pundit Annotator (Lab) and it is currently in beta.

EVENTPLACE
EventPlace is one of the main outcomes of the StoM project. The EventPlace system builds on the results of StoM and IN2’s media management platform, MyMeedia (https://mymeedia.com). MyMeedia is a flexible generic platform providing a range of generic components that can be customised for many applications. This platform emerged as a result from previous EC-funded projects of IN2. EventPlace is conceived as a vertical application of MyMeedia, which adds to and extends the functionality that was available. The main achievements of EventPlace that were possible due to StoM were:
- Integration of automatic semantic annotation and entity extraction
- Semantic content recommendations as a mean to give users additional exploration options and allow them to discover new content
- Accounting and billing components
- Better backend management and scalability
- Improved user interaction and content publishing means.

The development of EventPlace took place in the WP4 workpackage and was devoted to improving both the backend and the frontend of the system.

The backend allowed to implement Accounting and Billing services, User On-Boarding Automation, Semantic Annotation, Instance Management, for better flexibility and scalability, plus Semantic recommendations and discovery. Development for the EventPlace Frontend allowed to improve the general user interface of the system, the usability of the solution plus the whole process of publishing content to the platform.
A description of each one of these achievements is presented below.

EventPlace Back-End: Accounting and Billing Services
There are a couple of third-party solutions available when Internet-based services look for integrating payment and billing into their systems. Among the possible options that were considered there were the direct integration of credit card processing from IN2’s bank, PayPal, the use of other payment providers.

Discarding the first option primarily for security reasons and PayPal for usability concerns, IN2 decided to integrate Stripe, a payment system that challengs PayPal with the simplicity of its APIs. Stripe was just being introduced to the UK market and IN2 has been one of the early adopters of the system in Europe. Now Stripe is operating almost worldwide and offers a range of extensions.
Stripe services get activted in EventPlace on the following touch points:
- It can be optionally switched on during the signup process via a startup parameter. It triggers an e-mail automatically based on time lapsed after signup (e.g. to cater for a free trial period)
- It can be triggered manually from the administration screen
- It can be triggered manually based on a click action in the Account page of the user.

The actual screen to insert credit card detail is then rendered as an overlay on top of the user account page.
Next to the monthly payments, Stripe gives the option for yearly payments. Furthermore, for enterprise accounts, IN2 plans to use the regular quote and invoice-based process.

EventPlace Back-End: User On-Boarding Automation
Once a new user has signed up, a number of processes needs to be set up in place, in order to ensure that the users get to familiarise themselves with the application and continue to engage with it on a regular basis. These processes of “getting the users on board” are generally referred to as customer on-boarding and are especially important when offering a free trial period, looking to convert the registered users at the end of the trial period.

For SaaS products like EventPlace, customer on-boarding is best done with the help of software tools that allows providers to understand how users interact with the application and automate the processes designed to engage new users. This is often called marketing automation. Essentially marketing automation allows sending emails to users based on actions that they perform. For example, when visiting for the first time a specific section or while dropping from a specific task.

For this purpose we tested the Customer.io and Mautic services. Customer.io is a cloud service with a range of features mainly for targeted messaging. Mautic (https://www.mautic.org/) is an open source software available both as a cloud service and as a self-hosted application. After comparing both solutions, we finally integrated Mautic, because of the additional flexibility it provides in capturing history of interactions, but also in attaching notes to users and the gamification elements it incorporates. For this we opted for the self-hosted version and configured the service to run on https://mautic.in-two.com. With Mautic we can graphically define campaigns and attach triggers and actions to them.

EventPlace Back-End: Semantic Annotation
As part of EventPlace we also extract tags from Instagram and Twitter using the automatic semantic annotation and entity extraction service developed within StoM by integrating SpazioDati’s Dandelion API and build up so domain knowledge on the topics of events. We integrated this component during the feed import process.

Generally EventPlace imports feeds (using e.g. the social media APIs available) of social media posts and harmonises them for easy ingestion into the service. In StoM we extended this feed import process with tag extraction in the following ways:
- Hashtag extraction
- Post cleansing based on stop words
- Semantic annotation and entity extraction using the StoM components.

The workflow for semantic annotation follows these steps: first the feeds are imported on a regular basis using the APIs of the social network. For each post hashtag extraction and automatic semantic annotation is performed, after which the new items are indexed and stored in the database. The feed import and the subsequent processing of feeds run asynchronously and in the background and only make available results on the frontend when the process is completed. This way, we also make sure that in cases where there is a high load (on average till now 12K to 15K posts per day are processed but this depends of course on the number and the scale of events that EventPlace is handing) we can scale the available infrastructure.

EventPlace Back-End: Instance Management
One major objective of this work was to ensure that EventPlace is both flexible and scalable. Flexibility is essential to be able to cover the different needs of event organisers both on functionality and interfaces. Scalability is required for being able to handle large events with many participants and large surges of concurrent access to the service.

We performed a complete redesign of the initial backend of EventPlace and moved more functionality in reusable components. One of the first changes we did was to replace the original Jackrabbit XML store with a combination of persistent storage in a cluster of PostgreSQL databases and SolrCloud for distributed search, indexing and retrieval. In the current version, all search operations and results are returned by SolrCloud, while all changes on items (for example save, edits) are saved in persistent storage first and then synced with the SolrCloud. This allows us to have very quick response times, on average of around 70 milliseconds per Solr instance when it comes to displaying results on the frontend.

In order to be able to monitor the performance of the backend we installed and configured the monitoring service DataDog (https://www.datadoghq.com/). DataDog provides specific integrations with cloud services and also allows collecting metrics from the running machines and infrastructure. This allows us to monitor for example memory usage, hard disks utilisation, CPU resources and draw related charts. Furthermore it allows us to trigger notifications and events in case some threshold values are reached or exceeded. This enables us to react quickly and keep the system in a healthy stage at all times.

Next to system optimisation and monitoring, we also addressed scalability by making it easy to deploy many instances of the application simultaneously. One main aspect here is to make stateless application requests and design applications in a way that does not require knowledge of the previous state to fulfill a request. For example, when loading a web page, the underlying assets (e.g. images, videos, stylesheets) may be served by different application instances independently. EventPlace builds requests on top of the HTTP protocol which itself it is a stateless protocol. However there are parts of the system that require knowing the previous state, e.g. when a user is logged in. So while it is straight-forward to perform stateless requests when users are not logged in and not authenticated (e.g. for public websites), it becomes much more difficult when authentication and user roles need to be enforced. In order to enforce these user roles and rights, we implemented at EventPlace a session management method utilizing a cookie on the client browser side. With this session management, EventPlace is now fully stateless.

For deploying many instances simultaneously, ideally the whole instance must be up and running with as few as possible actions (commands) and integrate seamlessly into the current operating environment. Applications however depend on external libraries for e.g. accessing system resources. These external libraries must be available at the application build process in order to startup successfully. Programming languages like Java use dependency management configuration files and dedicated public repositories to fetch these libraries. Contrary however to the main backend languages (e.g. Java, C, C++), frontend programming languages do not include a dependency manager. In fact dependency management is not that mature when it comes to frontend programming languages, notably JavaScript. We have developed a workflow to streamline frontend JavaScript dependencies using the current state of the art tools:
- npm, a package manager for the JavaScript runtime environment of Node.js
- Bower, a package manager for web components (external HTML, CSS, JavaScript, fonts, images) and
- Gulp as a build system (e.g. for concatenating, minifying stylesheets and scripts).

In this way frontend dependencies are resolved automatically and included at the build process to the application. This makes sure that an EventPlace instance starts with all required components in place.

Another important part in scalability is being actually able to insert a load balancer in front of all application instances to decide which one is going to handle the actual request. This is usually handled by the application’s web server. The current state-of-the-art web servers are the Apache HTTP server and Nginx, each of them with their own advantages. We evaluated both and finally implemented Nginx mainly due its capabilities to act as a load balancer, as a web cache and as an application reverse proxy.

The EventPlace system backend is developed in Java using the Play Framework (https://www.playframework.com/). Play follows a Model-View-Controller (MVC) approach for implementing web applications. Essentially in this approach data models are independent of views and interfaces but are manipulated by controllers (methods and functions). Controllers thus represent the application logic and are the main building blocks of the system.

The MVC approach allows particularly for building flexible interfaces and customising them very quickly. We implemented an intelligent expansion of the available templating functionality in Play that allows to render custom templates when available, inject custom CSS to existing templates when available and fall back to a variety of pre-designed templates.

EventPlace Back-End: Semantic Recommendations and Discovery
With EventPlace we provide tangible tools for content discovery based on the semantic recommendation and annotation developed in StoM. Event organizers can use this to maximise audience engagement and create compelling personalised user experiences for participants. EventPlace makes available several content exploration features to allow participants to engage with content produced by events, to remain informed and maximise the value of their participation:
- Discover who is talking (Actors) on what (Tags) in which medium (Source) and what format (Format) using powerful filters. These filters can be combined together and are calculated automatically for every collection and sub-collection.
- Discover related stories if they exist based on items they share
- Discover related posts based on content similarity. If related posts do not exists, then the system falls back to latest posts.

With EventPlace, event organisers let their participants chart out their personal way through the event’s digital content, whenever they need and on all devices they use, making sure that they can find and explore content that is interesting to them.

Technically we take advantage of the semantic annotation of the SpazioDati’s DataTXT offering and the faceting capabilities of SolrCloud, the index engine we use. We additionally installed and configured a local instance of the semantic recommendation engine of StoM. There we store explicit user actions and events that we then use to calculate recommendations on related posts.

EventPlace Front-End: Technical developments for improving the user interface and usability
Functional user interfaces are paramount for retaining customers and improving user acceptance. EventPlace has two main interfaces:
- Backstage: the administration frontend, where event organizers can set-up their feeds, upload multimedia content, create and publish collections and stories
- Stage: the public user frontend where participants interact with the published collections and stories.
Each of those interfaces has dedicated functionality and is optimised for performing the tasks required. The overall design concept follows a card-based approach. The Backstage also includes a vertical menu on the left side of the page where all actions are grouped together to support the user workflow. Items are represented as cards, as well collections and stories where possible.

The main Backstage actions are: edit, delete and publish items. Social media items cannot be edited but only deleted. We designed these elements in order to minimise the number of actions required to perform them and the moves necessary according to Fitts’s Law. For this reason, we also re-order the list of collections available bringing the last collections selected always on top.

Publishing items involves only selecting to which collections the item will be published. It is important to note here that EventPlace allows to publish one item to multiple collections. This way it breaks out of hierarchical folders we know from desktop applications and allows the building of graphs where items are connected together. Events can thus include for example one photo in many collections and stories. The collections where items are published are marked visually (with a background colour) and include an indication (+ or - sign) for adding, respectively removing the item from it. We decided to limit the number of collections to the three last accessed, since we experienced that this fits better to the user workflow and does not clutter the view and interface.

The interactive components of the Backstage have been implemented using AngularJS, a JavaScript framework that follows a similar MVC (Model-View-Controller) architectural approach to what we use in the EventPlace backend. Using AngularJS we were able to separate the actual data from the views and reuse templates for rendering frontend components.

The design of event Stages (i.e. public user frontends) also follows a card-based approach with emphasis being put into presenting diverse information consistently. Since items can range from pure text (e.g. Twitter) to only visual (e.g. Flickr and Instagram), we decided to provide several predefined templates for information presentation. In all views where collections are shown, we include as a first card the meta-information of the collection (name, event, search and filters) to provide context to the user of what is being shown. Users are able to follow collections either by subscribing to a newsletter clicking the “Follow” button on the information card, or by subscribing to the RSS feed of the collection (clicking the symbol on the bottom right on the information card).

Individual posts are shown in as much as possible full width view and also include means for sharing to social media and via e-mail.
All views from both Backstage and Stage are fully responsive, i.e. designed to work seamlessly on all devices: desktops, tablets and mobile phones. This is especially important since a large percentage of event participants is only active on mobile and tablets. Furthermore, because EventPlace is a web-based application it works on all mobile phones and tablets, independently of vendor (e.g. Apple, Google, Samsung) and operating system (e.g. iOS, Android). This allows for a wider reach of event participants and is a lot more cost effective for events compared to the development of native applications for all vendors and operating systems.

EventPlace Front-End: Publishing
A significant improvement was also done to the publishing process, to give the maximum flexibility to event organizers to present their content. EventPlace provides now a variety of publishing templates that event organizers can choose to best fit their needs and match the type of content that they have available. The developed designs take into account different types of content (textual only, visual only, mixed text and visuals) and arrange them in the following styles:
- Feed layout
- Grid layout -- variable height, where the background colour corresponds to the item source e.g. blue background for Twitter items, red background for YouTube ones
- Grid layout -- variable width
- Gallery
- Slideshow.

Potential Impact:
The main impacts of StoM must be measured in terms of expected economic results that can be produced by the commercialization of the project outcomes. This aspect has been analysed in full in the Business Plans for Pundit and EventPlace, that were produced in the WP1 workpackage.

The context for each product has been described separately in the D1.2 Deliverable. Here for each product an analysis was conducted by providing:
- A description of the Business Idea
- A description of the Target Market and Segmentation
- An analysis of the Business Model and the Selling Strategy
- An overview of each company’s Business Organisation and Resources
- An estimate of Financial impacts.

As far as Pundit is concerned, the business plan considers a revenue model based on both single user and corporate licenses, considering a freemium model for the three possible versions of the tool. In this model the basic Pundit Annotator is available for free, while the Pro and Premium versions come with a price.

Single user licenses consider the possible evolution by which a single user can start with the basic solution and then can upgrade to the following levels, according to his/her specific needs.
Corporate licenses offer the possibility for Universities, Professional Associations and Companies to offer the product to their employees/members.

The corporate licenses include a set of single user premium licenses and some supporting services, such as training to effectively use the system and customisation for the specific needs of the corporate entity (e.g. developing ad-hoc plug-ins and domain specific ontologies).
A hypothesis of price has been also presented, considering as a reference the costs of other SaaS solutions. In particular:
- Pundit Annotator, Single user license: 0€/month
- Pundit Annotator Pro, Single user license: 2€/month with a free trial period of 30 days.
- Pundit Annotator Premium, Single user license: 10€/month
- Pundit Annotator Premium + Supporting Services (Package), Corporate license (< 30 user): 5.000€/year
- Pundit Annotator Premium + Supporting Services (Package),Corporate license (> 30 user): To be agreed with the customer.

We identified two main approaches to the market:
- Direct sales of licenses (in particular Single User Licenses) through the Web site.
- Engage customers via company’s channels and partnerships (in particular for Corporate Licenses).

Both approaches (but in particular the former) require a strong online/offline marketing activity. Therefore, we devised a commercialisation framework based on the following main steps:
- The first step of the strategy will be to create attention in a Total Available Market through a series of activities called "Build audience and Awareness”.
- The second step will be to acquire users in available markets (identified by monitoring the activities done in the 'Awareness stage') through a series of activities called" Engagement".
- The third step will be to take action and monetize with users/customers in a target available market phase (at this point identified by monitoring the activities done in the 'Engagement stage') through a series of activities called "Conversion and Retention".

The proposed framework could provide concrete benefits (i.e. relevant sales) in the medium/long terms. Therefore, to realise the value of the Pundit Annotator solution in the short term, Net7 should engage with a series of stakeholders, like target customers and relevant strategic partners. The aim is to economically sustain the early phases of the business development with corporate license sales and agreements, while direct sales will take-up.
In addition, access to private financing, such as equity and investment rounds, will be also taken into consideration.

Having established the software product (i.e. available to users), in the first year, Pundit will mainly leverage the deal-flow and financing opportunities from corporate license clients to drive the global growth. In this way, we will implement a sustainable scale-up, and by the third year sales would reach about 500.000 euros (in total for the 3 years).

This level of sales represents a conservative but achievable result in the next three years and correspond to a market share of approximately 20 corporate clients and 6.500 single user licenses.

This is certainly much less than our most developed competitor (Evernote), but in this analysis, we currently do not include the support of external investors and/or additional other funding (e.g. European/National public funds).

Costs on the other hand have been estimated to be within 85.000€ for the first year, while growing in the second and third year together with the increase in revenues, and be contained within 430.000 € for the whole three years. Although this can be considered a conservative analysis, it is worth pointing out that in 3 years it is expected to have a cumulative turnover from Pundit that roughly corresponds to the double of the Net7 budget in the StoM project.

EventPlace will be commercialised by IN2 as a Software as a Service (SaaS) platform. It mainly targets the following two classes of users:
- Event Attendees, which include individuals, employees of corporations and associations;
- Event Organizers, which include a quite broad set of intermediaries, such as professional event/congress organizers, exhibition/event management companies, business travel and MICE tour operators.

Between the two groups, the Event Organisers/Intermediaries are the actual target customers for EventPlace (i.e. who will pay the EventPlace solution). In particular, EventPlace will target exhibition/event management companies and professional congress/event organizers, that with their planning teams (either internal or outsourced) deal with the event creation, curation and follow-up. These specific classes of customer can offer the EventPlace solution as part of their services for organizing and managing an event. According to the dimension of the company/team, they can organize and manage multiple events yearly. Generally, small companies/teams deal with a few, small/medium events per year, whilst big companies/teams deal with several, medium/big events per year. This distinction should affect the EventPlace offering.

In addition, EventPlace can directly target convention centers (i.e. organizations that manage/owns the convention centers), which commonly offer physical spaces for organizing events (usually many events during the year, almost one for week). However, they are recently extending their offering by including additional services to either support the event organizer or in some cases directly organize the event. Therefore, EventPlace could be a novel supporting service to be offered.

The needs of all of these customers are more or less the same: effectively monitoring and aggregating social media during the events and creating compelling follow-ups after the events. Distinctions between these customers are mainly related to the number and the dimension of the events that they manage in a year.

In order to address different expectations in terms of number and dimension of the events of the target customers, EventPlace will follow two main revenue models:
- Event Licenses. In this model the license for EventPlace will be limited to the duration of an event. This model is particularly suited for companies that do not manage many events during a year. In order to target events of different dimension, different prices for the license are planned.
Small Event (<250): 400 €
Medium Event (<1000): 600 €
Large Event (>1000): 800 €

- Annual Licenses. In this model the license for EventPlace will last one year and the customer can use the platform any time. This model is particularly suited for companies that organise many events for year. In this case, the different prices are related to the type of supporting service IN2 will give to the customer. In the “Pro” version IN2 will give a minimum support to set-up the platform and train the customer for using it, while in the “Premium” version IN2 will offer a constant support and opportunities to customise the platform to the specific needs of the customers. Finally, in the “Enterprise” version IN2 will create a specific vertical of the EventPlace, with ad-hoc developments (e.g. graphic user interface) and supporting services.
EventPlace Pro: 10.000 €
EventPlace Premium: 15.000 €
EventPlace Enterprise: To be agreed with the customer.

For both types of revenue model, an additional revenue stream can be created by offering a dedicated person for supporting the event organizer in managing the EventPlace platform during the event. In fact, event organizer could not have the capabilities and/or the time to deal with the platform during the event. Therefore, IN2 could enforce the event organizer team with an additional person that knows how to deal with EventPlace and can animate and curate the social activities during the event.

This additional service is already part of the Premium/Enterprise offering and could be a plus for all the other solutions.
Similarly to the Pundit solution, also for EventPlace we identified two main approaches to the market (the two approaches are not mutual exclusive and one can feed the other):
- Direct sales of licenses (in particular Event Licenses) through the Web site;
- Engage customers via company’s channels and partnerships (in particular for Annual Licenses).

From a marketing perspective, both approaches require a strong online/offline marketing activity. Therefore, we devised a commercialisation framework based on the following main steps:
- The first step of the strategy will be to create Attention in a Total Available Market through a series of activities called "Build audience and Awareness”.
- The second step will be to acquire users in a Served Available market (identified by monitoring the activities done in the 'Awareness stage') through a series of activities called" Engagement".
- The third step will be to take action and monetize with users/customers in a Target Available Market (at this point identified by monitoring the activities done in the 'Engagement stage') through a series of activities called "Conversion and Retention".

Having already demonstrated the EventPlace solution with potential early adopters, in the first year, EventPlace will mainly leverage the deal-flow and financing opportunities from converting engaged early adopters in clients (annual licenses) and then create local partnerships (starting from the UK) for selling event licenses. Specifically, it is planned for the first year to secure 8 annual licenses and support at least other 8 events.
The focus on annual licenses will support a sustainable scale-up of the EventPlace solution and at the same time will create awareness about EventPlace. In fact, by the third year sales would reach about 1.500.000 euros (in total for the 3 years), mainly coming from selling annual licenses.

Costs on the other hand will grow with the increase of customers and should be contained within 900.000 € for the whole three years. In this scenario, it is worth to highlight that in 3 years it is expected to have a Net Cash Flow more than double of the IN2 budget in the StoM project.
In the long term (i.e. after 4-5 years), it is expected the actual take-up of event licenses with a more global market approach (e.g. targeting US and Asia markets).

DISSEMINATION
The dissemination activities have been consistent with the originally planned tasks of the project proposal: the emphasis has been put on the dissemination of project information on the Internet and the participation in events.
In the initial phase of the project, the priority was placed on the promotion of the website and the use of the StoM hashtag (#stom_eu) on Twitter and LinkedIn. A first StoM press release was released at month three and widely distributed to the project target groups, in order to explain basic aims of the project.

Later on, StoM technologies were presented during various events, with the partners aiming at maximizing the visibility of the project.
Specific Skype calls for managing dissemination were arranged throughout the course of the project: in the beginning it has been agreed that the Partners would also promote StoM and its activities via their websites and social media channels, for example by:
- including link to the StoM website on the company website,
- publishing press releases on their websites,
- promoting StoM news via their websites, social media channels and mailing lists,
- promoting the project also during events aimed at presenting other activities,
- using the project logo on every publication related to StoM.

As the logo provides the first insight of the project identity, the Consortium aimed at designing a clear solution, that would allow people to identify the key area of activity. Therefore, StoM project logo represents the concept of semantic technology and ontologies by a simplistic example of a semantic graph.
The logo was used on all project-related presentations, publications, reports, deliverables and dissemination material. Together with the website background image, it forms a coherent project brand identity.

The project webpage (http://www.stom-project.eu) was the crucial instrument in communicating the projects results and to spread information about project activities.

The website, launched at the very start of the project and has been regularly updated ever since: both the Project Coordinator (Net7) and the appointees for Dissemination have access to the WordPress platform where it is possible to manage the content and structure of the website. The ‘News’ section has been regularly updated by Techin before and then by INNOVA, based on the contributions from the Partners. The project website will remain accessible for two years after the end of the project.

Participation to events helped promoting the project towards targeted audiences. The selection of events was made against their expected type of audience. It should be noted that events gathering industry and academia were the priority at the first stage of the project, while during the second period the aim was to disseminate more information on the platforms and their functionalities. In line with the development of Pundit and EventPlace, the dissemination was focused more on promoting the two solutions to their target audiences.

Dissemination was tailored to the needs of separate target groups, taking into account functionalities of each platform and needs of each target group. Simultaneously, dissemination of general information on StoM continued.

Furthermore, dissemination activities have been reviewed, revised and updated in line with the development of other deliverables (i.e. Business Plan) and platforms, as well as on the effectiveness evaluation of the dissemination activities performed.

The reporting of events and on-line dissemination has been done via a dedicated form available for Partners via Google Docs. As a result, an Excel table is produced listing all events. As such, it gives a clear overview of the dissemination activities carried out by each partner. In order to ensure that the form was regularly updated, a reminder has been sent every two months before by Techin and then by INNOVA team.
The full list of Dissemination activities and events has been inserted in the Participant Portal.

Dissemination also included the creation of videos to showcase the features of Pundit. One of the main difficulties that the StoM team met during the project is in fact to explain in few words, like in an “elevator pitch”, what a semantic annotation is and what are its benefits. It is essential to show immediately to everybody the values of Pundit and video is undoubtedly the most immediate and effective medium to accomplish this result.
It is not a case that at the very beginning of the project two videos were produced to show the potential of the tool at the stage in which it was at the end of the SemLib project. Now that, thanks to StoM, the product has reached stability and functional maturity, it made sense to use again videos to introduce Pundit to the public.

Seven Pundit videos have been produced: it has been decided to create a set of releases, not only to showcase the main features of the system but also to demonstrate its possible use in real-life scenarios. These videos comprise therefore also use cases and tutorials, to provide examples of application of the system to solve real-life situations and to present step by step guides for setting-up and using the tool.

Finally, at the end of the project two press release were produced, one for each platform. They put in evidence – for Pundit and EventPlace – their current state of development and their potential, taking also into consideration the advantages that they provide respect to what already exists in the market.

List of Websites:
StoM web site: http://www.stom-project.eu/

Project coordinator and contact point for Pundit: Luca De Santis, email: desantis@netseven.it
Contact point for EventPlace: George Ioannidis, email: gi@in-two.com
Contact point for the Recommender: Michele Barbera, email: barbera@spaziodati.eu
Contact point for StoM Business Planning: Alessio Gugliotta, email: a.gugliotta@innova-eu.net
Contact point for StoM Dissemination: Cristina Fregonese, email: c.fregonese@innova-eu.net

Final Pundit press release: http://www.stom-project.eu/wp-content/uploads/2016/05/STOM-Press-Release_PUNDIT_1.0-1.pdf
Final EventPlace press release: http://www.stom-project.eu/wp-content/uploads/2016/05/STOM-Press-Release_EVENTPLACE_1.0.pdf

Final Report Summary - STOM (SemLib to Market)

Related documents

Share this page Share this page on social networks

Download Download the content of the page