European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Social Sentiment analysis financial IndeXes

Periodic Reporting for period 3 - SSIX (Social Sentiment analysis financial IndeXes)

Okres sprawozdawczy: 2017-03-01 do 2018-02-28

Social Sentiment Analysis Financial IndeXes (SSIX) was a ‘Big Data and Open Data Innovation and Take-Up’ action under the Horizon 2020 programme. The objective of this action was to improve the ability of European SMEs to develop innovative, multilingual data products and services, by converting large data volumes into semantically interoperable data assets and knowledge. SSIX supports this objective with the creation of the SSIX Platform, comprised of a collection of versatile components which can be used to create data-driven analytics from large volumes of multilingual text content. Multilinguality is one of the principal benefits which SSIX delivers, as non-English support is underserved in the market. The projects aim was to extract relevant and significant signals from increasingly influential social media platforms such as Twitter, Facebook and StockTwits, as well as traditional news feeds and blogs. Essentially, SSIX is a platform producing sentiment metrics which can be utilised to help make better, more informed decisions. For the finance (investing) domain, previous research has already shown that analysing social media content can produce a predictive signal on future stock market direction. Comparatively, the SSIX platforms sentiment metrics can be used as a leading indicator helping to generate alpha while also decreasing risk.
This section gives describes the activities and work progress for entire project duration:
WP1 was responsible for project management, reporting and risk management. WP1 was responsible for running the three controlling boards. WP1 produced 12 reports.

WP2 was responsible for defining the requirements and use cases to be supported by the SSIX platform. The outcomes are reported in D2.1 & D2.2.

WP3 produced the techniques for collecting, storing and filtering the source content used for the SSIX platforms analysis. Efforts towards these tasks are explained in detail in following deliverables: D3.4 provides a technical description of the architecture developed for data ingestion. D3.5 describes the procedures implemented to perform sampling. D3.6 outlines the API endpoints. D3.7 contains the technical documentation provided to illustrate how to interact with the Streaming APIs. D3.8 documents the architecture described in this report was the outcome of three years of continuous improvement and fine-tuning. WP3 delivered three Data Management Plans.

WP4 was responsible for the NLP services and analysis pipeline to be used in the SSIX platform. Tasks were concerned with: Defining what NLP service and analysis architecture were to be used in the platform is reported in D4.1 and D4.2. Multilingual language resource acquisition and management task delivered two Language Resources catalogues (D4.3 D4.4). NLP Service and Analysis Implementation tasks produced three reports (D4.5 D4.6 D4.7).

WP5 was responsible for the SSIX Platform Deployment, Validation and Evaluation. Tasks were concerned with: SSIX Process Definition Specification (D5.1) which outlines the main software process flows and related business processes that were to be interlinked within the SSIX platform architecture. The architecture design of the SSIX platform is outlined in D5.2. D5.3 - SSIX Technical Validation Plan of selected Business Cases focuses on developing test cases and baselines to validate the implemented platform against the requirements from WP2. Development of SSIX platform prototype, reports D5.4 and D5.5 describes the overall SSIX architecture, highlighting the different layers and explains the operation of the different work packages within the SSIX platform. D5.6 - SSIX API Definitions contains the technical documentation of the API of the SSIX Platform. Technologies’ integration and Testing principles adopted for the SSIX project are described in D5.7 and D5.8. D5.9 - SSIX Platform Assessment Report is a report on the testing activities performed on the SSIX platform.

WP6 was responsible for Technology Transfer and Dissemination in the SSIX project. Four editions of the Project Website, Wiki, LinkedIn and Training Materials report (D6.2 D6.3 D6.4 D6.5) were delivered. Technology Transfer (Training and Dissemination Plan) in the SSIX project was reported in D6.6 D6.7 D6.8 and D6.9.

WP7 was responsible for Exploitation and Commercialization of the SSIX project. Tasks were concerned with: D7.1 Market assessment for industry and end-user partners. D7.2 - API Commercial Toolkit and Services Trials. Commercialisation goals and guidelines are documented in D7.3 it outlines the commercialisation plan to be pursued by the consortium members. D7.4 - Exploitation Strategy and Go To Market Plan outlines the comprehensive strategy to bring the outcomes of the SSIX project to the market. The Pilots in Industry (D7.5) report provides descriptions of industrial pilots performed.
WP2
Our findings show that while there are tools available to handle sentiment analysis extracted from social networks,the mostly show positive/negative sentiment or positive/neutral/negative. Apart from it being difficult to audit how these ‘labels’ were given, such binary granularity is of no use for quantitative modelling of financial data. What makes SSIX unique is that it lays foundations to generate ‘technical sentiment data’ parameters or ‘X-scores’ which are within a continuous numeric interval (-1;1) having the potential to be adopted by the quantitative investment industry and reach the same level of utility as commonly used parameters such as P/E ratio, RSI or MACD.

WP3
Performance tests highlighted stability issues due to the high volumes of parallel data when listening to the most discussed financial markets, however, this bottleneck was overcome with hardware resources scalability. Our experiments shifted to cloud technologies provided by the Google Cloud Platform reduced the time for data storage and extraction and helped the scalability of the parallel computing processes. A stratified sampling technique was developed to extract content from large historical data sets. This technique was adopted for creating the data sample used in the production of the platforms custom classifiers.

WP4
By the first year, we concluded that the available multilingual domain specific and sentiment lexica may not provide the expected features for the opinion mining needs for this project. In parallel, several Big Data analysis infrastructures were analysed for their suitability used as the foundation for the pipeline architecture. Year 2 and 3 saw development on SSIX custom sentiment classifiers for financial microblogs, the Brexit referendum and the 2017 German elections. Benchmarking tests for the financial microblogs classifier show promising results against current SOTA services. Efforts have gone into a custom machine translation service and an aspect-based sentiment analysis (ABSA) classifier.

WP5
Foresight was given to the need for scalability and efficiency. The major system components were designed to operate independently of each other, so they can be distributed or centralised depending on the deployment scenario and load on the system. Areas of potential innovation include testing of new classification models, building a system for statistical calculations and NLP classification, using massively parallel computing and researching new visualisations to aid end users in the decision-making process.
Figure 1: SSIX Platform Overview