efficient Syntactic Analysis for Large-scale Sentiment Analysis

Información del proyecto

SALSA

Identificador del acuerdo de subvención: 101100615

DOI

10.3030/101100615

Proyecto cerrado

Fecha de la firma de la CE 4 Octubre 2022

Fecha de inicio 1 Febrero 2023

Fecha de finalización 31 Julio 2024

Financiado con arreglo a

European Research Council (ERC)

Coste total

Sin datos

Aportación de la UE

€ 150 000,00

Coordinado por

UNIVERSIDADE DA CORUNA
Spain

Periodic Reporting for period 1 - SALSA (efficient Syntactic Analysis for Large-scale Sentiment Analysis)

Período documentado: 2023-02-01 hasta 2024-07-31

One of the key aspects of any successful business is knowing how customers feel about its brand and products. For this purpose, sentiment analysis or opinion mining tools could be paramount in helping companies to succeed. However, the current state of the art sentiment analysis solutions (both commercial or academic) present important drawbacks including low accuracy, low performance (response time around 100-1000 ms), high computational cost and/or high price (around €500/month on average) relegating these solutions to consolidated big brands, social listening agencies or consulting firms offering social listening services.

The aim of SALSA was to democratize the analysis and transformation of internet/social data into knowledge creation for decision makers, making large-scale sentiment analysis technology viable for small entities without massive computational power. SALSA explored the potential of the powerful models and algorithms developed within ERC Starting Grant FASTPARSE to create the first AI-based syntax-guided sentiment analysis engine which is: a) accurate, due to using syntactic information to infer the opinions contained in each sentence from its structure and the relationship between its words, rather than shallow methods that consider words in isolation and b) cost-effective, due to employing fast parsers that have a throughput in the order of 1000 sentences per second on consumer-grade hardware, and that can work without time- and memory-hungry large language models.

SALSA followed an open-source software business model in which we will explored several sources of revenue, most based on service-level agreement. This will highly contribute to the competitiveness of the EU tech, reducing their dependency on the oligopoly of technological giants (mostly American and Chinese) that currently have a dominant position in language technologies, largely thanks to their enormous computational resources.

The goals of SALSA were achieved, as SALSA successfully leveraged fast syntactic analysis algorithms to create a sentiment analysis solution providing much faster processing than existing systems, while offering state-of-the-art accuracy. A demo system was developed and shown to potential partners and client companies in a series of interviews and demos to calibrate the market and define the business model. The reception by the participating entities was highly positive, showing high interest in the proposed system and preference for a SaaS model including revenue sharing and flexible payment options, and showed that SALSA has a great opportunity to develop into a successful business by offering sentiment analysis services.

From the scientific research standpoint, we have produced a wide range of scientific results, spanning two main aspects: (1) improving fast syntactic parsing algorithms that can efficiently obtain the internal structure of sentences, achieving greater accuracy and robustness while preserving speed, and (2) using them to power fast, accurate and explainable sentiment analysis. Among the scientific results, we here highlight some of the most relevant ones:

- Improvements to accuracy of syntactic parsing using the two most well-known and used syntax representations, i.e. dependency and constituency parsing, with developments using deep learning technologies such as hierarchical pointer networks and sequence-to-sequence models.

- Development of new methods for both dependency and constituency parsing that are fully incremental, inspired in how humans understand linguistic input from left to right, and allowing to save resources by using simpler architectures that only look at the words read so far.

- New methods for dependency parsing as sequence labeling (currently, the fastest known paradigm for parsing): we developed new, compact ways of encoding trees that improve this approach providing extra accuracy while keeping high speed.

- Development of a fast, accurate and explainable sentiment analysis system combining parsing as sequence labeling with syntactic rules to determine whether the opinion expressed in a text, in general or about specific aspects, is positive or negative.

- Studies on how this approach to sentiment analysis can be adapted to different domains via automatic acquisition of sentiment dictionaries.

On the other hand, the goals of this Proof-of-Concept Grant go beyond pure research to practical applicability in markets. In this respect, we have successfully performed various activities with the goal of turning the aforementioned sentiment analysis system into a viable product that can be launched to market. In particular, we have:

- Done a market analysis to analyze the market potential of our solution.

- Validated the value proposition by interviewing target users, as well as demoing our models to them in live workshops. Models were tested iteratively with potential users as they were built and improved, so we could make them more useful for the potential end users.

- Established contact with possible partners that could integrate our solution and help us enter the market, who also participated in the demos.

In all these activities, we observed a very positive reception, both from potential end users from industry and potential partners; and their feedback helped us establish the proposal's product-market fit, create a business model canvas and sketch a road to market, concluding that there is a great potential to achieve penetration.

First of all, we have substantially advanced the state of the art in syntactic parsing in several fronts. In terms of raw accuracy, we set new state-of-the-art results on several parsing benchmarks: English and Chinese dependency parsing for non-contextualized BERT-based embeddings, as well discontinuous English constituency parsing. In terms of speed, we created new sequence labeling encodings for dependency parsing that are bounded, providing an excellent speed-accuracy tradeoff. Finally, in terms of understanding, we have created the first fully incremental parsing models since the deep learning evolution, which further our understanding of how to create efficient models that mimic human processing.

In terms of applying parsing to sentiment analysis, which is the core goal of the project, we have advanced the state of the art by plugging our syntactic parsers together with rules to compute sentiment from sentence structure; achieving a system that has equivalent accuracy to existing syntax rule-based systems, while being much faster. The equivalent accuracy was an unexpected (positive) development, as such systems typically exhibit a speed-accuracy tradeoff where optimizing speed leads to sacrificing some accuracy, but this was not the case here; and makes our proposal especially competitive.

The key needs for the future after the project has finished lie in continuing the process of bringing this result to market by increasing the TRL and commercializing the system. The useful feedback gathered from potential end users and partners will be useful to try to evolve from demos to actual pilot programs in the real contexts demanded by the users; and the performed market and business model analyses offer potential roads for commercialization.One of the requests made by the partners interviewed during the project was the integration of Salsa through an API into their solutions. This integration was considered achievable with a modest development effort.

Periodic Reporting for period 1 - SALSA (efficient Syntactic Analysis for Large-scale Sentiment Analysis)

Compartir esta página Compartir esta página en las redes sociales

Descargar Descargar el contenido de la página