Periodic Reporting for period 1 - SALSA (efficient Syntactic Analysis for Large-scale Sentiment Analysis)
Período documentado: 2023-02-01 hasta 2024-07-31
The aim of SALSA was to democratize the analysis and transformation of internet/social data into knowledge creation for decision makers, making large-scale sentiment analysis technology viable for small entities without massive computational power. SALSA explored the potential of the powerful models and algorithms developed within ERC Starting Grant FASTPARSE to create the first AI-based syntax-guided sentiment analysis engine which is: a) accurate, due to using syntactic information to infer the opinions contained in each sentence from its structure and the relationship between its words, rather than shallow methods that consider words in isolation and b) cost-effective, due to employing fast parsers that have a throughput in the order of 1000 sentences per second on consumer-grade hardware, and that can work without time- and memory-hungry large language models.
SALSA followed an open-source software business model in which we will explored several sources of revenue, most based on service-level agreement. This will highly contribute to the competitiveness of the EU tech, reducing their dependency on the oligopoly of technological giants (mostly American and Chinese) that currently have a dominant position in language technologies, largely thanks to their enormous computational resources.
The goals of SALSA were achieved, as SALSA successfully leveraged fast syntactic analysis algorithms to create a sentiment analysis solution providing much faster processing than existing systems, while offering state-of-the-art accuracy. A demo system was developed and shown to potential partners and client companies in a series of interviews and demos to calibrate the market and define the business model. The reception by the participating entities was highly positive, showing high interest in the proposed system and preference for a SaaS model including revenue sharing and flexible payment options, and showed that SALSA has a great opportunity to develop into a successful business by offering sentiment analysis services.
- Improvements to accuracy of syntactic parsing using the two most well-known and used syntax representations, i.e. dependency and constituency parsing, with developments using deep learning technologies such as hierarchical pointer networks and sequence-to-sequence models.
- Development of new methods for both dependency and constituency parsing that are fully incremental, inspired in how humans understand linguistic input from left to right, and allowing to save resources by using simpler architectures that only look at the words read so far.
- New methods for dependency parsing as sequence labeling (currently, the fastest known paradigm for parsing): we developed new, compact ways of encoding trees that improve this approach providing extra accuracy while keeping high speed.
- Development of a fast, accurate and explainable sentiment analysis system combining parsing as sequence labeling with syntactic rules to determine whether the opinion expressed in a text, in general or about specific aspects, is positive or negative.
- Studies on how this approach to sentiment analysis can be adapted to different domains via automatic acquisition of sentiment dictionaries.
On the other hand, the goals of this Proof-of-Concept Grant go beyond pure research to practical applicability in markets. In this respect, we have successfully performed various activities with the goal of turning the aforementioned sentiment analysis system into a viable product that can be launched to market. In particular, we have:
- Done a market analysis to analyze the market potential of our solution.
- Validated the value proposition by interviewing target users, as well as demoing our models to them in live workshops. Models were tested iteratively with potential users as they were built and improved, so we could make them more useful for the potential end users.
- Established contact with possible partners that could integrate our solution and help us enter the market, who also participated in the demos.
In all these activities, we observed a very positive reception, both from potential end users from industry and potential partners; and their feedback helped us establish the proposal's product-market fit, create a business model canvas and sketch a road to market, concluding that there is a great potential to achieve penetration.
In terms of applying parsing to sentiment analysis, which is the core goal of the project, we have advanced the state of the art by plugging our syntactic parsers together with rules to compute sentiment from sentence structure; achieving a system that has equivalent accuracy to existing syntax rule-based systems, while being much faster. The equivalent accuracy was an unexpected (positive) development, as such systems typically exhibit a speed-accuracy tradeoff where optimizing speed leads to sacrificing some accuracy, but this was not the case here; and makes our proposal especially competitive.
The key needs for the future after the project has finished lie in continuing the process of bringing this result to market by increasing the TRL and commercializing the system. The useful feedback gathered from potential end users and partners will be useful to try to evolve from demos to actual pilot programs in the real contexts demanded by the users; and the performed market and business model analyses offer potential roads for commercialization.One of the requests made by the partners interviewed during the project was the integration of Salsa through an API into their solutions. This integration was considered achievable with a modest development effort.