Mining and analysing opinions across languages
The 'Automated analysis of opinions in a multilingual context' (CROSSLINGMIND) project was established to develop a cross-lingual opinion mining (CLOM) system, applying knowledge in machine translation to adapt an existing monolingual opinion analysis system. The existing monolingual opinion mining pipeline was adapted to the type of content generated by social media users and to languages of high interest for potential industrial partners. Related work resulted in the publication of 'Selection of correction candidates for the normalization of Spanish user-generated content' in the journal Natural Language Engineering. An opinion mining pipeline was also built for the Portuguese language. At the same time, CROSSLINGMIND conducted an in-depth study of the state of the art in CLOM, with particular emphasis on cross-lingual algorithms and applications used in natural language processing. The project also generated a lexicon of sentiment words (the LIWC lexicon) in Catalan via triangulation from similar lexicons in other languages. CROSSLINGMIND developed an aspect-based CLOM system that performs an analysis at the level of the aspects of the entities about which opinions are expressed. Trained on data from the Seventh Framework Programme (FP7) OPENER project, the system was annotated at the opinionated unit level, in several languages, in the hotel review domain. The various activities and deliverables have been presented at conferences and developed as tutorials, courses and special journal issues. The project-developed system proved that aspect-based CLOM is achievable and can return competitive classification results. It has been designed for use in real-life settings and has the potential to be part of a technology transfer framework. This development is slated to help improve the competitiveness of European companies, organisations and institutions that rely on general public opinion to effect changes and growth. It will also break down language barriers hindering communication and fragmenting exploitable data and the digital market. Eventually, the general public should also be able to use it.
Keywords
Multilingual context, cross-lingual opinion mining, opinion mining, machine translation, opinion analysis