Cross-Lingual Embeddings for Less-Represented Languages in European News Media

Resultado final

Final context-dependent and dynamic embeddings technology (T1.2)

Contextaware crosslingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented report and source code T12

Initial cross-lingual and multilingual embeddings technology (T1.1)

Initial embeddings and transformations between a selection of all targeted languages (Estonian, Finnish, Swedish, Latvian, Lithuanian, Croatian, Slovene, English, Russian) (report and source code) (T1.1)

Initial cross-lingual semantic enrichment technology (T2.1)

Initial approach to named entity (NE) extraction and disambiguation and event detection, covering multiple domains and languages (report and source code) (T2.1).

Datasets, benchmarks and evaluation metrics for cross-lingual content analysis (T4.4)

Gathering and preprocessing training and testing data (Estonian, Latvian, Lithuanian, Russian, Croatian, Finnish and English) provided by the media partners (report and dataset) (T4.4) .

Initial deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languages by using character-level inputs and additional information on morphology (suffixes, prefixes, separately trained POS tags) (report and source code) (T1.3).

Interim report on ethics and responsible science and journalism (T6.5)

Interim report on ethics and responsible science and journalism, with analysis of news production and new tool development (T6.5).

Final evaluation report on cross-lingual user generated content filtering and analysis technology (T3.4)

Producing datasets for evaluation and development of algorithms T34

Final dynamic multilingual news generation technology (T5.2)

Development of a novel method for automatically organising news articles to be maximally informative to the assumed reader report and source code T52

Final cross-lingual news viewpoints identification technology (T4.3)

Development of methods for detecting viewpoints and sentiments based on media sources report and source code T43

Final real-time multilingual news linking technology (T4.1)

Development of tools for linking news stories across languages based on their topics andcontents report and source code T41

Final evaluation report on cross-lingual content analysis technology (T4.4)

All tools developed in WP4 will be evaluated using the produced datasets and manual user evaluation T44

Final report on ethics and responsible science and journalism (T6.5).

Final report on ethics and responsible science and journalism T65

Initial interpretability and visualisation technology (T1.4)

Initial approaches to explanation of deep learning models by adoptation of perturbation based explanation methods based on coalitional game theory to ext classification and initial development of visual tools for visually explaining the classification process. (report and source code) (T1.4).

Final tehnology for multilingual and self-explainable news generation (T5.1)

Based on the analysis of newsrooms WP6 the NLG technology will be adapted for the requirements of news generation The task will develop mechanisms for i determining what is interesting or important in the given data and deciding what to report and for ii rendering that information in an accurate manner iii in multiple languages report and source code T51

Final evaluation report on cross-lingual embedding technology (T1.5)

Report on evaluation of the crosslingual and multilingual embeddings on public datasetsand challenges T15

Initial context-dependent and dynamic embeddings technology (T1.2)

Context-aware cross-lingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented (report and source code) (T1.2).

Report on user needs and challenges for news media industry (T6.1).

Initial report on identification and analysis of needs of different stakeholders in news media industry. We will arrange workshop to identify in detail challenges that are specific to operations of different media partners and prepare a specifications documentation (T6.1).

Recommendations on avoiding gender and other biases (T6.4)

The means to avoid and detect gender and other biases in news media contents creation will be developped in T6.4. This deliverable will propose the recommendations for avoiding gender bias (T6.4).

Final interpretability and visualisation technology (T1.4)

Adoptation of three most popular perturbation based explanation methods based on coalitional game theory IME LIME and SHAP to be suitable for text classification and development of visualisation techniques where different explanatory lexical units in the source texts words ngrams sentences are visualizedreport and source code T14

Initial cross-lingual context and opinion analysis technology (T3.1)

Report on initial developed technology for a range of user comment analyses, including topic modelling, conversation structure and context modelling, sentiment, stance and opinion detection and effect and information spread measurement (report and source code) (T3.1).

Final report on gender bias in content creation (T6.4)

Final report on gender bias in content creation T64

Reusable EMBEDDIA components available through the ClowdFlows web interface (T7.4)

Developed tools and procedures will be incorporated as widgets and make them available beyond the media context and assure reusability and repeatability of experiments report and source code T74

Initial multilingual news linking technology (T4.1)

Development of initial tools for linking news stories across languages based on their topics and contents (report and source code) (T4.1).

Initial keyword extraction techniques (T2.2)

Initial keyword extraction by application of statistical approaches (based on heuristics), machine learning approaches, as well as graph-based approaches (report and source code) (T2.2).

Final cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual languageindependent multidocument news summarisation report and source code T42

Initial dynamic news generation technology (T5.2)

Development of a novel method for automatically organising news articles, considering the domain of the article, effects of time and news repetition (report and source code) (T5.2).

Refined analysis of news media partners’ needs and challenges (T6.1).

Refined report of news media partners’ needs and challenges and their analysis with regard to the state of the art in NLP for news media (T6.1).

Final cross-lingual and multilingual embeddings technology (T1.1)

Embeddings and transformations between all targeted languages including EstonianFinnish Swedish Latvian Lithuanian Croatian Slovene as well as English and Russian report and source code T11

Report generator from multilingual comments (T3.3)

Report on developed and implemented methods for generating humanreadable reports in multiple languages from the outputs of the methods developed in T31 and T32 report and source code T33

Datasets, benchmarks and evaluation metrics for cross-lingual user generated content filtering and analysis (T3.4)

Evaluation and development of algorithms requires relevant, annotated, and multilingual datasets (report and dataset) (T3.4).

Final evaluation report on advanced cross-lingual NLP technology (T2.4)

Final report on existing evaluation datasets and benchmarks for NER NEL and event detection for instance ACE Meantime and TAC KBPs Entity Discovery and Linking tasks report and dataset T24

Final deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languagesby using characterlevel inputs and additional information on morphology suffixes prefixes separately trained POS tags report and source code T13

Multilingual language generation approach (T2.3)

Incorporating hybrid techniques in the architecture, to take advantage of the robustness of machine learning techniques and transparency of rule-based techniques. Adaptation of the context-aware word-embeddings developed in T1.2 to improve fluency and variability in the generated texts (report and source code) (T2.3).

Final multilingual keyword extraction techniques (T2.2)

Application and further development of statistical approaches based on heuristicsmachine learning approaches as well as graphbased approaches report and source code T22

Initial news generation technology (T5.1)

Based on the analysis of newsrooms (WP6), the NLG technology will be adapted for the requirements of news generation. The task will develop mechanisms for (i) determining what is interesting or important in the given data and deciding what to report, and for (ii) rendering that information in an accurate manner (iii) in multiple languages (report and source code) (T5.1).

Final report on EMBEDDIA Assistant platform evaluation (T6.3)

Final report on EMBEDDIA Assistant platform evaluation by media partners T63

Platform requirements documentation and platform design (T6.2)

The EMBEDDIA Toolkit will incorporate different tools and resources developed in WP1–WP5 and on top of it build the EMBEDDIA Media Assistant platform. The platform will be built as a series of base microservices, functional microservices and task oriented APIs. This deliverable will report on platform requirements and platform design (T6.2).

Final cross-lingual comment filtering technology (T3.2)

Final report on developed tools for automatic flagging or filtering of user comments specifically targeted at the use cases defined by end user partners in WP6 eg detection of hate speech and political trolling attempts to elicit extreme reactions and influence others opinions report and source code T32

Initial cross-lingual news viewpoints identification technology (T4.3)

Initial approaches for detecting viewpoints and sentiments based on media sources (report and source code) (T4.3) .

Final cross-lingual semantic enrichment technology (T2.1)

Generalization of approaches to multiple domains and languages large scale corpora and integrating crosslingual embeddings report and source code T21

Creative multilingual technology for news and headline generation (T5.3)

We will make the generated texts more varied and colourful by generating creative expressions especially in headlines report and source code T53

Final cross-lingual context and opinion analysis technology (T3.1)

Final report on developed technology for a range of user comment analyses including topic modelling conversation structure and context modelling sentiment stance and opinion detection and effect and information spread measurement report and source code T31

Datasets, benchmarks and evaluation metrics for advanced cross-lingual NLP technology (T2.4)

Report on existing evaluation datasets and benchmarks for NER, NEL and event detection (for instance, ACE, Meantime and TAC KBP’s Entity Discovery and Linking tasks) (report and dataset) (T2.4).

Initial cross-lingual comment filtering technology (T3.2)

Report on developed tools for automatic flagging or filtering of user comments, specifically targeted at the use cases defined by end user partners in WP6, e.g., detection of hate speech and political trolling, attempts to elicit extreme reactions and influence others’ opinions (report and source code) (T3.2).

Datasets, benchmarks and evaluation metrics for multilingual text generation (T5.4)

From news partners texts (news stories) and structured datasets from which news can be generated will be collected (report and datasets) and methodology for evaluation defined (T5.4).

Selected EMBEDDIA components in ClowdFlows (T7.4)

Initial selection of tools and procedures incorporated as widgets in webbased platform Clowsflows to make them available beyond the media context and assure reusability and repeatability of experiments report and source code T74

Initial cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual language-independent multi-document news summarisation (report and source code) (T4.2).

Final evaluation report on multilingual text generation technology (T5.4)

Final evaluation report on multilingual text generation technology T54

Datasets, benchmarks and evaluation metrics for cross-lingual word embeddings (T1.5)

A repository of training and evaluation data, stored in a dedicated GitHub repository (report and datasets) (T1.5).

Final EMBEDDIA Media Assistant platform, packaged in docker container (T6.2)

Final EMBEDDIA Media Assistant platform incorporating different tools and resourcespackaged in docker container report and source code T62

Project website and social media accounts (T7.1)

Created project website --- which will function both as a project dissemination tool and for providing access to the technical outcomes produced by the project --- and social media accounts/pages on relevant social networks will be created (T7.1)


To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Autores: Kristian Miok, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Publicado en: Cognitive Computation, 2021, ISSN 1866-9956
Editor: Springer Verlag
DOI: 10.1007/s12559-021-09826-9

Cross-lingual alignments of ELMo contextual embeddings

Autores: Ulčar, Matej; Robnik-Šikonja, Marko
Publicado en: Neural Computing and Applications, Edición 3, 2022, ISSN 0941-0643
Editor: Springer Verlag
DOI: 10.1007/s00521-022-07164-x

NeSyChair: Automatic Conference Scheduling Combining Neuro-Symbolic Representations and Constrained Clustering

Autores: Škvorc, Tadej; Lavrač, Nada; Robnik-Šikonja, Marko
Publicado en: IEEE Access, Edición 10, 2022, ISSN 2169-3536
Editor: Institute of Electrical and Electronics Engineers Inc.
DOI: 10.1109/ACCESS.2022.3144932

autoBOT: evolving neuro-symbolic representations for explainable low resource text classification

Autores: Blaž Škrlj, Matej Martinc, Nada Lavrač, Senja Pollak
Publicado en: Machine Learning, 2021, ISSN 0885-6125
Editor: Kluwer Academic Publishers
DOI: 10.1007/s10994-021-05968-x

MICE: Mining Idioms with Contextual Embeddings

Autores: Škvorc, Tadej; Gantar, Polona; Robnik-Šikonja, Marko
Publicado en: Knowledge-Based Systems, Edición 237, 2022, ISSN 0950-7051
Editor: Elsevier BV
DOI: 10.1016/j.knosys.2021.107606

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Autores: Andraž Pelicon, Marko Pranjić, Dragana Miljković, Blaž Škrlj, Senja Pollak
Publicado en: Applied Sciences, Edición 10/17, 2020, Página(s) 5993, ISSN 2076-3417
Editor: MDPI
DOI: 10.3390/app10175993

Supervised and Unsupervised Neural Approaches to Text Readability

Autores: Matej Martinc; Senja Pollak; Marko Robnik-Šikonja
Publicado en: Computational Linguistics, Edición 47.1, 2021, Página(s) 141-179, ISSN 0891-2017
Editor: MIT Press
DOI: 10.1162/coli_a_00398

Nazaj v prihodnost: avtomatizacija in preobrazba novinarske epistemologije

Autores: Igor Vobič, Marko Robnik Šikonja, Monika Kalin Golob
Publicado en: Javnost - The Public, Edición 26/sup1, 2019, Página(s) S41-S61, ISSN 1318-3222
Editor: European Institute for Communication and Culture
DOI: 10.1080/13183222.2019.1696600

What makes a reporter human? A Research Agenda for Augmented Journalism

Autores: Lindén, Carl-Gustav
Publicado en: Questions de communication, 2020, ISSN 2259-8901
Editor: Presses universitaires de Lorraine
DOI: 10.4000/questionsdecommunication.23301

Cross-lingual Transfer of Sentiment Classifiers

Autores: Robnik-Šikonja, Marko; Reba, Kristjan; Mozetič, Igor
Publicado en: Slovenščina 2.0, Edición 9(1), 2021, Página(s) 1-25, ISSN 2335-2736
Editor: Ljubljana University Press, Faculty of Arts
DOI: 10.4312/slo2.0.2021.1.1-25

Completability vs (In)completeness

Autores: Eleni Gregoromichelaki, Gregory James Mills, Christine Howes, Arash Eshghi, Stergios Chatzikyriakidis, Matthew Purver, Ruth Kempson, Ronnie Cann, Patrick G. T. Healey
Publicado en: Acta Linguistica Hafniensia, Edición 52/2, 2020, Página(s) 260-284, ISSN 0374-0463
Editor: Nordisk Sprog- og Kulturforlag
DOI: 10.1080/03740463.2020.1795549

TNT-KID: Transformer-based neural tagger for keyword identification

Autores: Matej Martinc, Blaž Škrlj, Senja Pollak
Publicado en: Natural Language Engineering, 2021, Página(s) 1-40, ISSN 1351-3249
Editor: Cambridge University Press
DOI: 10.1017/s1351324921000127

Investigating cross-lingual training for offensive language detection

Autores: Andraž Pelicon, Ravi Shekhar, Blaž Škrlj, Matthew Purver, Senja Pollak
Publicado en: PeerJ Computer Science, Edición 7, 2021, Página(s) e559, ISSN 2376-5992
Editor: PeerJ Publishing
DOI: 10.7717/peerj-cs.559

Journalistic Passion as Commodity : A Managerial Perspective

Autores: Carl-Gustav Lindén; Katja Lehtisaari; Mikko Grönlund; Mikko Villi
Publicado en: Journalism Studies, Edición 22(12), 2021, Página(s) 1701--1719, ISSN 1461-670X
Editor: Routledge
DOI: 10.1080/1461670x.2021.1911672

Re-Representing Metaphor: Modeling Metaphor Perception Using Dynamically Contextual Distributional Semantics

Autores: Stephen McGregor, Kat Agres, Karolina Rataj, Matthew Purver, Geraint Wiggins
Publicado en: Frontiers in Psychology, Edición 10, 2019, ISSN 1664-1078
Editor: Frontiers Research Foundation
DOI: 10.3389/fpsyg.2019.00765

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Autores: Blaž Škrlj, Jan Kralj, Nada Lavrač, Senja Pollak
Publicado en: Machine Learning and Knowledge Extraction, Edición 1/2, 2019, Página(s) 575-589, ISSN 2504-4990
Editor: MDPI AG
DOI: 10.3390/make1020034

Predicting Slovene Text Complexity Using Readability Measures

Autores: Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt, Marko Robnik-Šikonja
Publicado en: In Contributions to Contemporary History, 2019, ISSN 2463-7807
Editor: OJS/PKP

Combining n -grams and deep convolutional features for language variety classification

Autores: Matej Martinc, Senja Pollak
Publicado en: Natural Language Engineering, Edición 25/5, 2019, Página(s) 607-632, ISSN 1351-3249
Editor: Cambridge University Press
DOI: 10.1017/S1351324919000299


Autores: Andraž Repar, Vid Podpečan, Anže Vavpetič, Nada Lavrač, Senja Pollak
Publicado en: Terminology, Edición 25/1, 2019, Página(s) 93-120, ISSN 0929-9971
Editor: John Benjamins Publishing Company
DOI: 10.1075/term.00029.rep

Reproduction, replication, analysis and adaptation of a term alignment approach

Autores: Andraž Repar, Matej Martinc, Senja Pollak
Publicado en: Language Resources and Evaluation, 2019, ISSN 1574-020X
Editor: Springer Verlag
DOI: 10.1007/s10579-019-09477-1

‘Our task is to demystify fears’: Analysing newsroom management of automation in journalism

Autores: Marko Milosavljević, Igor Vobič
Publicado en: Journalism, 2019, Página(s) 146488491986159, ISSN 1464-8849
Editor: SAGE Publications
DOI: 10.1177/1464884919861598

Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge

Autores: Saturnino Luz, Shane Sheehan
Publicado en: Palgrave Communications, Edición 6/1, 2020, ISSN 2055-1045
Editor: Humanities and Social Sciences Communications
DOI: 10.1057/s41599-020-0423-6

Exploring the Relations Between Net Benefits of IT Projects and CIOs’ Perception of Quality of Software Development Disciplines

Autores: Damjan Vavpotič, Marko Robnik-Šikonja, Tomaž Hovelja
Publicado en: Business & Information Systems Engineering, 2019, ISSN 2363-7005
Editor: Springer Gabler
DOI: 10.1007/s12599-019-00612-4

Data Journalism as a Service: Digital Native Data Journalism Expertise and Product Development

Autores: Ester Appelgren, Carl-Gustav Lindén
Publicado en: Media and Communication, Edición 8/2, 2020, Página(s) 62, ISSN 2183-2439
Editor: Cogitatio
DOI: 10.17645/mac.v8i2.2757

How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context

Autores: Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
Publicado en: Transactions of the Association for Computational Linguistics, Edición 8, 2020, Página(s) 296-310, ISSN 2307-387X
Editor: The MIT Press
DOI: 10.1162/tacl_a_00315

Computational generation of slogans

Autores: Khalid Alnajjar, Hannu Toivonen
Publicado en: Natural Language Engineering, 2020, Página(s) 1-33, ISSN 1351-3249
Editor: Cambridge University Press
DOI: 10.1017/S1351324920000236

In the Name of the Right to be Forgotten: New Legal and Policy Issues and Practices regarding Unpublishing Requests in Slovenian Online News Media

Autores: Marko Milosavljević, Melita Poler, Rok Čeferin
Publicado en: Digital Journalism, 2020, Página(s) 1-17, ISSN 2167-0811
Editor: Taylor & Francis
DOI: 10.1080/21670811.2020.1747942

(Mis)Information Operations: An Integrated Perspective

Autores: Cinelli, Matteo; Conti, Mauro; Finos, Livio; Grisolia, Francesco; Kralj Novak, Petra; Peruzzi, Antonio; Tesconi, Maurizio; Zollo, Fabia; Quattrociocchi, Walter
Publicado en: Journal of Information Warfare, Edición 18(3), 2020, ISSN 1445-3312
Editor: Mt. Eliza : Teamlink Australia

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Autores: Linhares Pontes, Elvys; Huet, Stéphane; Torres Moreno, Juan Manuel; Gouveia da Silva, Thiago; Carneiro Linhares, Andréa
Publicado en: Computación y Sistemas, Edición 24(2), 2020, ISSN 1405-5546
Editor: Centro de Investigacion en Computacion (CIC) del Instituto Politecnico Nacional (IPN)

Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting

Autores: Leo Leppänen, Hanna Tuulonen, Stefanie Sirén-Heikel
Publicado en: Media and Communication, Edición 8/3, 2020, Página(s) 39, ISSN 2183-2439
Editor: Cogitatio
DOI: 10.17645/mac.v8i3.3022

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

Autores: Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak
Publicado en: Computer Speech & Language, Edición 65, 2021, Página(s) 101104, ISSN 0885-2308
Editor: Academic Press
DOI: 10.1016/j.csl.2020.101104

Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles

Autores: Koloski, Boshko; Stepišnik-Perdih, Timen; Robnik-Šikonja, Marko; Pollak, Senja; Škrlj, Blaž
Publicado en: Neurocomputing journal, 2022, ISSN 0925-2312
Editor: Elsevier BV
DOI: 10.1016/j.neucom.2022.01.096

Cross-lingual transfer of abstractive summarizer to less-resource language

Autores: Aleš Žagar, Marko Robnik-Šikonja
Publicado en: Journal of Intelligent Information Systems, 2021, ISSN 0925-9902
Editor: Kluwer Academic Publishers
DOI: 10.1007/s10844-021-00663-8

Bisociative Literature-Based Discovery: Lessons Learned and New Word Embedding Approach

Autores: Nada Lavrač, Matej Martinc, Senja Pollak, Maruša Pompe Novak, Bojan Cestnik
Publicado en: New Generation Computing, Edición 38/4, 2020, Página(s) 773-800, ISSN 0288-3635
Editor: Springer Verlag
DOI: 10.1007/s00354-020-00108-w

Propositionalization and embeddings: two sides of the same coin

Autores: Nada Lavrač; Nada Lavrač; Blaž Škrlj; Marko Robnik-Šikonja
Publicado en: Machine Learning, Edición 109, 2020, ISSN 0885-6125
Editor: Kluwer Academic Publishers
DOI: 10.1007/s10994-020-05890-8

Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Autores: Shekhar, Ravi; Pranjić. Marko; Pollak, Senja; Pelicon, Andraž; Purver, Matthew
Publicado en: Journal for Language Technology and Computational Linguistics, Edición 2, 2020, Página(s) 49-79, ISSN 2190-6858
Editor: German Society for Computational Linguistics and Language Technology (GSCL)
DOI: 10.5281/zenodo.4032371

Enhancing deep neural networks with morphological information

Autores: Klemen, Matej; Krsnik, Luka; Robnik-Šikonja, Marko
Publicado en: Natural Language Engineering, 2022, ISSN 1351-3249
Editor: Cambridge University Press
DOI: 10.1017/S1351324922000080

Slovene and Croatian word embeddings in terms of gender occupational analogies

Autores: Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Publicado en: Slovenščina 2.0: empirical, applied and interdisciplinary research, Edición 9/1, 2021, Página(s) 26-59, ISSN 2335-2736
Editor: Ljubljana University Press, Faculty of Arts
DOI: 10.4312/slo2.0.2021.1.26-59

MELHISSA: A Multilingual Entity Linking Architecture for Historical Press Articles

Autores: Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrian; Moreno, Jose G.; Boros, Emanuela; Hamdi, Ahmed; Doucet, Antoine; Sidere, Nicolas; Coustaty, Mickael
Publicado en: International Journal on Digital Libraries, 2021, ISSN 1432-1300
Editor: Springer
DOI: 10.1007/s00799-021-00319-6

Recycling a genre for news automation

Autores: Lauri Haapanen, Leo Leppänen
Publicado en: AILA Review, Edición 33, 2020, Página(s) 67-85, ISSN 1461-0213
Editor: John Benjamins Publishing Company
DOI: 10.1075/aila.00030.haa

Incremental Composition in Distributional Semantics

Autores: Matthew Purver, Mehrnoosh Sadrzadeh, Ruth Kempson, Gijs Wijnholds, Julian Hough
Publicado en: Journal of Logic, Language and Information, Edición 30/2, 2021, Página(s) 379-406, ISSN 0925-8531
Editor: Kluwer Academic Publishers
DOI: 10.1007/s10849-021-09337-8

Kratt: Developing an Automatic Subject Indexing Tool for the National Library of Estonia

Autores: Asula, Marit; Makke, Jane; Freienthal, Linda; Kuulmets, Hele-Andra; Sirel, Raul
Publicado en: Cataloging & Classification Quarterly, Edición 59:8, 2021, Página(s) 775-793, ISSN 0163-9374
Editor: Haworth Press Inc.
DOI: 10.1080/01639374.2021.1998283

SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

Autores: Sebastian Meznar, Nada Lavrac, Blaz Skrlj
Publicado en: IEEE Access, Edición 8, 2020, Página(s) 212568-212588, ISSN 2169-3536
Editor: Institute of Electrical and Electronics Engineers Inc.
DOI: 10.1109/access.2020.3039541

Token-Level Multilingual Epidemic Dataset for Event Extraction

Autores: Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Gaël Lejeune, Adam Jatowt, Moses Odeo
Publicado en: Linking Theory and Practice of Digital Libraries - 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Virtual Event, September 13–17, 2021, Proceedings, Edición 12866, 2021, Página(s) 55-59, ISBN 978-3-030-86323-4
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-86324-1_6

Entity Linking for Historical Documents: Challenges and Solutions

Autores: Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Emanuela Boros, Ahmed Hamdi, Nicolas Sidère, Mickaël Coustaty, Antoine Doucet
Publicado en: Digital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Edición 12504, 2020, Página(s) 215-231, ISBN 978-3-030-64451-2
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-64452-9_19

Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning

Autores: Blaž Škrlj, Marko Jukič, Nika Eržen, Senja Pollak, Nada Lavrač
Publicado en: Discovery Science - 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings, Edición 12986, 2021, Página(s) 204-217, ISBN 978-3-030-88941-8
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-88942-5_16

Identification of COVID-19 Related Fake News via Neural Stacking

Autores: Boshko Koloski, Timen Stepišnik-Perdih, Senja Pollak, Blaž Škrlj
Publicado en: Combating Online Hostile Posts in Regional Languages during Emergency Situation - First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers, Edición 1402, 2021, Página(s) 177-188, ISBN 978-3-030-73695-8
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-73696-5_17

FinEst BERT and CroSloEngual BERT - Less Is More in Multilingual Models

Autores: Matej Ulčar, Marko Robnik-Šikonja
Publicado en: Text, Speech, and Dialogue - 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings, Edición 12284, 2020, Página(s) 104-111, ISBN 978-3-030-58322-4
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-58323-1_11

RaKUn: Rank-based Keyword Extraction via Unsupervised Learning and Meta Vertex Aggregation

Autores: Blaž Škrlj, Andraž Repar, Senja Pollak
Publicado en: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Edición 11816, 2019, Página(s) 311-323, ISBN 978-3-030-31371-5
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_26

Language Comparison via Network Topology

Autores: Blaž Škrlj, Senja Pollak
Publicado en: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Edición 11816, 2019, Página(s) 112-123, ISBN 978-3-030-31371-5
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_10

Prediction Uncertainty Estimation for Hate Speech Classification

Autores: Kristian Miok, Dong Nguyen-Doan, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Publicado en: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Edición 11816, 2019, Página(s) 286-298, ISBN 978-3-030-31371-5
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_24

Symbolic Graph Embedding Using Frequent Pattern Mining

Autores: Blaž Škrlj, Nada Lavrač, Jan Kralj
Publicado en: Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings, Edición 11828, 2019, Página(s) 261-275, ISBN 978-3-030-33777-3
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-33778-0_21

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

Autores: Pollak, Senja; Robnik-Šikonja, Marko; Purver, Matthew; Boggia, Michele; Shekhar, Ravi; Pranjić, Marko; Salmela, Salla; Krustok, Ivar; Paju, Tarmo; Linden, Carl-Gustav; Leppänen, Leo; Zosa, Elaine; Ulčar, Matej; Freienthal, Linda; Traat, Silver; Cabrera-Diego, Luis Adrián; Martinc, Matej; Lavrač, Nada; Škrlj, Blaž; Žnidaršič, Martin; Pelicon, Andraž; Koloski, Boshko; Podpečan, Vid; Kra
Publicado en: Edición Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730464

EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+

Autores: Martinc, Matej; Perger, Nina; Pelicon, Andraž; Ulčar, Matej; Vezovnik, Andreja; Pollak, Senja
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730336

Exploring Neural Language Models via Analysis of Local and Global Self-Attention Spaces

Autores: Škrlj, Blaž; Sheehan, Shane; Eržen, Nika; Robnik-Šikonja, Marko; Luz, Saturnino; Pollak, Senja
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730396

Grammatical Profiling for Semantic Change Detection

Autores: Giulianelli, Mario; Kutuzov, Andrey; Pivovarova, Lidia
Publicado en: In the Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021), 2021, Página(s) 423-434
Editor: ACL

Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Autores: Robnik-Šikonja, Marko; Reba, Kristijan; Mozetič, Igor
Publicado en: In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Página(s) 87-92
Editor: Institute of Contemporary History
DOI: 10.5281/zenodo.4059725

When a Computer Cracks a Joke: Automated Generation of Humorous Headlines

Autores: Alnajjar, Khalid; Hämäläinen, Mika
Publicado en: In the Proceedings of the 12th International Conference on Computational Creativity (ICCC21), 2021, ISBN 978-989-54160-3-5
Editor: Association for Computational Creativity

Knowledge graph aware text classification

Autores: Petrželková, Nela; Škrlj, Blaž; Lavrač, Nada
Publicado en: In Proceedings of the 23rd International Multiconference – IS2020, 2020
Editor: Jožef Stefan Institute
DOI: 10.5281/zenodo.4072961

Relation Classification via Relation Validation

Autores: Moreno, Jose G.; Doucet, Antoine; Grau, Brigitte
Publicado en: Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730492

Simple ways to improve NER in every language using markup

Autores: Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Publicado en: In Proceedings of ECIR 2021, 2021
Editor: CEUR Workshops
DOI: 10.5281/zenodo.4680998

A bilingual approach to specialised adjectives through word embeddings in the karstology domain

Autores: Grčić Simeunović, Larisa; Martinc, Matej; Vintar, Špela
Publicado en: In Proceedings of TOTH 2020, 2020
Editor: Université Savoie Mont Blanc
DOI: 10.5281/zenodo.6435390

Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism

Autores: Rämö, Miia; Leppänen, Leo
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730334

Know your Neighbors: Efficient Author Profiling via Follower Tweets

Autores: Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Publicado en: Notebook for PAN at CLEF 2020, 2020
DOI: 10.5281/zenodo.4059641

Corpus KAS 2.0: Cleaner and with New Datasets

Autores: Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko
Publicado en: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021
Editor: Jožef Stefan Institute

Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques

Autores: Emanuela Boros; Ahmed Hamdi; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Nicolas Sidère; Antoine Doucet
Publicado en: Edición 29, 2021
Editor: l’Association Francophone de Recherche d’Information et Applications ARIA
DOI: 10.5281/zenodo.4734435

Automated Hate Speech Target Identification

Autores: Pelicon, Andraž; Škrlj, Blaž; Kralj Novak, Petra
Publicado en: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021
Editor: Jožef Stefan Institute

Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization,Classification, and Linking of Named Entities across Slavic languages

Autores: Piskorski et al
Publicado en: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730512

Bayesian Methods for Semi-supervised Text Annotation

Autores: Miok, Kristian; Pirs, Gregor; Robnik-Sikonja, Marko
Publicado en: In Proceedings of the 14th Linguistic Annotation Workshop Co-located with COLING 2020, Edición 2, 2020
Editor: Association for Computational Linguistics

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

Autores: Ulčar, Matej; Robnik-Šikonja, Marko
Publicado en: In the Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts (AIST 2021), 2021
Editor: Springer

Preliminary experimentation with combinations and extensions of forward-looking sentence detection wordlists

Autores: Štihec, Jan; Pollak, Senja; Žnidaršič, Martin
Publicado en: In Proceedings of the 3rd financial narrative processing workshop, 2021
Editor: Association for Computational Linguistics

Bayesian BERT for Trustful Hate Speech Detection

Autores: Miok, Kristian; Škrlj, Blaž; Zaharie, Daniela; Robnik-Šikonja, Marko
Publicado en: ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning, 2021
Editor: ICML UDL

Underreporting of errors in NLG output, and what to do about it

Autores: van Miltenburg, Emiel; Clinciu, Miruna; Dušek, Ondrej; Gkatzia, Dimitra; Inglis, Stephanie; Leppänen, Leo; Mahamood, Saad; Manning, Emma; Schoch, Stephanie; Thomson, Craig; Wen, Luou
Publicado en: In the Proceedings of the 14th International Conference on Natural Language Generation, 2021
Editor: Association for Computational Linguistics

Primerjava slovenskih besednih vektorskih vložitev z vidika spola na analogijah poklicev

Autores: Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Publicado en: In the Proceedings of the Conference on Language Technologies and Digital Humanities (JTDH 2021), 2021, Página(s) 93-100
Editor: Slovensko društvo za jezikovne tehnologije

Simple discovery of COVID ISWAR Metaphors Using Word Embeddings

Autores: Brglez, Mojca; Pollak, Senja; Vintar, Špela
Publicado en: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021
Editor: Jožef Stefan Institute

COVID-19 v slovenskih spletnih medijih: analiza s pomočjo računalniške obdelave jezika

Autores: Pollak, Senja; Martinc, Matej; Pelicon, Andraž; Ulčar, Matej; Vezovnik, Andreja
Publicado en: Pandemična družba: slovensko sociološko srečanje, 2021
Editor: Slovenska sociološka družba

Visual Topic Modelling for NewsImage Task at MediaEval 2021

Autores: Pivovarova, Lidia; Zosa, Elaine
Publicado en: MediaEval 2021 Multimedia Benchmark Workshop : Work ing Notes Proceedings of the MediaEval 2021 Workshop, 2021
Editor: MediaEval Multimedia Benchmark
DOI: 10.5281/zenodo.6384719

TeMoTopic: Temporal Mosaic Visualisation of Topic Distribution, Keywords, and Context

Autores: Sheehan, Shane; Luz, Saturnino; Masoodian, Masood
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730388

Robust Named Entity Recognition and Linking on Historical Multilingual Documents

Autores: Boros, Emanuela; Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrián; Hamdi, Ahmed; Moreno, Jose G.; Sidère, Nicolas; Doucet, Antoine
Publicado en: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum (CLEF-HIPE 2020), 2020
DOI: 10.5281/zenodo.4059652

Impact Analysis of Document Digitization on Event Extraction

Autores: Nguyen, Nhu Khoa; Boroş, Emanuela; Lejeune, Gaël; Doucet, Antoine
Publicado en: In 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI* IA 2020), 2020
Editor: CEUR Workshop Proceedings
DOI: 10.5281/zenodo.4680744

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Autores: Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Publicado en: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730478

Topic modelling discourse dynamics in historical newspapers

Autores: Marjanen, Jani; Zosa, Elaine; Hengchen, Simon; Pivovarova, Lidia; Tolonen, Mikko
Publicado en: In Post-Proceedings of the DHN2020 Conference: the 5th conference on Digital Humanities in the Nordic Countries, 2021
Editor: CEUR Workshop Proceedings (

BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers

Autores: Kokalj, Enja; Škrlj, Blaž; Lavrač, Nada; Pollak, Senja; Robnik-Šikonja, Marko
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730384

Multilingual Epidemic Event Extraction

Autores: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Publicado en: In the Proceedings of ICADL 2021, 2021, ISBN 978-3-030-91669-8
Editor: Springer
DOI: 10.1007/978-3-030-91669-5_12

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

Autores: Boroş, Emanuela; Doucet, Antoine
Publicado en: In Proceedings of the Thirteenth Text Analysis Conference (TAC 2020), 2021
Editor: NIST USA
DOI: 10.5281/zenodo.4681008

SloBERTa: Slovene monolingual large pretrained masked language model

Autores: Ulčar, Matej; Robnik-Šikonja, Marko
Publicado en: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD, 2021
Editor: Jožef Stefan Institute

Alleviating Digitization Errors in Named Entity Recognition for Historical Documents

Autores: Emanuela Boros, Ahmed Hamdi, Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Nicolas Sidere, Antoine Doucet
Publicado en: Proceedings of the 24th Conference on Computational Natural Language Learning, 2020, Página(s) 431-441
Editor: Association for Computational Linguistics
DOI: 10.18653/v1/2020.conll-1.35

Not All Comments Are Equal: Insights into Comment Moderation from a Topic-aware Model

Autores: Zosa, Elaine; Shekhar, Ravi; Karan, Mladen; Purver, Matthew
Publicado en: In the Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021
Editor: ACL

TLR at the NTCIR-15 FinNum-2 Task: Improving Text Classifiers for Numeral Attachment in Financial Social Data

Autores: Moreno, Jose G.; Boros, Emanuela; Doucet, Antoine
Publicado en: In Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies, Edición 2, 2020
Editor: Association for Computing Machinery
DOI: 10.5281/zenodo.4680695

Multilingual Detection of Fake News Spreaders via Sparse Matrix Factorization

Autores: Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Publicado en: Notebook for PAN at CLEF 2020, 2020
DOI: 10.5281/zenodo.4059635

Unsupervised Approach to Cross-Lingual User Comments Summarization

Autores: Žagar, Aleš; Robnik-Šikonja, Marko
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association of Computational Linguistics
DOI: 10.5281/zenodo.4730327

Semantic Reasoning from Model-Agnostic Explanations

Autores: Stepišnik-Perdih, Timen; Lavrač, Nada; Škrlj, Blaž
Publicado en: In the Proceedings of the 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)., 2021, ISBN 978-1-7281-8053-3
Editor: IEEE
DOI: 10.1109/sami50585.2021.9378668

Linking Named Entities across Languages using Multilingual Word Embeddings

Autores: Elvys Linhares Pontes, Jose G. Moreno, Antoine Doucet
Publicado en: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020, Página(s) 329-332, ISBN 9781450375856
Editor: ACM
DOI: 10.1145/3383583.3398597

Discovery Team at SemEval-2020 Task 1: Context-sensitive Embeddings not Always Better Than Static for Semantic Change Detection

Autores: Martinc, Matej; Montariol, Syrielle; Zosa, Elaine; Pivovarova, Lidia
Publicado en: In Proceedings of the Fourteenth Workshop on Semantic Evaluation (SemEval 2020), 2020, Página(s) 67-73
Editor: International Committee for Computational Linguistics
DOI: 10.5281/zenodo.4681022

Creative Language Generation in a Society of Engagement and Reflection

Autores: Wright, George A.; Purver, Matthew
Publicado en: In Proceedings of the Eleventh International Conference on Computational Creativity (ICCC2020), 2020
Editor: Association for Computational Creativity (ACC)
DOI: 10.5281/zenodo.4680484

A Review of Cross-Domain Text-to-SQL Models

Autores: Yujian Gan, Purver, Matthew, & Woodward, John
Publicado en: In the Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, 2020, Página(s) 108-115
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4699229

Event Detection with Entity Markers

Autores: Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Publicado en: In the Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), 2021
Editor: Springer

Parsing Text in a Workspace for Language Generation

Autores: Wright, George A.; Purver, Matthew
Publicado en: In the Proceedings of the 2021 Society for Text & Discourse Annual Conference, 2021, 2021
Editor: Easychair

Zero-shot cross-lingual content filtering: offensive language and hate speech detection

Autores: Andraž, Pelicon; Shekhar, Ravi; Martinc, Matej; Škrlj, Blaž; Pollak, Senja; Purver, Matthew
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association Of Computational Linguistics
DOI: 10.5281/zenodo.4730308

Intérêt des modèles de caractères pour la détection d’événements

Autores: Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Publicado en: In Proceedings of TALN 2021, 2021
Editor: HAL-LIST

Embeddia at SemEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches

Autores: Andraž Pelicon, Matej Martinc, and Petra Kralj Novak
Publicado en: Proceedings of The 13th International Workshop on Semantic Evaluation (SemEval), 2019
Editor: SemEval

Generating Data using Monte Carlo Dropout

Autores: Kristian Miok, Dong Nguyen-Doan, Daniela Zaharie, and Marko Robnik-Šikonja
Publicado en: IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP 2019), 2019
Editor: IEEE

Detecting Depression with Word-Level Multimodal Fusion

Autores: Morteza Rohanian, Julian Hough, Matthew Purver
Publicado en: Interspeech 2019, 2019, Página(s) 1443-1447
Editor: ISCA
DOI: 10.21437/interspeech.2019-2283

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Autores: Jani Marjanen, Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki
Publicado en: Proceedings of the 5th International Workshop on Computational History, 2019
Editor: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Karst exploration: Extracting terms and definitions from karst

Autores: Senja Pollak, Andraž Repar, Matej Martinc, and Vid Podpečan
Publicado en: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019
Editor: Presses Universitaires de Louvain

Who is hot and who is not? Profiling celebs on Twitter

Autores: Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Publicado en: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Edición 6, 2019
Editor: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Fake or Not: Distinguishing Between Bots, Males and Females

Autores: Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Publicado en: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Edición 2, 2019
Editor: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Pooled LSTM for Dutch cross-genre gender classification

Autores: Matej Martinc, Senja Pollak
Publicado en: Proceedings of the Shared Task on Cross-Genre Gender Detection in Dutch at Computational Linguistic in Netherlands (CLIN 2019) conference, 2019
Editor: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Methods for Generating Colourful and Factual Multilingual News Headlines

Autores: Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Publicado en: In Proceedings of the 10th International Conference on Computational Creativity (ICCC 2019), Edición 1, 2019, Página(s) 258-265, ISBN 978-989-54160-1-1
Editor: Association for Computational Creativity (ACC)

TLR at BSNLP2019: A Multilingual Named Entity Recognition System

Autores: Jose G. Moreno, Elvys Linhares Pontes, Mickael Coustaty, Antoine Doucet
Publicado en: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, Página(s) 83-88
Editor: Association for Computational Linguistics
DOI: 10.18653/v1/w19-3711

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Autores: Jani Marjanen; Lidia Pivovarova; Elaine Zosa; Jussi Kurunmäki
Publicado en: HistoInformatics 2019: International Workshop on Computational History 2019, 2019
DOI: 10.5281/zenodo.3689467

A Corpus Study on Questions, Responses and Misunderstanding Signals in Conversations with Alzheimer's Patients

Autores: Shamila Nasreen; Matthew Purver; Julian Hough
Publicado en: Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue, Edición 13, 2019
DOI: 10.5281/zenodo.3689456

Word Clustering for Historical Newspapers Analysis

Autores: Pivovarova, Lidia; Marjanen, Jani; Zosa, Elaine
Publicado en: Proceedings of the Workshop on Language Technology for Digital Historical Archives in conjuction with RANLP-2019, 2019, Página(s) 3-10
Editor: INCOMA Ltd.
DOI: 10.5281/zenodo.3402940

TeMoCo: A Visualization Tool for Temporal Analysis of Multi-party Dialogues in Clinical Settings

Autores: Shane Sheehan, Pierre Albert, Saturnino Luz, Masood Masoodian
Publicado en: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), 2019, Página(s) 690-695, ISBN 978-1-7281-2286-1
Editor: IEEE
DOI: 10.1109/CBMS.2019.00140

Gender, language, and society: word embeddings as a reflection of social inequalities in linguistic corpora

Autores: Supej, Anka; Plahuta, Marko; Purver, Matthew; Mathioudakis, Michael; Pollak, Senja
Publicado en: In Znanost in družbe prihodnosti, Slovensko sociološko srečanje [Annual meeting of the Slovenian Sociological Association: Science and future societies], 2019
Editor: Slovensko sociološko društvo
DOI: 10.5281/zenodo.3894466

No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines

Autores: Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Publicado en: Proceedings of the 10th International Conference on Computational Creativity (ICCC2019), 2019
Editor: Association for Computational Creativity

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Autores: Kristian Miok, Dong Nguyen-Doan, Marko Robnik-Sikonja, Daniela Zaharie
Publicado en: 2019 E-Health and Bioengineering Conference (EHB), 2019, Página(s) 1-4, ISBN 978-1-7281-2603-6
Editor: IEEE
DOI: 10.1109/EHB47216.2019.8969940

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Autores: Ulčar, Matej; Robnik-Šikonja Marko
Publicado en: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Página(s) 4731–4738
Editor: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894535

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

Autores: Martinc, Matej; Kralj Novak, Petra; Pollak, Senja
Publicado en: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), 2020, Página(s) 4811‑4819
Editor: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894557

Multilingual Culture-Independent Word Analogy Datasets

Autores: Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica; Dailidėnaitė, Milda; Robnik-Šikonja, Marko
Publicado en: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), Edición 1, 2020, Página(s) 4074‑4080
Editor: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894553

Dataset for Temporal Analysis of English-French Cognates

Autores: Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon
Publicado en: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Página(s) 855-859
Editor: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3693651

A Dataset for Multi-lingual Epidemiological Event Extraction

Autores: Mutuvi, Stephen; Doucet, Antoine; Lejeune, Gael; Odeo, Moses
Publicado en: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Página(s) 4139–4144
Editor: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3709626

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

Autores: Carlos Santos Armendariz; Matthew Purver; Matej Ulčar; Senja Pollak; Nikola Ljubešič; Marko Robnik-Šikonja; Mark Granroth-Wilding; Kristiina Vaik
Publicado en: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Página(s) 5878–5886
Editor: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894565

Text Visualization for the Support of Lexicography-Based Scholarly Work

Autores: Sheehan, Shane; Luz, Saturnino
Publicado en: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Página(s) 694-725
Editor: Lexical Computing CZ s.r.o., Brno, Czech Republic
DOI: 10.5281/zenodo.3894619

Mining semantic relations from comparable corpora through intersections of word embeddings.

Autores: Vintar, Špela; Grčič Simeunovič, Larisa; Martinc, Matej; Pollak, Senja; Stepišnik, Uroš
Publicado en: Proceedings of the LREC 2020 13th Workshop on Building and Using Comparable Corpora, 2020, Página(s) 29-34
Editor: European Language Resources Association
DOI: 10.5281/zenodo.3894635

Interaction Patterns in Conversations with Alzheimer's Patients

Autores: Nasreen, Shamila; Purver, Matthew; Hough, Julian
Publicado en: Poster presentation at the 7th International Conference on Statistical Language and Speech Processing. Ljubljana, Slovenia, 2019
Editor: Springer
DOI: 10.5281/zenodo.3894637

Multilingual Dynamic Topic Model

Autores: Elaine Zosa, Mark Granroth-Wilding
Publicado en: Proceedings - Natural Language Processing in a Deep Learning World, 2019, Página(s) 1388-1396, ISBN 9789-544520564
Editor: Incoma Ltd., Shoumen, Bulgaria
DOI: 10.26615/978-954-452-056-4_159

The NetViz terminology visualization tool and the use cases in karstology domain modeling

Autores: Pollak, Senja; Podpečan, Vid; Miljkovic, Dragana; Stepinšik, Uroš; Vintar, Špela
Publicado en: Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 2020, Página(s) 55-61
Editor: European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894686

Communities of related terms in Karst terminology co-occurrence network

Autores: Miljkovic, Dragana; Kralj, Jan; Stepišnik, Uroš; Pollak, Senja
Publicado en: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Página(s) 357-373
Editor: Lexical Computing CZ s.r.o., Brno, Czech Republic
DOI: 10.5281/zenodo.3894684

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Autores: Zosa, Elaine; Granroth-Wilding, Mark; Pivovarova, Lidia
Publicado en: Proceedings of the Cross-Language Search and Summarization of Text and Speech Workshop, 2020, Página(s) 32-37
Editor: European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3898384

Capturing Evolution in Word Usage: Just Add More Clusters?

Autores: Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova
Publicado en: Companion Proceedings of the Web Conference 2020, 2020, Página(s) 343-349, ISBN 9781-450370240
Editor: ACM
DOI: 10.1145/3366424.3382186

Evaluating the Robustness of Embedding-based Topic Models to OCR Noise

Autores: Zosa, Elaine; Mutuvi, Stephen; Granroth-Wilding, Mark; Doucet, Antoine
Publicado en: In the Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021
Editor: Springer
DOI: 10.1007/978-3-030-91669-5_30

Evaluating Natural Language Descriptions Generated in a Workspace-Based Architecture

Autores: Wright, George A.; Purver, Matthew
Publicado en: In the Proceedings of the 12th International Conference on Computational Creativity, ICCC2021, 2021
Editor: Association for Computational Creativity

Multi-label classification of COVID-19-related articles with an autoML approach

Autores: Tavchioski, Ilija; Koloski, Boshko; Škrlj, Blaž; Pollak, Senja
Publicado en: In Proceedings of the BioCreative VII Challenge Evaluation Workshop, 2021, Página(s) 295-299, ISBN 978-0-578-32368-8
Editor: Biocreative

L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers

Autores: Nhu Khoa Nguyen; Emanuela Boros; Gaël Lejeune; Antoine Doucet; Thierry Delahaut
Publicado en: Edición 30, 2021
Editor: IW3C2
DOI: 10.5281/zenodo.4734321

CTLR@WiC-TSV: Target Sense Verification using Marked Inputs and Pre-trained Models

Autores: Moreno, Jose G.; Linhares Pontes, Elvys; Dias, Gaël
Publicado en: In 6th Workshop on Semantic Deep Learning (SemDeep-6) associated to 29th International Joint Conference on Artificial Intelligence and 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), Edición 2, 2021
Editor: International Joint Conferences on Artificial Intelligence
DOI: 10.5281/zenodo.4680720

Exploratory analysis of news sentiment using subgroup discovery

Autores: Valmarska, Anita; Cabrera-Diego, Luis Adrián; Linhares Pontes, Elvys; Pollak, Senja
Publicado en: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730472

COVID-19 Therapy Target Discovery with Context-Aware Literature Mining

Autores: Martinc, Matej; Škrlj, Blaž; Pirkmajer, Sergej; Lavrač, Nada; Cestnik, Bojan; Marzidovšek, Martin; Pollak, Senja
Publicado en: In Proceedings of the 23rd International Conference on Discovery Science (DS 2020), 2020, Página(s) 109-123
Editor: Springer International Publishing
DOI: 10.5281/zenodo.4306020

A Baseline Document Planning Method for Automated Journalism

Autores: Leppänen, Leo; Toivonen, Hannu
Publicado en: In the Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021
Editor: Association for Computational Linguistics

Multilingual Epidemiological Text Classification: A Comparative Study

Autores: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Publicado en: Proceedings of the 28th International Conference on Computational Linguistics, Edición 44, 2020
Editor: International Committee on Computational Linguistics
DOI: 10.5281/zenodo.4476039

Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer’s Dementia Recognition from Spontaneous Speech

Autores: Morteza Rohanian, Julian Hough, Matthew Purver
Publicado en: Interspeech 2020, 2020, Página(s) 2187-2191
Editor: ISCA
DOI: 10.21437/interspeech.2020-2721

Temporal Mental Health Dynamics on Social Media

Autores: Tom Tabak, Matthew Purver
Publicado en: Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, 2020
Editor: Association for Computational Linguistics
DOI: 10.18653/v1/2020.nlpcovid19-2.7

Extending Neural Keyword Extraction with TF-IDF tagset matching

Autores: Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730354

The Importance of Character-Level Information in an Event Detection Model

Autores: Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Publicado en: In Proceedings of NLDB 2021, 2021
Editor: Springer

Benchmarks for Unsupervised Discourse Change Detection

Autores: Duong, Quan; Pivovarova, Lidia; Zosa, Elaine
Publicado en: In the Proceedings of the Histoinformatics workshop 2021, 2021
Editor: CEUR

Three-part diachronic semantic change dataset for Russian

Autores: Andrey Kutuzov, Lidia Pivovarova
Publicado en: Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 2021, Página(s) 7-13
Editor: Association for Computational Linguistics
DOI: 10.18653/v1/2021.lchange-1.2

SemEval2020 Task 3: Graded Word Similarity in Context

Autores: Armendariz, Carlos Santos; Purver, Matthew; Pollak, Senja; Ljubešić, Nikola; Ulčar, Matej; Robnik-Šikonja, Marko; Vulić, Ivan; Mohammed Taher Pilehvar
Publicado en: In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), 2020, Página(s) 36-49
Editor: International Committee for Computational Linguistics
DOI: 10.5281/zenodo.4309679

Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

Autores: Vaik, Kristiina; Asula, Marit; Sirel, Raul
Publicado en: In Proceedings of the LREC2020 Industry Track, 2020, Página(s) 26-30
Editor: The European Language Resources Association (ELRA)

A Baseline Document Planning Method for Automated Journalism

Autores: Leppänen, Leo; Toivonen, Hannu
Publicado en: In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 2021
Editor: Linköping University Electronic Press, Sweden

TeMoCo-Doc - A visualization for supporting temporal and contextual analysis of dialogues and associated documents

Autores: Shane Sheehan, Saturnino Luz, Pierre Albert, Masood Masoodian
Publicado en: Proceedings of the International Conference on Advanced Visual Interfaces, 2020, Página(s) 1-3, ISBN 9781450375351
Editor: ACM
DOI: 10.1145/3399715.3399956

Named Entity Recognition Architecture Combining Contextual and Global Features

Autores: Tran Thi Hong, Hahn; Doucet, Antoine; Sidere, Nicolas; Moreno, Jose G.; Pollak, Senja
Publicado en: In the Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021
Editor: Springer
DOI: 10.1007/978-3-030-91669-5_21

Aligning Estonian and Russian news industry keywords with the help of subtitle translations and an environmental thesaurus

Autores: Repar, Andraž; Shumakov, Andrej
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730392

Zaznavanje sentimenta v novicah z globokimi nevronskimi mrežami

Autores: Arhar Holdt, Špela; Pollak, Senja; Robnik-Šikonja, Marko; Krek, Simon
Publicado en: Edición In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Página(s) 10-15
Editor: Institute of Contemporary History
DOI: 10.5281/zenodo.4059729

Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie

Autores: Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Gaël Lejeune; Adam Jatowt; Moses Odeo
Publicado en: Edición 29, 2021
Editor: l’Association Francophone de Recherche d’Information et Applications ARIA
DOI: 10.5281/zenodo.4734472

Word-embedding based bilingual terminology alignment

Autores: Repar, Andraž; Martinc, Matej; Ulčar, Matej; Pollak, Senja
Publicado en: In Proceedings of eLex 2021 (eLex2021), 2021
Editor: Brno: Lexical Computing CZ, s.r.o.

Investigating the Semantic Wave in Tutorial Dialogues: An Annotation Scheme and Corpus Study on Analogy Components

Autores: Del-Bosque-Trevino, Jorge, Hough, Julian, and Purver, Matthew
Publicado en: In Proceedings of the 24th SemDial Workshop on the Semantics and Pragmatics of Dialogue (SemDial), 2020

Interesting cross-border news discovery using cross-lingual article linking and document similarity

Autores: Koloski, Boshko; Zosa, Elaine; Stepišnik-Perdih, Timen; Škrlj, Blaž; Paju, Tarmo; Pollak, Senja
Publicado en: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730369

An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

Autores: Pranjić, Marko; Robnik-Šikonja, Marko; Pollak, Senja
Publicado en: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021
Editor: Jožef Stefan Institute

Evaluation of related news recommendations using document similarity methods

Autores: Pranjić, Marko; Podpečan, Vid; Robnik-Šikonja, Marko; Pollak, Senja
Publicado en: Edición In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Página(s) 81-86
Editor: Institute of Contemporary History
DOI: 10.5281/zenodo.4059710

Dimenzija spola v slovenskih vektorskih vložitvah besed: primerjava modelov prek analogij poklicev

Autores: Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Publicado en: In Proceedings of the Joint Conference on Digital Libraries (JCDL 2020), 2020, Página(s) 93-100
Editor: Institute of Contemporary History
DOI: 10.5281/zenodo.4059700

Mitigating Gender Bias in Word Embeddings using Explicit Gender Free Corpus

Autores: Hargrave, David
Publicado en: Masters thesis, School of Electronic Engineering and Computer Science, Queen Mary University of London, 2021
Editor: Queen Mary University of London

Silicon Valley och makten över medierna [Silicon Valley and the power over media]

Autores: Carl-Gustav Linden
Publicado en: Edición 1, 2020
Editor: Nordicom
DOI: 10.48335/9789188855350

Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Autores: Toivonen, Hannu; Boggia, Michele
Publicado en: 2021, ISBN 978-1-954085-13-8
Editor: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730375

