Cross-Lingual Embeddings for Less-Represented Languages in European News Media

CORDIS provides links to public deliverables and publications of HORIZON projects.

Links to deliverables and publications from FP7 projects, as well as links to some specific result types such as dataset and software, are dynamically retrieved from OpenAIRE .

Deliverables

Final context-dependent and dynamic embeddings technology (T1.2)

Contextaware crosslingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented report and source code T12

Initial cross-lingual and multilingual embeddings technology (T1.1)

Initial embeddings and transformations between a selection of all targeted languages (Estonian, Finnish, Swedish, Latvian, Lithuanian, Croatian, Slovene, English, Russian) (report and source code) (T1.1)

Initial cross-lingual semantic enrichment technology (T2.1)

Initial approach to named entity (NE) extraction and disambiguation and event detection, covering multiple domains and languages (report and source code) (T2.1).

Datasets, benchmarks and evaluation metrics for cross-lingual content analysis (T4.4)

Gathering and preprocessing training and testing data (Estonian, Latvian, Lithuanian, Russian, Croatian, Finnish and English) provided by the media partners (report and dataset) (T4.4) .

Initial deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languages by using character-level inputs and additional information on morphology (suffixes, prefixes, separately trained POS tags) (report and source code) (T1.3).

Interim report on ethics and responsible science and journalism (T6.5)

Interim report on ethics and responsible science and journalism, with analysis of news production and new tool development (T6.5).

Final evaluation report on cross-lingual user generated content filtering and analysis technology (T3.4)

Producing datasets for evaluation and development of algorithms T34

Final dynamic multilingual news generation technology (T5.2)

Development of a novel method for automatically organising news articles to be maximally informative to the assumed reader report and source code T52

Final cross-lingual news viewpoints identification technology (T4.3)

Development of methods for detecting viewpoints and sentiments based on media sources report and source code T43

Final real-time multilingual news linking technology (T4.1)

Development of tools for linking news stories across languages based on their topics andcontents report and source code T41

Final evaluation report on cross-lingual content analysis technology (T4.4)

All tools developed in WP4 will be evaluated using the produced datasets and manual user evaluation T44

Final report on ethics and responsible science and journalism (T6.5).

Final report on ethics and responsible science and journalism T65

Initial interpretability and visualisation technology (T1.4)

Initial approaches to explanation of deep learning models by adoptation of perturbation based explanation methods based on coalitional game theory to ext classification and initial development of visual tools for visually explaining the classification process. (report and source code) (T1.4).

Final tehnology for multilingual and self-explainable news generation (T5.1)

Based on the analysis of newsrooms WP6 the NLG technology will be adapted for the requirements of news generation The task will develop mechanisms for i determining what is interesting or important in the given data and deciding what to report and for ii rendering that information in an accurate manner iii in multiple languages report and source code T51

Final evaluation report on cross-lingual embedding technology (T1.5)

Report on evaluation of the crosslingual and multilingual embeddings on public datasetsand challenges T15

Initial context-dependent and dynamic embeddings technology (T1.2)

Context-aware cross-lingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented (report and source code) (T1.2).

Report on user needs and challenges for news media industry (T6.1).

Initial report on identification and analysis of needs of different stakeholders in news media industry. We will arrange workshop to identify in detail challenges that are specific to operations of different media partners and prepare a specifications documentation (T6.1).

Recommendations on avoiding gender and other biases (T6.4)

The means to avoid and detect gender and other biases in news media contents creation will be developped in T6.4. This deliverable will propose the recommendations for avoiding gender bias (T6.4).

Final interpretability and visualisation technology (T1.4)

Adoptation of three most popular perturbation based explanation methods based on coalitional game theory IME LIME and SHAP to be suitable for text classification and development of visualisation techniques where different explanatory lexical units in the source texts words ngrams sentences are visualizedreport and source code T14

Initial cross-lingual context and opinion analysis technology (T3.1)

Report on initial developed technology for a range of user comment analyses, including topic modelling, conversation structure and context modelling, sentiment, stance and opinion detection and effect and information spread measurement (report and source code) (T3.1).

Final report on gender bias in content creation (T6.4)

Final report on gender bias in content creation T64

Reusable EMBEDDIA components available through the ClowdFlows web interface (T7.4)

Developed tools and procedures will be incorporated as widgets and make them available beyond the media context and assure reusability and repeatability of experiments report and source code T74

Initial multilingual news linking technology (T4.1)

Development of initial tools for linking news stories across languages based on their topics and contents (report and source code) (T4.1).

Initial keyword extraction techniques (T2.2)

Initial keyword extraction by application of statistical approaches (based on heuristics), machine learning approaches, as well as graph-based approaches (report and source code) (T2.2).

Final cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual languageindependent multidocument news summarisation report and source code T42

Initial dynamic news generation technology (T5.2)

Development of a novel method for automatically organising news articles, considering the domain of the article, effects of time and news repetition (report and source code) (T5.2).

Refined analysis of news media partners’ needs and challenges (T6.1).

Refined report of news media partners’ needs and challenges and their analysis with regard to the state of the art in NLP for news media (T6.1).

Final cross-lingual and multilingual embeddings technology (T1.1)

Embeddings and transformations between all targeted languages including EstonianFinnish Swedish Latvian Lithuanian Croatian Slovene as well as English and Russian report and source code T11

Report generator from multilingual comments (T3.3)

Report on developed and implemented methods for generating humanreadable reports in multiple languages from the outputs of the methods developed in T31 and T32 report and source code T33

Datasets, benchmarks and evaluation metrics for cross-lingual user generated content filtering and analysis (T3.4)

Evaluation and development of algorithms requires relevant, annotated, and multilingual datasets (report and dataset) (T3.4).

Final evaluation report on advanced cross-lingual NLP technology (T2.4)

Final report on existing evaluation datasets and benchmarks for NER NEL and event detection for instance ACE Meantime and TAC KBPs Entity Discovery and Linking tasks report and dataset T24

Final deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languagesby using characterlevel inputs and additional information on morphology suffixes prefixes separately trained POS tags report and source code T13

Multilingual language generation approach (T2.3)

Incorporating hybrid techniques in the architecture, to take advantage of the robustness of machine learning techniques and transparency of rule-based techniques. Adaptation of the context-aware word-embeddings developed in T1.2 to improve fluency and variability in the generated texts (report and source code) (T2.3).

Final multilingual keyword extraction techniques (T2.2)

Application and further development of statistical approaches based on heuristicsmachine learning approaches as well as graphbased approaches report and source code T22

Initial news generation technology (T5.1)

Based on the analysis of newsrooms (WP6), the NLG technology will be adapted for the requirements of news generation. The task will develop mechanisms for (i) determining what is interesting or important in the given data and deciding what to report, and for (ii) rendering that information in an accurate manner (iii) in multiple languages (report and source code) (T5.1).

Final report on EMBEDDIA Assistant platform evaluation (T6.3)

Final report on EMBEDDIA Assistant platform evaluation by media partners T63

Platform requirements documentation and platform design (T6.2)

The EMBEDDIA Toolkit will incorporate different tools and resources developed in WP1–WP5 and on top of it build the EMBEDDIA Media Assistant platform. The platform will be built as a series of base microservices, functional microservices and task oriented APIs. This deliverable will report on platform requirements and platform design (T6.2).

Final cross-lingual comment filtering technology (T3.2)

Final report on developed tools for automatic flagging or filtering of user comments specifically targeted at the use cases defined by end user partners in WP6 eg detection of hate speech and political trolling attempts to elicit extreme reactions and influence others opinions report and source code T32

Initial cross-lingual news viewpoints identification technology (T4.3)

Initial approaches for detecting viewpoints and sentiments based on media sources (report and source code) (T4.3) .

Final cross-lingual semantic enrichment technology (T2.1)

Generalization of approaches to multiple domains and languages large scale corpora and integrating crosslingual embeddings report and source code T21

Creative multilingual technology for news and headline generation (T5.3)

We will make the generated texts more varied and colourful by generating creative expressions especially in headlines report and source code T53

Final cross-lingual context and opinion analysis technology (T3.1)

Final report on developed technology for a range of user comment analyses including topic modelling conversation structure and context modelling sentiment stance and opinion detection and effect and information spread measurement report and source code T31

Datasets, benchmarks and evaluation metrics for advanced cross-lingual NLP technology (T2.4)

Report on existing evaluation datasets and benchmarks for NER, NEL and event detection (for instance, ACE, Meantime and TAC KBP’s Entity Discovery and Linking tasks) (report and dataset) (T2.4).

Initial cross-lingual comment filtering technology (T3.2)

Report on developed tools for automatic flagging or filtering of user comments, specifically targeted at the use cases defined by end user partners in WP6, e.g., detection of hate speech and political trolling, attempts to elicit extreme reactions and influence others’ opinions (report and source code) (T3.2).

Datasets, benchmarks and evaluation metrics for multilingual text generation (T5.4)

From news partners texts (news stories) and structured datasets from which news can be generated will be collected (report and datasets) and methodology for evaluation defined (T5.4).

Selected EMBEDDIA components in ClowdFlows (T7.4)

Initial selection of tools and procedures incorporated as widgets in webbased platform Clowsflows to make them available beyond the media context and assure reusability and repeatability of experiments report and source code T74

Initial cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual language-independent multi-document news summarisation (report and source code) (T4.2).

Final evaluation report on multilingual text generation technology (T5.4)

Final evaluation report on multilingual text generation technology T54

Datasets, benchmarks and evaluation metrics for cross-lingual word embeddings (T1.5)

A repository of training and evaluation data, stored in a dedicated GitHub repository (report and datasets) (T1.5).

Final EMBEDDIA Media Assistant platform, packaged in docker container (T6.2)

Final EMBEDDIA Media Assistant platform incorporating different tools and resourcespackaged in docker container report and source code T62

Project website and social media accounts (T7.1)

Created project website --- which will function both as a project dissemination tool and for providing access to the technical outcomes produced by the project --- and social media accounts/pages on relevant social networks will be created (T7.1)

Publications

To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Author(s): Kristian Miok, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Published in: Cognitive Computation, 2021, ISSN 1866-9956
Publisher: Springer Verlag
DOI: 10.1007/s12559-021-09826-9

Cross-lingual alignments of ELMo contextual embeddings

Author(s): Ulčar, Matej; Robnik-Šikonja, Marko
Published in: Neural Computing and Applications, Issue 3, 2022, ISSN 0941-0643
Publisher: Springer Verlag
DOI: 10.1007/s00521-022-07164-x

NeSyChair: Automatic Conference Scheduling Combining Neuro-Symbolic Representations and Constrained Clustering

Author(s): Škvorc, Tadej; Lavrač, Nada; Robnik-Šikonja, Marko
Published in: IEEE Access, Issue 10, 2022, ISSN 2169-3536
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI: 10.1109/ACCESS.2022.3144932

autoBOT: evolving neuro-symbolic representations for explainable low resource text classification

Author(s): Blaž Škrlj, Matej Martinc, Nada Lavrač, Senja Pollak
Published in: Machine Learning, 2021, ISSN 0885-6125
Publisher: Kluwer Academic Publishers
DOI: 10.1007/s10994-021-05968-x

MICE: Mining Idioms with Contextual Embeddings

Author(s): Škvorc, Tadej; Gantar, Polona; Robnik-Šikonja, Marko
Published in: Knowledge-Based Systems, Issue 237, 2022, ISSN 0950-7051
Publisher: Elsevier BV
DOI: 10.1016/j.knosys.2021.107606

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Author(s): Andraž Pelicon, Marko Pranjić, Dragana Miljković, Blaž Škrlj, Senja Pollak
Published in: Applied Sciences, Issue 10/17, 2020, Page(s) 5993, ISSN 2076-3417
Publisher: MDPI
DOI: 10.3390/app10175993

Supervised and Unsupervised Neural Approaches to Text Readability

Author(s): Matej Martinc; Senja Pollak; Marko Robnik-Šikonja
Published in: Computational Linguistics, Issue 47.1, 2021, Page(s) 141-179, ISSN 0891-2017
Publisher: MIT Press
DOI: 10.1162/coli_a_00398

Nazaj v prihodnost: avtomatizacija in preobrazba novinarske epistemologije

Author(s): Igor Vobič, Marko Robnik Šikonja, Monika Kalin Golob
Published in: Javnost - The Public, Issue 26/sup1, 2019, Page(s) S41-S61, ISSN 1318-3222
Publisher: European Institute for Communication and Culture
DOI: 10.1080/13183222.2019.1696600

What makes a reporter human? A Research Agenda for Augmented Journalism

Author(s): Lindén, Carl-Gustav
Published in: Questions de communication, 2020, ISSN 2259-8901
Publisher: Presses universitaires de Lorraine
DOI: 10.4000/questionsdecommunication.23301

Cross-lingual Transfer of Sentiment Classifiers

Author(s): Robnik-Šikonja, Marko; Reba, Kristjan; Mozetič, Igor
Published in: Slovenščina 2.0, Issue 9(1), 2021, Page(s) 1-25, ISSN 2335-2736
Publisher: Ljubljana University Press, Faculty of Arts
DOI: 10.4312/slo2.0.2021.1.1-25

Completability vs (In)completeness

Author(s): Eleni Gregoromichelaki, Gregory James Mills, Christine Howes, Arash Eshghi, Stergios Chatzikyriakidis, Matthew Purver, Ruth Kempson, Ronnie Cann, Patrick G. T. Healey
Published in: Acta Linguistica Hafniensia, Issue 52/2, 2020, Page(s) 260-284, ISSN 0374-0463
Publisher: Nordisk Sprog- og Kulturforlag
DOI: 10.1080/03740463.2020.1795549

TNT-KID: Transformer-based neural tagger for keyword identification

Author(s): Matej Martinc, Blaž Škrlj, Senja Pollak
Published in: Natural Language Engineering, 2021, Page(s) 1-40, ISSN 1351-3249
Publisher: Cambridge University Press
DOI: 10.1017/s1351324921000127

Investigating cross-lingual training for offensive language detection

Author(s): Andraž Pelicon, Ravi Shekhar, Blaž Škrlj, Matthew Purver, Senja Pollak
Published in: PeerJ Computer Science, Issue 7, 2021, Page(s) e559, ISSN 2376-5992
Publisher: PeerJ Publishing
DOI: 10.7717/peerj-cs.559

Journalistic Passion as Commodity : A Managerial Perspective

Author(s): Carl-Gustav Lindén; Katja Lehtisaari; Mikko Grönlund; Mikko Villi
Published in: Journalism Studies, Issue 22(12), 2021, Page(s) 1701--1719, ISSN 1461-670X
Publisher: Routledge
DOI: 10.1080/1461670x.2021.1911672

Re-Representing Metaphor: Modeling Metaphor Perception Using Dynamically Contextual Distributional Semantics

Author(s): Stephen McGregor, Kat Agres, Karolina Rataj, Matthew Purver, Geraint Wiggins
Published in: Frontiers in Psychology, Issue 10, 2019, ISSN 1664-1078
Publisher: Frontiers Research Foundation
DOI: 10.3389/fpsyg.2019.00765

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Author(s): Blaž Škrlj, Jan Kralj, Nada Lavrač, Senja Pollak
Published in: Machine Learning and Knowledge Extraction, Issue 1/2, 2019, Page(s) 575-589, ISSN 2504-4990
Publisher: MDPI AG
DOI: 10.3390/make1020034

Predicting Slovene Text Complexity Using Readability Measures

Author(s): Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt, Marko Robnik-Šikonja
Published in: In Contributions to Contemporary History, 2019, ISSN 2463-7807
Publisher: OJS/PKP

Combining n -grams and deep convolutional features for language variety classification

Author(s): Matej Martinc, Senja Pollak
Published in: Natural Language Engineering, Issue 25/5, 2019, Page(s) 607-632, ISSN 1351-3249
Publisher: Cambridge University Press
DOI: 10.1017/S1351324919000299

TermEnsembler

Author(s): Andraž Repar, Vid Podpečan, Anže Vavpetič, Nada Lavrač, Senja Pollak
Published in: Terminology, Issue 25/1, 2019, Page(s) 93-120, ISSN 0929-9971
Publisher: John Benjamins Publishing Company
DOI: 10.1075/term.00029.rep

Reproduction, replication, analysis and adaptation of a term alignment approach

Author(s): Andraž Repar, Matej Martinc, Senja Pollak
Published in: Language Resources and Evaluation, 2019, ISSN 1574-020X
Publisher: Springer Verlag
DOI: 10.1007/s10579-019-09477-1

‘Our task is to demystify fears’: Analysing newsroom management of automation in journalism

Author(s): Marko Milosavljević, Igor Vobič
Published in: Journalism, 2019, Page(s) 146488491986159, ISSN 1464-8849
Publisher: SAGE Publications
DOI: 10.1177/1464884919861598

Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge

Author(s): Saturnino Luz, Shane Sheehan
Published in: Palgrave Communications, Issue 6/1, 2020, ISSN 2055-1045
Publisher: Humanities and Social Sciences Communications
DOI: 10.1057/s41599-020-0423-6

Exploring the Relations Between Net Benefits of IT Projects and CIOs’ Perception of Quality of Software Development Disciplines

Author(s): Damjan Vavpotič, Marko Robnik-Šikonja, Tomaž Hovelja
Published in: Business & Information Systems Engineering, 2019, ISSN 2363-7005
Publisher: Springer Gabler
DOI: 10.1007/s12599-019-00612-4

Data Journalism as a Service: Digital Native Data Journalism Expertise and Product Development

Author(s): Ester Appelgren, Carl-Gustav Lindén
Published in: Media and Communication, Issue 8/2, 2020, Page(s) 62, ISSN 2183-2439
Publisher: Cogitatio
DOI: 10.17645/mac.v8i2.2757

How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context

Author(s): Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
Published in: Transactions of the Association for Computational Linguistics, Issue 8, 2020, Page(s) 296-310, ISSN 2307-387X
Publisher: The MIT Press
DOI: 10.1162/tacl_a_00315

Computational generation of slogans

Author(s): Khalid Alnajjar, Hannu Toivonen
Published in: Natural Language Engineering, 2020, Page(s) 1-33, ISSN 1351-3249
Publisher: Cambridge University Press
DOI: 10.1017/S1351324920000236

In the Name of the Right to be Forgotten: New Legal and Policy Issues and Practices regarding Unpublishing Requests in Slovenian Online News Media

Author(s): Marko Milosavljević, Melita Poler, Rok Čeferin
Published in: Digital Journalism, 2020, Page(s) 1-17, ISSN 2167-0811
Publisher: Taylor & Francis
DOI: 10.1080/21670811.2020.1747942

(Mis)Information Operations: An Integrated Perspective

Author(s): Cinelli, Matteo; Conti, Mauro; Finos, Livio; Grisolia, Francesco; Kralj Novak, Petra; Peruzzi, Antonio; Tesconi, Maurizio; Zollo, Fabia; Quattrociocchi, Walter
Published in: Journal of Information Warfare, Issue 18(3), 2020, ISSN 1445-3312
Publisher: Mt. Eliza : Teamlink Australia

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Author(s): Linhares Pontes, Elvys; Huet, Stéphane; Torres Moreno, Juan Manuel; Gouveia da Silva, Thiago; Carneiro Linhares, Andréa
Published in: Computación y Sistemas, Issue 24(2), 2020, ISSN 1405-5546
Publisher: Centro de Investigacion en Computacion (CIC) del Instituto Politecnico Nacional (IPN)

Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting

Author(s): Leo Leppänen, Hanna Tuulonen, Stefanie Sirén-Heikel
Published in: Media and Communication, Issue 8/3, 2020, Page(s) 39, ISSN 2183-2439
Publisher: Cogitatio
DOI: 10.17645/mac.v8i3.3022

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

Author(s): Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak
Published in: Computer Speech & Language, Issue 65, 2021, Page(s) 101104, ISSN 0885-2308
Publisher: Academic Press
DOI: 10.1016/j.csl.2020.101104

Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles

Author(s): Koloski, Boshko; Stepišnik-Perdih, Timen; Robnik-Šikonja, Marko; Pollak, Senja; Škrlj, Blaž
Published in: Neurocomputing journal, 2022, ISSN 0925-2312
Publisher: Elsevier BV
DOI: 10.1016/j.neucom.2022.01.096

Cross-lingual transfer of abstractive summarizer to less-resource language

Author(s): Aleš Žagar, Marko Robnik-Šikonja
Published in: Journal of Intelligent Information Systems, 2021, ISSN 0925-9902
Publisher: Kluwer Academic Publishers
DOI: 10.1007/s10844-021-00663-8

Bisociative Literature-Based Discovery: Lessons Learned and New Word Embedding Approach

Author(s): Nada Lavrač, Matej Martinc, Senja Pollak, Maruša Pompe Novak, Bojan Cestnik
Published in: New Generation Computing, Issue 38/4, 2020, Page(s) 773-800, ISSN 0288-3635
Publisher: Springer Verlag
DOI: 10.1007/s00354-020-00108-w

Propositionalization and embeddings: two sides of the same coin

Author(s): Nada Lavrač; Nada Lavrač; Blaž Škrlj; Marko Robnik-Šikonja
Published in: Machine Learning, Issue 109, 2020, ISSN 0885-6125
Publisher: Kluwer Academic Publishers
DOI: 10.1007/s10994-020-05890-8

Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Author(s): Shekhar, Ravi; Pranjić. Marko; Pollak, Senja; Pelicon, Andraž; Purver, Matthew
Published in: Journal for Language Technology and Computational Linguistics, Issue 2, 2020, Page(s) 49-79, ISSN 2190-6858
Publisher: German Society for Computational Linguistics and Language Technology (GSCL)
DOI: 10.5281/zenodo.4032371

Enhancing deep neural networks with morphological information

Author(s): Klemen, Matej; Krsnik, Luka; Robnik-Šikonja, Marko
Published in: Natural Language Engineering, 2022, ISSN 1351-3249
Publisher: Cambridge University Press
DOI: 10.1017/S1351324922000080

Slovene and Croatian word embeddings in terms of gender occupational analogies

Author(s): Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Published in: Slovenščina 2.0: empirical, applied and interdisciplinary research, Issue 9/1, 2021, Page(s) 26-59, ISSN 2335-2736
Publisher: Ljubljana University Press, Faculty of Arts
DOI: 10.4312/slo2.0.2021.1.26-59

MELHISSA: A Multilingual Entity Linking Architecture for Historical Press Articles

Author(s): Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrian; Moreno, Jose G.; Boros, Emanuela; Hamdi, Ahmed; Doucet, Antoine; Sidere, Nicolas; Coustaty, Mickael
Published in: International Journal on Digital Libraries, 2021, ISSN 1432-1300
Publisher: Springer
DOI: 10.1007/s00799-021-00319-6

Recycling a genre for news automation

Author(s): Lauri Haapanen, Leo Leppänen
Published in: AILA Review, Issue 33, 2020, Page(s) 67-85, ISSN 1461-0213
Publisher: John Benjamins Publishing Company
DOI: 10.1075/aila.00030.haa

Incremental Composition in Distributional Semantics

Author(s): Matthew Purver, Mehrnoosh Sadrzadeh, Ruth Kempson, Gijs Wijnholds, Julian Hough
Published in: Journal of Logic, Language and Information, Issue 30/2, 2021, Page(s) 379-406, ISSN 0925-8531
Publisher: Kluwer Academic Publishers
DOI: 10.1007/s10849-021-09337-8

Kratt: Developing an Automatic Subject Indexing Tool for the National Library of Estonia

Author(s): Asula, Marit; Makke, Jane; Freienthal, Linda; Kuulmets, Hele-Andra; Sirel, Raul
Published in: Cataloging & Classification Quarterly, Issue 59:8, 2021, Page(s) 775-793, ISSN 0163-9374
Publisher: Haworth Press Inc.
DOI: 10.1080/01639374.2021.1998283

SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

Author(s): Sebastian Meznar, Nada Lavrac, Blaz Skrlj
Published in: IEEE Access, Issue 8, 2020, Page(s) 212568-212588, ISSN 2169-3536
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI: 10.1109/access.2020.3039541

Token-Level Multilingual Epidemic Dataset for Event Extraction

Author(s): Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Gaël Lejeune, Adam Jatowt, Moses Odeo
Published in: Linking Theory and Practice of Digital Libraries - 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Virtual Event, September 13–17, 2021, Proceedings, Issue 12866, 2021, Page(s) 55-59, ISBN 978-3-030-86323-4
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-86324-1_6

Entity Linking for Historical Documents: Challenges and Solutions

Author(s): Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Emanuela Boros, Ahmed Hamdi, Nicolas Sidère, Mickaël Coustaty, Antoine Doucet
Published in: Digital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Issue 12504, 2020, Page(s) 215-231, ISBN 978-3-030-64451-2
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-64452-9_19

Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning

Author(s): Blaž Škrlj, Marko Jukič, Nika Eržen, Senja Pollak, Nada Lavrač
Published in: Discovery Science - 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings, Issue 12986, 2021, Page(s) 204-217, ISBN 978-3-030-88941-8
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-88942-5_16

Identification of COVID-19 Related Fake News via Neural Stacking

Author(s): Boshko Koloski, Timen Stepišnik-Perdih, Senja Pollak, Blaž Škrlj
Published in: Combating Online Hostile Posts in Regional Languages during Emergency Situation - First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers, Issue 1402, 2021, Page(s) 177-188, ISBN 978-3-030-73695-8
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-73696-5_17

FinEst BERT and CroSloEngual BERT - Less Is More in Multilingual Models

Author(s): Matej Ulčar, Marko Robnik-Šikonja
Published in: Text, Speech, and Dialogue - 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings, Issue 12284, 2020, Page(s) 104-111, ISBN 978-3-030-58322-4
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-58323-1_11

RaKUn: Rank-based Keyword Extraction via Unsupervised Learning and Meta Vertex Aggregation

Author(s): Blaž Škrlj, Andraž Repar, Senja Pollak
Published in: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Issue 11816, 2019, Page(s) 311-323, ISBN 978-3-030-31371-5
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_26

Language Comparison via Network Topology

Author(s): Blaž Škrlj, Senja Pollak
Published in: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Issue 11816, 2019, Page(s) 112-123, ISBN 978-3-030-31371-5
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_10

Prediction Uncertainty Estimation for Hate Speech Classification

Author(s): Kristian Miok, Dong Nguyen-Doan, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Published in: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Issue 11816, 2019, Page(s) 286-298, ISBN 978-3-030-31371-5
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_24

Symbolic Graph Embedding Using Frequent Pattern Mining

Author(s): Blaž Škrlj, Nada Lavrač, Jan Kralj
Published in: Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings, Issue 11828, 2019, Page(s) 261-275, ISBN 978-3-030-33777-3
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-33778-0_21

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

Author(s): Pollak, Senja; Robnik-Šikonja, Marko; Purver, Matthew; Boggia, Michele; Shekhar, Ravi; Pranjić, Marko; Salmela, Salla; Krustok, Ivar; Paju, Tarmo; Linden, Carl-Gustav; Leppänen, Leo; Zosa, Elaine; Ulčar, Matej; Freienthal, Linda; Traat, Silver; Cabrera-Diego, Luis Adrián; Martinc, Matej; Lavrač, Nada; Škrlj, Blaž; Žnidaršič, Martin; Pelicon, Andraž; Koloski, Boshko; Podpečan, Vid; Kra
Published in: Issue Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730464

EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+

Author(s): Martinc, Matej; Perger, Nina; Pelicon, Andraž; Ulčar, Matej; Vezovnik, Andreja; Pollak, Senja
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730336

Exploring Neural Language Models via Analysis of Local and Global Self-Attention Spaces

Author(s): Škrlj, Blaž; Sheehan, Shane; Eržen, Nika; Robnik-Šikonja, Marko; Luz, Saturnino; Pollak, Senja
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730396

Grammatical Profiling for Semantic Change Detection

Author(s): Giulianelli, Mario; Kutuzov, Andrey; Pivovarova, Lidia
Published in: In the Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021), 2021, Page(s) 423-434
Publisher: ACL

Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Author(s): Robnik-Šikonja, Marko; Reba, Kristijan; Mozetič, Igor
Published in: In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 87-92
Publisher: Institute of Contemporary History
DOI: 10.5281/zenodo.4059725

When a Computer Cracks a Joke: Automated Generation of Humorous Headlines

Author(s): Alnajjar, Khalid; Hämäläinen, Mika
Published in: In the Proceedings of the 12th International Conference on Computational Creativity (ICCC21), 2021, ISBN 978-989-54160-3-5
Publisher: Association for Computational Creativity

Knowledge graph aware text classification

Author(s): Petrželková, Nela; Škrlj, Blaž; Lavrač, Nada
Published in: In Proceedings of the 23rd International Multiconference – IS2020, 2020
Publisher: Jožef Stefan Institute
DOI: 10.5281/zenodo.4072961

Relation Classification via Relation Validation

Author(s): Moreno, Jose G.; Doucet, Antoine; Grau, Brigitte
Published in: Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730492

Simple ways to improve NER in every language using markup

Author(s): Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Published in: In Proceedings of ECIR 2021, 2021
Publisher: CEUR Workshops
DOI: 10.5281/zenodo.4680998

A bilingual approach to specialised adjectives through word embeddings in the karstology domain

Author(s): Grčić Simeunović, Larisa; Martinc, Matej; Vintar, Špela
Published in: In Proceedings of TOTH 2020, 2020
Publisher: Université Savoie Mont Blanc
DOI: 10.5281/zenodo.6435390

Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism

Author(s): Rämö, Miia; Leppänen, Leo
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730334

Know your Neighbors: Efficient Author Profiling via Follower Tweets

Author(s): Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Published in: Notebook for PAN at CLEF 2020, 2020
Publisher: CEUR-WS.org
DOI: 10.5281/zenodo.4059641

Corpus KAS 2.0: Cleaner and with New Datasets

Author(s): Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko
Published in: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021
Publisher: Jožef Stefan Institute

Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques

Author(s): Emanuela Boros; Ahmed Hamdi; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Nicolas Sidère; Antoine Doucet
Published in: Issue 29, 2021
Publisher: l’Association Francophone de Recherche d’Information et Applications ARIA
DOI: 10.5281/zenodo.4734435

Automated Hate Speech Target Identification

Author(s): Pelicon, Andraž; Škrlj, Blaž; Kralj Novak, Petra
Published in: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021
Publisher: Jožef Stefan Institute

Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization,Classification, and Linking of Named Entities across Slavic languages

Author(s): Piskorski et al
Published in: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730512

Bayesian Methods for Semi-supervised Text Annotation

Author(s): Miok, Kristian; Pirs, Gregor; Robnik-Sikonja, Marko
Published in: In Proceedings of the 14th Linguistic Annotation Workshop Co-located with COLING 2020, Issue 2, 2020
Publisher: Association for Computational Linguistics

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

Author(s): Ulčar, Matej; Robnik-Šikonja, Marko
Published in: In the Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts (AIST 2021), 2021
Publisher: Springer

Preliminary experimentation with combinations and extensions of forward-looking sentence detection wordlists

Author(s): Štihec, Jan; Pollak, Senja; Žnidaršič, Martin
Published in: In Proceedings of the 3rd financial narrative processing workshop, 2021
Publisher: Association for Computational Linguistics

Bayesian BERT for Trustful Hate Speech Detection

Author(s): Miok, Kristian; Škrlj, Blaž; Zaharie, Daniela; Robnik-Šikonja, Marko
Published in: ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning, 2021
Publisher: ICML UDL

Underreporting of errors in NLG output, and what to do about it

Author(s): van Miltenburg, Emiel; Clinciu, Miruna; Dušek, Ondrej; Gkatzia, Dimitra; Inglis, Stephanie; Leppänen, Leo; Mahamood, Saad; Manning, Emma; Schoch, Stephanie; Thomson, Craig; Wen, Luou
Published in: In the Proceedings of the 14th International Conference on Natural Language Generation, 2021
Publisher: Association for Computational Linguistics

Primerjava slovenskih besednih vektorskih vložitev z vidika spola na analogijah poklicev

Author(s): Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Published in: In the Proceedings of the Conference on Language Technologies and Digital Humanities (JTDH 2021), 2021, Page(s) 93-100
Publisher: Slovensko društvo za jezikovne tehnologije

Simple discovery of COVID ISWAR Metaphors Using Word Embeddings

Author(s): Brglez, Mojca; Pollak, Senja; Vintar, Špela
Published in: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021
Publisher: Jožef Stefan Institute

COVID-19 v slovenskih spletnih medijih: analiza s pomočjo računalniške obdelave jezika

Author(s): Pollak, Senja; Martinc, Matej; Pelicon, Andraž; Ulčar, Matej; Vezovnik, Andreja
Published in: Pandemična družba: slovensko sociološko srečanje, 2021
Publisher: Slovenska sociološka družba

Visual Topic Modelling for NewsImage Task at MediaEval 2021

Author(s): Pivovarova, Lidia; Zosa, Elaine
Published in: MediaEval 2021 Multimedia Benchmark Workshop : Work ing Notes Proceedings of the MediaEval 2021 Workshop, 2021
Publisher: MediaEval Multimedia Benchmark
DOI: 10.5281/zenodo.6384719

TeMoTopic: Temporal Mosaic Visualisation of Topic Distribution, Keywords, and Context

Author(s): Sheehan, Shane; Luz, Saturnino; Masoodian, Masood
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730388

Robust Named Entity Recognition and Linking on Historical Multilingual Documents

Author(s): Boros, Emanuela; Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrián; Hamdi, Ahmed; Moreno, Jose G.; Sidère, Nicolas; Doucet, Antoine
Published in: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum (CLEF-HIPE 2020), 2020
Publisher: http://ceur-ws.org/
DOI: 10.5281/zenodo.4059652

Impact Analysis of Document Digitization on Event Extraction

Author(s): Nguyen, Nhu Khoa; Boroş, Emanuela; Lejeune, Gaël; Doucet, Antoine
Published in: In 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI* IA 2020), 2020
Publisher: CEUR Workshop Proceedings
DOI: 10.5281/zenodo.4680744

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Author(s): Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Published in: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730478

Topic modelling discourse dynamics in historical newspapers

Author(s): Marjanen, Jani; Zosa, Elaine; Hengchen, Simon; Pivovarova, Lidia; Tolonen, Mikko
Published in: In Post-Proceedings of the DHN2020 Conference: the 5th conference on Digital Humanities in the Nordic Countries, 2021
Publisher: CEUR Workshop Proceedings (CEUR-WS.org)

BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers

Author(s): Kokalj, Enja; Škrlj, Blaž; Lavrač, Nada; Pollak, Senja; Robnik-Šikonja, Marko
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730384

Multilingual Epidemic Event Extraction

Author(s): Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Published in: In the Proceedings of ICADL 2021, 2021, ISBN 978-3-030-91669-8
Publisher: Springer
DOI: 10.1007/978-3-030-91669-5_12

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

Author(s): Boroş, Emanuela; Doucet, Antoine
Published in: In Proceedings of the Thirteenth Text Analysis Conference (TAC 2020), 2021
Publisher: NIST USA
DOI: 10.5281/zenodo.4681008

SloBERTa: Slovene monolingual large pretrained masked language model

Author(s): Ulčar, Matej; Robnik-Šikonja, Marko
Published in: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD, 2021
Publisher: Jožef Stefan Institute

Alleviating Digitization Errors in Named Entity Recognition for Historical Documents

Author(s): Emanuela Boros, Ahmed Hamdi, Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Nicolas Sidere, Antoine Doucet
Published in: Proceedings of the 24th Conference on Computational Natural Language Learning, 2020, Page(s) 431-441
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.conll-1.35

Not All Comments Are Equal: Insights into Comment Moderation from a Topic-aware Model

Author(s): Zosa, Elaine; Shekhar, Ravi; Karan, Mladen; Purver, Matthew
Published in: In the Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021
Publisher: ACL

TLR at the NTCIR-15 FinNum-2 Task: Improving Text Classifiers for Numeral Attachment in Financial Social Data

Author(s): Moreno, Jose G.; Boros, Emanuela; Doucet, Antoine
Published in: In Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies, Issue 2, 2020
Publisher: Association for Computing Machinery
DOI: 10.5281/zenodo.4680695

Multilingual Detection of Fake News Spreaders via Sparse Matrix Factorization

Author(s): Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Published in: Notebook for PAN at CLEF 2020, 2020
Publisher: http://ceur-ws.org/
DOI: 10.5281/zenodo.4059635

Unsupervised Approach to Cross-Lingual User Comments Summarization

Author(s): Žagar, Aleš; Robnik-Šikonja, Marko
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association of Computational Linguistics
DOI: 10.5281/zenodo.4730327

Semantic Reasoning from Model-Agnostic Explanations

Author(s): Stepišnik-Perdih, Timen; Lavrač, Nada; Škrlj, Blaž
Published in: In the Proceedings of the 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)., 2021, ISBN 978-1-7281-8053-3
Publisher: IEEE
DOI: 10.1109/sami50585.2021.9378668

Linking Named Entities across Languages using Multilingual Word Embeddings

Author(s): Elvys Linhares Pontes, Jose G. Moreno, Antoine Doucet
Published in: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020, Page(s) 329-332, ISBN 9781450375856
Publisher: ACM
DOI: 10.1145/3383583.3398597

Discovery Team at SemEval-2020 Task 1: Context-sensitive Embeddings not Always Better Than Static for Semantic Change Detection

Author(s): Martinc, Matej; Montariol, Syrielle; Zosa, Elaine; Pivovarova, Lidia
Published in: In Proceedings of the Fourteenth Workshop on Semantic Evaluation (SemEval 2020), 2020, Page(s) 67-73
Publisher: International Committee for Computational Linguistics
DOI: 10.5281/zenodo.4681022

Creative Language Generation in a Society of Engagement and Reflection

Author(s): Wright, George A.; Purver, Matthew
Published in: In Proceedings of the Eleventh International Conference on Computational Creativity (ICCC2020), 2020
Publisher: Association for Computational Creativity (ACC)
DOI: 10.5281/zenodo.4680484

A Review of Cross-Domain Text-to-SQL Models

Author(s): Yujian Gan, Purver, Matthew, & Woodward, John
Published in: In the Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, 2020, Page(s) 108-115
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4699229

Event Detection with Entity Markers

Author(s): Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Published in: In the Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), 2021
Publisher: Springer

Parsing Text in a Workspace for Language Generation

Author(s): Wright, George A.; Purver, Matthew
Published in: In the Proceedings of the 2021 Society for Text & Discourse Annual Conference, 2021, 2021
Publisher: Easychair

Zero-shot cross-lingual content filtering: offensive language and hate speech detection

Author(s): Andraž, Pelicon; Shekhar, Ravi; Martinc, Matej; Škrlj, Blaž; Pollak, Senja; Purver, Matthew
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association Of Computational Linguistics
DOI: 10.5281/zenodo.4730308

Intérêt des modèles de caractères pour la détection d’événements

Author(s): Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Published in: In Proceedings of TALN 2021, 2021
Publisher: HAL-LIST

Embeddia at SemEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches

Author(s): Andraž Pelicon, Matej Martinc, and Petra Kralj Novak
Published in: Proceedings of The 13th International Workshop on Semantic Evaluation (SemEval), 2019
Publisher: SemEval

Generating Data using Monte Carlo Dropout

Author(s): Kristian Miok, Dong Nguyen-Doan, Daniela Zaharie, and Marko Robnik-Šikonja
Published in: IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP 2019), 2019
Publisher: IEEE

Detecting Depression with Word-Level Multimodal Fusion

Author(s): Morteza Rohanian, Julian Hough, Matthew Purver
Published in: Interspeech 2019, 2019, Page(s) 1443-1447
Publisher: ISCA
DOI: 10.21437/interspeech.2019-2283

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Author(s): Jani Marjanen, Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki
Published in: Proceedings of the 5th International Workshop on Computational History, 2019
Publisher: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Karst exploration: Extracting terms and definitions from karst

Author(s): Senja Pollak, Andraž Repar, Matej Martinc, and Vid Podpečan
Published in: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019
Publisher: Presses Universitaires de Louvain

Who is hot and who is not? Profiling celebs on Twitter

Author(s): Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Published in: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Issue 6, 2019
Publisher: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Fake or Not: Distinguishing Between Bots, Males and Females

Author(s): Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Published in: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Issue 2, 2019
Publisher: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Pooled LSTM for Dutch cross-genre gender classification

Author(s): Matej Martinc, Senja Pollak
Published in: Proceedings of the Shared Task on Cross-Genre Gender Detection in Dutch at Computational Linguistic in Netherlands (CLIN 2019) conference, 2019
Publisher: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Methods for Generating Colourful and Factual Multilingual News Headlines

Author(s): Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Published in: In Proceedings of the 10th International Conference on Computational Creativity (ICCC 2019), Issue 1, 2019, Page(s) 258-265, ISBN 978-989-54160-1-1
Publisher: Association for Computational Creativity (ACC)

TLR at BSNLP2019: A Multilingual Named Entity Recognition System

Author(s): Jose G. Moreno, Elvys Linhares Pontes, Mickael Coustaty, Antoine Doucet
Published in: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, Page(s) 83-88
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-3711

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Author(s): Jani Marjanen; Lidia Pivovarova; Elaine Zosa; Jussi Kurunmäki
Published in: HistoInformatics 2019: International Workshop on Computational History 2019, 2019
Publisher: CEUR-WS.org
DOI: 10.5281/zenodo.3689467

A Corpus Study on Questions, Responses and Misunderstanding Signals in Conversations with Alzheimer's Patients

Author(s): Shamila Nasreen; Matthew Purver; Julian Hough
Published in: Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue, Issue 13, 2019
Publisher: SEMDIAL
DOI: 10.5281/zenodo.3689456

Word Clustering for Historical Newspapers Analysis

Author(s): Pivovarova, Lidia; Marjanen, Jani; Zosa, Elaine
Published in: Proceedings of the Workshop on Language Technology for Digital Historical Archives in conjuction with RANLP-2019, 2019, Page(s) 3-10
Publisher: INCOMA Ltd.
DOI: 10.5281/zenodo.3402940

TeMoCo: A Visualization Tool for Temporal Analysis of Multi-party Dialogues in Clinical Settings

Author(s): Shane Sheehan, Pierre Albert, Saturnino Luz, Masood Masoodian
Published in: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), 2019, Page(s) 690-695, ISBN 978-1-7281-2286-1
Publisher: IEEE
DOI: 10.1109/CBMS.2019.00140

Gender, language, and society: word embeddings as a reflection of social inequalities in linguistic corpora

Author(s): Supej, Anka; Plahuta, Marko; Purver, Matthew; Mathioudakis, Michael; Pollak, Senja
Published in: In Znanost in družbe prihodnosti, Slovensko sociološko srečanje [Annual meeting of the Slovenian Sociological Association: Science and future societies], 2019
Publisher: Slovensko sociološko društvo
DOI: 10.5281/zenodo.3894466

No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines

Author(s): Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Published in: Proceedings of the 10th International Conference on Computational Creativity (ICCC2019), 2019
Publisher: Association for Computational Creativity

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Author(s): Kristian Miok, Dong Nguyen-Doan, Marko Robnik-Sikonja, Daniela Zaharie
Published in: 2019 E-Health and Bioengineering Conference (EHB), 2019, Page(s) 1-4, ISBN 978-1-7281-2603-6
Publisher: IEEE
DOI: 10.1109/EHB47216.2019.8969940

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Author(s): Ulčar, Matej; Robnik-Šikonja Marko
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 4731–4738
Publisher: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894535

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

Author(s): Martinc, Matej; Kralj Novak, Petra; Pollak, Senja
Published in: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), 2020, Page(s) 4811‑4819
Publisher: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894557

Multilingual Culture-Independent Word Analogy Datasets

Author(s): Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica; Dailidėnaitė, Milda; Robnik-Šikonja, Marko
Published in: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), Issue 1, 2020, Page(s) 4074‑4080
Publisher: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894553

Dataset for Temporal Analysis of English-French Cognates

Author(s): Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 855-859
Publisher: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3693651

A Dataset for Multi-lingual Epidemiological Event Extraction

Author(s): Mutuvi, Stephen; Doucet, Antoine; Lejeune, Gael; Odeo, Moses
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 4139–4144
Publisher: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3709626

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

Author(s): Carlos Santos Armendariz; Matthew Purver; Matej Ulčar; Senja Pollak; Nikola Ljubešič; Marko Robnik-Šikonja; Mark Granroth-Wilding; Kristiina Vaik
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 5878–5886
Publisher: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894565

Text Visualization for the Support of Lexicography-Based Scholarly Work

Author(s): Sheehan, Shane; Luz, Saturnino
Published in: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Page(s) 694-725
Publisher: Lexical Computing CZ s.r.o., Brno, Czech Republic
DOI: 10.5281/zenodo.3894619

Mining semantic relations from comparable corpora through intersections of word embeddings.

Author(s): Vintar, Špela; Grčič Simeunovič, Larisa; Martinc, Matej; Pollak, Senja; Stepišnik, Uroš
Published in: Proceedings of the LREC 2020 13th Workshop on Building and Using Comparable Corpora, 2020, Page(s) 29-34
Publisher: European Language Resources Association
DOI: 10.5281/zenodo.3894635

Interaction Patterns in Conversations with Alzheimer's Patients

Author(s): Nasreen, Shamila; Purver, Matthew; Hough, Julian
Published in: Poster presentation at the 7th International Conference on Statistical Language and Speech Processing. Ljubljana, Slovenia, 2019
Publisher: Springer
DOI: 10.5281/zenodo.3894637

Multilingual Dynamic Topic Model

Author(s): Elaine Zosa, Mark Granroth-Wilding
Published in: Proceedings - Natural Language Processing in a Deep Learning World, 2019, Page(s) 1388-1396, ISBN 9789-544520564
Publisher: Incoma Ltd., Shoumen, Bulgaria
DOI: 10.26615/978-954-452-056-4_159

The NetViz terminology visualization tool and the use cases in karstology domain modeling

Author(s): Pollak, Senja; Podpečan, Vid; Miljkovic, Dragana; Stepinšik, Uroš; Vintar, Špela
Published in: Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 2020, Page(s) 55-61
Publisher: European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894686

Communities of related terms in Karst terminology co-occurrence network

Author(s): Miljkovic, Dragana; Kralj, Jan; Stepišnik, Uroš; Pollak, Senja
Published in: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Page(s) 357-373
Publisher: Lexical Computing CZ s.r.o., Brno, Czech Republic
DOI: 10.5281/zenodo.3894684

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Author(s): Zosa, Elaine; Granroth-Wilding, Mark; Pivovarova, Lidia
Published in: Proceedings of the Cross-Language Search and Summarization of Text and Speech Workshop, 2020, Page(s) 32-37
Publisher: European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3898384

Capturing Evolution in Word Usage: Just Add More Clusters?

Author(s): Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova
Published in: Companion Proceedings of the Web Conference 2020, 2020, Page(s) 343-349, ISBN 9781-450370240
Publisher: ACM
DOI: 10.1145/3366424.3382186

Evaluating the Robustness of Embedding-based Topic Models to OCR Noise

Author(s): Zosa, Elaine; Mutuvi, Stephen; Granroth-Wilding, Mark; Doucet, Antoine
Published in: In the Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021
Publisher: Springer
DOI: 10.1007/978-3-030-91669-5_30

Evaluating Natural Language Descriptions Generated in a Workspace-Based Architecture

Author(s): Wright, George A.; Purver, Matthew
Published in: In the Proceedings of the 12th International Conference on Computational Creativity, ICCC2021, 2021
Publisher: Association for Computational Creativity

Multi-label classification of COVID-19-related articles with an autoML approach

Author(s): Tavchioski, Ilija; Koloski, Boshko; Škrlj, Blaž; Pollak, Senja
Published in: In Proceedings of the BioCreative VII Challenge Evaluation Workshop, 2021, Page(s) 295-299, ISBN 978-0-578-32368-8
Publisher: Biocreative

L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers

Author(s): Nhu Khoa Nguyen; Emanuela Boros; Gaël Lejeune; Antoine Doucet; Thierry Delahaut
Published in: Issue 30, 2021
Publisher: IW3C2
DOI: 10.5281/zenodo.4734321

CTLR@WiC-TSV: Target Sense Verification using Marked Inputs and Pre-trained Models

Author(s): Moreno, Jose G.; Linhares Pontes, Elvys; Dias, Gaël
Published in: In 6th Workshop on Semantic Deep Learning (SemDeep-6) associated to 29th International Joint Conference on Artificial Intelligence and 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), Issue 2, 2021
Publisher: International Joint Conferences on Artificial Intelligence
DOI: 10.5281/zenodo.4680720

Exploratory analysis of news sentiment using subgroup discovery

Author(s): Valmarska, Anita; Cabrera-Diego, Luis Adrián; Linhares Pontes, Elvys; Pollak, Senja
Published in: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730472

COVID-19 Therapy Target Discovery with Context-Aware Literature Mining

Author(s): Martinc, Matej; Škrlj, Blaž; Pirkmajer, Sergej; Lavrač, Nada; Cestnik, Bojan; Marzidovšek, Martin; Pollak, Senja
Published in: In Proceedings of the 23rd International Conference on Discovery Science (DS 2020), 2020, Page(s) 109-123
Publisher: Springer International Publishing
DOI: 10.5281/zenodo.4306020

A Baseline Document Planning Method for Automated Journalism

Author(s): Leppänen, Leo; Toivonen, Hannu
Published in: In the Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021
Publisher: Association for Computational Linguistics

Multilingual Epidemiological Text Classification: A Comparative Study

Author(s): Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Published in: Proceedings of the 28th International Conference on Computational Linguistics, Issue 44, 2020
Publisher: International Committee on Computational Linguistics
DOI: 10.5281/zenodo.4476039

Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer’s Dementia Recognition from Spontaneous Speech

Author(s): Morteza Rohanian, Julian Hough, Matthew Purver
Published in: Interspeech 2020, 2020, Page(s) 2187-2191
Publisher: ISCA
DOI: 10.21437/interspeech.2020-2721

Temporal Mental Health Dynamics on Social Media

Author(s): Tom Tabak, Matthew Purver
Published in: Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, 2020
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.nlpcovid19-2.7

Extending Neural Keyword Extraction with TF-IDF tagset matching

Author(s): Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730354

The Importance of Character-Level Information in an Event Detection Model

Author(s): Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Published in: In Proceedings of NLDB 2021, 2021
Publisher: Springer

Benchmarks for Unsupervised Discourse Change Detection

Author(s): Duong, Quan; Pivovarova, Lidia; Zosa, Elaine
Published in: In the Proceedings of the Histoinformatics workshop 2021, 2021
Publisher: CEUR

Three-part diachronic semantic change dataset for Russian

Author(s): Andrey Kutuzov, Lidia Pivovarova
Published in: Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 2021, Page(s) 7-13
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2021.lchange-1.2

SemEval2020 Task 3: Graded Word Similarity in Context

Author(s): Armendariz, Carlos Santos; Purver, Matthew; Pollak, Senja; Ljubešić, Nikola; Ulčar, Matej; Robnik-Šikonja, Marko; Vulić, Ivan; Mohammed Taher Pilehvar
Published in: In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), 2020, Page(s) 36-49
Publisher: International Committee for Computational Linguistics
DOI: 10.5281/zenodo.4309679

Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

Author(s): Vaik, Kristiina; Asula, Marit; Sirel, Raul
Published in: In Proceedings of the LREC2020 Industry Track, 2020, Page(s) 26-30
Publisher: The European Language Resources Association (ELRA)

A Baseline Document Planning Method for Automated Journalism

Author(s): Leppänen, Leo; Toivonen, Hannu
Published in: In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 2021
Publisher: Linköping University Electronic Press, Sweden

TeMoCo-Doc - A visualization for supporting temporal and contextual analysis of dialogues and associated documents

Author(s): Shane Sheehan, Saturnino Luz, Pierre Albert, Masood Masoodian
Published in: Proceedings of the International Conference on Advanced Visual Interfaces, 2020, Page(s) 1-3, ISBN 9781450375351
Publisher: ACM
DOI: 10.1145/3399715.3399956

Named Entity Recognition Architecture Combining Contextual and Global Features

Author(s): Tran Thi Hong, Hahn; Doucet, Antoine; Sidere, Nicolas; Moreno, Jose G.; Pollak, Senja
Published in: In the Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021
Publisher: Springer
DOI: 10.1007/978-3-030-91669-5_21

Aligning Estonian and Russian news industry keywords with the help of subtitle translations and an environmental thesaurus

Author(s): Repar, Andraž; Shumakov, Andrej
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730392

Zaznavanje sentimenta v novicah z globokimi nevronskimi mrežami

Author(s): Arhar Holdt, Špela; Pollak, Senja; Robnik-Šikonja, Marko; Krek, Simon
Published in: Issue In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 10-15
Publisher: Institute of Contemporary History
DOI: 10.5281/zenodo.4059729

Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie

Author(s): Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Gaël Lejeune; Adam Jatowt; Moses Odeo
Published in: Issue 29, 2021
Publisher: l’Association Francophone de Recherche d’Information et Applications ARIA
DOI: 10.5281/zenodo.4734472

Word-embedding based bilingual terminology alignment

Author(s): Repar, Andraž; Martinc, Matej; Ulčar, Matej; Pollak, Senja
Published in: In Proceedings of eLex 2021 (eLex2021), 2021
Publisher: Brno: Lexical Computing CZ, s.r.o.

Investigating the Semantic Wave in Tutorial Dialogues: An Annotation Scheme and Corpus Study on Analogy Components

Author(s): Del-Bosque-Trevino, Jorge, Hough, Julian, and Purver, Matthew
Published in: In Proceedings of the 24th SemDial Workshop on the Semantics and Pragmatics of Dialogue (SemDial), 2020
Publisher: SEMDIAL

Interesting cross-border news discovery using cross-lingual article linking and document similarity

Author(s): Koloski, Boshko; Zosa, Elaine; Stepišnik-Perdih, Timen; Škrlj, Blaž; Paju, Tarmo; Pollak, Senja
Published in: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730369

An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

Author(s): Pranjić, Marko; Robnik-Šikonja, Marko; Pollak, Senja
Published in: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021
Publisher: Jožef Stefan Institute

Evaluation of related news recommendations using document similarity methods

Author(s): Pranjić, Marko; Podpečan, Vid; Robnik-Šikonja, Marko; Pollak, Senja
Published in: Issue In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 81-86
Publisher: Institute of Contemporary History
DOI: 10.5281/zenodo.4059710

Dimenzija spola v slovenskih vektorskih vložitvah besed: primerjava modelov prek analogij poklicev

Author(s): Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Published in: In Proceedings of the Joint Conference on Digital Libraries (JCDL 2020), 2020, Page(s) 93-100
Publisher: Institute of Contemporary History
DOI: 10.5281/zenodo.4059700

Mitigating Gender Bias in Word Embeddings using Explicit Gender Free Corpus

Author(s): Hargrave, David
Published in: Masters thesis, School of Electronic Engineering and Computer Science, Queen Mary University of London, 2021
Publisher: Queen Mary University of London

Silicon Valley och makten över medierna [Silicon Valley and the power over media]

Author(s): Carl-Gustav Linden
Published in: Issue 1, 2020
Publisher: Nordicom
DOI: 10.48335/9789188855350

Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Author(s): Toivonen, Hannu; Boggia, Michele
Published in: 2021, ISBN 978-1-954085-13-8
Publisher: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730375

Searching for OpenAIRE data...

Deliverables

Publications

Share this page Share this page on social networks

Download Download the content of the page