Skip to main content

Cross-Lingual Embeddings for Less-Represented Languages in European News Media

Deliverables

Initial cross-lingual and multilingual embeddings technology (T1.1)

Initial embeddings and transformations between a selection of all targeted languages (Estonian, Finnish, Swedish, Latvian, Lithuanian, Croatian, Slovene, English, Russian) (report and source code) (T1.1)

Initial cross-lingual semantic enrichment technology (T2.1)

Initial approach to named entity (NE) extraction and disambiguation and event detection, covering multiple domains and languages (report and source code) (T2.1).

Datasets, benchmarks and evaluation metrics for cross-lingual content analysis (T4.4)

Gathering and preprocessing training and testing data (Estonian, Latvian, Lithuanian, Russian, Croatian, Finnish and English) provided by the media partners (report and dataset) (T4.4) .

Initial deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languages by using character-level inputs and additional information on morphology (suffixes, prefixes, separately trained POS tags) (report and source code) (T1.3).

Interim report on ethics and responsible science and journalism (T6.5)

Interim report on ethics and responsible science and journalism, with analysis of news production and new tool development (T6.5).

Initial interpretability and visualisation technology (T1.4)

Initial approaches to explanation of deep learning models by adoptation of perturbation based explanation methods based on coalitional game theory to ext classification and initial development of visual tools for visually explaining the classification process. (report and source code) (T1.4).

Initial context-dependent and dynamic embeddings technology (T1.2)

Context-aware cross-lingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented (report and source code) (T1.2).

Report on user needs and challenges for news media industry (T6.1).

Initial report on identification and analysis of needs of different stakeholders in news media industry. We will arrange workshop to identify in detail challenges that are specific to operations of different media partners and prepare a specifications documentation (T6.1).

Recommendations on avoiding gender and other biases (T6.4)

The means to avoid and detect gender and other biases in news media contents creation will be developped in T6.4. This deliverable will propose the recommendations for avoiding gender bias (T6.4).

Initial cross-lingual context and opinion analysis technology (T3.1)

Report on initial developed technology for a range of user comment analyses, including topic modelling, conversation structure and context modelling, sentiment, stance and opinion detection and effect and information spread measurement (report and source code) (T3.1).

Initial multilingual news linking technology (T4.1)

Development of initial tools for linking news stories across languages based on their topics and contents (report and source code) (T4.1).

Initial keyword extraction techniques (T2.2)

Initial keyword extraction by application of statistical approaches (based on heuristics), machine learning approaches, as well as graph-based approaches (report and source code) (T2.2).

Initial dynamic news generation technology (T5.2)

Development of a novel method for automatically organising news articles, considering the domain of the article, effects of time and news repetition (report and source code) (T5.2).

Refined analysis of news media partners’ needs and challenges (T6.1).

Refined report of news media partners’ needs and challenges and their analysis with regard to the state of the art in NLP for news media (T6.1).

Datasets, benchmarks and evaluation metrics for cross-lingual user generated content filtering and analysis (T3.4)

Evaluation and development of algorithms requires relevant, annotated, and multilingual datasets (report and dataset) (T3.4).

Multilingual language generation approach (T2.3)

Incorporating hybrid techniques in the architecture, to take advantage of the robustness of machine learning techniques and transparency of rule-based techniques. Adaptation of the context-aware word-embeddings developed in T1.2 to improve fluency and variability in the generated texts (report and source code) (T2.3).

Initial news generation technology (T5.1)

Based on the analysis of newsrooms (WP6), the NLG technology will be adapted for the requirements of news generation. The task will develop mechanisms for (i) determining what is interesting or important in the given data and deciding what to report, and for (ii) rendering that information in an accurate manner (iii) in multiple languages (report and source code) (T5.1).

Platform requirements documentation and platform design (T6.2)

The EMBEDDIA Toolkit will incorporate different tools and resources developed in WP1–WP5 and on top of it build the EMBEDDIA Media Assistant platform. The platform will be built as a series of base microservices, functional microservices and task oriented APIs. This deliverable will report on platform requirements and platform design (T6.2).

Initial cross-lingual news viewpoints identification technology (T4.3)

Initial approaches for detecting viewpoints and sentiments based on media sources (report and source code) (T4.3) .

Datasets, benchmarks and evaluation metrics for advanced cross-lingual NLP technology (T2.4)

Report on existing evaluation datasets and benchmarks for NER, NEL and event detection (for instance, ACE, Meantime and TAC KBP’s Entity Discovery and Linking tasks) (report and dataset) (T2.4).

Initial cross-lingual comment filtering technology (T3.2)

Report on developed tools for automatic flagging or filtering of user comments, specifically targeted at the use cases defined by end user partners in WP6, e.g., detection of hate speech and political trolling, attempts to elicit extreme reactions and influence others’ opinions (report and source code) (T3.2).

Datasets, benchmarks and evaluation metrics for multilingual text generation (T5.4)

From news partners texts (news stories) and structured datasets from which news can be generated will be collected (report and datasets) and methodology for evaluation defined (T5.4).

Initial cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual language-independent multi-document news summarisation (report and source code) (T4.2).

Datasets, benchmarks and evaluation metrics for cross-lingual word embeddings (T1.5)

A repository of training and evaluation data, stored in a dedicated GitHub repository (report and datasets) (T1.5).

Project website and social media accounts (T7.1)

Created project website --- which will function both as a project dissemination tool and for providing access to the technical outcomes produced by the project --- and social media accounts/pages on relevant social networks will be created (T7.1)

Searching for OpenAIRE data...

Publications

To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Author(s): Kristian Miok, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Published in: Cognitive Computation, 2021, ISSN 1866-9956
DOI: 10.1007/s12559-021-09826-9

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Author(s): Andraž Pelicon, Marko Pranjić, Dragana Miljković, Blaž Škrlj, Senja Pollak
Published in: Applied Sciences, Issue 10/17, 2020, Page(s) 5993, ISSN 2076-3417
DOI: 10.3390/app10175993

Nazaj v prihodnost: avtomatizacija in preobrazba novinarske epistemologije

Author(s): Igor Vobič, Marko Robnik Šikonja, Monika Kalin Golob
Published in: Javnost - The Public, Issue 26/sup1, 2019, Page(s) S41-S61, ISSN 1318-3222
DOI: 10.1080/13183222.2019.1696600

Completability vs (In)completeness

Author(s): Eleni Gregoromichelaki, Gregory James Mills, Christine Howes, Arash Eshghi, Stergios Chatzikyriakidis, Matthew Purver, Ruth Kempson, Ronnie Cann, Patrick G. T. Healey
Published in: Acta Linguistica Hafniensia, Issue 52/2, 2020, Page(s) 260-284, ISSN 0374-0463
DOI: 10.1080/03740463.2020.1795549

TNT-KID: Transformer-based neural tagger for keyword identification

Author(s): Matej Martinc, Blaž Škrlj, Senja Pollak
Published in: Natural Language Engineering, 2021, Page(s) 1-40, ISSN 1351-3249
DOI: 10.1017/s1351324921000127

Investigating cross-lingual training for offensive language detection

Author(s): Andraž Pelicon, Ravi Shekhar, Blaž Škrlj, Matthew Purver, Senja Pollak
Published in: PeerJ Computer Science, Issue 7, 2021, Page(s) e559, ISSN 2376-5992
DOI: 10.7717/peerj-cs.559

Re-Representing Metaphor: Modeling Metaphor Perception Using Dynamically Contextual Distributional Semantics

Author(s): Stephen McGregor, Kat Agres, Karolina Rataj, Matthew Purver, Geraint Wiggins
Published in: Frontiers in Psychology, Issue 10, 2019, ISSN 1664-1078
DOI: 10.3389/fpsyg.2019.00765

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Author(s): Blaž Škrlj, Jan Kralj, Nada Lavrač, Senja Pollak
Published in: Machine Learning and Knowledge Extraction, Issue 1/2, 2019, Page(s) 575-589, ISSN 2504-4990
DOI: 10.3390/make1020034

Predicting Slovene Text Complexity Using Readability Measures

Author(s): Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt, Marko Robnik-Šikonja
Published in: In Contributions to Contemporary History, 2019, ISSN 2463-7807

Combining n -grams and deep convolutional features for language variety classification

Author(s): Matej Martinc, Senja Pollak
Published in: Natural Language Engineering, Issue 25/5, 2019, Page(s) 607-632, ISSN 1351-3249
DOI: 10.1017/S1351324919000299

TermEnsembler

Author(s): Andraž Repar, Vid Podpečan, Anže Vavpetič, Nada Lavrač, Senja Pollak
Published in: Terminology, Issue 25/1, 2019, Page(s) 93-120, ISSN 0929-9971
DOI: 10.1075/term.00029.rep

Reproduction, replication, analysis and adaptation of a term alignment approach

Author(s): Andraž Repar, Matej Martinc, Senja Pollak
Published in: Language Resources and Evaluation, 2019, ISSN 1574-020X
DOI: 10.1007/s10579-019-09477-1

‘Our task is to demystify fears’: Analysing newsroom management of automation in journalism

Author(s): Marko Milosavljević, Igor Vobič
Published in: Journalism, 2019, Page(s) 146488491986159, ISSN 1464-8849
DOI: 10.1177/1464884919861598

Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge

Author(s): Saturnino Luz, Shane Sheehan
Published in: Palgrave Communications, Issue 6/1, 2020, ISSN 2055-1045
DOI: 10.1057/s41599-020-0423-6

Exploring the Relations Between Net Benefits of IT Projects and CIOs’ Perception of Quality of Software Development Disciplines

Author(s): Damjan Vavpotič, Marko Robnik-Šikonja, Tomaž Hovelja
Published in: Business & Information Systems Engineering, 2019, ISSN 2363-7005
DOI: 10.1007/s12599-019-00612-4

Data Journalism as a Service: Digital Native Data Journalism Expertise and Product Development

Author(s): Ester Appelgren, Carl-Gustav Lindén
Published in: Media and Communication, Issue 8/2, 2020, Page(s) 62, ISSN 2183-2439
DOI: 10.17645/mac.v8i2.2757

How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context

Author(s): Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
Published in: Transactions of the Association for Computational Linguistics, Issue 8, 2020, Page(s) 296-310, ISSN 2307-387X
DOI: 10.1162/tacl_a_00315

Compressive approaches for cross-language multi-document summarization

Author(s): Elvys Linhares Pontes, Stéphane Huet, Juan-Manuel Torres-Moreno, Andréa Carneiro Linhares
Published in: Data & Knowledge Engineering, Issue 125, 2020, Page(s) 101763, ISSN 0169-023X
DOI: 10.1016/j.datak.2019.101763

Computational generation of slogans

Author(s): Khalid Alnajjar, Hannu Toivonen
Published in: Natural Language Engineering, 2020, Page(s) 1-33, ISSN 1351-3249
DOI: 10.1017/S1351324920000236

Nazaj v prihodnost: avtomatizacija in preobrazba novinarske epistemologije

Author(s): Igor Vobič, Marko Robnik Šikonja, Monika Kalin Golob
Published in: Javnost - The Public, Issue 26/sup1, 2019, Page(s) S41-S61, ISSN 1318-3222
DOI: 10.1080/13183222.2019.1696600

In the Name of the Right to be Forgotten: New Legal and Policy Issues and Practices regarding Unpublishing Requests in Slovenian Online News Media

Author(s): Marko Milosavljević, Melita Poler, Rok Čeferin
Published in: Digital Journalism, 2020, Page(s) 1-17, ISSN 2167-0811
DOI: 10.1080/21670811.2020.1747942

(Mis)Information Operations: An Integrated Perspective

Author(s): Cinelli, Matteo; Conti, Mauro; Finos, Livio; Grisolia, Francesco; Kralj Novak, Petra; Peruzzi, Antonio; Tesconi, Maurizio; Zollo, Fabia; Quattrociocchi, Walter
Published in: Journal of Information Warfare, Issue 18(3), 2020, ISSN 1445-3312

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Author(s): Linhares Pontes, Elvys; Huet, Stéphane; Torres Moreno, Juan Manuel; Gouveia da Silva, Thiago; Carneiro Linhares, Andréa
Published in: Computación y Sistemas, Issue 24(2), 2020, ISSN 1405-5546

Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting

Author(s): Leo Leppänen, Hanna Tuulonen, Stefanie Sirén-Heikel
Published in: Media and Communication, Issue 8/3, 2020, Page(s) 39, ISSN 2183-2439
DOI: 10.17645/mac.v8i3.3022

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

Author(s): Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak
Published in: Computer Speech & Language, Issue 65, 2021, Page(s) 101104, ISSN 0885-2308
DOI: 10.1016/j.csl.2020.101104

Cross-lingual transfer of abstractive summarizer to less-resource language

Author(s): Aleš Žagar, Marko Robnik-Šikonja
Published in: Journal of Intelligent Information Systems, 2021, ISSN 0925-9902
DOI: 10.1007/s10844-021-00663-8

Bisociative Literature-Based Discovery: Lessons Learned and New Word Embedding Approach

Author(s): Nada Lavrač, Matej Martinc, Senja Pollak, Maruša Pompe Novak, Bojan Cestnik
Published in: New Generation Computing, Issue 38/4, 2020, Page(s) 773-800, ISSN 0288-3635
DOI: 10.1007/s00354-020-00108-w

Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Author(s): Shekhar, Ravi; Pranjić. Marko; Pollak, Senja; Pelicon, Andraž; Purver, Matthew
Published in: Journal for Language Technology and Computational Linguistics, Issue 2, 2020, Page(s) 49-79, ISSN 2190-6858
DOI: 10.5281/zenodo.4032371

Slovene and Croatian word embeddings in terms of gender occupational analogies

Author(s): Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Published in: Slovenščina 2.0: empirical, applied and interdisciplinary research, Issue 9/1, 2021, Page(s) 26-59, ISSN 2335-2736
DOI: 10.4312/slo2.0.2021.1.26-59

Recycling a genre for news automation

Author(s): Lauri Haapanen, Leo Leppänen
Published in: AILA Review, Issue 33, 2020, Page(s) 67-85, ISSN 1461-0213
DOI: 10.1075/aila.00030.haa

Incremental Composition in Distributional Semantics

Author(s): Matthew Purver, Mehrnoosh Sadrzadeh, Ruth Kempson, Gijs Wijnholds, Julian Hough
Published in: Journal of Logic, Language and Information, Issue 30/2, 2021, Page(s) 379-406, ISSN 0925-8531
DOI: 10.1007/s10849-021-09337-8

SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

Author(s): Sebastian Meznar, Nada Lavrac, Blaz Skrlj
Published in: IEEE Access, Issue 8, 2020, Page(s) 212568-212588, ISSN 2169-3536
DOI: 10.1109/access.2020.3039541

Token-Level Multilingual Epidemic Dataset for Event Extraction

Author(s): Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Gaël Lejeune, Adam Jatowt, Moses Odeo
Published in: Linking Theory and Practice of Digital Libraries - 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Virtual Event, September 13–17, 2021, Proceedings, Issue 12866, 2021, Page(s) 55-59
DOI: 10.1007/978-3-030-86324-1_6

Entity Linking for Historical Documents: Challenges and Solutions

Author(s): Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Emanuela Boros, Ahmed Hamdi, Nicolas Sidère, Mickaël Coustaty, Antoine Doucet
Published in: Digital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Issue 12504, 2020, Page(s) 215-231
DOI: 10.1007/978-3-030-64452-9_19

Identification of COVID-19 Related Fake News via Neural Stacking

Author(s): Boshko Koloski, Timen Stepišnik-Perdih, Senja Pollak, Blaž Škrlj
Published in: Combating Online Hostile Posts in Regional Languages during Emergency Situation - First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers, Issue 1402, 2021, Page(s) 177-188
DOI: 10.1007/978-3-030-73696-5_17

FinEst BERT and CroSloEngual BERT - Less Is More in Multilingual Models

Author(s): Matej Ulčar, Marko Robnik-Šikonja
Published in: Text, Speech, and Dialogue - 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings, Issue 12284, 2020, Page(s) 104-111
DOI: 10.1007/978-3-030-58323-1_11

RaKUn: Rank-based Keyword Extraction via Unsupervised Learning and Meta Vertex Aggregation

Author(s): Blaž Škrlj, Andraž Repar, Senja Pollak
Published in: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Issue 11816, 2019, Page(s) 311-323
DOI: 10.1007/978-3-030-31372-2_26

Language Comparison via Network Topology

Author(s): Blaž Škrlj, Senja Pollak
Published in: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Issue 11816, 2019, Page(s) 112-123
DOI: 10.1007/978-3-030-31372-2_10

Prediction Uncertainty Estimation for Hate Speech Classification

Author(s): Kristian Miok, Dong Nguyen-Doan, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Published in: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Issue 11816, 2019, Page(s) 286-298
DOI: 10.1007/978-3-030-31372-2_24

Symbolic Graph Embedding Using Frequent Pattern Mining

Author(s): Blaž Škrlj, Nada Lavrač, Jan Kralj
Published in: Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings, Issue 11828, 2019, Page(s) 261-275
DOI: 10.1007/978-3-030-33778-0_21

Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Author(s): Robnik-Šikonja, Marko; Reba, Kristijan; Mozetič, Igor
Published in: In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 87-92
DOI: 10.5281/zenodo.4059725

Know your Neighbors: Efficient Author Profiling via Follower Tweets

Author(s): Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Published in: Notebook for PAN at CLEF 2020, 2020
DOI: 10.5281/zenodo.4059641

Corpus KAS 2.0: Cleaner and with New Datasets

Author(s): Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko
Published in: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021

Automated Hate Speech Target Identification

Author(s): Pelicon, Andraž; Škrlj, Blaž; Kralj Novak, Petra
Published in: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021

Bayesian Methods for Semi-supervised Text Annotation

Author(s): Miok, Kristian; Pirs, Gregor; Robnik-Sikonja, Marko
Published in: In Proceedings of the 14th Linguistic Annotation Workshop Co-located with COLING 2020, Issue 2, 2020

Underreporting of errors in NLG output, and what to do about it

Author(s): van Miltenburg, Emiel; Clinciu, Miruna; Dušek, Ondrej; Gkatzia, Dimitra; Inglis, Stephanie; Leppänen, Leo; Mahamood, Saad; Manning, Emma; Schoch, Stephanie; Thomson, Craig; Wen, Luou
Published in: In the Proceedings of the 14th International Conference on Natural Language Generation, 2021

Simple discovery of COVID ISWAR Metaphors Using Word Embeddings

Author(s): Brglez, Mojca; Pollak, Senja; Vintar, Špela
Published in: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021

Robust Named Entity Recognition and Linking on Historical Multilingual Documents

Author(s): Boros, Emanuela; Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrián; Hamdi, Ahmed; Moreno, Jose G.; Sidère, Nicolas; Doucet, Antoine
Published in: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum (CLEF-HIPE 2020), 2020
DOI: 10.5281/zenodo.4059652

SloBERTa: Slovene monolingual large pretrained masked language model

Author(s): Ulčar, Matej; Robnik-Šikonja, Marko
Published in: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD, 2021

Linking Named Entities across Languages using Multilingual Word Embeddings

Author(s): Elvys Linhares Pontes, Jose G. Moreno, Antoine Doucet
Published in: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020, Page(s) 329-332
DOI: 10.1145/3383583.3398597

Event Detection with Entity Markers

Author(s): Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Published in: In the Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), 2021

Intérêt des modèles de caractères pour la détection d’événements

Author(s): Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Published in: In Proceedings of TALN 2021, 2021

Embeddia at SemEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches

Author(s): Andraž Pelicon, Matej Martinc, and Petra Kralj Novak
Published in: Proceedings of The 13th International Workshop on Semantic Evaluation (SemEval), 2019

Generating Data using Monte Carlo Dropout

Author(s): Kristian Miok, Dong Nguyen-Doan, Daniela Zaharie, and Marko Robnik-Šikonja
Published in: IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP 2019), 2019

Detecting Depression with Word-Level Multimodal Fusion

Author(s): Morteza Rohanian, Julian Hough, Matthew Purver
Published in: Interspeech 2019, 2019, Page(s) 1443-1447
DOI: 10.21437/interspeech.2019-2283

Word Clustering for Historical Newspapers Analysis

Author(s): Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki
Published in: Proceedings of the Workshop on Language Technology for Digital Historical Archives, 2019

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Author(s): Jani Marjanen, Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki
Published in: Proceedings of the 5th International Workshop on Computational History, 2019

Karst exploration: Extracting terms and definitions from karst

Author(s): Senja Pollak, Andraž Repar, Matej Martinc, and Vid Podpečan
Published in: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019

Who is hot and who is not? Profiling celebs on Twitter

Author(s): Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Published in: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Issue 6, 2019

Fake or Not: Distinguishing Between Bots, Males and Females

Author(s): Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Published in: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Issue 2, 2019

Pooled LSTM for Dutch cross-genre gender classification

Author(s): Matej Martinc, Senja Pollak
Published in: Proceedings of the Shared Task on Cross-Genre Gender Detection in Dutch at Computational Linguistic in Netherlands (CLIN 2019) conference, 2019

Methods for Generating Colourful and Factual Multilingual News Headlines

Author(s): Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Published in: In Proceedings of the 10th International Conference on Computational Creativity (ICCC 2019), Issue 1, 2019, Page(s) 258-265

TLR at BSNLP2019: A Multilingual Named Entity Recognition System

Author(s): Jose G. Moreno, Elvys Linhares Pontes, Mickael Coustaty, Antoine Doucet
Published in: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, Page(s) 83-88
DOI: 10.18653/v1/w19-3711

Generating Data using Monte Carlo Dropout

Author(s): Miok, Kristian; Nguyen-Doan, Dong; Zaharie, Daniela; Robnik-Šikonja, Marko
Published in: Issue 1, 2019
DOI: 10.5281/zenodo.3559060

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Author(s): Jani Marjanen; Lidia Pivovarova; Elaine Zosa; Jussi Kurunmäki
Published in: HistoInformatics 2019: International Workshop on Computational History 2019, 2019
DOI: 10.5281/zenodo.3689467

A Corpus Study on Questions, Responses and Misunderstanding Signals in Conversations with Alzheimer's Patients

Author(s): Shamila Nasreen; Matthew Purver; Julian Hough
Published in: Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue, Issue 13, 2019
DOI: 10.5281/zenodo.3689456

Word Clustering for Historical Newspapers Analysis

Author(s): Pivovarova, Lidia; Marjanen, Jani; Zosa, Elaine
Published in: Proceedings of the Workshop on Language Technology for Digital Historical Archives in conjuction with RANLP-2019, 2019, Page(s) 3-10
DOI: 10.5281/zenodo.3402940

TeMoCo: A Visualization Tool for Temporal Analysis of Multi-party Dialogues in Clinical Settings

Author(s): Shane Sheehan, Pierre Albert, Saturnino Luz, Masood Masoodian
Published in: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), 2019, Page(s) 690-695
DOI: 10.1109/CBMS.2019.00140

Gender, language, and society: word embeddings as a reflection of social inequalities in linguistic corpora

Author(s): Supej, Anka; Plahuta, Marko; Purver, Matthew; Mathioudakis, Michael; Pollak, Senja
Published in: In Znanost in družbe prihodnosti, Slovensko sociološko srečanje [Annual meeting of the Slovenian Sociological Association: Science and future societies], 2019
DOI: 10.5281/zenodo.3894466

No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines

Author(s): Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Published in: Proceedings of the 10th International Conference on Computational Creativity (ICCC2019), 2019

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Author(s): Kristian Miok, Dong Nguyen-Doan, Marko Robnik-Sikonja, Daniela Zaharie
Published in: 2019 E-Health and Bioengineering Conference (EHB), 2019, Page(s) 1-4
DOI: 10.1109/EHB47216.2019.8969940

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Author(s): Ulčar, Matej; Robnik-Šikonja Marko
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 4731–4738
DOI: 10.5281/zenodo.3894535

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

Author(s): Martinc, Matej; Kralj Novak, Petra; Pollak, Senja
Published in: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), 2020, Page(s) 4811‑4819
DOI: 10.5281/zenodo.3894557

Multilingual Culture-Independent Word Analogy Datasets

Author(s): Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica; Dailidėnaitė, Milda; Robnik-Šikonja, Marko
Published in: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), Issue 1, 2020, Page(s) 4074‑4080
DOI: 10.5281/zenodo.3894553

Dataset for Temporal Analysis of English-French Cognates

Author(s): Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 855-859
DOI: 10.5281/zenodo.3693651

A Dataset for Multi-lingual Epidemiological Event Extraction

Author(s): Mutuvi, Stephen; Doucet, Antoine; Lejeune, Gael; Odeo, Moses
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 4139–4144
DOI: 10.5281/zenodo.3709626

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

Author(s): Carlos Santos Armendariz; Matthew Purver; Matej Ulčar; Senja Pollak; Nikola Ljubešič; Marko Robnik-Šikonja; Mark Granroth-Wilding; Kristiina Vaik
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 5878–5886
DOI: 10.5281/zenodo.3894565

Text Visualization for the Support of Lexicography-Based Scholarly Work

Author(s): Sheehan, Shane; Luz, Saturnino
Published in: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Page(s) 694-725
DOI: 10.5281/zenodo.3894619

Mining semantic relations from comparable corpora through intersections of word embeddings.

Author(s): Vintar, Špela; Grčič Simeunovič, Larisa; Martinc, Matej; Pollak, Senja; Stepišnik, Uroš
Published in: Proceedings of the LREC 2020 13th Workshop on Building and Using Comparable Corpora, 2020, Page(s) 29-34
DOI: 10.5281/zenodo.3894635

Interaction Patterns in Conversations with Alzheimer's Patients

Author(s): Nasreen, Shamila; Purver, Matthew; Hough, Julian
Published in: Poster presentation at the 7th International Conference on Statistical Language and Speech Processing. Ljubljana, Slovenia, 2019
DOI: 10.5281/zenodo.3894637

Multilingual Dynamic Topic Model

Author(s): Elaine Zosa, Mark Granroth-Wilding
Published in: Proceedings - Natural Language Processing in a Deep Learning World, 2019, Page(s) 1388-1396
DOI: 10.26615/978-954-452-056-4_159

The NetViz terminology visualization tool and the use cases in karstology domain modeling

Author(s): Pollak, Senja; Podpečan, Vid; Miljkovic, Dragana; Stepinšik, Uroš; Vintar, Špela
Published in: Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 2020, Page(s) 55-61
DOI: 10.5281/zenodo.3894686

Communities of related terms in Karst terminology co-occurrence network

Author(s): Miljkovic, Dragana; Kralj, Jan; Stepišnik, Uroš; Pollak, Senja
Published in: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Page(s) 357-373
DOI: 10.5281/zenodo.3894684

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Author(s): Zosa, Elaine; Granroth-Wilding, Mark; Pivovarova, Lidia
Published in: Proceedings of the Cross-Language Search and Summarization of Text and Speech Workshop, 2020, Page(s) 32-37
DOI: 10.5281/zenodo.3898384

Capturing Evolution in Word Usage: Just Add More Clusters?

Author(s): Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova
Published in: Companion Proceedings of the Web Conference 2020, 2020, Page(s) 343-349
DOI: 10.1145/3366424.3382186

A Baseline Document Planning Method for Automated Journalism

Author(s): Leppänen, Leo; Toivonen, Hannu
Published in: In the Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021

Benchmarks for Unsupervised Discourse Change Detection

Author(s): Duong, Quan; Pivovarova, Lidia; Zosa, Elaine
Published in: In the Proceedings of the Histoinformatics workshop 2021, 2021

Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

Author(s): Vaik, Kristiina; Asula, Marit; Sirel, Raul
Published in: In Proceedings of the LREC2020 Industry Track, 2020, Page(s) 26-30

A Baseline Document Planning Method for Automated Journalism

Author(s): Leppänen, Leo; Toivonen, Hannu
Published in: In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 2021

TeMoCo-Doc - A visualization for supporting temporal and contextual analysis of dialogues and associated documents

Author(s): Shane Sheehan, Saturnino Luz, Pierre Albert, Masood Masoodian
Published in: Proceedings of the International Conference on Advanced Visual Interfaces, 2020, Page(s) 1-3
DOI: 10.1145/3399715.3399956

Zaznavanje sentimenta v novicah z globokimi nevronskimi mrežami

Author(s): Arhar Holdt, Špela; Pollak, Senja; Robnik-Šikonja, Marko; Krek, Simon
Published in: Issue In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 10-15
DOI: 10.5281/zenodo.4059729

Word-embedding based bilingual terminology alignment

Author(s): Repar, Andraž; Martinc, Matej; Ulčar, Matej; Pollak, Senja
Published in: In Proceedings of eLex 2021 (eLex2021), 2021

Investigating the Semantic Wave in Tutorial Dialogues: An Annotation Scheme and Corpus Study on Analogy Components

Author(s): Del-Bosque-Trevino, Jorge, Hough, Julian, and Purver, Matthew
Published in: In Proceedings of the 24th SemDial Workshop on the Semantics and Pragmatics of Dialogue (SemDial), 2020

An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

Author(s): Pranjić, Marko; Robnik-Šikonja, Marko; Pollak, Senja
Published in: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021

Evaluation of related news recommendations using document similarity methods

Author(s): Pranjić, Marko; Podpečan, Vid; Robnik-Šikonja, Marko; Pollak, Senja
Published in: Issue In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 81-86
DOI: 10.5281/zenodo.4059710

Dimenzija spola v slovenskih vektorskih vložitvah besed: primerjava modelov prek analogij poklicev

Author(s): Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Published in: In Proceedings of the Joint Conference on Digital Libraries (JCDL 2020), 2020, Page(s) 93-100
DOI: 10.5281/zenodo.4059700

Cross-lingual embeddings for hate speech detection in comments

Author(s): Marinšek, Rok
Published in: 2019
DOI: 10.5281/zenodo.3894645

Cross-lingual approach to abstractive summarization

Author(s): Žagar, Aleš
Published in: MSc Thesis, 2020
DOI: 10.5281/zenodo.3967214

Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Author(s): Toivonen, Hannu; Boggia, Michele
Published in: 2021
DOI: 10.5281/zenodo.4730375