Cross-Lingual Embeddings for Less-Represented Languages in European News Media

CORDIS fournit des liens vers les livrables publics et les publications des projets HORIZON.

Les liens vers les livrables et les publications des projets du 7e PC, ainsi que les liens vers certains types de résultats spécifiques tels que les jeux de données et les logiciels, sont récupérés dynamiquement sur OpenAIRE .

Livrables

Final context-dependent and dynamic embeddings technology (T1.2)

Contextaware crosslingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented report and source code T12

Initial cross-lingual and multilingual embeddings technology (T1.1)

Initial embeddings and transformations between a selection of all targeted languages (Estonian, Finnish, Swedish, Latvian, Lithuanian, Croatian, Slovene, English, Russian) (report and source code) (T1.1)

Initial cross-lingual semantic enrichment technology (T2.1)

Initial approach to named entity (NE) extraction and disambiguation and event detection, covering multiple domains and languages (report and source code) (T2.1).

Datasets, benchmarks and evaluation metrics for cross-lingual content analysis (T4.4)

Gathering and preprocessing training and testing data (Estonian, Latvian, Lithuanian, Russian, Croatian, Finnish and English) provided by the media partners (report and dataset) (T4.4) .

Initial deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languages by using character-level inputs and additional information on morphology (suffixes, prefixes, separately trained POS tags) (report and source code) (T1.3).

Interim report on ethics and responsible science and journalism (T6.5)

Interim report on ethics and responsible science and journalism, with analysis of news production and new tool development (T6.5).

Final evaluation report on cross-lingual user generated content filtering and analysis technology (T3.4)

Producing datasets for evaluation and development of algorithms T34

Final dynamic multilingual news generation technology (T5.2)

Development of a novel method for automatically organising news articles to be maximally informative to the assumed reader report and source code T52

Final cross-lingual news viewpoints identification technology (T4.3)

Development of methods for detecting viewpoints and sentiments based on media sources report and source code T43

Final real-time multilingual news linking technology (T4.1)

Development of tools for linking news stories across languages based on their topics andcontents report and source code T41

Final evaluation report on cross-lingual content analysis technology (T4.4)

All tools developed in WP4 will be evaluated using the produced datasets and manual user evaluation T44

Final report on ethics and responsible science and journalism (T6.5).

Final report on ethics and responsible science and journalism T65

Initial interpretability and visualisation technology (T1.4)

Initial approaches to explanation of deep learning models by adoptation of perturbation based explanation methods based on coalitional game theory to ext classification and initial development of visual tools for visually explaining the classification process. (report and source code) (T1.4).

Final tehnology for multilingual and self-explainable news generation (T5.1)

Based on the analysis of newsrooms WP6 the NLG technology will be adapted for the requirements of news generation The task will develop mechanisms for i determining what is interesting or important in the given data and deciding what to report and for ii rendering that information in an accurate manner iii in multiple languages report and source code T51

Final evaluation report on cross-lingual embedding technology (T1.5)

Report on evaluation of the crosslingual and multilingual embeddings on public datasetsand challenges T15

Initial context-dependent and dynamic embeddings technology (T1.2)

Context-aware cross-lingual embeddings which will enable improved understanding of short texts such as user comments in the context of an emerging comment thread and the news story being commented (report and source code) (T1.2).

Report on user needs and challenges for news media industry (T6.1).

Initial report on identification and analysis of needs of different stakeholders in news media industry. We will arrange workshop to identify in detail challenges that are specific to operations of different media partners and prepare a specifications documentation (T6.1).

Recommendations on avoiding gender and other biases (T6.4)

The means to avoid and detect gender and other biases in news media contents creation will be developped in T6.4. This deliverable will propose the recommendations for avoiding gender bias (T6.4).

Final interpretability and visualisation technology (T1.4)

Adoptation of three most popular perturbation based explanation methods based on coalitional game theory IME LIME and SHAP to be suitable for text classification and development of visualisation techniques where different explanatory lexical units in the source texts words ngrams sentences are visualizedreport and source code T14

Initial cross-lingual context and opinion analysis technology (T3.1)

Report on initial developed technology for a range of user comment analyses, including topic modelling, conversation structure and context modelling, sentiment, stance and opinion detection and effect and information spread measurement (report and source code) (T3.1).

Final report on gender bias in content creation (T6.4)

Final report on gender bias in content creation T64

Reusable EMBEDDIA components available through the ClowdFlows web interface (T7.4)

Developed tools and procedures will be incorporated as widgets and make them available beyond the media context and assure reusability and repeatability of experiments report and source code T74

Initial multilingual news linking technology (T4.1)

Development of initial tools for linking news stories across languages based on their topics and contents (report and source code) (T4.1).

Initial keyword extraction techniques (T2.2)

Initial keyword extraction by application of statistical approaches (based on heuristics), machine learning approaches, as well as graph-based approaches (report and source code) (T2.2).

Final cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual languageindependent multidocument news summarisation report and source code T42

Initial dynamic news generation technology (T5.2)

Development of a novel method for automatically organising news articles, considering the domain of the article, effects of time and news repetition (report and source code) (T5.2).

Refined analysis of news media partners’ needs and challenges (T6.1).

Refined report of news media partners’ needs and challenges and their analysis with regard to the state of the art in NLP for news media (T6.1).

Final cross-lingual and multilingual embeddings technology (T1.1)

Embeddings and transformations between all targeted languages including EstonianFinnish Swedish Latvian Lithuanian Croatian Slovene as well as English and Russian report and source code T11

Report generator from multilingual comments (T3.3)

Report on developed and implemented methods for generating humanreadable reports in multiple languages from the outputs of the methods developed in T31 and T32 report and source code T33

Datasets, benchmarks and evaluation metrics for cross-lingual user generated content filtering and analysis (T3.4)

Evaluation and development of algorithms requires relevant, annotated, and multilingual datasets (report and dataset) (T3.4).

Final evaluation report on advanced cross-lingual NLP technology (T2.4)

Final report on existing evaluation datasets and benchmarks for NER NEL and event detection for instance ACE Meantime and TAC KBPs Entity Discovery and Linking tasks report and dataset T24

Final deep network architecture (T1.3)

Deep neural networks will be adapted to morphologically rich languagesby using characterlevel inputs and additional information on morphology suffixes prefixes separately trained POS tags report and source code T13

Multilingual language generation approach (T2.3)

Incorporating hybrid techniques in the architecture, to take advantage of the robustness of machine learning techniques and transparency of rule-based techniques. Adaptation of the context-aware word-embeddings developed in T1.2 to improve fluency and variability in the generated texts (report and source code) (T2.3).

Final multilingual keyword extraction techniques (T2.2)

Application and further development of statistical approaches based on heuristicsmachine learning approaches as well as graphbased approaches report and source code T22

Initial news generation technology (T5.1)

Based on the analysis of newsrooms (WP6), the NLG technology will be adapted for the requirements of news generation. The task will develop mechanisms for (i) determining what is interesting or important in the given data and deciding what to report, and for (ii) rendering that information in an accurate manner (iii) in multiple languages (report and source code) (T5.1).

Final report on EMBEDDIA Assistant platform evaluation (T6.3)

Final report on EMBEDDIA Assistant platform evaluation by media partners T63

Platform requirements documentation and platform design (T6.2)

The EMBEDDIA Toolkit will incorporate different tools and resources developed in WP1–WP5 and on top of it build the EMBEDDIA Media Assistant platform. The platform will be built as a series of base microservices, functional microservices and task oriented APIs. This deliverable will report on platform requirements and platform design (T6.2).

Final cross-lingual comment filtering technology (T3.2)

Final report on developed tools for automatic flagging or filtering of user comments specifically targeted at the use cases defined by end user partners in WP6 eg detection of hate speech and political trolling attempts to elicit extreme reactions and influence others opinions report and source code T32

Initial cross-lingual news viewpoints identification technology (T4.3)

Initial approaches for detecting viewpoints and sentiments based on media sources (report and source code) (T4.3) .

Final cross-lingual semantic enrichment technology (T2.1)

Generalization of approaches to multiple domains and languages large scale corpora and integrating crosslingual embeddings report and source code T21

Creative multilingual technology for news and headline generation (T5.3)

We will make the generated texts more varied and colourful by generating creative expressions especially in headlines report and source code T53

Final cross-lingual context and opinion analysis technology (T3.1)

Final report on developed technology for a range of user comment analyses including topic modelling conversation structure and context modelling sentiment stance and opinion detection and effect and information spread measurement report and source code T31

Datasets, benchmarks and evaluation metrics for advanced cross-lingual NLP technology (T2.4)

Report on existing evaluation datasets and benchmarks for NER, NEL and event detection (for instance, ACE, Meantime and TAC KBP’s Entity Discovery and Linking tasks) (report and dataset) (T2.4).

Initial cross-lingual comment filtering technology (T3.2)

Report on developed tools for automatic flagging or filtering of user comments, specifically targeted at the use cases defined by end user partners in WP6, e.g., detection of hate speech and political trolling, attempts to elicit extreme reactions and influence others’ opinions (report and source code) (T3.2).

Datasets, benchmarks and evaluation metrics for multilingual text generation (T5.4)

From news partners texts (news stories) and structured datasets from which news can be generated will be collected (report and datasets) and methodology for evaluation defined (T5.4).

Selected EMBEDDIA components in ClowdFlows (T7.4)

Initial selection of tools and procedures incorporated as widgets in webbased platform Clowsflows to make them available beyond the media context and assure reusability and repeatability of experiments report and source code T74

Initial cross-lingual news summarisation and visualisation technology (T4.2)

Development of textual and visual language-independent multi-document news summarisation (report and source code) (T4.2).

Final evaluation report on multilingual text generation technology (T5.4)

Final evaluation report on multilingual text generation technology T54

Datasets, benchmarks and evaluation metrics for cross-lingual word embeddings (T1.5)

A repository of training and evaluation data, stored in a dedicated GitHub repository (report and datasets) (T1.5).

Final EMBEDDIA Media Assistant platform, packaged in docker container (T6.2)

Final EMBEDDIA Media Assistant platform incorporating different tools and resourcespackaged in docker container report and source code T62

Project website and social media accounts (T7.1)

Created project website --- which will function both as a project dissemination tool and for providing access to the technical outcomes produced by the project --- and social media accounts/pages on relevant social networks will be created (T7.1)

Publications

To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Auteurs: Kristian Miok, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Publié dans: Cognitive Computation, 2021, ISSN 1866-9956
Éditeur: Springer Verlag
DOI: 10.1007/s12559-021-09826-9

Cross-lingual alignments of ELMo contextual embeddings

Auteurs: Ulčar, Matej; Robnik-Šikonja, Marko
Publié dans: Neural Computing and Applications, Numéro 3, 2022, ISSN 0941-0643
Éditeur: Springer Verlag
DOI: 10.1007/s00521-022-07164-x

NeSyChair: Automatic Conference Scheduling Combining Neuro-Symbolic Representations and Constrained Clustering

Auteurs: Škvorc, Tadej; Lavrač, Nada; Robnik-Šikonja, Marko
Publié dans: IEEE Access, Numéro 10, 2022, ISSN 2169-3536
Éditeur: Institute of Electrical and Electronics Engineers Inc.
DOI: 10.1109/ACCESS.2022.3144932

autoBOT: evolving neuro-symbolic representations for explainable low resource text classification

Auteurs: Blaž Škrlj, Matej Martinc, Nada Lavrač, Senja Pollak
Publié dans: Machine Learning, 2021, ISSN 0885-6125
Éditeur: Kluwer Academic Publishers
DOI: 10.1007/s10994-021-05968-x

MICE: Mining Idioms with Contextual Embeddings

Auteurs: Škvorc, Tadej; Gantar, Polona; Robnik-Šikonja, Marko
Publié dans: Knowledge-Based Systems, Numéro 237, 2022, ISSN 0950-7051
Éditeur: Elsevier BV
DOI: 10.1016/j.knosys.2021.107606

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Auteurs: Andraž Pelicon, Marko Pranjić, Dragana Miljković, Blaž Škrlj, Senja Pollak
Publié dans: Applied Sciences, Numéro 10/17, 2020, Page(s) 5993, ISSN 2076-3417
Éditeur: MDPI
DOI: 10.3390/app10175993

Supervised and Unsupervised Neural Approaches to Text Readability

Auteurs: Matej Martinc; Senja Pollak; Marko Robnik-Šikonja
Publié dans: Computational Linguistics, Numéro 47.1, 2021, Page(s) 141-179, ISSN 0891-2017
Éditeur: MIT Press
DOI: 10.1162/coli_a_00398

Nazaj v prihodnost: avtomatizacija in preobrazba novinarske epistemologije

Auteurs: Igor Vobič, Marko Robnik Šikonja, Monika Kalin Golob
Publié dans: Javnost - The Public, Numéro 26/sup1, 2019, Page(s) S41-S61, ISSN 1318-3222
Éditeur: European Institute for Communication and Culture
DOI: 10.1080/13183222.2019.1696600

What makes a reporter human? A Research Agenda for Augmented Journalism

Auteurs: Lindén, Carl-Gustav
Publié dans: Questions de communication, 2020, ISSN 2259-8901
Éditeur: Presses universitaires de Lorraine
DOI: 10.4000/questionsdecommunication.23301

Cross-lingual Transfer of Sentiment Classifiers

Auteurs: Robnik-Šikonja, Marko; Reba, Kristjan; Mozetič, Igor
Publié dans: Slovenščina 2.0, Numéro 9(1), 2021, Page(s) 1-25, ISSN 2335-2736
Éditeur: Ljubljana University Press, Faculty of Arts
DOI: 10.4312/slo2.0.2021.1.1-25

Completability vs (In)completeness

Auteurs: Eleni Gregoromichelaki, Gregory James Mills, Christine Howes, Arash Eshghi, Stergios Chatzikyriakidis, Matthew Purver, Ruth Kempson, Ronnie Cann, Patrick G. T. Healey
Publié dans: Acta Linguistica Hafniensia, Numéro 52/2, 2020, Page(s) 260-284, ISSN 0374-0463
Éditeur: Nordisk Sprog- og Kulturforlag
DOI: 10.1080/03740463.2020.1795549

TNT-KID: Transformer-based neural tagger for keyword identification

Auteurs: Matej Martinc, Blaž Škrlj, Senja Pollak
Publié dans: Natural Language Engineering, 2021, Page(s) 1-40, ISSN 1351-3249
Éditeur: Cambridge University Press
DOI: 10.1017/s1351324921000127

Investigating cross-lingual training for offensive language detection

Auteurs: Andraž Pelicon, Ravi Shekhar, Blaž Škrlj, Matthew Purver, Senja Pollak
Publié dans: PeerJ Computer Science, Numéro 7, 2021, Page(s) e559, ISSN 2376-5992
Éditeur: PeerJ Publishing
DOI: 10.7717/peerj-cs.559

Journalistic Passion as Commodity : A Managerial Perspective

Auteurs: Carl-Gustav Lindén; Katja Lehtisaari; Mikko Grönlund; Mikko Villi
Publié dans: Journalism Studies, Numéro 22(12), 2021, Page(s) 1701--1719, ISSN 1461-670X
Éditeur: Routledge
DOI: 10.1080/1461670x.2021.1911672

Re-Representing Metaphor: Modeling Metaphor Perception Using Dynamically Contextual Distributional Semantics

Auteurs: Stephen McGregor, Kat Agres, Karolina Rataj, Matthew Purver, Geraint Wiggins
Publié dans: Frontiers in Psychology, Numéro 10, 2019, ISSN 1664-1078
Éditeur: Frontiers Research Foundation
DOI: 10.3389/fpsyg.2019.00765

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Auteurs: Blaž Škrlj, Jan Kralj, Nada Lavrač, Senja Pollak
Publié dans: Machine Learning and Knowledge Extraction, Numéro 1/2, 2019, Page(s) 575-589, ISSN 2504-4990
Éditeur: MDPI AG
DOI: 10.3390/make1020034

Predicting Slovene Text Complexity Using Readability Measures

Auteurs: Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt, Marko Robnik-Šikonja
Publié dans: In Contributions to Contemporary History, 2019, ISSN 2463-7807
Éditeur: OJS/PKP

Combining n -grams and deep convolutional features for language variety classification

Auteurs: Matej Martinc, Senja Pollak
Publié dans: Natural Language Engineering, Numéro 25/5, 2019, Page(s) 607-632, ISSN 1351-3249
Éditeur: Cambridge University Press
DOI: 10.1017/S1351324919000299

TermEnsembler

Auteurs: Andraž Repar, Vid Podpečan, Anže Vavpetič, Nada Lavrač, Senja Pollak
Publié dans: Terminology, Numéro 25/1, 2019, Page(s) 93-120, ISSN 0929-9971
Éditeur: John Benjamins Publishing Company
DOI: 10.1075/term.00029.rep

Reproduction, replication, analysis and adaptation of a term alignment approach

Auteurs: Andraž Repar, Matej Martinc, Senja Pollak
Publié dans: Language Resources and Evaluation, 2019, ISSN 1574-020X
Éditeur: Springer Verlag
DOI: 10.1007/s10579-019-09477-1

‘Our task is to demystify fears’: Analysing newsroom management of automation in journalism

Auteurs: Marko Milosavljević, Igor Vobič
Publié dans: Journalism, 2019, Page(s) 146488491986159, ISSN 1464-8849
Éditeur: SAGE Publications
DOI: 10.1177/1464884919861598

Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge

Auteurs: Saturnino Luz, Shane Sheehan
Publié dans: Palgrave Communications, Numéro 6/1, 2020, ISSN 2055-1045
Éditeur: Humanities and Social Sciences Communications
DOI: 10.1057/s41599-020-0423-6

Exploring the Relations Between Net Benefits of IT Projects and CIOs’ Perception of Quality of Software Development Disciplines

Auteurs: Damjan Vavpotič, Marko Robnik-Šikonja, Tomaž Hovelja
Publié dans: Business & Information Systems Engineering, 2019, ISSN 2363-7005
Éditeur: Springer Gabler
DOI: 10.1007/s12599-019-00612-4

Data Journalism as a Service: Digital Native Data Journalism Expertise and Product Development

Auteurs: Ester Appelgren, Carl-Gustav Lindén
Publié dans: Media and Communication, Numéro 8/2, 2020, Page(s) 62, ISSN 2183-2439
Éditeur: Cogitatio
DOI: 10.17645/mac.v8i2.2757

How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context

Auteurs: Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
Publié dans: Transactions of the Association for Computational Linguistics, Numéro 8, 2020, Page(s) 296-310, ISSN 2307-387X
Éditeur: The MIT Press
DOI: 10.1162/tacl_a_00315

Computational generation of slogans

Auteurs: Khalid Alnajjar, Hannu Toivonen
Publié dans: Natural Language Engineering, 2020, Page(s) 1-33, ISSN 1351-3249
Éditeur: Cambridge University Press
DOI: 10.1017/S1351324920000236

In the Name of the Right to be Forgotten: New Legal and Policy Issues and Practices regarding Unpublishing Requests in Slovenian Online News Media

Auteurs: Marko Milosavljević, Melita Poler, Rok Čeferin
Publié dans: Digital Journalism, 2020, Page(s) 1-17, ISSN 2167-0811
Éditeur: Taylor & Francis
DOI: 10.1080/21670811.2020.1747942

(Mis)Information Operations: An Integrated Perspective

Auteurs: Cinelli, Matteo; Conti, Mauro; Finos, Livio; Grisolia, Francesco; Kralj Novak, Petra; Peruzzi, Antonio; Tesconi, Maurizio; Zollo, Fabia; Quattrociocchi, Walter
Publié dans: Journal of Information Warfare, Numéro 18(3), 2020, ISSN 1445-3312
Éditeur: Mt. Eliza : Teamlink Australia

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Auteurs: Linhares Pontes, Elvys; Huet, Stéphane; Torres Moreno, Juan Manuel; Gouveia da Silva, Thiago; Carneiro Linhares, Andréa
Publié dans: Computación y Sistemas, Numéro 24(2), 2020, ISSN 1405-5546
Éditeur: Centro de Investigacion en Computacion (CIC) del Instituto Politecnico Nacional (IPN)

Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting

Auteurs: Leo Leppänen, Hanna Tuulonen, Stefanie Sirén-Heikel
Publié dans: Media and Communication, Numéro 8/3, 2020, Page(s) 39, ISSN 2183-2439
Éditeur: Cogitatio
DOI: 10.17645/mac.v8i3.3022

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

Auteurs: Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak
Publié dans: Computer Speech & Language, Numéro 65, 2021, Page(s) 101104, ISSN 0885-2308
Éditeur: Academic Press
DOI: 10.1016/j.csl.2020.101104

Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles

Auteurs: Koloski, Boshko; Stepišnik-Perdih, Timen; Robnik-Šikonja, Marko; Pollak, Senja; Škrlj, Blaž
Publié dans: Neurocomputing journal, 2022, ISSN 0925-2312
Éditeur: Elsevier BV
DOI: 10.1016/j.neucom.2022.01.096

Cross-lingual transfer of abstractive summarizer to less-resource language

Auteurs: Aleš Žagar, Marko Robnik-Šikonja
Publié dans: Journal of Intelligent Information Systems, 2021, ISSN 0925-9902
Éditeur: Kluwer Academic Publishers
DOI: 10.1007/s10844-021-00663-8

Bisociative Literature-Based Discovery: Lessons Learned and New Word Embedding Approach

Auteurs: Nada Lavrač, Matej Martinc, Senja Pollak, Maruša Pompe Novak, Bojan Cestnik
Publié dans: New Generation Computing, Numéro 38/4, 2020, Page(s) 773-800, ISSN 0288-3635
Éditeur: Springer Verlag
DOI: 10.1007/s00354-020-00108-w

Propositionalization and embeddings: two sides of the same coin

Auteurs: Nada Lavrač; Nada Lavrač; Blaž Škrlj; Marko Robnik-Šikonja
Publié dans: Machine Learning, Numéro 109, 2020, ISSN 0885-6125
Éditeur: Kluwer Academic Publishers
DOI: 10.1007/s10994-020-05890-8

Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Auteurs: Shekhar, Ravi; Pranjić. Marko; Pollak, Senja; Pelicon, Andraž; Purver, Matthew
Publié dans: Journal for Language Technology and Computational Linguistics, Numéro 2, 2020, Page(s) 49-79, ISSN 2190-6858
Éditeur: German Society for Computational Linguistics and Language Technology (GSCL)
DOI: 10.5281/zenodo.4032371

Enhancing deep neural networks with morphological information

Auteurs: Klemen, Matej; Krsnik, Luka; Robnik-Šikonja, Marko
Publié dans: Natural Language Engineering, 2022, ISSN 1351-3249
Éditeur: Cambridge University Press
DOI: 10.1017/S1351324922000080

Slovene and Croatian word embeddings in terms of gender occupational analogies

Auteurs: Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Publié dans: Slovenščina 2.0: empirical, applied and interdisciplinary research, Numéro 9/1, 2021, Page(s) 26-59, ISSN 2335-2736
Éditeur: Ljubljana University Press, Faculty of Arts
DOI: 10.4312/slo2.0.2021.1.26-59

MELHISSA: A Multilingual Entity Linking Architecture for Historical Press Articles

Auteurs: Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrian; Moreno, Jose G.; Boros, Emanuela; Hamdi, Ahmed; Doucet, Antoine; Sidere, Nicolas; Coustaty, Mickael
Publié dans: International Journal on Digital Libraries, 2021, ISSN 1432-1300
Éditeur: Springer
DOI: 10.1007/s00799-021-00319-6

Recycling a genre for news automation

Auteurs: Lauri Haapanen, Leo Leppänen
Publié dans: AILA Review, Numéro 33, 2020, Page(s) 67-85, ISSN 1461-0213
Éditeur: John Benjamins Publishing Company
DOI: 10.1075/aila.00030.haa

Incremental Composition in Distributional Semantics

Auteurs: Matthew Purver, Mehrnoosh Sadrzadeh, Ruth Kempson, Gijs Wijnholds, Julian Hough
Publié dans: Journal of Logic, Language and Information, Numéro 30/2, 2021, Page(s) 379-406, ISSN 0925-8531
Éditeur: Kluwer Academic Publishers
DOI: 10.1007/s10849-021-09337-8

Kratt: Developing an Automatic Subject Indexing Tool for the National Library of Estonia

Auteurs: Asula, Marit; Makke, Jane; Freienthal, Linda; Kuulmets, Hele-Andra; Sirel, Raul
Publié dans: Cataloging & Classification Quarterly, Numéro 59:8, 2021, Page(s) 775-793, ISSN 0163-9374
Éditeur: Haworth Press Inc.
DOI: 10.1080/01639374.2021.1998283

SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

Auteurs: Sebastian Meznar, Nada Lavrac, Blaz Skrlj
Publié dans: IEEE Access, Numéro 8, 2020, Page(s) 212568-212588, ISSN 2169-3536
Éditeur: Institute of Electrical and Electronics Engineers Inc.
DOI: 10.1109/access.2020.3039541

Token-Level Multilingual Epidemic Dataset for Event Extraction

Auteurs: Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Gaël Lejeune, Adam Jatowt, Moses Odeo
Publié dans: Linking Theory and Practice of Digital Libraries - 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Virtual Event, September 13–17, 2021, Proceedings, Numéro 12866, 2021, Page(s) 55-59, ISBN 978-3-030-86323-4
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-86324-1_6

Entity Linking for Historical Documents: Challenges and Solutions

Auteurs: Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Emanuela Boros, Ahmed Hamdi, Nicolas Sidère, Mickaël Coustaty, Antoine Doucet
Publié dans: Digital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Numéro 12504, 2020, Page(s) 215-231, ISBN 978-3-030-64451-2
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-64452-9_19

Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning

Auteurs: Blaž Škrlj, Marko Jukič, Nika Eržen, Senja Pollak, Nada Lavrač
Publié dans: Discovery Science - 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings, Numéro 12986, 2021, Page(s) 204-217, ISBN 978-3-030-88941-8
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-88942-5_16

Identification of COVID-19 Related Fake News via Neural Stacking

Auteurs: Boshko Koloski, Timen Stepišnik-Perdih, Senja Pollak, Blaž Škrlj
Publié dans: Combating Online Hostile Posts in Regional Languages during Emergency Situation - First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers, Numéro 1402, 2021, Page(s) 177-188, ISBN 978-3-030-73695-8
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-73696-5_17

FinEst BERT and CroSloEngual BERT - Less Is More in Multilingual Models

Auteurs: Matej Ulčar, Marko Robnik-Šikonja
Publié dans: Text, Speech, and Dialogue - 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings, Numéro 12284, 2020, Page(s) 104-111, ISBN 978-3-030-58322-4
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-58323-1_11

RaKUn: Rank-based Keyword Extraction via Unsupervised Learning and Meta Vertex Aggregation

Auteurs: Blaž Škrlj, Andraž Repar, Senja Pollak
Publié dans: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Numéro 11816, 2019, Page(s) 311-323, ISBN 978-3-030-31371-5
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_26

Language Comparison via Network Topology

Auteurs: Blaž Škrlj, Senja Pollak
Publié dans: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Numéro 11816, 2019, Page(s) 112-123, ISBN 978-3-030-31371-5
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_10

Prediction Uncertainty Estimation for Hate Speech Classification

Auteurs: Kristian Miok, Dong Nguyen-Doan, Blaž Škrlj, Daniela Zaharie, Marko Robnik-Šikonja
Publié dans: Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings, Numéro 11816, 2019, Page(s) 286-298, ISBN 978-3-030-31371-5
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-31372-2_24

Symbolic Graph Embedding Using Frequent Pattern Mining

Auteurs: Blaž Škrlj, Nada Lavrač, Jan Kralj
Publié dans: Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings, Numéro 11828, 2019, Page(s) 261-275, ISBN 978-3-030-33777-3
Éditeur: Springer International Publishing
DOI: 10.1007/978-3-030-33778-0_21

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

Auteurs: Pollak, Senja; Robnik-Šikonja, Marko; Purver, Matthew; Boggia, Michele; Shekhar, Ravi; Pranjić, Marko; Salmela, Salla; Krustok, Ivar; Paju, Tarmo; Linden, Carl-Gustav; Leppänen, Leo; Zosa, Elaine; Ulčar, Matej; Freienthal, Linda; Traat, Silver; Cabrera-Diego, Luis Adrián; Martinc, Matej; Lavrač, Nada; Škrlj, Blaž; Žnidaršič, Martin; Pelicon, Andraž; Koloski, Boshko; Podpečan, Vid; Kra
Publié dans: Numéro Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730464

EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+

Auteurs: Martinc, Matej; Perger, Nina; Pelicon, Andraž; Ulčar, Matej; Vezovnik, Andreja; Pollak, Senja
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730336

Exploring Neural Language Models via Analysis of Local and Global Self-Attention Spaces

Auteurs: Škrlj, Blaž; Sheehan, Shane; Eržen, Nika; Robnik-Šikonja, Marko; Luz, Saturnino; Pollak, Senja
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730396

Grammatical Profiling for Semantic Change Detection

Auteurs: Giulianelli, Mario; Kutuzov, Andrey; Pivovarova, Lidia
Publié dans: In the Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021), 2021, Page(s) 423-434
Éditeur: ACL

Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Auteurs: Robnik-Šikonja, Marko; Reba, Kristijan; Mozetič, Igor
Publié dans: In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 87-92
Éditeur: Institute of Contemporary History
DOI: 10.5281/zenodo.4059725

When a Computer Cracks a Joke: Automated Generation of Humorous Headlines

Auteurs: Alnajjar, Khalid; Hämäläinen, Mika
Publié dans: In the Proceedings of the 12th International Conference on Computational Creativity (ICCC21), 2021, ISBN 978-989-54160-3-5
Éditeur: Association for Computational Creativity

Knowledge graph aware text classification

Auteurs: Petrželková, Nela; Škrlj, Blaž; Lavrač, Nada
Publié dans: In Proceedings of the 23rd International Multiconference – IS2020, 2020
Éditeur: Jožef Stefan Institute
DOI: 10.5281/zenodo.4072961

Relation Classification via Relation Validation

Auteurs: Moreno, Jose G.; Doucet, Antoine; Grau, Brigitte
Publié dans: Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730492

Simple ways to improve NER in every language using markup

Auteurs: Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Publié dans: In Proceedings of ECIR 2021, 2021
Éditeur: CEUR Workshops
DOI: 10.5281/zenodo.4680998

A bilingual approach to specialised adjectives through word embeddings in the karstology domain

Auteurs: Grčić Simeunović, Larisa; Martinc, Matej; Vintar, Špela
Publié dans: In Proceedings of TOTH 2020, 2020
Éditeur: Université Savoie Mont Blanc
DOI: 10.5281/zenodo.6435390

Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism

Auteurs: Rämö, Miia; Leppänen, Leo
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730334

Know your Neighbors: Efficient Author Profiling via Follower Tweets

Auteurs: Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Publié dans: Notebook for PAN at CLEF 2020, 2020
Éditeur: CEUR-WS.org
DOI: 10.5281/zenodo.4059641

Corpus KAS 2.0: Cleaner and with New Datasets

Auteurs: Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko
Publié dans: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021
Éditeur: Jožef Stefan Institute

Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques

Auteurs: Emanuela Boros; Ahmed Hamdi; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Nicolas Sidère; Antoine Doucet
Publié dans: Numéro 29, 2021
Éditeur: l’Association Francophone de Recherche d’Information et Applications ARIA
DOI: 10.5281/zenodo.4734435

Automated Hate Speech Target Identification

Auteurs: Pelicon, Andraž; Škrlj, Blaž; Kralj Novak, Petra
Publié dans: In Proceedings of the 24th International Multiconference – IS2021 (Slovenian Conference on Artificial Intelligence), 2021
Éditeur: Jožef Stefan Institute

Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization,Classification, and Linking of Named Entities across Slavic languages

Auteurs: Piskorski et al
Publié dans: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730512

Bayesian Methods for Semi-supervised Text Annotation

Auteurs: Miok, Kristian; Pirs, Gregor; Robnik-Sikonja, Marko
Publié dans: In Proceedings of the 14th Linguistic Annotation Workshop Co-located with COLING 2020, Numéro 2, 2020
Éditeur: Association for Computational Linguistics

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

Auteurs: Ulčar, Matej; Robnik-Šikonja, Marko
Publié dans: In the Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts (AIST 2021), 2021
Éditeur: Springer

Preliminary experimentation with combinations and extensions of forward-looking sentence detection wordlists

Auteurs: Štihec, Jan; Pollak, Senja; Žnidaršič, Martin
Publié dans: In Proceedings of the 3rd financial narrative processing workshop, 2021
Éditeur: Association for Computational Linguistics

Bayesian BERT for Trustful Hate Speech Detection

Auteurs: Miok, Kristian; Škrlj, Blaž; Zaharie, Daniela; Robnik-Šikonja, Marko
Publié dans: ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning, 2021
Éditeur: ICML UDL

Underreporting of errors in NLG output, and what to do about it

Auteurs: van Miltenburg, Emiel; Clinciu, Miruna; Dušek, Ondrej; Gkatzia, Dimitra; Inglis, Stephanie; Leppänen, Leo; Mahamood, Saad; Manning, Emma; Schoch, Stephanie; Thomson, Craig; Wen, Luou
Publié dans: In the Proceedings of the 14th International Conference on Natural Language Generation, 2021
Éditeur: Association for Computational Linguistics

Primerjava slovenskih besednih vektorskih vložitev z vidika spola na analogijah poklicev

Auteurs: Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Publié dans: In the Proceedings of the Conference on Language Technologies and Digital Humanities (JTDH 2021), 2021, Page(s) 93-100
Éditeur: Slovensko društvo za jezikovne tehnologije

Simple discovery of COVID ISWAR Metaphors Using Word Embeddings

Auteurs: Brglez, Mojca; Pollak, Senja; Vintar, Špela
Publié dans: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021
Éditeur: Jožef Stefan Institute

COVID-19 v slovenskih spletnih medijih: analiza s pomočjo računalniške obdelave jezika

Auteurs: Pollak, Senja; Martinc, Matej; Pelicon, Andraž; Ulčar, Matej; Vezovnik, Andreja
Publié dans: Pandemična družba: slovensko sociološko srečanje, 2021
Éditeur: Slovenska sociološka družba

Visual Topic Modelling for NewsImage Task at MediaEval 2021

Auteurs: Pivovarova, Lidia; Zosa, Elaine
Publié dans: MediaEval 2021 Multimedia Benchmark Workshop : Work ing Notes Proceedings of the MediaEval 2021 Workshop, 2021
Éditeur: MediaEval Multimedia Benchmark
DOI: 10.5281/zenodo.6384719

TeMoTopic: Temporal Mosaic Visualisation of Topic Distribution, Keywords, and Context

Auteurs: Sheehan, Shane; Luz, Saturnino; Masoodian, Masood
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730388

Robust Named Entity Recognition and Linking on Historical Multilingual Documents

Auteurs: Boros, Emanuela; Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrián; Hamdi, Ahmed; Moreno, Jose G.; Sidère, Nicolas; Doucet, Antoine
Publié dans: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum (CLEF-HIPE 2020), 2020
Éditeur: http://ceur-ws.org/
DOI: 10.5281/zenodo.4059652

Impact Analysis of Document Digitization on Event Extraction

Auteurs: Nguyen, Nhu Khoa; Boroş, Emanuela; Lejeune, Gaël; Doucet, Antoine
Publié dans: In 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI* IA 2020), 2020
Éditeur: CEUR Workshop Proceedings
DOI: 10.5281/zenodo.4680744

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Auteurs: Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Publié dans: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730478

Topic modelling discourse dynamics in historical newspapers

Auteurs: Marjanen, Jani; Zosa, Elaine; Hengchen, Simon; Pivovarova, Lidia; Tolonen, Mikko
Publié dans: In Post-Proceedings of the DHN2020 Conference: the 5th conference on Digital Humanities in the Nordic Countries, 2021
Éditeur: CEUR Workshop Proceedings (CEUR-WS.org)

BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers

Auteurs: Kokalj, Enja; Škrlj, Blaž; Lavrač, Nada; Pollak, Senja; Robnik-Šikonja, Marko
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730384

Multilingual Epidemic Event Extraction

Auteurs: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Publié dans: In the Proceedings of ICADL 2021, 2021, ISBN 978-3-030-91669-8
Éditeur: Springer
DOI: 10.1007/978-3-030-91669-5_12

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

Auteurs: Boroş, Emanuela; Doucet, Antoine
Publié dans: In Proceedings of the Thirteenth Text Analysis Conference (TAC 2020), 2021
Éditeur: NIST USA
DOI: 10.5281/zenodo.4681008

SloBERTa: Slovene monolingual large pretrained masked language model

Auteurs: Ulčar, Matej; Robnik-Šikonja, Marko
Publié dans: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD, 2021
Éditeur: Jožef Stefan Institute

Alleviating Digitization Errors in Named Entity Recognition for Historical Documents

Auteurs: Emanuela Boros, Ahmed Hamdi, Elvys Linhares Pontes, Luis Adrián Cabrera-Diego, Jose G. Moreno, Nicolas Sidere, Antoine Doucet
Publié dans: Proceedings of the 24th Conference on Computational Natural Language Learning, 2020, Page(s) 431-441
Éditeur: Association for Computational Linguistics
DOI: 10.18653/v1/2020.conll-1.35

Not All Comments Are Equal: Insights into Comment Moderation from a Topic-aware Model

Auteurs: Zosa, Elaine; Shekhar, Ravi; Karan, Mladen; Purver, Matthew
Publié dans: In the Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021
Éditeur: ACL

TLR at the NTCIR-15 FinNum-2 Task: Improving Text Classifiers for Numeral Attachment in Financial Social Data

Auteurs: Moreno, Jose G.; Boros, Emanuela; Doucet, Antoine
Publié dans: In Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies, Numéro 2, 2020
Éditeur: Association for Computing Machinery
DOI: 10.5281/zenodo.4680695

Multilingual Detection of Fake News Spreaders via Sparse Matrix Factorization

Auteurs: Koloski, Boško; Pollak, Senja; Škrlj, Blaž
Publié dans: Notebook for PAN at CLEF 2020, 2020
Éditeur: http://ceur-ws.org/
DOI: 10.5281/zenodo.4059635

Unsupervised Approach to Cross-Lingual User Comments Summarization

Auteurs: Žagar, Aleš; Robnik-Šikonja, Marko
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association of Computational Linguistics
DOI: 10.5281/zenodo.4730327

Semantic Reasoning from Model-Agnostic Explanations

Auteurs: Stepišnik-Perdih, Timen; Lavrač, Nada; Škrlj, Blaž
Publié dans: In the Proceedings of the 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)., 2021, ISBN 978-1-7281-8053-3
Éditeur: IEEE
DOI: 10.1109/sami50585.2021.9378668

Linking Named Entities across Languages using Multilingual Word Embeddings

Auteurs: Elvys Linhares Pontes, Jose G. Moreno, Antoine Doucet
Publié dans: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020, Page(s) 329-332, ISBN 9781450375856
Éditeur: ACM
DOI: 10.1145/3383583.3398597

Discovery Team at SemEval-2020 Task 1: Context-sensitive Embeddings not Always Better Than Static for Semantic Change Detection

Auteurs: Martinc, Matej; Montariol, Syrielle; Zosa, Elaine; Pivovarova, Lidia
Publié dans: In Proceedings of the Fourteenth Workshop on Semantic Evaluation (SemEval 2020), 2020, Page(s) 67-73
Éditeur: International Committee for Computational Linguistics
DOI: 10.5281/zenodo.4681022

Creative Language Generation in a Society of Engagement and Reflection

Auteurs: Wright, George A.; Purver, Matthew
Publié dans: In Proceedings of the Eleventh International Conference on Computational Creativity (ICCC2020), 2020
Éditeur: Association for Computational Creativity (ACC)
DOI: 10.5281/zenodo.4680484

A Review of Cross-Domain Text-to-SQL Models

Auteurs: Yujian Gan, Purver, Matthew, & Woodward, John
Publié dans: In the Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, 2020, Page(s) 108-115
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4699229

Event Detection with Entity Markers

Auteurs: Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Publié dans: In the Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), 2021
Éditeur: Springer

Parsing Text in a Workspace for Language Generation

Auteurs: Wright, George A.; Purver, Matthew
Publié dans: In the Proceedings of the 2021 Society for Text & Discourse Annual Conference, 2021, 2021
Éditeur: Easychair

Zero-shot cross-lingual content filtering: offensive language and hate speech detection

Auteurs: Andraž, Pelicon; Shekhar, Ravi; Martinc, Matej; Škrlj, Blaž; Pollak, Senja; Purver, Matthew
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association Of Computational Linguistics
DOI: 10.5281/zenodo.4730308

Intérêt des modèles de caractères pour la détection d’événements

Auteurs: Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Publié dans: In Proceedings of TALN 2021, 2021
Éditeur: HAL-LIST

Embeddia at SemEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches

Auteurs: Andraž Pelicon, Matej Martinc, and Petra Kralj Novak
Publié dans: Proceedings of The 13th International Workshop on Semantic Evaluation (SemEval), 2019
Éditeur: SemEval

Generating Data using Monte Carlo Dropout

Auteurs: Kristian Miok, Dong Nguyen-Doan, Daniela Zaharie, and Marko Robnik-Šikonja
Publié dans: IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP 2019), 2019
Éditeur: IEEE

Detecting Depression with Word-Level Multimodal Fusion

Auteurs: Morteza Rohanian, Julian Hough, Matthew Purver
Publié dans: Interspeech 2019, 2019, Page(s) 1443-1447
Éditeur: ISCA
DOI: 10.21437/interspeech.2019-2283

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Auteurs: Jani Marjanen, Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki
Publié dans: Proceedings of the 5th International Workshop on Computational History, 2019
Éditeur: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Karst exploration: Extracting terms and definitions from karst

Auteurs: Senja Pollak, Andraž Repar, Matej Martinc, and Vid Podpečan
Publié dans: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019
Éditeur: Presses Universitaires de Louvain

Who is hot and who is not? Profiling celebs on Twitter

Auteurs: Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Publié dans: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Numéro 6, 2019
Éditeur: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Fake or Not: Distinguishing Between Bots, Males and Females

Auteurs: Martinc, Matej; Škrlj, Blaž; Pollak, Senja
Publié dans: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Numéro 2, 2019
Éditeur: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Pooled LSTM for Dutch cross-genre gender classification

Auteurs: Matej Martinc, Senja Pollak
Publié dans: Proceedings of the Shared Task on Cross-Genre Gender Detection in Dutch at Computational Linguistic in Netherlands (CLIN 2019) conference, 2019
Éditeur: Aachen : R. Piskac c/o Redaktion Sun SITE, Informatik V, RWTH Aachen

Methods for Generating Colourful and Factual Multilingual News Headlines

Auteurs: Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Publié dans: In Proceedings of the 10th International Conference on Computational Creativity (ICCC 2019), Numéro 1, 2019, Page(s) 258-265, ISBN 978-989-54160-1-1
Éditeur: Association for Computational Creativity (ACC)

TLR at BSNLP2019: A Multilingual Named Entity Recognition System

Auteurs: Jose G. Moreno, Elvys Linhares Pontes, Mickael Coustaty, Antoine Doucet
Publié dans: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, Page(s) 83-88
Éditeur: Association for Computational Linguistics
DOI: 10.18653/v1/w19-3711

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Auteurs: Jani Marjanen; Lidia Pivovarova; Elaine Zosa; Jussi Kurunmäki
Publié dans: HistoInformatics 2019: International Workshop on Computational History 2019, 2019
Éditeur: CEUR-WS.org
DOI: 10.5281/zenodo.3689467

A Corpus Study on Questions, Responses and Misunderstanding Signals in Conversations with Alzheimer's Patients

Auteurs: Shamila Nasreen; Matthew Purver; Julian Hough
Publié dans: Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue, Numéro 13, 2019
Éditeur: SEMDIAL
DOI: 10.5281/zenodo.3689456

Word Clustering for Historical Newspapers Analysis

Auteurs: Pivovarova, Lidia; Marjanen, Jani; Zosa, Elaine
Publié dans: Proceedings of the Workshop on Language Technology for Digital Historical Archives in conjuction with RANLP-2019, 2019, Page(s) 3-10
Éditeur: INCOMA Ltd.
DOI: 10.5281/zenodo.3402940

TeMoCo: A Visualization Tool for Temporal Analysis of Multi-party Dialogues in Clinical Settings

Auteurs: Shane Sheehan, Pierre Albert, Saturnino Luz, Masood Masoodian
Publié dans: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), 2019, Page(s) 690-695, ISBN 978-1-7281-2286-1
Éditeur: IEEE
DOI: 10.1109/CBMS.2019.00140

Gender, language, and society: word embeddings as a reflection of social inequalities in linguistic corpora

Auteurs: Supej, Anka; Plahuta, Marko; Purver, Matthew; Mathioudakis, Michael; Pollak, Senja
Publié dans: In Znanost in družbe prihodnosti, Slovensko sociološko srečanje [Annual meeting of the Slovenian Sociological Association: Science and future societies], 2019
Éditeur: Slovensko sociološko društvo
DOI: 10.5281/zenodo.3894466

No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines

Auteurs: Alnajjar, Khalid; Leppänen, Leo; Toivonen, Hannu
Publié dans: Proceedings of the 10th International Conference on Computational Creativity (ICCC2019), 2019
Éditeur: Association for Computational Creativity

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Auteurs: Kristian Miok, Dong Nguyen-Doan, Marko Robnik-Sikonja, Daniela Zaharie
Publié dans: 2019 E-Health and Bioengineering Conference (EHB), 2019, Page(s) 1-4, ISBN 978-1-7281-2603-6
Éditeur: IEEE
DOI: 10.1109/EHB47216.2019.8969940

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Auteurs: Ulčar, Matej; Robnik-Šikonja Marko
Publié dans: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 4731–4738
Éditeur: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894535

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

Auteurs: Martinc, Matej; Kralj Novak, Petra; Pollak, Senja
Publié dans: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), 2020, Page(s) 4811‑4819
Éditeur: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894557

Multilingual Culture-Independent Word Analogy Datasets

Auteurs: Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica; Dailidėnaitė, Milda; Robnik-Šikonja, Marko
Publié dans: Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), Numéro 1, 2020, Page(s) 4074‑4080
Éditeur: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894553

Dataset for Temporal Analysis of English-French Cognates

Auteurs: Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon
Publié dans: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 855-859
Éditeur: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3693651

A Dataset for Multi-lingual Epidemiological Event Extraction

Auteurs: Mutuvi, Stephen; Doucet, Antoine; Lejeune, Gael; Odeo, Moses
Publié dans: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 4139–4144
Éditeur: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3709626

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

Auteurs: Carlos Santos Armendariz; Matthew Purver; Matej Ulčar; Senja Pollak; Nikola Ljubešič; Marko Robnik-Šikonja; Mark Granroth-Wilding; Kristiina Vaik
Publié dans: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 5878–5886
Éditeur: The European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894565

Text Visualization for the Support of Lexicography-Based Scholarly Work

Auteurs: Sheehan, Shane; Luz, Saturnino
Publié dans: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Page(s) 694-725
Éditeur: Lexical Computing CZ s.r.o., Brno, Czech Republic
DOI: 10.5281/zenodo.3894619

Mining semantic relations from comparable corpora through intersections of word embeddings.

Auteurs: Vintar, Špela; Grčič Simeunovič, Larisa; Martinc, Matej; Pollak, Senja; Stepišnik, Uroš
Publié dans: Proceedings of the LREC 2020 13th Workshop on Building and Using Comparable Corpora, 2020, Page(s) 29-34
Éditeur: European Language Resources Association
DOI: 10.5281/zenodo.3894635

Interaction Patterns in Conversations with Alzheimer's Patients

Auteurs: Nasreen, Shamila; Purver, Matthew; Hough, Julian
Publié dans: Poster presentation at the 7th International Conference on Statistical Language and Speech Processing. Ljubljana, Slovenia, 2019
Éditeur: Springer
DOI: 10.5281/zenodo.3894637

Multilingual Dynamic Topic Model

Auteurs: Elaine Zosa, Mark Granroth-Wilding
Publié dans: Proceedings - Natural Language Processing in a Deep Learning World, 2019, Page(s) 1388-1396, ISBN 9789-544520564
Éditeur: Incoma Ltd., Shoumen, Bulgaria
DOI: 10.26615/978-954-452-056-4_159

The NetViz terminology visualization tool and the use cases in karstology domain modeling

Auteurs: Pollak, Senja; Podpečan, Vid; Miljkovic, Dragana; Stepinšik, Uroš; Vintar, Špela
Publié dans: Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 2020, Page(s) 55-61
Éditeur: European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3894686

Communities of related terms in Karst terminology co-occurrence network

Auteurs: Miljkovic, Dragana; Kralj, Jan; Stepišnik, Uroš; Pollak, Senja
Publié dans: Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019, 2019, Page(s) 357-373
Éditeur: Lexical Computing CZ s.r.o., Brno, Czech Republic
DOI: 10.5281/zenodo.3894684

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Auteurs: Zosa, Elaine; Granroth-Wilding, Mark; Pivovarova, Lidia
Publié dans: Proceedings of the Cross-Language Search and Summarization of Text and Speech Workshop, 2020, Page(s) 32-37
Éditeur: European Language Resources Association (ELRA)
DOI: 10.5281/zenodo.3898384

Capturing Evolution in Word Usage: Just Add More Clusters?

Auteurs: Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova
Publié dans: Companion Proceedings of the Web Conference 2020, 2020, Page(s) 343-349, ISBN 9781-450370240
Éditeur: ACM
DOI: 10.1145/3366424.3382186

Evaluating the Robustness of Embedding-based Topic Models to OCR Noise

Auteurs: Zosa, Elaine; Mutuvi, Stephen; Granroth-Wilding, Mark; Doucet, Antoine
Publié dans: In the Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021
Éditeur: Springer
DOI: 10.1007/978-3-030-91669-5_30

Evaluating Natural Language Descriptions Generated in a Workspace-Based Architecture

Auteurs: Wright, George A.; Purver, Matthew
Publié dans: In the Proceedings of the 12th International Conference on Computational Creativity, ICCC2021, 2021
Éditeur: Association for Computational Creativity

Multi-label classification of COVID-19-related articles with an autoML approach

Auteurs: Tavchioski, Ilija; Koloski, Boshko; Škrlj, Blaž; Pollak, Senja
Publié dans: In Proceedings of the BioCreative VII Challenge Evaluation Workshop, 2021, Page(s) 295-299, ISBN 978-0-578-32368-8
Éditeur: Biocreative

L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers

Auteurs: Nhu Khoa Nguyen; Emanuela Boros; Gaël Lejeune; Antoine Doucet; Thierry Delahaut
Publié dans: Numéro 30, 2021
Éditeur: IW3C2
DOI: 10.5281/zenodo.4734321

CTLR@WiC-TSV: Target Sense Verification using Marked Inputs and Pre-trained Models

Auteurs: Moreno, Jose G.; Linhares Pontes, Elvys; Dias, Gaël
Publié dans: In 6th Workshop on Semantic Deep Learning (SemDeep-6) associated to 29th International Joint Conference on Artificial Intelligence and 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), Numéro 2, 2021
Éditeur: International Joint Conferences on Artificial Intelligence
DOI: 10.5281/zenodo.4680720

Exploratory analysis of news sentiment using subgroup discovery

Auteurs: Valmarska, Anita; Cabrera-Diego, Luis Adrián; Linhares Pontes, Elvys; Pollak, Senja
Publié dans: In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730472

COVID-19 Therapy Target Discovery with Context-Aware Literature Mining

Auteurs: Martinc, Matej; Škrlj, Blaž; Pirkmajer, Sergej; Lavrač, Nada; Cestnik, Bojan; Marzidovšek, Martin; Pollak, Senja
Publié dans: In Proceedings of the 23rd International Conference on Discovery Science (DS 2020), 2020, Page(s) 109-123
Éditeur: Springer International Publishing
DOI: 10.5281/zenodo.4306020

A Baseline Document Planning Method for Automated Journalism

Auteurs: Leppänen, Leo; Toivonen, Hannu
Publié dans: In the Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021
Éditeur: Association for Computational Linguistics

Multilingual Epidemiological Text Classification: A Comparative Study

Auteurs: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Publié dans: Proceedings of the 28th International Conference on Computational Linguistics, Numéro 44, 2020
Éditeur: International Committee on Computational Linguistics
DOI: 10.5281/zenodo.4476039

Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer’s Dementia Recognition from Spontaneous Speech

Auteurs: Morteza Rohanian, Julian Hough, Matthew Purver
Publié dans: Interspeech 2020, 2020, Page(s) 2187-2191
Éditeur: ISCA
DOI: 10.21437/interspeech.2020-2721

Temporal Mental Health Dynamics on Social Media

Auteurs: Tom Tabak, Matthew Purver
Publié dans: Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, 2020
Éditeur: Association for Computational Linguistics
DOI: 10.18653/v1/2020.nlpcovid19-2.7

Extending Neural Keyword Extraction with TF-IDF tagset matching

Auteurs: Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730354

The Importance of Character-Level Information in an Event Detection Model

Auteurs: Boros, Emanuela; Besançon, Romaric; Ferret, Olivier; Grau, Brigitte
Publié dans: In Proceedings of NLDB 2021, 2021
Éditeur: Springer

Benchmarks for Unsupervised Discourse Change Detection

Auteurs: Duong, Quan; Pivovarova, Lidia; Zosa, Elaine
Publié dans: In the Proceedings of the Histoinformatics workshop 2021, 2021
Éditeur: CEUR

Three-part diachronic semantic change dataset for Russian

Auteurs: Andrey Kutuzov, Lidia Pivovarova
Publié dans: Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 2021, Page(s) 7-13
Éditeur: Association for Computational Linguistics
DOI: 10.18653/v1/2021.lchange-1.2

SemEval2020 Task 3: Graded Word Similarity in Context

Auteurs: Armendariz, Carlos Santos; Purver, Matthew; Pollak, Senja; Ljubešić, Nikola; Ulčar, Matej; Robnik-Šikonja, Marko; Vulić, Ivan; Mohammed Taher Pilehvar
Publié dans: In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), 2020, Page(s) 36-49
Éditeur: International Committee for Computational Linguistics
DOI: 10.5281/zenodo.4309679

Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

Auteurs: Vaik, Kristiina; Asula, Marit; Sirel, Raul
Publié dans: In Proceedings of the LREC2020 Industry Track, 2020, Page(s) 26-30
Éditeur: The European Language Resources Association (ELRA)

A Baseline Document Planning Method for Automated Journalism

Auteurs: Leppänen, Leo; Toivonen, Hannu
Publié dans: In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 2021
Éditeur: Linköping University Electronic Press, Sweden

TeMoCo-Doc - A visualization for supporting temporal and contextual analysis of dialogues and associated documents

Auteurs: Shane Sheehan, Saturnino Luz, Pierre Albert, Masood Masoodian
Publié dans: Proceedings of the International Conference on Advanced Visual Interfaces, 2020, Page(s) 1-3, ISBN 9781450375351
Éditeur: ACM
DOI: 10.1145/3399715.3399956

Named Entity Recognition Architecture Combining Contextual and Global Features

Auteurs: Tran Thi Hong, Hahn; Doucet, Antoine; Sidere, Nicolas; Moreno, Jose G.; Pollak, Senja
Publié dans: In the Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021
Éditeur: Springer
DOI: 10.1007/978-3-030-91669-5_21

Aligning Estonian and Russian news industry keywords with the help of subtitle translations and an environmental thesaurus

Auteurs: Repar, Andraž; Shumakov, Andrej
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730392

Zaznavanje sentimenta v novicah z globokimi nevronskimi mrežami

Auteurs: Arhar Holdt, Špela; Pollak, Senja; Robnik-Šikonja, Marko; Krek, Simon
Publié dans: Numéro In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 10-15
Éditeur: Institute of Contemporary History
DOI: 10.5281/zenodo.4059729

Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie

Auteurs: Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Gaël Lejeune; Adam Jatowt; Moses Odeo
Publié dans: Numéro 29, 2021
Éditeur: l’Association Francophone de Recherche d’Information et Applications ARIA
DOI: 10.5281/zenodo.4734472

Word-embedding based bilingual terminology alignment

Auteurs: Repar, Andraž; Martinc, Matej; Ulčar, Matej; Pollak, Senja
Publié dans: In Proceedings of eLex 2021 (eLex2021), 2021
Éditeur: Brno: Lexical Computing CZ, s.r.o.

Investigating the Semantic Wave in Tutorial Dialogues: An Annotation Scheme and Corpus Study on Analogy Components

Auteurs: Del-Bosque-Trevino, Jorge, Hough, Julian, and Purver, Matthew
Publié dans: In Proceedings of the 24th SemDial Workshop on the Semantics and Pragmatics of Dialogue (SemDial), 2020
Éditeur: SEMDIAL

Interesting cross-border news discovery using cross-lingual article linking and document similarity

Auteurs: Koloski, Boshko; Zosa, Elaine; Stepišnik-Perdih, Timen; Škrlj, Blaž; Paju, Tarmo; Pollak, Senja
Publié dans: In the Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL2021), 2021
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730369

An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

Auteurs: Pranjić, Marko; Robnik-Šikonja, Marko; Pollak, Senja
Publié dans: In Proceedings of the 24th International Multiconference – IS2021 (SiKDD), 2021
Éditeur: Jožef Stefan Institute

Evaluation of related news recommendations using document similarity methods

Auteurs: Pranjić, Marko; Podpečan, Vid; Robnik-Šikonja, Marko; Pollak, Senja
Publié dans: Numéro In Proceedings of the Conference on Language Technologies and Digital Humanities, JTDH2020, 2020, Page(s) 81-86
Éditeur: Institute of Contemporary History
DOI: 10.5281/zenodo.4059710

Dimenzija spola v slovenskih vektorskih vložitvah besed: primerjava modelov prek analogij poklicev

Auteurs: Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja
Publié dans: In Proceedings of the Joint Conference on Digital Libraries (JCDL 2020), 2020, Page(s) 93-100
Éditeur: Institute of Contemporary History
DOI: 10.5281/zenodo.4059700

Mitigating Gender Bias in Word Embeddings using Explicit Gender Free Corpus

Auteurs: Hargrave, David
Publié dans: Masters thesis, School of Electronic Engineering and Computer Science, Queen Mary University of London, 2021
Éditeur: Queen Mary University of London

Silicon Valley och makten över medierna [Silicon Valley and the power over media]

Auteurs: Carl-Gustav Linden
Publié dans: Numéro 1, 2020
Éditeur: Nordicom
DOI: 10.48335/9789188855350

Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Auteurs: Toivonen, Hannu; Boggia, Michele
Publié dans: 2021, ISBN 978-1-954085-13-8
Éditeur: Association for Computational Linguistics
DOI: 10.5281/zenodo.4730375

Recherche de données OpenAIRE...

Livrables

Publications

Télécharger Télécharger le contenu de la page