Skip to main content

Found in Translation – Natural Language Understanding with Cross-Lingual Grounding

Searching for OpenAIRE data...

Publications

TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages

Author(s): Yves Scherrer
Published in: Proceedings of The 12th Language Resources and Evaluation Conference, 2020, Page(s) 6868-6873, ISBN 979-10-95546-34-4
Publisher: European Language Resources Association (ELRA)

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models

Author(s): Yves Scherrer, Nikola Ljubešić
Published in: Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020, Page(s) 202-211, ISBN 978-1-952148-47-7
Publisher: International Committee on Computational Linguistics (ICCL)

Paraphrase Generation and Evaluation on Colloquial-Style Sentences

Author(s): Eetu Ilari Sjöblom, Mathias Creutz, Yves Scherrer
Published in: Proceedings of the 12th Language Resources and Evaluation Conference, 2020, Page(s) 1814-1822, ISBN 979-10-95546-34-4
Publisher: European Language Resources Association (ELRA)

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

Author(s): Mikko Aulamo, Sami Virpioja, Jörg Tiedemann
Published in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020, Page(s) 150-156
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.acl-demos.20

MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning

Author(s): Aina Garí Soler, Marianna Apidianaki
Published in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, December 2020, 2020, Page(s) 158–165
Publisher: International Committee for Computational Linguistics

Effects of Language Relatedness for Cross-lingual Transfer Learning in Character-Based Language Models

Author(s): Mittul Singh, Peter Smit, Sami Virpioja, Mikko Kurimo
Published in: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), 2020, Page(s) 41-45, ISBN 979-10-95546-35-1
Publisher: European Language Resources Association (ELRA)

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

Author(s): Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo
Published in: Proceedings of The 12th Language Resources and Evaluation Conference, 2020, Page(s) 3944-3953, ISBN 979-10-95546-34-4
Publisher: European Language Resources Association (ELRA)

FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics

Author(s): Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo
Published in: Interspeech 2020, 2020, Page(s) 429-433
Publisher: ISCA
DOI: 10.21437/interspeech.2020-2511

OPUS-MT -- Building open translation services for the World

Author(s): Jörg Tiedemann, Santhosh Thottingal
Published in: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 2020, Page(s) 479-480, ISBN 978-989-33-0589-8
Publisher: European Association for Machine Translation

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Author(s): Alessandro Raganato, Yves Scherrer, Jörg Tiedemann
Published in: Findings of the Association for Computational Linguistics : EMNLP 2020, 2020, Page(s) 556-568, ISBN 978-1-952148-90-3
Publisher: The Association for Computational Linguistics

The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks

Author(s): Yves Scherrer, Stig-Arne Grönroos, Sami Virpioja
Published in: Proceedings of the Fifth Conference on Machine Translation, 2020, Page(s) 1129-1138, ISBN 978-1-948087-81-0
Publisher: The Association for Computational Linguistics

Controlling the Imprint of Passivization and Negation in Contextualized Representations

Author(s): Hande Celikkanat, Sami Virpioja, Jörg Tiedemann, Marianna Apidianaki
Published in: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2020, Page(s) 136-148, ISBN 978-1-952148-86-6
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.blackboxnlp-1.13

BERT Knows Punta Cana is not just beautiful, it’s gorgeous: Ranking Scalar Adjectives with Contextualised Representations

Author(s): Aina Garí Soler, Marianna Apidianaki
Published in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, Page(s) 7371-7385
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.emnlp-main.598

The MUCOW word sense disambiguation test suite at WMT 2020

Author(s): Yves Scherrer, Alessandro Raganato, Jörg Tiedemann
Published in: Proceedings of the Fifth Conference on Machine Translation, 2020, Page(s) 365-370, ISBN 978-1-948087-81-0
Publisher: The Association for Computational Linguistics

Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks

Author(s): Iacer Calixto, Alessandro Raganato, Tommaso Pasini
Published in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, Page(s) 3651-3661
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2021.naacl-main.286

The University of Helsinki Submission to the IWSLT2020 Offline SpeechTranslation Task

Author(s): Raúl Vázquez, Mikko Aulamo, Umut Sulubacak, Jörg Tiedemann
Published in: Proceedings of the 17th International Conference on Spoken Language Translation, 2020, Page(s) 95-102, ISBN 978-1-952148-07-1
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.iwslt-1.10

Emerging Language Spaces Learned From Massively Multilingual Corpora. In Proceedings of the 3rd Conference on Digital Humanities in the Nordic Countries (DHN 2018), Helsinki, Finland

Author(s): Tiedemann, Jörg
Published in: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018), 2018, Page(s) 188-197
Publisher: CEUR Workshop Proceedings

An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation

Author(s): Alessandro Raganato, Raúl Vázquez, Mathias Creutz, Jörg Tiedemann
Published in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), 2019, Page(s) 27-32, ISBN 978-1-950737-35-2
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-4304

The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

Author(s): Yves Scherrer, Raúl Vázquez, Sami Virpioja
Published in: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), 2019, Page(s) 236-244, ISBN 978-1-950737-27-7
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-5432

The University of Helsinki Submissions to the WMT19 News Translation Task

Author(s): Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann
Published in: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), 2019, Page(s) 412-423, ISBN 978-1-950737-27-7
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-5347

The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task

Author(s): Raúl Vázquez, Umut Sulubacak, Jörg Tiedemann
Published in: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), 2019, Page(s) 294-300, ISBN 978-1-950737-27-7
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-5441

Multilingual NMT with a Language-Independent Attention Bridge

Author(s): Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, Mathias Creutz
Published in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), 2019, Page(s) 33-39, ISBN 978-1-950737-35-2
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-4305

SUM-QE: a BERT-based Summary Quality Estimation Model

Author(s): Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos
Published in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, Page(s) 6004-6010
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/d19-1618

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Author(s): Alessandro Raganato, Jörg Tiedemann
Published in: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018, Page(s) 287-297, ISBN 978-1-948087-71-1
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w18-5431

The University of Helsinki submissions to the WMT18 news task

Author(s): Alessandro Raganato, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, Jörg Tiedemann
Published in: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, 2018, Page(s) 488-495, ISBN 978-1-948087-81-0
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w18-6425

The MuCoW test suite at WMT 2019: Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation.

Author(s): Alessandro Raganato, Yves Scherrer, Jörg Tiedemann
Published in: In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL): Student Research Workshop. 2019., 2019, Page(s) 470-480, ISBN 9781-950737277
Publisher: The Association for Computational Linguistics

Analysing concatenation approaches to document-level NMT in two different domains

Author(s): Yves Scherrer, Jörg Tiedemann, Sharid Loáiciga
Published in: Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), 2019, Page(s) 51-61, ISBN 978-1-950737-74-1
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/d19-6506

A Report on the Third

Author(s): Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samardžić, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
Published in: Proceedings of the Sixth Workshop on, 2019, Page(s) 1-16, ISBN 978-1-950737-11-6
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-1401

Measuring Semantic Abstraction of Multilingual

Author(s): Jörg Tiedemann, Yves Scherrer
Published in: Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for, 2019, Page(s) 35-42, ISBN 978-1-950737-05-5
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-2005

The WMT’18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English

Author(s): Franck Burlot, Yves Scherrer, Vinit Ravishankar, Ondřej Bojar, Stig-Arne Grönroos, Maarit Koponen, Tommi Nieminen, François Yvon
Published in: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, 2018, Page(s) 546-560, ISBN 978-1-948087-81-0
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w18-6433

Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

Author(s): Aarne Talman, Antti Suni, Hande Celikkanat, Sofoklis Kakouros, Jörg Tiedemann, Martti Vainio
Published in: 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference, 2019, Page(s) 281–290, ISBN 978-91-7929-995-8
Publisher: Linköping University Electronic Press

A Computational Model for the Linguistic Notion of Morphological Paradigm

Author(s): Miikka Silfverberg, Ling Liu, Mans Hulden
Published in: Proceedings of the 27th International Conference on Computational Linguistics, 2018, Page(s) 1615-1626, ISBN 978-1-948087-50-6
Publisher: The Association for Computational Linguistics

Initial Experiments in Data-Driven Morphological Analysis for Finnish

Author(s): Miikka Silfverberg, Mans Hulden
Published in: Proceedings of the Fourth International Workshop on Computatinal Linguistics of Uralic Languages, 2018, Page(s) 100-107
Publisher: The Association for Computational Linguistics

Sub-label dependencies for neural morphological tagging--the joint submission of University of Colorado and University of Helsinki for VarDial 2018

Author(s): Miikka Silfverberg, Senka Drobac
Published in: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), 2018, Page(s) 37-45, ISBN 978-1-948087-55-1
Publisher: The Association for Computational Linguistics

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

Author(s): Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Geraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sebastian Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden
Published in: Proceedings of the CoNLL SIGMORPHON 2018 Shared Task : Universal Morphological Reinflection, 2018, Page(s) 1-27, ISBN 978-1-948087-83-4
Publisher: The Association for Computational Linguistics

OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora

Author(s): Pierre Lison, Jörg Tiedemann, Milen Kouylekov
Published in: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), 2018, Page(s) 1742-1748, ISBN 979-10-95546-00-9
Publisher: European Language Resources Association (ELRA)

The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services

Author(s): Mikko Aulamo, Jörg Tiedemann
Published in: 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference, 2019, ISBN 978-91-7929-995-8
Publisher: Linköping University Electronic Press

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

Author(s): Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samardžić, Nikola Ljubešić, Jörg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
Published in: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018, Page(s) 1-17, ISBN 978-1-948087-55-1
Publisher: The Association for Computational Linguistics

The University of Helsinki submissions to the IWSLT 2018 low-resource translation task

Author(s): Yves Scherrer
Published in: Proceedings of the 15th International Workshop on Spoken Language Translation, 2018, Page(s) 83-88
Publisher: International Workshop on Spoken Language Translation - Brugge, Belgium

Ensembles of Neural Morphological Inflection Models

Author(s): Ilmari Kylliäinen, Miikka Silfverberg
Published in: Proceedings of the 22nd Nordic Conference on Computational Linguistics, 2019, Page(s) 304–309
Publisher: Linköping University Electronic Press

Sound Analogies with Phoneme Embeddings

Author(s): Miikka P. Silfverberg, Lingshuang Mao, Mans Hulden
Published in: Proceedings of the Society for Computation in Linguistics (SCiL) 2018, 2018, Page(s) 136–144
Publisher: Society for Computation in Linguistics (SCiL)
DOI: 10.7275/r5nz85vd

Data-Driven Morphological Analysis for Uralic Languages

Author(s): Miikka Silfverberg, Francis Tyers
Published in: Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages, 2019, Page(s) 1-14
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-0301

Weird Inflects but OK: Making Sense of Morphological Generation Errors

Author(s): Kyle Gorman, Arya D. McCarthy, Ryan Cotterell, Ekaterina Vylomova, Miikka Silfverberg, Magdalena Markowska
Published in: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), 2019, Page(s) 140-151
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/k19-1014

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

Author(s): Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sebastian J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
Published in: Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2019, Page(s) 229-244
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/w19-4226

The Helsinki submission to the AmericasNLP shared task

Author(s): Raúl Vázquez, Yves Scherrer, Sami Virpioja, Jörg Tiedemann
Published in: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, 2021, Page(s) 255-264
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2021.americasnlp-1.29

Recent Trends in Word Sense Disambiguation: A Survey

Author(s): Michele Bevilacqua, Tommaso Pasini, Alessandro Raganato, Roberto Navigli
Published in: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021, Page(s) 4330-4338, ISBN 978-0-9992411-9-6
Publisher: International Joint Conferences on Artificial Intelligence Organization
DOI: 10.24963/ijcai.2021/593

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

Author(s): Raúl Vázquez, Hande Celikkanat, Mathias Creutz, Jörg Tiedemann
Published in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, 2021, Page(s) 337-347
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2021.acl-srw.35

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Author(s): Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar
Published in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, Page(s) 7193-7206
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.emnlp-main.584

OpusTools and Parallel Corpus Diagnostics

Author(s): Mikko Aulamo, Umut Sulubacak, Sami Virpioja, Jörg Tiedemann
Published in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, Page(s) 3782-3789, ISBN 979-10-95546-34-4
Publisher: European Language Resources Association (ELRA)

The Tatoeba Translation Challenge - Realistic Data Sets for Low Resource and Multilingual MT

Author(s): Jörg Tiedemann
Published in: Proceedings of the Fifth Conference on Machine Translation, 2020, Page(s) 1174-1182, ISBN 978-1-948087-81-0
Publisher: The Association for Computational Linguistics

An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

Author(s): Alessandro Raganato, Yves Scherrer, Jörg Tiedemann
Published in: Proceedings of The 12th Language Resources and Evaluation Conference, 2020, Page(s) 3668-3675, ISBN 979-10-95546-34-4
Publisher: European Language Resources Association (ELRA)

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Author(s): Raúl Vázquez, Alessandro Raganato, Mathias Creutz, Jörg Tiedemann
Published in: Computational Linguistics, 46/2, 2020, Page(s) 387-424, ISSN 0891-2017
Publisher: MIT Press
DOI: 10.1162/coli_a_00377

Are Multilingual Neural Machine Translation Models Better at Capturing Linguistic Features?

Author(s): David Mareček, Hande Celikkanat, Miikka Silfverberg, Vinit Ravishankar, Jörg Tiedemann
Published in: Prague Bulletin of Mathematical Linguistics, 115/1, 2020, Page(s) 143-162, ISSN 1804-0462
Publisher: Institute of Formal and Applied Linguistics, Charles University
DOI: 10.14712/00326585.009

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

Author(s): Stig-Arne Gronroos, Sami Virpioja, Mikko Kurimo
Published in: Machine Translation, 34, 2021, Page(s) 251-286, ISSN 0922-6567
Publisher: Kluwer Academic Publishers
DOI: 10.1007/s10590-020-09253-x

What Do Language Representations Really Represent?

Author(s): Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein
Published in: Computational Linguistics, 45/2, 2019, Page(s) 381-389, ISSN 0891-2017
Publisher: MIT Press
DOI: 10.1162/coli_a_00351

Neural morphosyntactic tagging for Rusyn

Author(s): Yves Scherrer, Achim Rabus
Published in: Natural Language Engineering, 25/5, 2019, Page(s) 633-650, ISSN 1351-3249
Publisher: Cambridge University Press
DOI: 10.1017/s1351324919000287

Digitising Swiss German: how to process and study a polycentric spoken language

Author(s): Yves Scherrer, Tanja Samardžić, Elvira Glaser
Published in: Language Resources and Evaluation, 53/4, 2019, Page(s) 735-769, ISSN 1574-020X
Publisher: Springer Verlag
DOI: 10.1007/s10579-019-09457-5

Sentence Embeddings in NLI with Iterative Refinement Encoders

Author(s): Aarne Johannes Talman, Anssi Yli-Jyrä, Jörg Tiedemann
Published in: Natural Language Engineering, 2019, Page(s) 467-482, ISSN 1351-3249
Publisher: Cambridge University Press

A Finnish news corpus for named entity recognition

Author(s): Teemu Ruokolainen, Pekka Kauppinen, Miikka Silfverberg, Krister Lindén
Published in: Language Resources and Evaluation, 54/1, 2020, Page(s) 247-272, ISSN 1574-020X
Publisher: Springer Verlag
DOI: 10.1007/s10579-019-09471-7

Advances in subword-based HMM-DNN speech recognition across languages

Author(s): Peter Smit, Sami Virpioja, Mikko Kurimo
Published in: Computer Speech & Language, 66, 2021, Page(s) 101158, ISSN 0885-2308
Publisher: Academic Press
DOI: 10.1016/j.csl.2020.101158