Skip to main content

Real time network, text, and speaker analytics for combating organized crime

Deliverables

Preliminary report on network analysis

Initial report and system on NA.

Initial speech/text/video technologies

A set of software and associated report for rapid deployment of speech, NLP and video technologies for early integration and system testing.

Description of the integration toolkit, guidelines, plan

Preliminary ROXANNE integration platform for 1st field-test developed. Technical description of the tools to be developed to ease the integration of the different components. A set of guidelines described.

Technical specifications and detailed architecture

Technical specifications and detailed architecture report: Technical specifications and the design architecture of the platform and integration framework given the requirements.

The project's communications plan update M18

Updated version at M18 of the project's communications plan.

The project's dissemination and exploitation plan

Interim version at M5, with updated versions at M18 and M36. The M36 version will be prepared for those partners exploiting the project results and for stakeholders using the results after EU funding ends.

Overview and analysis of lawfully intercepted data

Overview and analysis of lawfully intercepted and publicly available data: Report providing initial overview of investigation and public data available for ROXANNE (+legal framework).

Risk Assessment

Risk assessment of the whole project.

Initial report on compliance with ethical principles

Initial report (checklist brochure) on compliance with ethical/societal/fundamental/privacy principles: Summary of initial results of T3.1-T3.4. Security advisory board introduced, update M36 in D3.4.

The project's communications plan

Interim version of this plan at M4, with updated versions at M18 and M36.

Development of a decision-making-mechanism

Development of a decision-making-mechanism for ensuring compliance: Report on how to set up and operate the mechanism, with a list of questions that can be used.

Training manual Volume I

Development of the online dynamic manual, regarding the integrated solution accessed by all end-users (updated at M19, M29).

Creation of the project's identity and website

Creation of the project’s identity, website and online accounts: to communicate, inform, create dialogue and promote use of the project results. It includes the project’s online accounts.

Searching for OpenAIRE data...

Publications

German News Article Classification : A Multichannel CNN Approach

Author(s): Parida, Shantipriya; Motlicek, Petr; Dash, Satya Ranjan
Published in: Proceeding 2nd International Conference on Emerging Trends and Advances in Electrical Engineering and Renewable Energy (ETAEERE-2020), 2020
Publisher: Springer

Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition Evaluation

Author(s): Jahangir Alam, Radek Beneš, Marián Beszédeš, Lukáš Burget, Mohamed Dahmane, Abderrahim Fathan, Hamed Ghodrati, Ondřej Glembek, Woo Hyun Kang, Pavel Matĕjka, Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Anna Silnova, Themos Stafylakis
Published in: Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, Page(s) 346-353
Publisher: Odyssey 2022
DOI: 10.21437/odyssey.2022-48

Open Machine Translation for Low Resource South American Languages (AmericasNLP 2021 Shared Task Contribution)

Author(s): Shantipriya Parida, Subhadarshi Panda, Amulya Dash, Esau Villatoro-Tello, A. Seza Doğruöz, Rosa M. Ortega-Mendoza, Amadeo Hernández, Yashvardhan Sharma, Petr Motlicek
Published in: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, 2021
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2021.americasnlp-1.24

ODIANLP’s Participation in WAT2020

Author(s): Shantipriya Parida, Petr Motlicek, Amulya Ratna Dash, Satya Ranjan Dash, Debasish Kumar Mallick, Satya Prakash Biswal, Priyanka Pattnaik, Biranchi Narayan Nayak, Ondřej Bojar
Published in: Proceedings of the 7th Workshop on Asian Translation, 2020, Page(s) 103–108
Publisher: Association for Computational Linguistics

BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020

Author(s): Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukáš Burget, Oldřich Plchot, Ondřej Glembek, Ondvrej Novotný, Pavel Matějka
Published in: Interspeech 2020, 2020, Page(s) 761-765
Publisher: ISCA
DOI: 10.21437/interspeech.2020-2882

Idiap NMT System for WAT 2019 Multi-Modal Translation Task

Author(s): Shantipriya Parida, Petr Motlíček, Ondřej Bojar
Published in: Proceedings of the 6th Workshop on Asian Translation, 2019, Page(s) 175-180
Publisher: Association for Computational Linguistics

Analysis of the but Diarization System for Voxconverse Challenge

Author(s): Federico Landini; Ondrej Glembek; Pavel Matejka; Johan Rohdin; Lukas Burget; Mireia Diez; Anna Silnova
Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7, 2021, ISBN 978-1-7281-7605-5
Publisher: IEEE
DOI: 10.1109/icassp39728.2021.9414315

Analysis of X-Vectors for Low-Resource Speech Recognition

Author(s): Martin Karafiat, Karel Vesely, Jan “Honza” Cernocky, Jan Profant, Jiri Nytra, Miroslav Hlavacek, and Tomas Pavlicek
Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
Publisher: IEEE
DOI: 10.1109/icassp39728.2021.9414725

Detection of Similar Languages and Dialects Using Deep Supervised Autoencoders

Author(s): Shantipriya Parida; Esau Villatoro-Tello; Sajit Kumar; Mael Fabien; Petr Motlicek
Published in: Proceedings of the 17th International Conference on Natural Language Processing, 2020
Publisher: ICON2020

Utilizing VOiCES Dataset for Multichannel Speaker Verification with Beamforming

Author(s): Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Jan Černocký
Published in: Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, Page(s) 187-193
Publisher: ISCA
DOI: 10.21437/odyssey.2020-27

Analysing the Noise Model Error for Realistic Noisy Label Data

Author(s): Hedderich, Michael A.; Zhu, Dawei; Klakow, Dietrich
Published in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35 No. 9: AAAI-21 Technical Tracks 9, 2021, Page(s) 7675-7684, ISSN 2374-3468
Publisher: AAAI Press
DOI: 10.48550/arxiv.2101.09763

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

Author(s): Jahangir Alam, Gilles Boulianne, Lukas Burget, Mohamed Dahmane, Mireia Diez Sánchez, Alicia Lozano-Diez, Ondrej Glembek, Pierre-Luc St-Charles, Marc Lalonde, Pavel Matejka, Petr Mizera, Joao Monteiro, Ladislav Mosner, Cedric Noiseux, Ondřej Novotný, Oldrich Plchot, Johan Rohdin, Anna Silnova, Josef Slavicek, Themos Stafylakis, Shuai Wang, Hossein Zeinali
Published in: Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, Page(s) 289-295
Publisher: ISCA
DOI: 10.21437/odyssey.2020-41

Probabilistic Embeddings for Speaker Diarization

Author(s): Anna Silnova, Niko Brummer, Johan Rohdin, Themos Stafylakis, Lukas Burget
Published in: Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, Page(s) 24-31
Publisher: ISCA
DOI: 10.21437/odyssey.2020-4

SoChainDB: A Database for Storing and Retrieving Blockchain-Powered Social Network Data

Author(s): Nguyen, Hoang H. and Bozhkov, Dmytro and Ahmadi, Zahra and Nguyen, Nhat-Minh and Doan, Thanh-Nam
Published in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '22, 2022, Page(s) 3036-3045, ISBN 978-1-4503-8732-3
Publisher: 2022 Association for Computing Machinery
DOI: 10.1145/3477495.3531735

NLPHut’s Participation at WAT2021

Author(s): Shantipriya Parida, Subhadarshi Panda, Ketan Kotwal, Amulya Ratna Dash, Satya Ranjan Dash, Yashvardhan Sharma, Petr Motlicek, Ondřej Bojar
Published in: Proceedings of the 8th Workshop on Asian Translation (WAT2021), 2021, Page(s) 146–154
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2021.wat-1.16

Multimodal Neural Machine Translation System for English to Bengali

Author(s): Shantipriya Parida, Subhadarshi Panda, Satya Prakash Biswal, Ketan Kotwal, Arghyadeep Sen, Satya Ranjan Dash, Petr Motlicek
Published in: Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021),, in conjuction with RANLP-2021, 2021, Page(s) 31-39
Publisher: INCOMA Ltd.
DOI: 10.26615/978-954-452-073-1_006

Analyzing speaker verification embedding extractors and back-ends underlanguage and channel mismatch

Author(s): Anna Silnova, Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Pavel Matejka, Lukas Burget, Ondrej Glembek, Niko Brummer
Published in: The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, Page(s) 9-16
Publisher: Odyssey
DOI: 10.21437/odyssey.2022-2

OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation

Author(s): Shantipriya Parida, Satya Ranjan Dash, Ondřej Bojar, Petr Motlicek, Priyanka Pattnaik, Debasish Kumar Mallick
Published in: Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, 2020
Publisher: European Language Resources Association (ELRA)

Speaker Recognition on Mono-Channel Telephony Recordings

Author(s): Yosef Solewicz and Noa Cohen and Johan Rohdin and Srikanth Madikeri and Jan ”Honza” Čercnocký
Published in: Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, Page(s) 193-199
Publisher: Odyssey 2022
DOI: 10.21437/odyssey.2022-27

On Node Embedding of Uncertain Networks

Author(s): Hoang H. Nguyen, Sergej Zerr, Tuan-Anh Hoang
Published in: 2020 IEEE International Conference on Big Data (Big Data), 2020, Page(s) 5792-5794, ISBN 978-1-7281-6251-5
Publisher: IEEE
DOI: 10.1109/bigdata50022.2020.9378022

Idiap Submission to Swiss-German Language Detection Shared Task

Author(s): Shantipriya Parida, Esaú Villatoro-Tello, Sajit Kumar, Petr Motlicek, Qingran Zhan
Published in: Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), 2020
Publisher: Creative Commons Licence

Idiap & UAM participation at GermEval 2020: Classification and Regression of Cognitive and Motivational Style from Text

Author(s): Esáu Villatoro-Tello, Shantipriya Parida, Sajit Kumar, Petr Motlicek, and Qingran Zhan
Published in: Proceedings of GermEval Task 1 (“Classification and Regression of Cognitive and Motivational Style from Text”), 2020
Publisher: Creative Commons Licence

ROXANNE Research Platform: Automate criminal investigations

Author(s): Maël Fabien, Shantipriya Parida, Petr Motlicek, Dawei Zhu, Aravind Krishnan, Hoang H. Nguyen
Published in: Proc. Interspeech 2021, INTERSPEECH 2021: Show & Tell Contribution, 2021, Page(s) 962-964
Publisher: 2021 ISCA

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Author(s): Michael A. Hedderich, David Adelani, Dawei Zhu, Jesujoba Alabi, Udia Markus, Dietrich Klakow
Published in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, Page(s) 2580-2591
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2020.emnlp-main.204

Idiap and UAM Participation at MEX-A3T Evaluation Campaign

Author(s): Esaú Villatoro-Tello, Gabriela Ramírez-de-la-Rosa, Sajit Kumar, Shantipriya Parida, Petr Motlicek
Published in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), 2020, Page(s) 252-257
Publisher: Creative Commons License Attribution 4.0 International (CC BY 4.0)

Distant Supervision and Noisy Label Learning for Low Resource Named Entity Recognition: A Study on Hausa and Yor\`ub\'a

Author(s): Adelani, David Ifeoluwa; Hedderich, Michael A.; Zhu, Dawei; Berg, Esther van den; Klakow, Dietrich
Published in: ICLR 2020 Workshop, 3, 2020
Publisher: ICLR

Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition

Author(s): Banriskhem Khonglah, Srikanth Madikeri, Subhadeep Dey, Herve Bourlard, Petr Motlicek, Jayadev Billa
Published in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, Page(s) 7419-7423, ISBN 978-1-5090-6631-5
Publisher: IEEE
DOI: 10.1109/icassp40776.2020.9054309

Analysis of the BUT Diarization System for VoxConverse Challenge

Author(s): Landini, Federico; Glembek, Ondřej; Matějka, Pavel; Rohdin, Johan; Burget, Lukáš; Diez, Mireia; Silnova, Anna
Published in: ICASSP 2021, 3, 2021
Publisher: ICASSP

BertAA: BERT fine-tuning for Authorship Attribution

Author(s): Fabien, Mael; VILLATORO-TELLO, Esaú; Motlicek, Petr; Parida, Shantipriya
Published in: Proceedings of the 17th International Conference on Natural Language Processing, 2020
Publisher: ICON2020

Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

Author(s): Fabien, Mael; Sarfjoo, Seyyed Saeed; Motlicek, Petr; Madikeri, Srikanth
Published in: ICASSP 2021, 1, 2021
Publisher: ICASSP 2021

On the Impact of Dataset Size:A Twitter Classification Case Study

Author(s): Nguyen, Thi Huyen; Nguyen, Hoang H.; Ahmadi, Zahra; Hoang, Tuan-Anh; Doan, Thanh-Nam
Published in: WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2022, Page(s) Pages 210–217, ISBN 978-1-4503-9115-3
Publisher: 2021 Association for Computing Machinery
DOI: 10.1145/3486622.3493960

Speaker Embeddings by Modeling Channel-Wise Correlations

Author(s): Themos Stafylakis, Johan Rohdin, Lukáš Burget
Published in: Proc. Interspeech 2021, 2021, Page(s) 501-505
Publisher: Interspeech 2021
DOI: 10.21437/interspeech.2021-1442

Open-Set Speaker Identification pipeline in live criminal investigations

Author(s): Maël Fabien, Petr Motlicek
Published in: 2021 ISCA Symposium on Security and Privacy in Speech Communication, 2021, Page(s) 21-24
Publisher: 2021 ISCA
DOI: 10.21437/spsc.2021-5

Overview of the 6th Workshop on Asian Translation

Author(s): Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi
Published in: Proceedings of the 6th Workshop on Asian Translation, 2019, Page(s) 1-35
Publisher: Association for Computational Linguistics
DOI: 10.18653/v1/d19-5201

ROXSD: a Simulated Dataset of Communication in Organized Crime

Author(s): "Kvetoslav Maly, Gerhard Backfried, Francesco Calderoni, Jan ""Honza"" Černocký, Erinc Dikici, Maël Fabien, Jan Hořínek, Joshua Hughes, Miroslav Janošík, Marek Kovac, Petr Motlicek, Hoang H. Nguyen, Shantipriya Parida, Johan Rohdin, Miroslav Skácel, Sergej Zerr, Dietrich Klakow, Dawei Zhu, Aravind Krishnan"
Published in: Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 2021, Page(s) 32-36
Publisher: 2021 ISCA
DOI: 10.21437/spsc.2021-7

Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

Author(s): Maël Fabien; Seyyed Saeed Sarfjoo; Petr Motlicek; Srikanth Madikeri
Published in: 2021 ISCA Symposium on Security and Privacy in Speech Communication, 1, 2021, Page(s) 10-13
Publisher: ISCA
DOI: 10.21437/spsc.2021-3

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

Author(s): Zhang, Zijian and Setty, Vinay and Anand, Avishek
Published in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '22, 2022, Page(s) 3219–3223, ISBN 978-1-4503-8732-3
Publisher: Association for Computing Machinery
DOI: 10.1145/3477495.3531677

Speech Activity Detection Based on Multilingual Speech Recognition System

Author(s): Seyyed Saeed Sarfjoo; Srikanth Madikeri; Petr Motlicek
Published in: Proc. Interspeech 2021, 4, 2021, Page(s) 4369-4373
Publisher: Interspeech 2021
DOI: 10.21437/interspeech.2021-1058

Inferring Highly-dense Representations for Clustering Broadcast Media Content

Author(s): Esaú Villatoro-Tello, Shantipriya Parida, Petr Motlicek, Ondřej Bojar
Published in: Prague Bulletin of Mathematical Linguistics, 115/1, 2020, Page(s) 31-50, ISSN 1804-0462
Publisher: Creative Commons CC BY-NC-ND
DOI: 10.14712/00326585.004

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks

Author(s): Federico Landini; Ján Profant; Mireia Diez; Lukas Burget
Published in: Computer Speech & Language, Vol. 71, 2021, ISSN 0885-2308
Publisher: Academic Press
DOI: 10.48550/arxiv.2012.14952

Robust link prediction in criminal networks: A case study of the Sicilian Mafia

Author(s): Francesco Calderoni, Salvatore Catanese, Pasquale De Meo, Annamaria Ficara, Giacomo Fiumara
Published in: Expert Systems with Applications, 161, 2020, Page(s) 113666, ISSN 0957-4174
Publisher: Pergamon Press Ltd.
DOI: 10.1016/j.eswa.2020.113666

Establishing phone-pair co-usage by comparing mobility patterns

Author(s): Wauter Bosma, Sander Dalm, Erwin van Eijk, Rachid el Harchaoui, Edwin Rijgersberg, Hannah Tereza Tops, Alle Veenstra, Rolf Ypma
Published in: Science & Justice, 60/2, 2020, Page(s) 180-190, ISSN 1355-0306
Publisher: Forensic Science Society
DOI: 10.1016/j.scijus.2019.10.005

Experimental Evaluation of Scale, and Patterns of Systematic Inconsistencies in Google Trends Data

Author(s): Philipp Behnen, Rene Kessler, Felix Kruse, Jorge Marx Gómez, Jan Schoenmakers, Sergej Zerr
Published in: ECML PKDD 2020 Workshops - Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): SoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, 1323, 2020, Page(s) 374-384, ISBN 978-3-030-65964-6
Publisher: Springer International Publishing
DOI: 10.1007/978-3-030-65965-3_25

BertOdia: BERT pre-training for low resource Odia language

Author(s): Shantipriya Parida, Satya Prakash Biswal, Biranchi Narayan Nayak, Maël Fabien, Esaú Villatoro-Tello, Petr Motlicek & Satya Ranjan Dash
Published in: Dehuri, S., Prasad Mishra, B.S., Mallick, P.K., Cho, SB. (eds) Biologically Inspired Techniques in Many Criteria Decision Making., Smart Innovation, Systems and Technologies, vol 271, 2022, ISBN 978-981-16-8739-6
Publisher: Springer
DOI: 10.1007/978-981-16-8739-6_32

Analysing the Noise Model Error for Realistic Noisy Label Data

Author(s): Hedderich, Michael A.; Zhu, Dawei; Klakow, Dietrich
Published in: 1, 2021
Publisher: AAAI 2021

BUT System Description for The Third DIHARD Speech Diarization Challenge

Author(s): Federico Landini, Alicia Lozano-Diez, Lukas Burget, Mireia Diez, Anna Silnova, Katerina Zmolıkova, Ondrej Glembek, Pavel Matejka, Themos Stafylakis, Niko Brümmer
Published in: Proceedings available at Dihard Challenge Github, 2021
Publisher: Dihard Challenge Github

Speech Activity Detection Based on Multilingual Speech Recognition System

Author(s): Sarfjoo, Seyyed Saeed; Madikeri, Srikanth; Motlicek, Petr
Published in: 1, 2021
Publisher: Interspeech 2021