Skip to main content

Understanding Europe’s Fashion Data Universe


The classification algorithm and its evaluation on fashion time series

As a result of task 5.3, this deliverable will consist of implemented algorithms that will be integrated within the data integration infrastructure developed within WP2 (T 2.3).

Time Series Operators for MonetDB

This deliverable will report the extended support for time series data processing in MonetDB, including integration with the software provided by D4.1. Corresponding software will be made available through the MonetDB open-source repository.

A set of aggregation algorithms and their experimental evaluation

As a result of task 3.2, this deliverable will consist of implemented algorithms that will be integrated within the data integration infrastructure developed within WP2 and will feed into WP5,6, and 7.

Data integration solution

A MonetDB data integration solution for modeling and storing i) the different available datasets from all partners, ii) the taxonomy, iii) the extracted named entities and links. The deliverable will also include the extension of MonetDB with JSON support to include the management of semi-structured data. The proposed solution will be used in WP4, WP5, and WP6.

A set of crowdsourcing interfaces

This deliverable will consist of a set of Human Intelligence Task design experimentally validated for object recognition in images, Validation of named entity extraction, image labeling. This will be available for tasks in WP5.

Named Entity Recognition and Linking methods

As a result of task 1.1, this deliverable will consists of implemented algorithms for entity extraction from textual documents and linking to the ontology defined in WP1. The result will feed into WP 4, 5, and 6.

Report on text joins

This report will describe our methodology for learning text joins and a robust entity recognizer for the fashion domain.

Surveys design and crowdsourcing tasks

The tangible result of task 3.3 will be models generated by means of crowdsourcing which will be used to address our use-cases in WP5 and WP6.

Communication plan

This deliverable will contain a plan of communications relevant for the project dissemination and community building activities including planned activities to support standardisation and interoperability. Outcomes of these efforts will be reported in Periodic Activity Reports produced in WP7.

Project factsheet

A brief project Fact Sheet suitable for Web publishing will be published within one month from the start of the project. The Fact Sheet will outline the project's rationale and objectives, specify its technical baseline and intended target groups and application domains, and detail intermediate and final outputs. The Fact Sheet can be used by the Commission for its own dissemination and awareness activities throughout the project lifecycle, and may be published on EC and EC sponsored Web sites. The factsheet has to be maintained and updated until the end of the project; this will be documented in the regular reporting.

Relation Extraction with Stacked Deep Learning

This report integrates relation extraction and stacked deep learning for selected relations of the Zalando FDWH. We will investigate, how much of the training should be executed in the database or how much may be shipped to a less expensive GPU-based architecture.

Software Requirements: SSM library for time series modelling and trend prediction

Most modern algorithms of State Space Models (SSM) for time series analysis and probabilistic inference will be summarised in this deliverable and will be used as a basis for future software developments in the project. The output will be available for project internal and public use.

Showcase specification and dissemination summary

This deliverable will present the produced promotion and dissemination material, demonstration workflows, and the fully functional data integration infrastructure ready to be demo-able to the public also including screencasts (as indicated in T7.2) . We will grant the Commission the right to use the Showcase for its own dissemination and awareness activities (including Web based and electronic publications) after the completion of the project. The Showcase will feature a meaningful subset (software, data, etc.) of the functionality characterizing the project demonstrator(s) arrived at, along with relevant copyright notices and contact information, and suitable installation aids and run-time interfaces. We will also report about project activities undertaken to support standardisation of project results and collaboration with other projects and relevant initiatives as well as the results of reaching-out by means of press, social media, open-source communities using demos, use cases, and benchmark results realized during the project. As planned in T7.1, we will report on our contribution to the Big Data Value PPP activities.

Showcase specification

This deliverable will contain a specification of the FashionBrain data integration infrastructure including design of promotion material and requirements for software needed to run it.

Survey document of existing datasets and data integration solutions (M6)

This deliverable will consist of an overview of existing state-of-the-art solutions for data integration including infrastructures, algorithms, and datasets covering both academic research as well as industry solutions. This will be the result of Task 1.1.

Demo on text joins

This demo presents fully functional and documented text joins for the example of the Zalando FDWH. Given a fashion data warehouse, we will demo executing text joins for common (and often idiosyncratic) fashion entities, such as brand or products.

Early Demo on textual image search

This deliverable consists of a preliminary image search prototype based on textual entities. This is the basis for D 6.5, which will extend the textual component by NLP and multi-linguality.

Early Demo on Fashion Trend Prediction

This early demo will show how it will be possible to detect fashion trends (style) on fashion time series over time.

Demo on Relation Extraction with Stacked Deep Learning

This demo integrates methods for stacked deep learning on typical crowd-based workflows for trend detection and brand monitoring.

Demo on Fashion Trend Prediction

This demo will show how a particular fashion trend (style) is detected on fashion time series over time. The prediction will be implemented as an operator in MonetDB.

Scalable Crowdsourced Social Media Annotation

This deliverable consists of a publicly available website with data visualization functionalities. We demonstrate that we analysed hundreds of fashion blogs, instagram profiles and that we are able to constantly update the profiles with recently published images.

Demo on textual image search

This deliverable consists of a image search prototype system which uses all of the data collected and allows users to search by images, collects user feedback and is able to periodically improve its results based on this interaction data. It extends the textual component of D 6.3 by NLP and multi-linguality primarily targeting on German, English, French, and Italian.

Product Taxonomy Linking

This deliverable extends D5.1 with a demo that integrates the products social media posts linking, that means that we recognise products from different social media channels.

Project Web site

Setting up the public, general audience targeted project Web site. The site will provide project overviews and highlights; up-to-date information on intermediate and final project results, including public reports and publications as well as synthesis reports drawn from selected confidential material in non-proprietary formats (e.g. PDF); project events, including e.g. user group meetings, conferences and workshops; contact details, etc. The project's Web site first point of access will describe the goals of the project in a simple jargon free language. The Web site will be maintained and updated until the end of the project. All open source components published will be extensively documented by means of textual documents and screencasts of professional quality illustrating how to download, install and operate the components in question. Documentation manuals and screencasts will be specifically identified as project deliverables and prominently published on the project's Web site.

Searching for OpenAIRE data...


Analysing Errors of Open Information Extraction Systems

Author(s): Schneider, Rudolf; Oberhauser, Tom; Klatt, Tobias; Gers, Felix A.; Löser, Alexander
Published in: Conference on Empirical Methods on Natural Language Processing Workshop Proceedings, Issue 3, 2017, Page(s) 8

FashionBrain Project: A Vision for Understanding Europe's Fashion Data Universe

Author(s): Checco, Alessandro; Demartini, Gianluca; Loeser, Alexander; Arous, Ines; Khayati, Mourad; Dantone, Matthias; Koopmanschap, Richard; Stalinov, Svetlin; Kersten, Martin; Zhang, Ying
Published in: Machine learning meets fashion' workshop at KDD 2017, Issue 2, 2017

IDEL: In-Database Entity Linking with Neural Embeddings

Author(s): Kilias, Torsten; Löser, Alexander; Gers, Felix A.; Koopmanschap, Richard; Zhang, Ying; Kersten, Martin
Published in: IEEE BigComp2019, Issue 1, 2018, Page(s) 12

Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing

Author(s): Alessandro Checco, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro and Gianluca Demartini
Published in: 2017

Contextual String Embeddings for Sequence Labeling

Author(s): Alan Akbik, Duncan Blythe and Roland Vollgraf
Published in: 27th International Conference on Computational Linguistics, COLING 2018, 2018

Smart-MD - Neural Paragraph Retrieval of Medical Topics

Author(s): Rudolf Schneider, Sebastian Arnold, Tom Oberhauser, Tobias Klatt, Thomas Steffek, Alexander Löser
Published in: Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, 2018, Page(s) 203-206
DOI: 10.1145/3184558.3186979

RelVis: Benchmarking OpenIE Systems

ZAP: An Open-Source Multilingual Annotation Projection Framework

Author(s): Alan Akbik and Roland Vollgraf
Published in: 11th Language Resources and Evaluation Conference, LREC 2018, 2018

Love at First Sight: MonetDB/TensorFlow

Author(s): Ying Zhang, Richard Koopmanschap, Martin Kersten
Published in: 2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018, Page(s) 1672-1672
DOI: 10.1109/icde.2018.00208

FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German

Author(s): Leonidas Lefakis, Alan Akbik, Roland Vollgraf
Published in: 2018

All That Glitters is Gold - An Attack Scheme on Gold Questions in Crowdsourcing

Author(s): Alessandro Checco, Jo Bates and Gianluca Demartini
Published in: The sixth AAAI Conference on Human Computation and Crowdsourcing, 2018

The Projector: An Interactive Annotation Projection Visualization Tool

Author(s): Alan Akbik, Roland Vollgraf
Published in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2017, Page(s) 43-48
DOI: 10.18653/v1/D17-2008

In-Database Machine Learning with MonetDB/TensorFlow

Author(s): Torsten Kilias, Alexander Löpser, Felix A. Gers, Richard Koopmanschap, Ying Zhang, Martin Kersten, Mark Raasveldt, Pedro Holanda, Hannes Mühleisen and Stefan Manegold

Investigating Stability and Reliability of Crowdsourcing Output

Author(s): Rehab K. Qarout, Alessandro Checco, Kalina Bontcheva
Published in: CrowdBias 2018, 2018

All Those Wasted Hours - On Task Abandonment in Crowdsourcing

Author(s): Lei Han, Kevin Roitero, Ujwal Gadiraju, Cristina Sarasua, Alessandro Checco, Eddy Maddalena, Gianluca Demartini
Published in: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining - WSDM '19, 2019, Page(s) 321-329
DOI: 10.1145/3289600.3291035

How Does BERT Answer Questions? - A Layer-Wise Analysis of Transformer Representations

Author(s): Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers
Published in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19, 2019, Page(s) 1823-1832
DOI: 10.1145/3357384.3358028

RecovDB: Accurate and Efficient Missing Blocks Recovery for Large Time Series

Author(s): Ines Arous, Mourad Khayati, Philippe Cudre-Mauroux, Ying Zhang, Martin Kersten, Svetlin Stalinlov
Published in: 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, Page(s) 1976-1979
DOI: 10.1109/icde.2019.00218

Multilingual Sequence Labeling With One Model

Author(s): Alan Akbik, Tanja Bergmann and Roland Vollgraf
Published in: NLDL 2019, 2019

FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP

Author(s): Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter and Roland Vollgraf
Published in: NAACL-HLT 2019, 2019

Platform-related Factors in Repeatability and Reproducibility of Crowdsourcing Tasks

Author(s): Rehab Qarout, Alessandro Checco, Gianluca Demartini and Kalina Bontcheva
Published in: 2019

OpenCrowd: A Human-AI Collaborative Approach for Finding Social Influencers via Open-Ended Answers Aggregation

Author(s): Ines Arous, Jie Yang, Mourad Khayati, Philippe Cudré-Mauroux
Published in: Proceedings of The Web Conference 2020, 2020, Page(s) 1851-1862
DOI: 10.1145/3366423.3380254

Pooled Contextualized Embeddings for Named Entity Recognition

Author(s): Alan Akbik, Tanja Bergmann and Roland Vollgraf
Published in: NAACL-HLT 2019, 2019

Implicit Bias in Crowdsourced Knowledge Graphs

Author(s): Gianluca Demartini
Published in: Companion Proceedings of The 2019 World Wide Web Conference on - WWW '19, 2019, Page(s) 624-630
DOI: 10.1145/3308560.3317307

Challenges for Toxic Comment Classification: An In-Depth Error Analysis

Author(s): Betty van Aken, Julian Risch, Ralf Krestel, Alexander Löser
Published in: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 2018, Page(s) 33-42
DOI: 10.18653/v1/w18-5105

The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits

Author(s): Cristina Sarasua, Alessandro Checco, Gianluca Demartini, Djellel Difallah, Michael Feldman, Lydia Pintscher
Published in: Computer Supported Cooperative Work (CSCW), Issue 28/5, 2019, Page(s) 843-882, ISSN 0925-9724
DOI: 10.1007/s10606-018-9344-y

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification

Author(s): Arnold, Sebastian; Schneider, Rudolf; Cudré-Mauroux, Philippe; Gers, Felix A.; Löser, Alexander
Published in: Transactions of the Association for Computational Linguistics, Issue 2, 2019, ISSN 2307-387X

Mind the gap

Author(s): Mourad Khayati, Alberto Lerner, Zakhar Tymchenko, Philippe Cudré-Mauroux
Published in: Proceedings of the VLDB Endowment, Issue 13/5, 2020, Page(s) 768-782, ISSN 2150-8097
DOI: 10.14778/3377369.3377383

The Impact of Task Abandonment in Crowdsourcing

Author(s): Lei Han, Kevin Roitero, Ujwal Gadiraju, Cristina Sarasua, Alessandro Checco, Eddy Maddalena, Gianluca Demartini
Published in: IEEE Transactions on Knowledge and Data Engineering, 2019, Page(s) 1-1, ISSN 1041-4347
DOI: 10.1109/tkde.2019.2948168

Adversarial Attacks on Crowdsourcing Quality Control

Author(s): Alessandro Checco, Jo Bates, Gianluca Demartini
Published in: Journal of Artificial Intelligence Research, Issue 67, 2020, Page(s) 375-408, ISSN 1076-9757
DOI: 10.1613/jair.1.11332

Deadline-Aware Fair Scheduling for Multi-Tenant Crowd-Powered Systems

Author(s): Djellel Difallah, Alessandro Checco, Gianluca Demartini, Philippe Cudré-Mauroux
Published in: ACM Transactions on Social Computing, Issue 2/1, 2019, Page(s) 1-29, ISSN 2469-7818
DOI: 10.1145/3301003

Scalable recovery of missing blocks in time series with high and low cross-correlations

Author(s): Mourad Khayati, Philippe Cudré-Mauroux, Michael H. Böhlen
Published in: Knowledge and Information Systems, 2019, ISSN 0219-1377
DOI: 10.1007/s10115-019-01421-7

Crowd-Labeling Fashion Reviews with Quality Control

Author(s): Chernushenko, I.; Gers, F.A.; Löser, A.; Checco, A.
Published in: arXiv, Issue 2, 2018