Discovering and Analyzing Visual Structures

Información del proyecto

DISCOVER

Identificador del acuerdo de subvención: 101076028

DOI

10.3030/101076028

Fecha de la firma de la CE 12 Diciembre 2022

Fecha de inicio 1 Junio 2023

Fecha de finalización 31 Mayo 2028

Financiado con arreglo a

European Research Council (ERC)

Coste total

€ 1 493 498,75

Aportación de la UE

€ 1 493 498,75

1 493 498,75

Coordinado por

ECOLE NATIONALE DES PONTS ET CHAUSSEES
France

Periodic Reporting for period 1 - DISCOVER (Discovering and Analyzing Visual Structures)

Período documentado: 2023-06-01 hasta 2025-11-30

Our goal is to develop approaches to assist experts in identifying and analyzing patterns. Indeed, while the success of deep learning on visual data is undeniable, applications are often limited to the supervised learning scenario where the algorithm tries to infer a label for a new image based on the annotations made by experts in a reference dataset. In contrast, we want to take as input images without any annotation, automatically identify consistent patterns and model their variation and evolution, so that an expert can more easily analyze them.

The concept we will develop is the one of visual structures. Their key features will be their interpretability, in terms of correspondences, deformations, or properties of the observed images, and their ability to incorporate prior knowledge about the data and expert feedback. We will explore two complementary approaches to formally define and identify visual structures: one based on analyzing correspondences, the other on learning interpretable image models.

We will develop visual structures in two domains: historical documents and Earth imagery. For example, from temporal series of multispectral Earth images, we will identify types of moving objects, areas with different types of vegetation or constructions, and model the evolution of their characteristics, which may correspond to changes in their activity or life cycle. Ultimately, experts will still be needed to select relevant visual structures and perform analysis, requiring to work closely with them, to design relevant features in our algorithms and adapted interfaces for interaction.

On a high level, the work in the first 30 months was focussed on explicit visual structures and historical applications. The focus on explicit visual structure is due to (i) the higher novelty and challenges associated with explicit visual structures with respect to implicit visual structures, where our previous work already provides results that we could use in applications, and (ii) the recent progress in generative models, that enable new approaches. The focus on historical applications for the first 30 months was planned in the proposal, but is also due to other factors: (i) the success of a method based on explicit visual structures for historical applications, which was beyond expectations ; (ii) very strong interest from, collaborations with and requests from historians; (iii) challenges encountered with Earth images. Thus, we plan to continue this focus for the rest of the project
The work led to the following publications.

[1] Learning Co-segmentation by Segment Swapping for Retrieval and Discovery X. Shen, A. Efros, A. Joulin, M. Aubry CVPR 2022 workshops
[2] Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans” R. Loiseau, E. Vincent, M. Aubry, L. Landrieu CVPR 2024
[3] The Learnable Typewriter A Generative Approach to Text Line Analysis Y. Siglidis, N. Gonthier, J. Gaubil, T. Monnier, M. Aubry ICDAR 2024, IAPR best paper award
[4] Diffusion Models as Data Mining Tools I. Siglidis, A. Holynski, A. Efros, M. Aubry, S. Ginosar ECCV 2024
[5] Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach E. Vincent, J. Ponce, M. Aubry, IGARSS 2025
[6] Historical Astronomical Diagrams Decomposition in Geometric Primitives S. Kalleli, S. Trigg, S. Albouy, M. Husson, M Aubry ICDAR 2024
[7] Historical Printed Ornaments: Dataset and Tasks S. K. Chaki *, S. Baltaci *, E. Vincent, R. Emonet, F. Vial-Bonacci, C. Bahier-Porte, M. Aubry, T. Fournel ICDAR 2024
[8] An Interpretable Deep Learning Approach for Morphological Script Type Analysis M. Vlachou Efstathiou, I. Siglidis, D. Stutzmann, M. Aubry IWCP 2024
[9] General Detection-based Text Line Recognition R. Baena, S. Kalleli, M. Aubry NeurIPS 2024
[10] Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift E. Vincent, J. Ponce, M. Aubry arXiv 2024
[11] Detecting Looted Archaeological Sites from Satellite Image Time Series E. Vincent, M. Saroufim, J. Chemla, Y. Ubelmann, P. Marquis, J. Ponce, M. Aubry CVPR EarthVision workshop 2025, Best Student Paper Award
[12] CoDeX: Combining Domain Expertise for Spatial Generalization in Satellite Image Analysis A. Kuriyal, E. Vincent, M. Aubry, L. Landrieu CVPR EarthVision workshop 2025
[13] Segmenting France Across Four Centuries M. Lopez-Rauhut, H. Zhou, M. Aubry, L. Landrieu ICDAR 2025

Each of the publications significantly advance state-of-the-art. In particular:
1. Recognized by a best paper award, we see the learnable typewriter [3] as a major achievement. It was one of the flagship applications targeted by the proposal, but its success among paleographers is even higher than expected.
2. The motivation for our work on text-line recognition [9] was to better identify relevant text regions to model with visual structures. While it does not directly deal with visual structure, it led us to completely revisit approaches to text line recognition, which was not targeted in the proposal, and unexpectedly led us to improving state-of-the-art on Chinese text recognition and winning ICDAR competitions on cipher recognition.
3. The use of 3D data for Earth images [2] was not planned, and neither was our success with them. While this paper is a proof of concept, it shows impressively high-quality results, in particular on forest areas, enabling to consider applications to biomass estimation that were not initially considered in the proposal but that we are now exploring.
4. We demonstrated diffusion models to mine discriminative visual elements [4] on a much wider diversity of datasets than any previous work demonstrated. This contribution was completely unforeseen at the time of the proposal, since it was written before the development of diffusion models.

Moreover, we recently released AIKON, a web platform targeted toward historians https://aikon-platform.github.io/

Periodic Reporting for period 1 - DISCOVER (Discovering and Analyzing Visual Structures)

Descargar Descargar el contenido de la página