Periodic Reporting for period 1 - DISCOVER (Discovering and Analyzing Visual Structures)
Período documentado: 2023-06-01 hasta 2025-11-30
The concept we will develop is the one of visual structures. Their key features will be their interpretability, in terms of correspondences, deformations, or properties of the observed images, and their ability to incorporate prior knowledge about the data and expert feedback. We will explore two complementary approaches to formally define and identify visual structures: one based on analyzing correspondences, the other on learning interpretable image models.
We will develop visual structures in two domains: historical documents and Earth imagery. For example, from temporal series of multispectral Earth images, we will identify types of moving objects, areas with different types of vegetation or constructions, and model the evolution of their characteristics, which may correspond to changes in their activity or life cycle. Ultimately, experts will still be needed to select relevant visual structures and perform analysis, requiring to work closely with them, to design relevant features in our algorithms and adapted interfaces for interaction.
The work led to the following publications.
[1] Learning Co-segmentation by Segment Swapping for Retrieval and Discovery X. Shen, A. Efros, A. Joulin, M. Aubry CVPR 2022 workshops
[2] Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans” R. Loiseau, E. Vincent, M. Aubry, L. Landrieu CVPR 2024
[3] The Learnable Typewriter A Generative Approach to Text Line Analysis Y. Siglidis, N. Gonthier, J. Gaubil, T. Monnier, M. Aubry ICDAR 2024, IAPR best paper award
[4] Diffusion Models as Data Mining Tools I. Siglidis, A. Holynski, A. Efros, M. Aubry, S. Ginosar ECCV 2024
[5] Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach E. Vincent, J. Ponce, M. Aubry, IGARSS 2025
[6] Historical Astronomical Diagrams Decomposition in Geometric Primitives S. Kalleli, S. Trigg, S. Albouy, M. Husson, M Aubry ICDAR 2024
[7] Historical Printed Ornaments: Dataset and Tasks S. K. Chaki *, S. Baltaci *, E. Vincent, R. Emonet, F. Vial-Bonacci, C. Bahier-Porte, M. Aubry, T. Fournel ICDAR 2024
[8] An Interpretable Deep Learning Approach for Morphological Script Type Analysis M. Vlachou Efstathiou, I. Siglidis, D. Stutzmann, M. Aubry IWCP 2024
[9] General Detection-based Text Line Recognition R. Baena, S. Kalleli, M. Aubry NeurIPS 2024
[10] Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift E. Vincent, J. Ponce, M. Aubry arXiv 2024
[11] Detecting Looted Archaeological Sites from Satellite Image Time Series E. Vincent, M. Saroufim, J. Chemla, Y. Ubelmann, P. Marquis, J. Ponce, M. Aubry CVPR EarthVision workshop 2025, Best Student Paper Award
[12] CoDeX: Combining Domain Expertise for Spatial Generalization in Satellite Image Analysis A. Kuriyal, E. Vincent, M. Aubry, L. Landrieu CVPR EarthVision workshop 2025
[13] Segmenting France Across Four Centuries M. Lopez-Rauhut, H. Zhou, M. Aubry, L. Landrieu ICDAR 2025
1. Recognized by a best paper award, we see the learnable typewriter [3] as a major achievement. It was one of the flagship applications targeted by the proposal, but its success among paleographers is even higher than expected.
2. The motivation for our work on text-line recognition [9] was to better identify relevant text regions to model with visual structures. While it does not directly deal with visual structure, it led us to completely revisit approaches to text line recognition, which was not targeted in the proposal, and unexpectedly led us to improving state-of-the-art on Chinese text recognition and winning ICDAR competitions on cipher recognition.
3. The use of 3D data for Earth images [2] was not planned, and neither was our success with them. While this paper is a proof of concept, it shows impressively high-quality results, in particular on forest areas, enabling to consider applications to biomass estimation that were not initially considered in the proposal but that we are now exploring.
4. We demonstrated diffusion models to mine discriminative visual elements [4] on a much wider diversity of datasets than any previous work demonstrated. This contribution was completely unforeseen at the time of the proposal, since it was written before the development of diffusion models.
Moreover, we recently released AIKON, a web platform targeted toward historians https://aikon-platform.github.io/(se abrirá en una nueva ventana)