Periodic Reporting for period 1 - GRAPPA (Gestalts Relate Aesthetic Preferences to Perceptual Analysis)
Periodo di rendicontazione: 2022-10-01 al 2025-03-31
The first step in our approach was to develop high-quality datasets of images of paintings and natural scenes with reliable aesthetic ratings by humans (WP1). We have started this work by first reviewing the available datasets. We have conducted an extensive literature search for this and compiled a database of the available datasets, along with their properties, which will be shared with the research community. Because we concluded that most available datasets were still limited (either in size or in quality), especially for images of artworks, we have invested significant time and effort in data collection for aesthetic ratings of images of artworks (and natural scenes in some cases) through large-scale online studies. So far, we have conducted 9 online studies, of which 5 are key ingredients for our model development and 4 are focused on specific aspects that help us to refine our model in future iterations.
The first and largest dataset is called LAPIS (for Leuven Art Personalized Image Set), consisting of 11,723 carefully selected and curated images of paintings across 26 styles and 7 genres, rated on a visual analogue scale (from highly unaesthetic to highly aesthetic) by a total number of 552 raters (for an average of 24 raters per image). Next we have worked on two large datasets focused on central aspects of perceptual organization, symmetry and composition. In the symmetry study 900 participants were asked to indicate image regions containing symmetry for 200 artworks and 200 natural scenes, with 50 raters per image. In the composition study 1364 participants rated 160 natural scenes and 160 artworks on composition, spatial layout, order, complexity, pleasure, and interest. In addition we have also developed targeted datasets to test specific models.
Regarding model development (WP4) and testing (WP5) we have realized major technological achievements by developing and testing models that predict aesthetic preferences of images based on perceptual image properties at all levels, as well as person characteristics. In addition to model training in the studies on symmetry (WP1), we implemented different machine learning models that focus on aesthetic attributes to better understand what image attributes contribute to aesthetic judgements of these images. As part of WP4 we have developed a multi-task convolutional neural network for image aesthetic assessment and we have investigated the Gestalt principle of closure in convolutional neural networks. Additionally we have investigated the impact of local and global data augmentations on artistic image aesthetic assessment. While traditional image data augmentation alter composition, we developed BackFlip, a local image transformation that introduces variations in the data, without affecting the composition and aesthetic qualities. We have also investigated how well large vision-language models can classify key attributes of paintings, including their art style, author and period. Furthermore we have developed an Image Aesthetic Assessment (IAA) model that works for both Generic IAA and Personal IAA, and allows to explain their differences. Finally we are working on a new tokenization approach for vision transformers that preserves composition, high resolution, aspect ratio, and multiscale information in image aesthetic assessment.
For WP2 we have conducted an extensive pilot study with several kind of images and Gestalt-level properties in experts in vision science or art and aesthetics and we are developing an annotation tool that will allow future participants to indicate regions of interest with specific Gestalt-level annotations. For WP3 we have done some experiments with well-controlled stimuli allowing parametric variations of specific Gestalt-level image characteristics and are now focusing more specifically on their relation to individual differences in aesthetics. For WP6 we have finished and published one study and are currently exploring other approaches to generative AI models, based on our extensive review paper. For WP7 we are currently analyzing the data of several experiments to investigate how specific image characteristics affect perceived order and complexity, as well as the resulting aesthetic value. For WP8 we are currently conducting two experiments on the way eye-movements capture the composition of paintings and professional, high-quality photographs.
Second, the novel data augmentation method developed in the BackFlip paper clearly goes beyond the state-of-the-art in data augmentation because it is the only technique that preserves the composition of images.
Third, the new tokenization approach for vision transformers and the new model to integrate GIAA and PIAA clearly go beyond the state-of-the-art in the domain of vision transformers and image aesthetic assessment, respectively.