Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Who is In, and Who is Not? Determining the Gaia Survey Selection Function

Periodic Reporting for period 3 - GaiaUnlimited (Who is In, and Who is Not? Determining the Gaia Survey Selection Function)

Reporting period: 2023-09-01 to 2024-09-30

The European Space Agency's Gaia mission is the most successful European space mission as measured by the rate of publications that use its data. The astrometric, photometric, and radial velocity catalogues provided through the third Gaia data release in June 2022 are the standard in fundamental astronomical data, with much more and much richer data appearing in future Gaia data releases. However the full transformative potential of the Gaia mission was not being realized because the astronomical scientific community lacked a detailed description of the survey selection function: the probability that an astronomical object of certain properties enters the Gaia catalogue (or not). Without the selection function it is impossible to obtain insights in fundamental physics and astrophysics from modeling the objects' population properties, based on a set of catalogue entries. An explanation of selection functions for non-experts is provided here: https://gaia-unlimited.org/what-is-a-selection-function/(opens in new window)

The GaiaUnlimited project was dedicated to research, develop, and implement the Gaia survey selection function, as well selection functions for Gaia combined with other surveys. The basic mathematics and the use of selection functions in statistical data analysis was explained in a peer reviewed paper. This was the basis for the selection function tools that were provided to the community through the project's Github pages (https://github.com/gaia-unlimited(opens in new window)). The scientific community is now using these tools to enhance the analyses of Gaia (and other) data. This is demonstrated by the well over 100 citations to the GaiaUnlimited papers. The effort to produce the selection function for the next Gaia data release will continue within the Gaia Data Processing and Analysis Consortium, building on the knowledge and expertise developed in GaiaUnlimited.

GaiaUnlimited enhanced the expertise in understanding in detail how the interpretation of large amounts of collected data is affected by selection biases and how this can be addressed through a proper description of the way the data was collected. This benefits any data science application and some of the researchers funded through this proposal have opted for a future career in European companies of which the business contains a large data science component.
An overview paper on the selection function was published, which describes the motivation for accounting for survey selection functions and explains the basic mathematical principles.

An novel approach to the overall survey selection was developed which is based on an empirical indicator for Gaia's completeness in a certain field on the sky.

The Gaia spectroscopic (radial velocity spectrograph) and general subset selection functions were derived.

A worked example was created of the construction of the selection function for the combination of Gaia and the APOGEE spectroscopic survey. This joint sample was used to study the density profile of stellar population of similar chemical compositions in the Milky Way's disk.

A worked example was created of the construction of the selection function for a combination of Gaia and the AllWISE photometric photometric survey. The joint sample was used to study the overall structure of the Milky Way’s disk revealing for the first time hints of spiral arm structure in an older stellar population.

A worked example was created of the construction of the selection function for a specialized subset of the Gaia data (metal poor Red Giant Branch stars). This sample was used to study the spatial distribution of the Aurora population, one of the oldest components of the Milky Way,

The selection function for unresolved binaries selected on the basis of the RUWE data quality indicator was derived.

Delivery of the GaiaUnlimited selection function tools. https://github.com/gaia-unlimited/gaiaunlimited(opens in new window) This includes the code to reproduce the above four complex selection function examples.

Delivery of the documentation corresponding to the tools. https://gaiaunlimited.readthedocs.io/en/latest/(opens in new window)

Three community workshops were organized. Feedback received during the first and second community workshop was incorporated into the GaiaUnlimited tools.

Peer reviewed papers describing the results (https://gaia-unlimited.org/papers/(opens in new window)).
- A paper with a clear didactic explanation of the need for selection functions and how to mathematically formulate the problem of accounting for selection effects. The paper includes a reproducible demonstration for the example of the white dwarf colour-luminosity function.

- Publication of open source tools and documentation for incorporating the Gaia selection function into scientific analyses.

The following were in included in the GaiaUnlimited tools available for use by scientists:

- Novel approach to determining the Gaia survey selection function, based on an empirical indicator of catalogue completeness.

- Modified statistical modelling approach that better accounts for the Gaia scanning law and crowded field effects.

- A generic method for constructing the selection function of subsets of the Gaia catalogue, selected entirely based on criteria applied to catalogue fields.

- A hierarchical version (varying levels of resolution on the sky) of the selection function of subsets of the Gaia catalogue.

- A worked example of the selection function for the combination of Gaia and a large spectroscopic survey

- A worked example of the selection function for a combination of Gaia and a large photometric survey.

- A worked example of the selection function for a specialized subset of the Gaia data.

- Selection function for unresolved binaries selected on the basis of data quality indicators.

Impacts:

- Enhanced the quality and reproducibility the scientific exploitation of data from a flagship European space mission.
- increased number of scientific publications accounting for selection effects in the analysis.
- Enhanced the collaboration and broadened the expertise of the partners involved.
- The partner institutes benefited from having Gaia and survey selection function experts in house, expertise which was turned into higher quality scientific exploitation of astronomical data and more publications (based on data from European space missions and surveys).
- The European scientific community gained a competitive advantage in the exploitation of space mission data with respect to their peers worldwide through the expertise and the tools built within GaiaUnlimited. The expertise was transferred to the community through the workshops that GaiaUnlimited organized, and through presentations at relevant scientific conferences.
- The expertise on the Gaia selection function was transferred to the Gaia Data Processing and Analysis Consortium so that the selection function will become a standard data product in future Gaia data releases.
- The expertise gained in this project can be transferred to other surveys. Examples of European projects that stand to benefit are the 4MOST and WEAVE spectroscopic surveys and the Euclid and Plato space missions.
- The scientific community world-wide benefits from the higher quality scientific analyses that are possible with the public availability of the Gaia survey selection function.
Illustration of the effect of sample selection on a simulated spherical halo population
Illustration of the selection function for Red Clump stars.
Emprical survey completes at G=21 as a function ok sky position
Illustration of the accounting for the selection function in the white colour-luminosity distributio
My booklet 0 0