NEMESIS is designed to evolve in three main phases; this report covers Phase I (kick-off), which evolved during the first year of the project. Beyond science, Phase I included setting up the team and all related infrastructure/channels that enable a close collaboration of the team members. Moreover it involved establishing the major channels to communicate the project activities and to disseminate its results, but also reach a wider network of collaborators which would help accelerate NEMESIS objectives and their reach to the wider community. To this end, Audard & Dionatos got approved a proposal to form an ISSI team, bringing experts from diverse disciplines in astrophysics, astro-informatics and machine learning. The major objectives of the team are to assist defining (i) best machine learning methods and (ii) most descriptive datasets toward a new YSO classification.
Being an important pillar of the project, data compilation was initiated immediately. At a first stage catalogued data for nearby star-forming regions were retrieved. For young stellar objects, infrared wavelengths are particularly important, therefore data from space-borne infrared facilities (e.g. Herschel, Spitzer, AKARI, WISE) were given a priority. Nonetheless, data spanning all over the electromagnetic spectrum either from space (e.g. Hubble, XMM-Newton, Chandra) or ground-based facilities (e.g. ALMA, APEX, JCMT, 2MASS etc) can provide important information on the evolution of YSOs and were therefore retrieved. Part of the data compilation is based on reduced/published data which was retrieved from the literature and/or databases, while data of specific interest are being freshly reduced by the NEMESIS team.
Aiming to accelerate the production of early results, we prioritised the data compilation for a single star-forming region: Orion. The selection was based on both the number of young stellar sources, with Orion being the largest nearby star-forming region, but also on the number of available data, since Orion is one of the best studied star-forming regions. The Orion data compilation allowed us to perform a number of test different machine learning methods on the actual data and evaluate their performance.