Making proteomics data publicly accessible

An EU-funded initiative has provided a common framework and the infrastructure needed for the cooperation of mass spectroscopy-based proteomics resources.

Digital Economy

Before the project started in 2011, among the existing proteomics repositories, the Proteomics Identifications PRIDE(opens in new window) database and Peptide Atlas(opens in new window) had the broadest target audience. On the one hand, PRIDE stores data as initially analysed by researchers. On the other, data in the Peptide Atlas are reprocessed with a focus on low protein false discovery rates. The EU-funded project 'International data exchange and data representation standards for proteomics' (PROTEOMEXCHANGE)(opens in new window) was initiated to coordinate data deposition and dissemination to such major repositories. The project consortium originally formed in 2006 with the aim to develop infrastructure enabling researchers to submit data in a consistent, harmonised format as well as to access data that were already public. With financial support from the Seventh Framework Programme (FP7), PROTEOMEXCHANGE partners established a standard framework for proteomics data submission and dissemination to PRIDE, Peptide Atlas and other repositories. It also delivers different 'views' of the deposited data, including the raw data and the processed results, all linked by a universally shared identifier. Individual resources can join PROTEOMEXCHANGE by following the membership agreement and implementing the data submission and metadata requirements. Authors can cite the assigned accession number for data sets used in their publications. In addition, a Digital Object Identifier (DOI) is provided for the data sets containing the results in open standard formats. This way, data sets become publishable per se and can be tracked when reported in the scientific literature. Since its introduction in 2012, the number of submissions and downloads through PROTEOMEXCHANGE steadily increased. The upturn is indicative of an ongoing shift in mentality in the field. In the last few years, data sharing is slowly becoming common among proteomics researchers. As the repositories have improved, researchers are keener to submit their data. In 2014, approximately four data set submissions were received daily. The total number of data sets submitted exceeded 1 600, and the volume of downloads reached 150 terabytes. By providing the means of accessing data across the individual repositories, the PROTEOMEXCHANGE framework is expected to maximise their benefit to the scientific community.