Data generated by biomedical research are vast in volume, diverse in content and not available in standardised formats. A rational approach to reorganising subsets of these data into groups, so that data within the same group share common features, is an essential element for the successful mining of knowledge. Clustering is only a starting point in the case of high-dimensional data, where each dimension represents a distinct attribute (variable). Pattern recognition techniques that work well at lower dimensions often perform poorly as the dimensionality of the analysed data increases. EU-funded scientists addressed the challenges of clustering high-dimensional data by focusing on low-dimensional structures that can approximate the given data. Within the PRINHDD (Pattern recognition in high dimensional data) project, they developed new methods of data analysis for species diversity and disease studies, among others. Nearest neighbour methods were proposed for drawing conclusions on spatial patterns. Two frequently studied spatial patterns between different species and their characteristics (sex, livelihood status etc.) are segregation and association. The researchers also tested patterns of reflexivity and cross-species correspondence. Two distance-based segregation indices were used to evaluate the results of disease clustering among subjects from a homogeneous and an inhomogeneous population. The researchers investigated the sensitivity of the size of these tests to the underlying background pattern, the level of clustering and number of clusters. In addition, a new method promises to make more of morphometric information contained in labelled cortical distance mapping data. Pooling and censoring the distance of grey matter voxels from the cerebral cortex surface can reveal differences in planum temporale between schizophrenia and bipolar disorder patients. PRINHDD research has been showcased in 11 papers published in prestigious peer-reviewed journals. During international conferences, the project team had the chance to communicate the results to scientists working in the field of high-dimensional data analysis and pattern recognition. Continuation of the PRINHDD project's work on classification and clustering through new collaborations should see extension both in terms of statistical methods and applications.
Clustering, biomedical, data analysis, high-dimensional data, pattern recognition, PRINHDD