Skip to main content

Pattern Recognition in High Dimensional Data

Final Report Summary - PRINHDD (Pattern Recognition in High Dimensional Data)

This project consisted of two phases. In the outgoing phase of the project, Dr. Ceyhan was a research fellow at the Statistical and Applied Mathematical Sciences Institute (SAMSI) which is located at Research Triangle Park, North Carolina, USA from August 1, 2013 to May 23, 2014. He attended the “Low-dimensional Structure in High-dimensional Systems (LDHD) Program” at SAMSI for a year. In the returning (or incoming) phase, he resumed his position as an Associate Professor at the Department of Mathematics, Koç University (KU) which is located in Istanbul, Turkey. In the returning phase of the project, KU provided a pleasant and active research environment for Dr. Ceyhan who is in the process of submitting the results and continuing the research started during his visit to SAMSI. He taught three classes (in the returning phase), attended various conferences and seminars (in both phases) along the main objectives outlined below.
This project has five main objectives:
(I) To conduct research towards the research objectives listed in the proposal (Part B1), which are:
(R1) Designing a novel graph-based recognition method,
(R2) Designing new methods where dimension reduction and pattern recognition are performed simultaneously,
(R3) Stochastic subspace search, and
(R4) Designing new hybrid recognition methods.
(II) To acquire knowledge about the current topics on high dimensional data analysis (especially pattern recognition) while visiting SAMSI,
(III) To realize the training objectives listed in the proposal (Part B2),
(IV) To start new collaborations with colleagues located outside of EU, and
(V) To disseminate the acquired expertise in Turkey and other European countries upon returning to the host organization (KU) in Istanbul.
The main objectives (II)-(IV) are mostly relevant to the outgoing phase, whereas (I) is relevant to both phases, and (V) is mostly relevant to the incoming phase. In the incoming phase, he focused on dissemination of the knowledge acquired and performing research on the previous and newly adopted research topics. In particular, he conducted research activities in areas of “classification and clustering”, “statistical inference for low-dimensional structures”, “medical image analysis”, “optimal obstacle placement”, and “modeling infectious diseases”.
One of the goals of Dr. Ceyhan’s visit to SAMSI was to bring him up to date with the cutting edge methods on high dimensional data analysis and pattern recognition and promote the exchange of ideas to generate and initiate new research with the regular members/visitors of SAMSI. Among the topics of the LDHD Program, Dr. Ceyhan’s project was most relevant to “classification and clustering” and “statistical inference for low-dimensional structures”. He attended nine major workshops, several lectures and seminars which were very informative and useful, and he participated in working groups, discussions, and graduate courses at SAMSI. Furthermore, Dr. Ceyhan started new research collaborations with colleagues from USA, China and Canada.
The general themes of Dr. Ceyhan’s proposed project was (i) to attain the cutting edge tools for pattern recognition of high dimensional data by attending the LDHD Program at SAMSI and (ii) to develop new methods to address the challenges of high dimensional recognition. Theme (i) pertains to the outgoing phase while theme (ii) pertains to both phases. The project consisted of several research and training objectives and tasks listed in his proposal (Parts B1 and B2, respectively). In both phases, Dr. Ceyhan worked along these objectives, which are mostly related to recognition of high dimensional data and dimension reduction. In both phases, Dr. Ceyhan worked mostly on the research objectives (R1) and (R4). He made significant progress in research objective (R1) by investigating the theoretical foundations of the newly developed method in collaboration with some of his colleagues. For example, his joint work with researchers from USA and China yielded an article which is to be revised and resubmitted. Moreover, his graph-based method which was designed in the returning phase yielded a draft which is almost at the stage of submission; the suggested method is robust to nonstandard data and imbalance in the class sizes and works well in higher dimensions and in data condensing. In objective (R4), the hybrid methods he is introducing uses different classifiers (e.g. the graph-based classifier) in part of the data support while some other classifier(s) such as k-NN classifier in other parts.
As for the clustering aspect of pattern recognition, he has conducted extensive research on clustering of data in lower dimensions (especially in R^2), and the results are published in four journal articles (articles (A1), (A2), (A4) and (A11) in the “Dissemination Activities” Section). In particular, article (A2) is on the clustering of (possibly infectious) diseases and is published in the prestigious journal “Statistics in Medicine”. Articles (A1) and (A11) are on spatial clustering of points from multiple classes in R^2, articles B(3) and C(3) are introducing new clustering patterns, namely, “reflexivity” and “species correspondence”, respectively, and proposing methods to test/detect these multi-class patterns. Among these journals, SERRA, Environmental and Ecological Statistics and SORT are high quality journals. His graph-based research resulted in five articles (see articles A(3), A(10), B(1), C(2), and C(4) in the Dissemination Activities). These articles are mostly on the theory of the graph-based clustering he has introduced. Among these journals, TEST, Statistics, REVSTAT and Statistical Methodology are reputed journals. Furthermore, he has also worked on high dimensional imaging data towards an inferential purpose, and the results are published in five articles as well (articles (A5)-(A7), A(9) and C(1) in the Dissemination Activities). Among these articles, article (A5) is introducing a novel method of censoring the localized morphometric measures of brain tissues, and is published in the promising journal “Frontiers in Neurology” and C(1) is proposing a method for testing/detecting morphometric variability in the brain tissues per diagnosis. The other articles are on applications of his statistical methods for imaging data.
Although he has initiated work on the research objectives (R2) and (R3), these efforts did not yield any publishable results yet. The project goals and implementation had undergone mild modifications. His clustering methods are performing very well in low(er) dimensions, and he has devised various inferential tools for analyzing high dimensional (imaging) data sets. These are highly relevant to the main purpose of the project. Furthermore, he has started working at the other end of pattern recognition, namely, supervised learning or classification using graph theoretical invariants (one article is almost ready for submission and another one is in preparation).
This project supported Dr. Ceyhan’s career by enabling him to: (1) take a leading role in international research groups (e.g. participates in three working groups in the Global Young Academy - GYA (http://globalyoungacademy.net)); (2) gain more experience in journal publication and peer-reviewing (he has published 11 articles during the project); (3) increase the number and scope of international research networks and collaborations, (4) build a solid theoretical and methodological ground to form in the return institution an interdisciplinary and collaborative research group (which currently consists of four PhD students). Thus, the proposed project helped Dr. Ceyhan immensely to realize his full potential as a European researcher and promote his independent research position in a European university.
Dr. Ceyhan attended 13 international conferences, during which he had a chance to meet and communicate with many researchers with some of which he started new research collaborations. In the outgoing phase, he had three presentations (two at a conference, one at a university), since in that phase the main emphasis was on training. But in the incoming phase, one of the main emphases was the dissemination of the research outputs (resulting from both phases). Hence Dr. Ceyhan presented his work on five international conferences during his return period and plans to attend four more (in three of which he will present his research, and in particular in one of them he is an invited session presenter). He also gave some lectures on the topics relevant to this project in order to train graduate students in the field. Along this line he held weekly grad seminars throughout the returning phase. This helped him and his students substantially to progress in their research activities.
Dr. Ceyhan is active in the statistics community that organizes conferences in Europe which brings together researchers who are working in the broad areas of probability and statistics (including high dimensional data analysis and recognition). For example, he is a board member of the International Association for Statistical Computing- European Regional Section (IASC-ERS), which organizes the biennial CompStat meetings. Furthermore, he will organize an Invited Paper Meeting (IPM) on graph-based classifier in the IC-SMHD-2016 conference. Ultimately, Dr. Ceyhan's goal is to excel in his career and to contribute to European competitiveness in mathematical sciences and IOF provided invaluable support to attain this goal.