Final Report Summary - NMR-SBA (AUTOMATED NMR STRUCTURE-BASED ASSIGNMENTS) Protein structure determination is of utmost importance in order to understand the proteins function and develop new drugs against them. One of the major experimental techniques to determine protein structure in solution is Nuclear Magnetic Resonance (NMR) Spectroscopy. To combat major diseases such as cancer, cardiovascular diseases and infectious diseases, NMR analysis of several large proteins is needed. This requires a suite of tools that process NMR spectra. One of the bottlenecks in analysis of NMR spectra is the assignment problem. This problem can be made easier by the use of a template protein that is homologous to the target protein. The corresponding problem is called structure-based assignments.The goals of this project were to develop an automated NMR structure-based assignment software to analyze NMR data. In order to reach this goal, several milestones had to be accomplished. These can be summarized as follows:1- The development of algorithms to tackle proteins of various sizes,2- The implementation of these algorithms, along with the various tests to determine the best parameters to employ in these algorithms,3- The incorporation of new types of NMR data into the software in order to increase the usability of the software,4- The extraction of more data from the existing data sources in order to enhance the performance of the algorithm,5- The test of these algorithms on synthetic and real data, including large proteins,6- The release of the source code of the algorithms and data on the internet.The main results of the project are three algorithms and corresponding software to tackle the assignment of proteins in the presence of a template. These algorithms are based on an existing framework called Nuclear Vector Replacement (NVR) developed in Prof. Bruce Donald's laboratory at Duke University. The newly developed software are called NVR-BIP, NVR-TS and NVR-ACO. The first of these approaches, NVR-BIP, employs an optimization algorithm called binary integer programming (BIP) to find the exact solution of the minimization problem which is at the crux of the assignment problem subject to binary constraints. This approach augmented the assignment accuracy of the NVR tool for new proteins significantly. However, it was unable to solve the assignments for large proteins due to the enormous number of variables and constraints that arise for large proteins. For these cases we have developed two approximate ("metaheuristics" based) algorithms, one based on tabu search (TS) and the other based on ant colony optimization (ACO). These approaches were able to successfully solve the assignments for large proteins. We have also incorporated new sources of data into the algorithm and extracted more information from the data already incorporated into the sıoftware. These data types include RDCs for various bond vectors and 4D-NOESY data. We have also extracted the NOE upperbounds and NOE thresholds automatically from the data, thus automating the algorithm. Previous incarnations of the algorithm needed these parameters to be set manually, thus making it difficult to run the software on novel proteins.The data we have tested our software came from public databases, as well as collaborators, including Dr. Yang from the National University of Singapore and Dr. Donald from Duke University. In the case when a particular data type was not available, we also used synthetic data. Towards the end of the project our group won a bilateral grant with the research group of Dr. Guittet from CNRS to apply our software to the data obtained in Dr. Guittet's laboratory. The CNRS group is involved in a multimillion euro project spanning a decade and that is composed of high-profile biologists, chemists and physico-chemists joining their forces to collectively explore new therapeutic avenues. Our software will contribute to the study of large proteins for the development of new antibiotics in collaboration with the group of Dr. Guittet, against antibiotic-resistant gram negative bacteria.The project website is at http://compbio.sehir.edu.tr. The source code has been deposited to github and is available at https://github.com/msapaydin/NVR. The contact person for the project is Dr. Mehmet Serkan Apaydın (email: firstname.lastname@example.org).