European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Understanding the Language of Life: Identifying and Characterizing the Language Units in Protein Sequences

Descripción del proyecto

Comprensión de las secuencias de las proteínas

Las proteínas son fundamentales para la vida. Se pueden representarse textualmente a partir de sus secuencias de aminoácidos. Si bien el «lenguaje de la vida» sigue sin conocerse del todo, el procesamiento del lenguaje natural ha posibilitado grandes avances en el estudio de las proteínas. El objetivo del proyecto LifeLU, financiado por el Consejo Europeo de Investigación, es hacer avanzar la investigación sobre el procesamiento del lenguaje y ahondar en nuevas fronteras de la comprensión del lenguaje de la vida. A tal fin, se concebirán métodos innovadores para identificar las unidades lingüísticas dentro del lenguaje de la vida y examinar las características y la variabilidad entre distintas especies. Además, se pretende desarrollar técnicas pioneras para identificar y analizar las funciones de estas unidades lingüísticas. El equipo de LifeLU trabajará en pos de descifrar el lenguaje de la vida, con el objetivo último de desarrollar planteamientos innovadores para la prevención, el diagnóstico y el tratamiento de enfermedades.

Objetivo

"Proteins play a key role in biological processes that govern and maintain life. Although they are three-dimensional entities, they can be represented in textual form as sequences of amino acids that largely determine their structures and functions. By analogy with natural (human) languages, we can consider proteins as written with a language, which we refer to in this proposal as the ""language of life"". Natural languages can be read and understood by humans. However, we cannot yet understand the language of life. We do not even know what the vocabulary is, i.e. what the basic language units are (analogous to words in human languages). Textual representation of proteins has enabled the application of natural language processing (NLP) techniques to the study of proteins, and breakthrough results have been achieved in various downstream tasks such as protein structure prediction. However, these efforts remain only at the ""processing level"" of the language of life. The main goal of this project is to go beyond the level of language processing and open new research horizons for understanding the language of life. Using my expertise in NLP and bioinformatics, I will pursue the following objectives: (i) develop innovative methods to determine the language units (i.e. the vocabulary) of the language of life; (ii) identify the characteristics of this language as well as its variability among species; (iii) develop novel methods to identify and characterize the functions of the language units. This research will lay the foundation for a new field of research, molecular language understanding, which aims to develop methods for understanding the messages encoded in molecular sequences. The ultimate goal of this project is to decipher the language of life, which will lead to groundbreaking consequences for understanding life and health, and will shed light to the development of novel prevention, diagnosis, and treatment strategies for diseases."

Régimen de financiación

HORIZON-ERC - HORIZON ERC Grants

Institución de acogida

BOGAZICI UNIVERSITESI
Aportación neta de la UEn
€ 1 982 800,00
Dirección
BEBEK
34342 Istanbul
Turquía

Ver en el mapa

Región
İstanbul İstanbul İstanbul
Tipo de actividad
Higher or Secondary Education Establishments
Enlaces
Coste total
€ 1 982 800,00

Beneficiarios (1)