Periodic Reporting for period 1 - PGEN (Automated evaluation and correction of generation bias in immune receptor repertoires)
Reporting period: 2019-03-01 to 2020-11-30
ERC Recognize led to algorithmic advances in modelling VDJ recombination and subsequent selection of T and B cell receptors that provide useful tools to analyze and compare immune repertoires across time, individuals, and tissues. The goal of PGEN was to make a proof of concept based on the ERC Recognize algorithm and develop a general software that can be used by physicians and biologists without bioinformatic training to analyze immune repertoire data and provide them with statistics that will help them make informed decisions. The outcomes of PGEN are two main software packages: SOS — a web-based interface where users with no coding skills can compute the generation and post-selection probabilities of their sequences, as well as generate batches of synthetic sequences; and pygor a more advanced python package and suite of command-line tools (easily installable in a single command through the Python Package Index system) for evaluating the generation probability of large datasets, calculating new models for new datasets and species, as well as a suite of plotting and analysis commands. Pygor3 provides a python interface to execute and encapsulate V(D)J recombination IGoR input/outputs by using a sqlite3 database that contains input sequences, alignments, model parameters, conditional probabilities of the model Bayes network, best scenarios and generation probabilities in a single database file. Pygor3 also has command line utilities to import/export IGoR generated files to AIRR standard format. Pygor comes complete with a set of tutorials and examples that guide the user from relatively simple tasks such as plotting existing models, evaluating these models for their sequences, to learning a whole new model for new species, with a tool to automatically download species genomic data from the IMGT database. Pygor also ships with pre-existing models for alpha and beta chains of human and mice T-cell receptors and light and heavy chains for human and mice B-cell receptors. SOS can be used on a mobile phone as an app, a feature that was requested by laboratory users who need to know quickly the generation probability of one sequence.