Skip to main content

Harnessing the power of bioinformatic analysis to improve genetic selection for fertility in dairy cows

Periodic Reporting for period 1 - Bioinformatics4Breeding (Harnessing the power of bioinformatic analysis to improve genetic selection for fertility in dairy cows)

Reporting period: 2016-05-01 to 2018-04-30

The future of a sustainable dairy industry is dependent on increased efficiency of production per cow. Economic analysis shows that profitability of dairy enterprises increases with increased longevity and fertility, however diseases as well as adaptation to different environmental conditions are heavily affecting the average life span in cattle. Dealing with these interlinked problems requires genetic selection for more fertile and robust and so long-lasting cows.
The first objective was for the fellow to obtain intensive training on using diverse bioinformatic methodologies (ranging from genome alignments to detection of evolutionary conserved elements) enabling two main comparative genomics projects to be carried out: i) enrichment for conserved non-coding elements and transcription factor binding sites in ruminant genomes, ii) whole genome resequencing of two highly adapted to harsh environment but phylogenetically distinct cattle breeds.
The aim, in the first project, was to explore the potential role of conserved non-coding elements in ruminants’ evolution, gene regulation, and formation of phenotypic traits. Genome sequence comparisons of transcription factor binding sites among several mammalian species was performed, and several ruminants-specific motifs were found to be enriched in regulatory domain of genes involved in reproduction.
In the second project, genomes from two highly adapted to harsh (cold) environment and resistant to various pathogens cattle breeds were compared to each other and other breed genomes . Whole genome resequencing of Yakut and Kholmogory native cattle breeds was performed and contrasted to Holstein and Hanwoo breeds in order to identify signatures of selection in genomic regions potentially responsible for adaptation to harsh climate as well as disease resistance. Detection of genomic regions under selection pinpointed potential causative SNPs which might be used further as markers to select for adaptation to harsh environments. This can be used to improve other European breeds to make them adapted to the “fast acting” climate change, as well as disease resistance, leading to longer lasting cows.
Objective 1: Bioinformatics training

During the first year the fellow has been learning both bash and Python programming languages, and she obtained training on using bioinformatic tools ranging from genome alignments to detection of evolutionary conserved elements. This Marie Curie fellowship has allowed the fellow to acquire new skills and become confident in using a diverse range of bioinformatic tools and she also had the opportunity to analyse big data sets and perform various analyses, such as genome alignments (NGS – next generation sequencing), SNP detection and annotation.

Objective 2: Data preparation

The fellow prepared databases of phenotypic and genomic data from different cattle populations for downstream analyses like detection of conserved elements, multiple genome alignment, SNP detection. She became confident in accessing data from different repositories (NCBI, UCSC, Ensembl), and in modifying the data format according to specific software requirements.

Objective 3: Data Analysis & Comparative genomics

i) enrichment for conserved non-coding elements and transcription factor binding sites in ruminant genomes.

A total of 28 mammalian species, obtained from public repositories, were selected and divided into three clades: Ruminants (10 species), Cetartiodactyls (5 extra species), and other Mammals (13 extra species). Clade-specific conserved non coding elements (CNEs) were defined from the multiple genome alignment. A transcription factor binding sites (TFBSs) scan was performed predicting over 900 million of TFBSs along the cattle genome. Analysis of their potential involvement in the gene regulation was performed focussing on TFBSs predicted to be in clade-specific gene regulatory domains (nearby genes) and overlapping CNEs. Comparisons of TFBSs overlapping CNEs found in regulatory domains among the three clades highlighted 56, 15, and 5 unique motifs in mammals, cetartiodactyls, and ruminants, respectively. GO analysis of clade-specific regulatory domains confirmed the ancestral nature of the mammalian CNEs and pinpointed few GO terms enriched only in the ruminants clade. Our results demonstrate that gene regulation in the ruminant clade has been changed compared to other mammals due to change of TFBSs pattern including those TFBSs that were formed in clade-specific CNEs. Expression data on additional individuals and species are indeed needed in order to better understand and explore the potential role of clade-specific CNEs and TFBSs in gene regulation and reveal specific traits affected.

ii) whole genome resequencing of two highly adapted to harsh environment but phylogenetically distinct cattle breeds.

A total of 20 individuals of two cattle breeds adapted to cold environment but phylogenetically distinct (Yakut and Kholmogory) were resequenced. The total number of variants identified in these genomes were >25 million. Population history analysis have revealed that the ancestral populations split around 100 years ago in two with the effective population size of about 3000 and 8000 in Yakut and Kholmogory, respectively. In an attempt to identify genomic region under selection two tools (HapFLK and DCMS) using different models for signatures of selection detection were used. Their combination will enable a higher power in detecting genomic regions under selection. Putative candidate genes and causative variants present in selected regions might be further utilised to predict phenotypic trait potentially associated with adaptation to harsh environment and disease resistance.

Objective 4: Dissemination and public engagement

The researched attended ISAG conference on 2017 with both oral and poster presentation: Transcription factor binding sites enrichment in ruminant and cetartiodactyl specific conserved non-coding elements. Laura Buggiotti*, Marta Farrè, and Denis Larkin, Royal Veterinary College, London, United Kingdom. Proceedings of the 36th International Conference on Animal Genetics, Dublin, Ireland (2017). The genomes sequenced as part of this project will be shared with the 1000 bull genome project ( to ensure their use by the wider international community in searches for QTLs, QTNs, and mendelian trait variants. The raw data will also be submitted to NCBI and EBI nucleotide achieves for easy access.
The fellow learnt state-of-the-art techniques in genetic association analysis and comparative molecular genomics becoming confident in using a diverse range of bioinformatic tools. She, indeed, acquired the knowledge to independently carry out two main projects from this Marie Curie action. The results of one project have been presented in an international conference as both oral and poster presentation and it is under preparation for submission, while the other needs some analysis to be finalized and soon to be under preparation for publication. Sharing of the data with the international community will have a potential effect of the resources built during the course of this fellowship on multiple projects in different countries including the EU.