Skip to main content

Collaborative Oncological Gene-environment Study

Final Report Summary - COGS (Collaborative Oncological Gene-environment Study)

Executive Summary:
The COGS project was initiated with the aim of identifying genetic determinates of breast, ovarian and prostate cancer. Furthermore, we wanted to identify the lifestyle factors that influence the risk of these cancers and explore possible interaction between inheritance and lifestyle factors. We also wanted to identify the genetics influence on type of tumour and prognosis of the diseases.
The generated information could be used for stratifying individuals in to risk of cancer thus enabling more efficient prevention and screening. Finally, we investigated the challenges using risk based strategies in prevention and screening and what ethical, legal and social implications these new approaches will have.
COGS generated the largest data set ever seen in cancer research including 239,832 individuals from 167 research groups from all over the world. We created our own costume made genotyping array, iCOGS, including 211,155 single nucleotide polymorphisms (SNPs). This was done through a lengthy and rigorous procedure where we used a combination of existing knowledge and previously genotyped datasets.
In all, 104 papers have been published based on COGS data. The main finding is the identification of new susceptibility markers for risk of breast, ovarian and prostate cancer. Before COGS started there were 73 markers for risk of these cancers (Table 1). Through COGS an additional 154 SNPs have been identified which means that COGS has contributed 68% of the currently known SNPs that influence the risk of breast, ovarian and prostate cancer.
Table 1. Number of established SNPs pre- and post the iCOGS analyses.

Breast cancer Pre-iCOGS1=27 iCOGS, phase I=45 iCOGS, phase II=47 TOTAL=119
Prostate cancer Pre-iCOGS1=42 iCOGS, phase I=26 iCOGS, phase II=22 TOTAL=90
Ovarian cancer Pre-iCOGS1=4 iCOGS, phase I=8 iCOGS, phase II=6 TOTAL=18

TOTAL Pre-iCOGS1=73 iCOGS, phase I=79 iCOGS, phase II=75 TOTAL=227

The major implication of the COGS results are that we are far better today than 5 years ago when it comes to predicting who will later in life be diagnosed with three major types of cancer. We have shown that this knowledge could be used risk stratification that in turn could improve our abilities to prevent cancer and/or detect the disease in an early stage.

Project Context and Objectives:
The overarching goal of COGS was to identify individuals with an increased risk of breast, ovary and prostate cancer. Furthermore, we wanted to evaluate the effect of inherited genetic variation on tumour characteristics and clinical outcome. We wanted to do this through quantifying the role of genetic and environmental/lifestyle risk in the largest data set ever generated. In all, we included 239,832 individuals from 167 research groups in COGS. We have generated detailed knowledge of the architecture of genetic alterations and their interactions with environmental/lifestyle factors, which will result in much more accurate individual risk prediction and improved intervention strategies than ever before. In COGS we also investigate challenges using risk based strategies in prevention and screening and what ethical, legal and social implications these new approaches will have.
The overall objectives of COGS were;
1. To determine the important common genetic variants that underlie breast, ovarian and prostate cancer risk, and to estimate their effects on risk, individually and in combination.
2. To assess interaction between genetic loci and known or suspected environmental/lifestyle risk factors, i.e. to examine whether environmental/lifestyle risk factors modify genetic susceptibility to breast, ovarian and prostate cancer.
3. To assess whether the association between genetic factors, environmental/lifestyle risk factors and cancer risk is stronger for certain tumour subtypes, and affect clinical outcome.
4. To develop comprehensive risk models including genetic and environmental/lifestyle factors for these cancers, to allow the prediction of breast, ovarian and prostate cancer among individuals in the population at large.
5. To investigate the efficacy and cost-effectiveness of using these risk models in prevention strategies, and the associated organizational, ethical, legal and social implications.

Project Results:
Below follows a short summary of how the project was organised and work performed. Subsequent to that, the main results are found.
The goals of COGS are to determine the individual risk of breast, ovarian and prostate cancer, to assess interaction between genetic loci and known lifestyle risk factors, to estimate whether the association between genetic factors, environmental/lifestyle risk factors and cancer risk is stronger for certain tumour subtypes, and if there is genetic polymorphisms that affect clinical outcome.
As a second step, within COGS comprehensive risk models, including genetic and environmental/lifestyle factors for these cancers will be generated. This will allow the prediction of breast, ovarian and prostate cancer among individuals in the population at large. Included in COGS is the investigation of the efficacy and cost-effectiveness of using these risk models in prevention strategies, and the associated organisational, ethical, legal and social implications.
Central to COGS were four large well-established consortia. The main focus of these consortia has been to identify individuals with an increased risk of being diagnosed with breast, ovarian or prostate cancer.
COGS was organized in seven work packages (WPs). WP1 was the management WP and WP2 had the responsibility for the statistical analyses to identify genetic loci incorporated into the tailor made genotyping chip that is generated within the consortium. The responsibility of WP2 was also to construct a central databases for each cancer to hold individual phenotype and SNP data, including genome wide data from genetic association studies.
The responsibility of WP3 was to gather all samples for the second genotyping step, to isolated, quantify and normalise DNA and to distribute samples to the genotyping centres that will performed the actual analyses. Three centres were carrying out the genotyping. The first year was spent on quantifying and organising the DNA received from groups all over the world. The responsibility of WP3 expired year 3 when genotyping was completed.
The responsibility of WP4 was to prepare for fine mapping of the genetic regions associated with any of the three cancers in search of the casual variants behind the associations. On average ≈ 500 SNPs have been selected in the vicinity of each of the 55 confirmed loci in which the causative mutation for the relevant cancer must lie.
The responsibility of WP5 was to gather lifestyle and clinical data for all the study participants. Questionnaires from consortia members were collected and posted on COGS website. Data dictionary of established and suspected lifestyle/environmental risk factors for each cancer site, including variable definition and coding, was established and placed on COGS website. A database with harmonized epidemiologic data for each of the different cancers and for BRCA1/2 mutation carriers was created. The overall goal of WP5 was to investigate gene-environment interactions and to assess if the effects of established or suspected lifestyle/environmental risk factors for breast, ovarian and prostate cancer, differ in subgroups classified according to genetic susceptibility.
The aim of WP6 was to examine the effects of genetic alterations and risk of specific tumour subtypes and study if the inherited genetic alterations are associated to the risk of dying from the disease. WP6 provided a codebook including standardization of histopathological and immunohistochemistry data from the groups included in COGS. For groups that have access to tumour material there should be an agreement on the tissue microarray scoring system. Finally a database for incorporating tumour characteristics was established.
The objectives of WP7 were to use the results of genetic association studies, gene-environment interaction and individual risk prediction models to evaluate the potential for stratification of population according to individual risk of breast, ovarian and prostate cancer. Secondly, the potential to reduce incidence and mortality from these cancers by risk stratification and targeting of population based screening and prevention programs, including cost-effectiveness analysis, were evaluated. Lastly, the responsibility of WP7 was to identify key organizational, ethical, legal and social issues that would arise from such targeted screening programs and make appropriate policy recommendations. In short, WP7 was preparing for what should come after COGS.

During the first part of the project samples were collected from study groups. Three genotype centers were selected based on capacity and infrastructure to meet the demand of high-throughput genotyping; Cambridge University (prostate cancer), Genome Quebec (ovarian cancer), and Spanish National Cancer Centre (CNIO; breast cancer). Genotyping started in September 2010 and ended in April 2011. In parallel, samples from the three genotyping centers were reorganized, prepared and the excess of sample returned to the original groups.
The total number of samples genotyped was 239,832. The total number of groups contributing samples was 167. As can be seen from the table below the largest number of samples was seen for breast cancer, followed by prostate and ovarian cancer. The number of BRCA1/2 carriers was 24,275.

Samples No of samples No of groups
Breast cancer cases and controls 116,065 52
Prostate cancer cases and controls 51,576 43
Ovarian cancer cases and controls 47,916 27
BRCA1/2 carriers 24,275 45
Total within COGS 239,832 167

The iCOGS is a custom SNP genotyping array that was specifically designed to evaluate genetic variants for association with the risk of with breast, ovarian and prostate cancer. The project was initiated to follow up potential associations found through previous genome-wide association studies in these cancers. iCOGS is an Illumina Custom Infinium array comprising 211,155 SNPs. The first chips was available in September 2010.

Specific objectives are:
1. to replicate potential associations arising through GWAS in these diseases and obtain precise estimates of genotype-specific risk.
2. to evaluate potential loci for subsets of disease (e.g. ER-negative breast cancer).
3. to evaluate variants for associations with disease survival.
4. to conduct dense genotyping of SNPs across known associated regions, to facilitate fine-mapping.
5. to evaluate variants selected as functional candidates, including rare variants in candidate genes (e. g. CHEK2), for association with disease risk.
6. to genotype SNPs associated with related quantitative traits (e.g. age at menarche), to determine whether such SNPs are associated with cancer risk.

The selection of SNPs was made by allocating a proportion of the SNPs to each consortium, in the proportions: breast cancer 10; ovarian cancer 10; prostate cancer 10; BRCA1/2 carriers 7; “common SNPs” 3. Each consortium provided a ranked list of SNPs to be put on iCOGS. SNPs were selected from these lists according to these weightings until the maximum number of SNPs was reached (the maximum number of beadtypes for an Infinium array is 240k, with ambiguous C/G and A/T SNPs requiring two beadtypes and other variants one). Where the same SNP was present on two lists, this counted towards both.
The selection of SNPs within each list was decided for by the analysis group for that consortium. Each of the disease-specific groups included primarily SNPs based on the combined analysis of GWAS, SNPs based on fine-mapping of known susceptibility regions, and functional candidates.
The breast cancer list was based on the combined analysis of 9 breast cancer GWAS. It includes subset analyses based on early onset disease (<40 years and <50 years), and ER-negative disease. There is a component of SNPs based on two-SNP interactions (conditional analyses). It also includes SNPs from additional GWAS in ER-negative and triple-negative breast cancer, and SNPs from an African American GWAS.

Number of SNPs put on iCOGS for breast, ovarian and prostate cancer,
BRCA1/2 carriers and “common SNPs”
Breast Ovarian Prostate Carriers Common
Priority allocation 59,000 58,975 59,000 41,300 17,700
Total SNPs 70,862 59,402 71,556 54,034 24,282
Fine-mapping SNPs* 8,978 14,281 537 3,762
GWAS+ 61,240 57,811 52,227
Candidate+ 2,337 1,287 1,486

The prostate cancer list was based primarily on the combined analysis of four GWAS. It includes subset analyses for aggressive disease and early onset disease (<55 years). Additional GWAS analyses were based on Gleason score and PSA level. There was a component for two-SNP interactions.
The ovarian cancer list included SNPs selected on the basis of the combined analysis of two GWAS. The carrier list was selected primarily from SNPs showing evidence of association in two GWAS, one in BRCA1 carriers and one in BRCA2 carriers.
The “common SNP” list included SNPs from the following categories:
• Published SNPs showing association with any cancer type, at “genome-wide” significance (P<10-7).
• SNPs showing evidence of association from GWAS in other specific cancer types: endometrium, melanoma, lung and testis.
• SNPs across four regions of general interest: 8q24, TERT, CDKN2/B and ESR1.
• SNPs known or suspected to be associated quantitative traits of relevance to cancer:
• Age at menarche, age at menopause
• Height, weight/BMI, waist-hip ratio
• Postmenopausal serum oestradiol level, SHBG level, Testosterone level, other related hormones (FSH, DHEAS, progesterone, free testosterone, IGF, Androglu, Androstenedione)
• Telomere length
• Mammographic breast density
• Endometriosis
• Finger-length
• Male pattern baldness
• Bone density
• Smoking addiction
• Type 2 diabetes
• Pigmented naevus count
• SNPs associated with allelic imbalance.
• Tagging SNPs for DNA repair genes.
• Rare variants in known of suspected susceptibility genes, including BRCA1, BRCA2, CHEK2, BRIP1, PALB2 and RAD51C.
• A set of ancestry informative markers.
• Markers from the Mitochondrial and Y genomes.
Fine mapping SNPs were based on tagging of susceptibility loci for breast, ovarian and prostate cancer known to the consortia at the end of April 2010. This included (in addition to the four “common” regions above) 18 loci for breast cancer, 27 regions for prostate cancer and 6 regions for ovarian cancer. In each case, SNPs were selected using the latest (March 2010) release of the 1000 genomes project, together with HapMap version 3. The aim was include all SNPs that are correlated with the known best hit (r2>0.1) together with a set of SNPs tagging all the remaining SNPs.

The main results of COGS were published simultaneously in thirteen papers April 2013. Nature created a website since one issue of Nature Genetics was devoted to the COGS results ( The webpage contained the five Nature Genetics papers, a list of the additional 8 papers, five primers and two commentaries. For detailed information of the COGS results, please use the website.
A very short version of what knowledge the COGS project has brought is looking at the SNPs that were significantly related to risk o breast, ovarian and prostate cancer before and after the COGS project. At the time of publication of the iCOGS experiment 73 susceptibility loci for breast, ovarian or prostate cancer, had been identified. The thirteen April 2013 publications brought another 79 SNPs, please see table below. Through subsequent analyses, including imputation and fine-mapping an additional 75 loci were identified bringing the total to 227 (papers on the 75 loci detected in COGS phase II are currently prepared). This means that the COGS project has identified 68% of the currently known susceptibility SNPs.
Number of established SNPs pre- and post the iCOGS analyses.

Pre-iCOGS iCOGS, phase I iCOGS, phase II TOTAL
Breast cancer 27 45 47 119
Prostate cancer 42 26 22 90
Ovarian cancer 4 8 6 18
Total 73 79 75 227

For breast cancer the main paper included 41 new susceptibility SNPs [Michailidou 2013]. Given the large number of samples included it was also possible to estimate the number of SNPs that had the ability to influence risk of breast cancer but did not reach statistical significance. At least 1000 more SNPs have the potential to influence the risk of breast cancer. Several breast cancer loci were shown to have effects that are specific, or largely specific, to estrogene specific breast cancer [Garcia-Closas 2013, Warren 2013, Purrington 2013]. Other breast cancer susceptibility SNPs were reported in separate papers [Siddiq 2013]. The genetic make up of Asians and Europeans were contrasted in two papers [Hein 2013, Zheng 2013].
Genetic determinants of breast cancer prognosis were identified and CHEK2*1100delC heterozygosity was found to be associated with early death, breast cancer specific death and increased risk of second cancers [Weischer 2013].
The TERT-locus SNPs and telomere length have been associated with a number of different cancers. Approximately 480 SNPs at the TERT locus were added to the iCOGS chip and analysed in 103,991 breast and 39,774 ovarian cancer cases and controls, and in 11,705 BRCA1 mutation carriers. The associations clustered in three independent peaks [Bojesen, 2013]. One peak was associated with longer telomeres, lower risks for estrogen receptor positive breast cancer and lower risk of breast cancer in BRCA1 mutation carrier. The second peak was associated with longer telomeres and higher risk of low-malignant ovarian cancer. In the third peak 3 SNPs increased the risk of ER-negative breast cancer and risk of breast and ovarian cancer in BRCA1 mutation carriers. It should be underlined that these associations had not been identified if it was not for the size of the COGS project and the fact that the same SNP array was used for patients with both breast and ovarian cancer. Later a study was published showing no association between a GWAS scan for mean telomere length and risk of hormone related cancers [Pooley, 2013].
An alternative way of looking at the inheritance of breast cancer and what part of the familial risk is explained by different alterations is shown in the figure below. Approximately 30% of all breast cancers are assumed to be familial. We estimate that we now have explained ~50% of the heritability of breast cancer. Of the explained 50% approximately half comes from 119 SNPs and the remaining from intermediate and high penetrating alterations. As can be seen the COGS contribution is quite dramatic.

The proportion of the familial risk explained by pre-COGS established SNPs (n=27), Phase I iCOGS SNPs (n=45), Phase II iCOGS SNPs (n=47), intermediate and high penetrating alterations (TP53, PTEN, LKB1, CHEK2, ATM, PALB2, BRIP1, XRCC2, BRCA1/2).

Main results for ovarian cancer revealed 8 SNPs in the first published phase of iCOGS [Pharoah 2013, Permuth-Wey 2013]. Through a genome-wide association study four susceptibility loci for epithelial ovarian cancer were identified [Pharoah 2013]. Another two suggestive loci reached near genome-wide significance. Data from two US and UK genome-wide association studies were pooled and 24,551 SNPs were selected for inclusion on the iCOGS custom genotyping array. The iCOGS array was used in 18,174 individuals with epithelial ovarian cancer (cases) and 26,134 controls from 43 studies from the Ovarian Cancer Association Consortium. We validated the two loci at 3q25 and 17q21 that were previously found to have associations close to genome-wide significance and identified three loci newly associated with risk: two loci associated with all epithelial ovarian cancer subtypes at 8q21 (rs11782652, P = 5.5 × 10(-9)) and 10p12 (rs1243180, P = 1.8 × 10(-8)) and another locus specific to the serous subtype at 17q12 (rs757210, P = 8.1 × 10(-10)).
For main results of prostate cancer the iCOGS array was used in 25,074 prostate cancer cases and 24,272 controls [Eeles 2013]. Twenty-three new prostate cancer susceptibility loci were identified at genome-wide significance (P < 5 × 10(-8)). Adding on the established pre-COGS SNPs and the SNPs from COGS phase II gives 90 susceptibility loci, explaining ∼30% of the familial risk for prostate cancer. On the basis of combined risks conferred by the new and previously known risk loci, the top 1% of the risk distribution has a 4.7-fold higher risk than the average of the population being profiled. These results will facilitate population risk stratification for clinical studies [Eeles 2014].
It was also convincingly demonstrated that some of the breast and ovarian cancer loci were associated influenced the risk of breast and ovarian cancer in BRCA1 and BRCA2 carriers. SNP profiling in carriers can discriminate carriers at higher or lower cancer risk, thus aiding the counselling process [Couch 2013, Gaudet 2013]. For example, based on the joint distribution of the known BRCA1 breast cancer risk-modifying loci, we estimated that the breast cancer lifetime risks for the 5% of BRCA1 carriers at lowest risk are 28%-50% compared to 81%-100% for the 5% at highest risk. Similarly, based on the known ovarian cancer risk-modifying loci, the 5% of BRCA1 carriers at lowest risk have an estimated lifetime risk of developing ovarian cancer of 28% or lower, whereas the 5% at highest risk will have a risk of 63% or higher. We also identified an ovarian cancer risk locus on 4q32.3 that appeared to be specific to BRCA1 carriers [Couch 2013] and a breast cancer locus on 6p24 that appeared to be specific to BRCA2 carriers [Gaudet 2013].
We completed a comprehensive analysis of SNP x SNP interactions for breast cancer; these analyses demonstrated no strong evidence for departure from a multiplicative model, either for established susceptibility loci or for SNPs with weaker evidence for association [Nickels 2013]. These results have important implications for risk prediction using SNPs.
We were able to use the iCOGS data to generate polygenic risk scores (PRS) to classify individual risk. Please see figure below. Women in the highest 1% of the PRS had a three-fold increased risk of developing breast cancer compared with women in the middle quintile (odds ratio (OR)=3·36 (95%CI: 2·95-3·83)). The ORs for ER-positive and ER-negative disease were 3·73 (95%CI: 3·24-4·30) and 2·80 (95%CI: 2·26-3·46), respectively. Lifetime risk of breast cancer for women in the lowest and highest quintiles of the PRS were 5% and 16% for a woman without family history, and 8·6% and 24·4% for a woman with a first-degree family history of breast cancer [paper in manuscript].

Polygenic risk score (PRS) using 77 breast cancer susceptibility SNPs
stratifying women in to risk of breast cancer.

Similar analyses for prostate cancer likewise showed little evidence for departure from a multiplicative model [manuscript in preparation]. Based on the PRS, the estimated increased risk to the top 1% of the population was 30.6 (95% CI 16.4-57.3) fold compared with the bottom 1% of the population, and 4.2 (95% CI 3.2-5.5) fold compared with the average population risk. The estimated absolute risk of prostate cancer by age 85 was 76.7% for a man with a family history of prostate cancer and in the top 1% of the risk distribution, compared with 4.6% for a man in the bottom 1%. The absolute risk for a man in the top 1% of the risk distribution without family history of prostate cancer was 43.8% and 1.9% for a man in the lowest 1%. Genetic risk profiling using SNPs could be useful in defining men at high risk for the disease for targeted prevention and screening programs.

The iCOGS chip included more than 26,750 SNPs specifically chosen for fine scale mapping. They were selected to represent all the known common variants in Europeans, identified via the 1000 Genomes Project ( in 18 loci for breast cancer, 27 for prostate, six for ovarian and a further four implicated in numerous cancers. These SNPs and loci are listed on the COGS website (
Once genotyping and related quality control was completed, SNP datasets from these loci were distributed to teams of researchers who had submitted 'concepts' to analyse specific loci. Each component consortium had in place its own strategy for managing this. At the beginning of this phase of work, no clear consensus on the best analysis strategy existed and so it was sensible for different teams to explore different avenues according to their skills. Consequently, a number of different analyses and manuscript styles have emerged from this data. Some teams have analysed individual loci in great detail and others are generating papers presenting the statistical mapping analysis of a single locus across multiple phenotypes. Often these have backed up this work with functional analysis of the strongest candidate causal variants. Other teams have carried out broader analyses of multiple loci in a single consortium.

Several novel themes have emerged from these various analyses:-
• Few, if any, of the causal variants underpinning GWAS associations are in the coding regions of genes. Thus, rather than altering the amino acid sequences of their target genes, a picture is emerging that most causal variants affect the regulatory regions of their target gene. Such regulatory regions may be physically distant from the targets with other, non-target genes in between. Some of these DNA regulatory regions are recognizable in bioinformatic scans (such as the ENCODE database) others are not and so would have been missed without evidence from the COGS project.
• The majority of the 55 loci, analysed in sufficient detail to-date, contain multiple independent functional variants - the variant underlying the originally detected GWAS hit plus others that can be detected at less stringent levels of significance, once the first high-stringency association has confirmed. That is, we have uncovered significant evidence for further, independently associated common variants within these known cancer loci. These additional functional variants will help increase the proportion of genetic variance explained by our original GWAS findings.
• We have detected associations of specific functional SNPs across multiple phenotypes and multiple consortia - having identical SNP genotype data across different consortia has been one of the great advantages of the COGS study design.

Below are listed the key findings from the first loci we have studied in detail:
• 11q13: We identified three functional variants in regulatory enhancers and silencers of the CCND1 gene. At all three we have found evidence that the breast cancer risk alleles act to reduce levels of cyclin D1 protein. Thus indicating that CCND1 is a tumour suppressor for estrogen receptor positive breast cancer - contrary to the prevailing hypothesis that it was an oncogene [French et al. 2013].
• 2q35: We have found that the breast cancer risk (G)allele of SNP rs4442974, situated in an IGFBP5 enhancer region, acts to reduce expression of its target gene by affecting physical DNA looping interactions over a 350Kb interval. Thus insulin like growth factor binding protein-5 (the IGFBP5 gene product) is a likely tumour suppressor of oestrogen receptor positive breast cancer [Manuscript submitted].
• 5q11: Here we have identified three functional variants in regulatory DNA elements. We have found evidence that all three breast cancer risk alleles act to reduce expression of the target, MAP3K1 gene. These findings support existing reports, from tumour DNA studies, that mitogen-activated protein kinase kinase kinase (the MAP3KI gene product) is a tumour suppressor for breast cancer. [Manuscript in preparation]
• 10q26: We have found three functional variants in regulatory regions within an FGFR2 gene intron, and have circumstantial evidence that the risk alleles of these may act to increase gene expression. Thus fibroblast growth factor receptor-2 (the FGFR2 gene product) is likely to be oncogenic for estrogen receptor positive breast cancer. [Meyer et al. 2013].
• 5p15: Studies across different cancer consortia have found at least four different functional variants, all regulating the TERT gene (encoding subunits of the telomerase enzyme). The minor alleles of SNPs rs10069690 and rs32242652 increase risks of estrogen receptor negative breast- & invasive ovarian cancers but decrease risk of prostate cancer. Various lines of evidence indicate that these minor alleles act to reduce TERT expression and they do not exert their actions through alterations in telomere length (previously the leading hypothesis). Other TERT functional variants, have strong effects on telomere length but only marginal effects on cancer risks [Bojesen et al. 2013, Kote-Jarai et al. 2013].
• 19p13: This locus has been identified independently by the breast, ovarian and BRCA carrier consortia, as associated with risks of both oestrogen receptor negative breast cancer and invasive ovarian cancer. Members of all three consortia are working together to map this locus and a joint collaborative paper is in preparation.
• 6q25: This locus, encompassing ESR1 (encoding the estrogen receptor), is being worked on jointly by members of the breast, ovarian and BRCA carrier consortia. A collaborative paper describing the mapping of multiple phenotypes is in preparation.
• Our experience in analysing the COGS mapping data lead to our being commissioned by the American Journal of Human Genetics to write a review article [Edwards et al. 2013]. This paper was among the 10 top cited papers in AHG 2012/13.
Findings from COGS have thus directly refuted several hypotheses regarding the lack of utility of GWAS, the technique underpinning all COGS studies. It has for example been argued that GWAS hits are likely to be false positives since they are not ‘in genes’ and that multiple different functional variants would be required to confirm a locus’ direct involvement in causing a phenotype.
An additional previous criticism of GWAS, is that it left unexplained much of the known heritability of most phenotypes (the so called “missing heritability” problem). A consistent finding of our early mapping projects has been that a single GWAS tag SNP can identify multiple independent causative variants within a given locus. Each new SNP explaining a further proportion of the “missing heritability” and thus going some way to addressing this issue. Another benefit of this phenomenon is that, ultimately, predictive models of cancer risk that include all these newly discovered causative variants, will perform better than had originally been anticipated.

In COGS we also investigated the gene-environment interactions in risk for cancer. Furthermore, we assessed if established lifestyle/environmental risk factors for breast, ovarian and prostate cancer, influenced risk of subgroup specific cancer.
In order to study environmental factors study questionnaires from each participating group of COGS were collected and a data dictionary created. Detailed checks were performed for all epidemiologic data transferred to the three central databases. The breast cancer database for extended epidemiologic risk factor data has been continually expanded through inclusion of data from new study groups. In total, over 60 different studies are included in the database. All epidemiological data submitted was quality checked as previously reported. The data has been used in various analysis of gene-environment interaction with confirmed susceptibility loci for breast cancer. The BRCA 1 and BRCA 2 mutation carriers risk factor database now includes data of 29 studies, synchronized into 150 variables (7,977 BRCA1, 4,917 BRCA2 mutation carriers; 12,894 BRCA1/2 mutation carriers in total). The data underwent an extensive quality assurance procedure. The ovarian cancer risk factor database comprises over 47 studies. In addition to the core 79 variables, 50 additional detailed epidemiological variables have undergone quality assurance procedures conducted by different COGS partners. In prostate cancer, the risk database comprises 12 studies that submitted epidemiological risk factor data according to the data dictionary. Quality checks have been completed. The data dictionary is posted on the PRACTICAL website.
Below is just an example of the amount of data collected on lifestyle factors within COGS. As can be seen a vast number of individuals have contributed detailed information on a variety of factors that influence the risk of cancer. This is merely an example, the same type of list exists for ovarian and prostate cancer.
Number of cases and controls with detailed lifestyle/environmental variables for
breast cancer from 65 studies in COGS

Characteristic All
N cases N controls
Age <54 years 52582 53754
Age >=54 years 55592 55078
Age at menarche 66962 60609
Age at menopause 32668 25923
Menopausal status 71349 67752
Ever parous 73053 71626
Number of births 72546 70483
Age at first birth 51405 48571
Age at last birth 28016 26526
Ever breastfed 50362 42253
Height 61555 47588
BMI (prediagnosis), postmenopausal women 14973 11573
BMI (prediagnosis), premenopausal women 20887 18158
BMI at interview/questionnaire 66518 67478
Ever oral contraceptives (OC) 48015 43347
Duration of OC use 35020 32377
Current menopausal hormone therapy (MHT) use 43078 36428
Duration of MHT use 34881 32555
Current combined estrogen-progesterone therapy 20607 21765
Current estrogen monotherapy 20754 23450
Family history in a first degree relative 75104 52884
Cumulative lifetime gms/day alcohol 12242 15871
Gms/week alcohol in last year before reference date 24074 23629
Ever smoked 45825 44969
Pack-years of smoking 37243 38902
Recent physical activity in hrs/week 11910 14576

In the initial gene-environment study 23 known SNPs and 19 established risk factors for breast cancer (age at menarche, parity, age at first birth, breastfeeding, BMI, height, use of menopausal hormone therapy, use of oral contraceptives, smoking, alcohol consumption, and physical activity) was analyzed in 34,793 invasive breast cancers and 41,099 unaffected controls. It was shown that the effect of alcohol intake and number of children on breast cancer risk was modified by some common genetic variants [Nickels et al. 2013].
For ovarian cancer, 6 established susceptibility SNPs and 6 established environmental risk factors for ovarian cancer (endometriosis, first degree family history of ovarian cancer, oral contraceptive use, parity, tubal ligation, age at diagnosis) was assessed. Data from 5,566 epithelial ovarian cases and 7,374 controls from 14 case-control studies in COGS were pooled and stratified analysis by each risk factor with tests for heterogeneity were conducted. There was no statistical evidence of interaction on risk of ovarian cancer among all histological subtypes [Pearce et al. 2013].
For ovarian cancer, six new low-penetrance susceptibility loci have recently been identified through the iCOGS array [Pharoah et al. 2013, Bojesen et al. 2013, Permuth-Wey et al. 2013]. G×E interaction analyses between these 6 newly identified susceptibility loci and environmental risk factors (endometriosis, first degree family history of ovarian cancer, oral contraceptive use, parity, tubal ligation, age at diagnosis, cigarette smoking, education, alcohol consumption, BMI, age at menarche and breast feeding) are currently being performed.
For BRCA1/2 mutation carriers, in total 21 susceptibility loci in BRCA1 mutation carriers and 26 susceptibility loci in BRCA2 mutation carriers have been identified to modify risk for breast cancer through iCOGS. Using these 47 loci, gene-environment analyses have been conducted for age at menarche, body height, body weight and oral contraceptives (N=5,329 BRCA1, N=3,576 BRCA2). No significant interactions have been found, but detailed analyses are still ongoing.
For prostate cancer, a gene-environment analyses using 60 established and the newly identified COGS susceptibility loci. Only one SNP (rs 1859962 on chromosome 17) showed a nominally significant negative interaction with height in the case only analysis, which was not significant after correction for multiple testing. The gene-environment analysis was also extended to explore multiplicative and additive interaction between height and overall genetic susceptibility using a polygenic risk score. Results have yet to be published.

An additional aim of COGS was to examine the effects of genetic alterations on risk of subtype specific tumours, defined by a combination of histopathology, immunohistochemistry, gene expression and/or comparative genomic hybridization data. Furthermore, we have examined whether genetic alterations affect outcome. Consortia within COGS on breast, ovarian and prostate cancer generated information on tumour subtypes by histo-pathological and immunohistochemistry characteristics, including data from tissue micro arrays (TMAs). A database for histo-pathological and immunohistopathological information was created and existing histo-pathological and immunohistochemistry data harmonized with tumour phenotype and disease outcome within the different contributing participating consortia.

Currently available data in the database on all genotyped cases and on cases specifically genotyped with the iCOGS chip is shown in the table below for those with two key tumour characteristics.

Breast Ovary BRCA1/2 breast BRCA1/2 ovary Prostate

All cases
Morphology 113,752 25,25 13,102 2,654 57,543
All cases genotyped with iCOGS SNP array
Morphology 55,585 25,25 7,312 1,356 38,13

New genotype data of the iCOGS chip have been added to the database. A large number of novel cancer susceptibility SNPs has been identified. Moreover, analyses of these data by cancer subtype (primarily receptor status in breast cancer) have identified a number of SNPs to be associated with specific cancer subtypes and the first manuscript of these findings have been published in Nature Genetics [Garcia-Closas 2013, Pharoah 2013, Eeles 2013].
Data available from tissue microarrays has been included and coded in the available database. For breast cancer, we evaluated ER, PR, HER2, HER2 SISH, CK5/6, EGFR to define the most relevant molecular breast cancer subtypes using the available TMAs. We expanded this set of molecular markers with Ki67, TP53, E-CAD, BCL2, and Annexin A1. Scoring of the TMAs has been done by either by study pathologists or using centralised scoring with the Ariol/SlidePath systems; scoring for ER, PER, HER2, CK5/6, EGFR, TP53 and Annexin A1 has been completed, Ki67 and BCL2 are almost completed. In this project we have evaluated molecular subtypes beyond the currently known, which will add to the understanding of the associations between genotype and molecular subtype. These additional data have also been added to the central database.
A manuscript has been written while more manuscript to be submitted in the coming year on the basis of iCOGS genotype data (manuscripts using TP53 and AnnexinA1 TMA scores have been drafted), and in the future including genotype data of the newly designed OncoArray. For ovarian cancer studies, immonohistochemistry has been performed in the OTTA initiative, which is directly linked to the OCAC genotype data derived from COGS, having scored the following (experimental) markers: ER, PR, DKK1, HNF1B, MDM2, P16, P53, TFF3, VIM, ARID1A, WT1, IMP3, NASP, FOLR1.
The final responsibilities of COGS was to evaluate potential for risk-stratified prevention in the population, to explore the key organisational, ethical, legal and social hurdles and to make appropriate policy recommendations.

During the COGS project relevant multidisciplinary stakeholders were invited to three International Workshops. Approximately 40 international experts attended the workshops. Workshop participants included oncologists, breast cancer screening program managers, clinical geneticists, ethicists, health service policy makers, public health specialists and public representatives, as well as scientists and clinicians closely involved as researchers in the wider COGS program. The participants of the workshop contributed in focused group discussions on the following themes:
1. Modelling and evaluation of stratified screening
2. Issues surrounding service delivery of stratified screening
3. Relevant ethical, legal and social issues
4. Additional professional and training needed if stratified screening were to be implemented.
For this task, steering group members produced preliminary documents related to effectiveness, organisational and ethical, legal and social issues of risk-stratified screening which were shared with invited workshop participants. Findings from the workshops are detailed in three consecutive workshop reports. COGS staff has also collaborated with participants of the workshops on prioritised areas to produce manuscripts for peer-reviewed publication. [Pashayan 2013, Hall 2013, Burton 2013, Chowdhury 2013, Dent 2013].
In his classic paper, the epidemiologist Geoffrey Rose highlighted two approaches to disease prevention: the individual and the population approaches. The individual approach focuses on identifying individuals at high risk and providing some individual protection, which might involve controlling the level of exposure to a causal agent or an intervention, such as prophylactic treatment or surveillance for early disease. The population approach focuses on identifying the underlying causes of disease (for example, high dietary intake of fat or salt) and providing a generalized intervention that shifts the whole distribution of risk at the population level. In both approaches, there is acknowledgment of the potential for harm or at least inconvenience for individuals, as well as the possibility of benefit. In the high-risk approach, the benefit-to-harm ratio for individuals is more favourable, albeit at the cost of identifying these individuals in the first place and the potential for long-lasting medicalization or stigmatization.
The potential for risk stratification using personal and medical information including genetic testing requires a refinement of these original concepts of disease prevention and suggests a third way that synthesizes elements of the two approaches. We argue that stratified prevention could be conceptualized as an enhancement of Rose's high-risk approach. Essentially, it uses a prior assessment of risk, applied to the whole population, followed by the assignment of individuals to a risk stratum and the tailoring of the interventions offered to each group. In so doing, it aims to optimize the benefit-harm ratio and the cost-effectiveness of the public health program.
The question then arises of whether current understanding of the genetic susceptibility for hormone-related cancers—breast and prostate—can provide sufficiently good discrimination between risk groups so that the clinical usefulness gained by the stratification of prevention justifies the complexity that will be added to prevention programs. In the key findings, it was claimed that “the current set of loci and assuming that all loci combine multiplicatively” could lead to a potential for risk stratification. For breast cancer the risk being approximately 2.3-fold and 3-fold higher, for individuals in the top 5% and 1% of the population relative to the population average [Michailidou 2013]. For prostate cancer, it was estimated that there would be a 4.7-fold greater risk for prostate cancer for the top 1% of men in the highest risk stratum relative to the population average [Eeles 2013].
Using the data, the 10-year absolute risk of being diagnosed with breast or prostate cancer was estimated, taking into account age and polygenic risk profile, using all known susceptibility variants, including new variants identified in the COGS program (a total of 67 susceptibility variants for breast cancer and 72 variants for prostate cancer). The number of individuals eligible for screening and the number of cases potentially detectable by screening were estimated in a population undergoing screening on the basis of age alone in comparison to a population undergoing personalized screening. For breast cancer, using the current UK National Health Service (NHS) breast cancer screening program as a comparator, it was found that, compared with existing age-based screening (ages 47–73 years), stratified screening of women in a wider age range (ages 35–79 years) at the same 10-year absolute risk (2.5%) would be expected to result in 24% fewer women being eligible for screening while potentially detecting 3% fewer cases through screening. Similarly, with prostate cancer, in a hypothetical screening strategy comparing risk-stratified screening for men aged 45–79 years with screening of men from age 55 (10-year absolute risk of prostate cancer of 2%), 19% fewer men would be eligible for screening at a cost of 4% fewer cases potentially detected by screening.
In addition to the evaluation of potential usefulness and cost-effectiveness, the implementation of risk-stratified screening will require attention to a wide range of organizational, ethical, legal and social issues. If DNA is to be sampled as part of a risk stratification process, then, like any other clinical intervention, consent will be required. DNA sampling and analysis will not in itself change the nature of the consent that is sought, unless it is for research or other purposes outside clinical care. Nevertheless, DNA sampling and analysis may cause concern to patients and the public. Policies for the implementation of risk-stratified screening will need to set out clearly the uses that will be made of the data generated. Participants in testing will need to be informed of whether the data generated can or will be used for other purposes, such as research; the possibility of generating incidental findings and how these will be managed; whether information will be relevant for family members and, if so, whether, how and by whom it will be shared; whether the data will be stored and, if so, with what safeguards; and who might have access to stored data, including the individual, family members, employers, insurance companies, criminal justice agencies and researchers.
The components of risk assessment are likely to include genetic susceptibility variants, other biomarkers, a data set of personal, clinical and family history information, reproductive information and environmental or lifestyle factors. The precise components of this information and the methods for collection must be studied to determine the most cost-effective approaches. Policymakers must decide whether information collection will be a one-time occurrence or take into account changing circumstances (such as family history), being updated over time.
Preliminary risk stratification will add new complexities to the prevention program. First, appropriate systems for inviting and recalling people for risk assessment and screening need to be in place. Second, there should be a standard protocol for taking consent, performing genetic sampling and using a standardized risk assessment tool to integrate genetic data from an individual with environmental, lifestyle and hormonal data. Third, the level of risk of cancer will dictate the care pathway followed, with different pathways being followed for each risk stratum. Before implementation of a stratified screening program using genetic information, some health professionals will require new competencies to explain the new system, undertake assessment and communicate results.

Potential Impact:
The most practical application of COGS is that the results enable the identification of the individual risk of disease. We have previously theoretically shown that, under the polygenic model, the predictive value of genetic testing can be substantial. For example, for breast cancer, approximately 50% of the disease will occur in the 12% of the population at highest risk, implying that preventive programmes targeted on those at highest risk would have a substantial population impact. One of the aims of this proposal will be to evaluate the public health implications of the risk models we develop.
COGS will identify individuals with a testable polygenic profile that are at substantially increased risk of disease and determine whether specific lifestyles have a greater impact in this genetic subgroup. As a consequence, genetic counsellors will be able to provide more specific risk information to the counselee. Moreover, such evidence may increase general awareness of current knowledge on tumour aetiology, systematically informing a motivated group within the general population that seeks advice to lower their cancer risk.
Using the data, the 10-year absolute risk of being diagnosed with breast or prostate cancer was estimated, taking into account age and polygenic risk profile, using all known susceptibility variants, including new variants identified in the COGS program (a total of 67 susceptibility variants for breast cancer and 72 variants for prostate cancer). The number of individuals eligible for screening and the number of cases potentially detectable by screening were estimated in a population undergoing screening on the basis of age alone in comparison to a population undergoing personalized screening. For breast cancer, using the current UK National Health Service (NHS) breast cancer screening program as a comparator, it was found that, compared with existing age-based screening (ages 47–73 years), stratified screening of women in a wider age range (ages 35–79 years) at the same 10-year absolute risk (2.5%) would be expected to result in 24% fewer women being eligible for screening while potentially detecting 3% fewer cases through screening. Similarly, with prostate cancer, in a hypothetical screening strategy comparing risk-stratified screening for men aged 45–79 years with screening of men from age 55 (10-year absolute risk of prostate cancer of 2%), 19% fewer men would be eligible for screening at a cost of 4% fewer cases potentially detected by screening.
The efficiency of mammography screening has been challenged, particularly so the last 5 years. It has been questioned if it really lowers mortality and it has been emphasized that numerous women will be diagnosed with breast cancer that, if left undetected, would never have caused any clinical problems.
The surprising fact is not that a report on the pros and cons of mammography screening is challenged, but that we still discuss mammography screening as if it will never change and always stay the same. Normally when something is not working properly, we try to find ways of improvement. Most areas within oncology have developed quite dramatically over the years. Within breast cancer, therapeutic advances in radio-, chemo- and hormonal therapy has decreased recurrence rate and increased cause-specific survival. Over the last decade, antibodies have introduced the concept of targeted and individualised therapy. An oncologist would today never rely simply on an X-ray, but use CT, MRI, PET, etc for proper diagnostics and follow-up.
All medical interventions, diagnostic or therapeutic, should do more good than harm. In addition, they have to be cost effective and acceptable to society, patients and health-care providers. Suppose we agree that early detection of a disease is generally better than postponing diagnosis and therapy, and that this leads to a decrease in breast cancer mortality (estimated to be 20%) and also to a significant rate of overdiagnosis (estimated 11%), it raises the following questions:
• How to make screening more efficient?
• Is it possible to increase sensitivity and specificity of the tests, to decrease the number of false positive and negative test results?
• How do we identify the fatal cancers?
• Could we find means to target those woman who are most likely to benefit from screening and avoid spending time and money on those in whom we merely induce anxiety?

Questions like these are seldom heard in the screening debate.
Mammography is an imaging technique that is dependent on identifying a contrast between a malignant tumour and the surrounding normal breast tissue. Mammography screening has been used in more or less the same way over the past 40 years and very little has been done to increase efficiency. Age of entry into the programme and screening intervals are discussed, but most programmes assume that the risk of breast cancer is solely dependent on age, that is, a women will benefit equally from screening as long as she is within a certain age range. However, there are many other factors that determine a woman's risk of breast cancer. It therefore seems like an intuitive next step to move from age-based to risk-based screening.
There are many genetic and non-genetic markers for breast cancer risk. Risk-based screening is already routine for women with a strong family history and/or carry a BRCA1 or BRCA2 mutation; the principle could be extended to utilise other markers. Forty-one common genetic variants for breast cancer were recently described within the COGS project [Michailidou 2013] and there are at least 50 markers yet not published. Collectively these markers identify 1% of women with a risk more than three times the average. Many more such variants are likely to be identified in the coming years. Several breast cancer risk models including BOADICEA, Tyrer-Cuzick and Gail already exist and are utilised widely in genetic counselling and prevention trials; some would argue that the current models have low predictive power, but combining the effects of genetic markers, lifestyle risk factors and mammographic density should lead to tests with useful discriminatory power.
A more nuanced risk-based screening programme would involve women at higher risk being screened from a younger age, or more intensively, or with additional modalities (e.g. MRI), while women at lower risk would be screened less often or not at all. The obvious advantage is that resources are targeted on women with the highest likelihood of benefit while at the same time reducing screening interventions for women at lower risk, thus producing more benefit at a lower cost [Pashayan 2011]. Moreover, we also know that both genetic and non-genetic risk factors can, to some extent, predict the risk of specific tumour subtypes as recently shown [Garcia-Closas, 2013]. This information might enable the rate of overdiagnosis to be reduced as well.
While individualised screening is already accepted in the context of women with a family history, there are clearly many challenges to be faced before this model could become a reality in national screening programmes. The public tolerability is crucial but also probably the least problematic. For decades, perfectly healthy individuals are ‘screened' for factors that influence the risk of cardiovascular disorders and measurements of blood lipids and blood pressure fully accepted. The organisational aspects are challenging. If genetic and other risk assessments were extended to all women, what form of counselling would be available? Most countries lack the infrastructure needed to handle of this massive information exchange. The professional acceptability of a change to an individualised screening programme is difficult to predict, although again the acceptance of individualised intervention for cardiovascular disease provides a model. On top of all this there are ethical, legal and social considerations to be taken into consideration.
Finally, we have not even touched on what to offer women in the highest risk group. Future prediction models will identify many more women with a lifetime risk of more than 30% – classified as ‘high risk' according to NICE [National Institute for Health and Clinical Excellence, 2006]. Intensified screening will surely be too passive an intervention for this group. In the context of a familial history, such women would generally be offered MRI, prophylactic surgery and/or risk reducing medications – would this also be extended to the population at large?
Finally, COGS will impact the scientific community for decades to come. The resources generated, genetic and phenotypic, the data bases created and samples analyzed, will benefit researchers and thereby todays and future patients. So far 104 scientific papers have emanated from COGS and there will at least as many to come over the coming years.

Lastly, what lessons could be learnt from the COGS success? There are some few key factors that made it possible. These factors each contributed to the large number of papers published and the knowledge generated.
• Success. COGS was built on previous successes. The four consortia that made up the core of COGS - Breast Cancer Association Consortia (BCAC), Ovarian Cancer Consortia (OCAC), Consortium of Investigators of Modifiers BRCA (CIMBA), PRostate cancer AssoCiation group To Investigate Cancer Associated aLterations in the genome (PRACTICAL) – were all established previous to COGS. Collaboration was ongoing and a couple of key papers had been published but sufficient funding was lacking for the next scientific leap.
• Competence. The partners of COGS came from different backgrounds, geneticists, epidemiologists, statisticians, molecular biologists, oncologists, behavioral scientists, and managed to find a language that most understood sufficiently well to solve emerging challenges. Competence is however, not enough. Competent people who delivers is the prerequisite.
• Altruism. All scientists want to have their name on the best papers in the most prestigious journals. Unfortunately there is no simple algorithm for solving this problem in a paper with > 2 authors. A solution is to adopt a give-and-take approach that makes collaborators feel that they gain from the collaboration.
• Transparency. In a project with ≈ 170 research groups and probably more than 1000 scientist involved there is a high risk of controversy on who is publishing what using what resources. We adopted a “no-surprise” policy which means that groups were free to do whatever they wanted to do with their own material but that they should reveal that at the regular meetings.
• Communication. See to that regular meetings are organized, that telephone conference are scheduled and scientists are encouraged to visit each other from time to time.
• Bureaucracy. See to that the EC reporting, etc, do not interfere with science. That is, the coordinator should have sufficient administrative backup.
• Work hard.

List of Websites: