A Whole Genome Checkup

In April 2012, I attended the Genomes, Environments, and Traits 2012 conference in Boston. It was a great meeting, full of exciting speakers and opportunities to talk to people from many different fields. I had already enrolled in the Personal Genome Project and was waiting for my exome results from 23andMe. The meeting organizers took advantage of the presence of over 100 PGP volunteers to carry out blood draws for DNA sequencing and the establishment of cell lines.

I knew that there was a long waiting list for complete genome sequencing. There were fewer than 100 PGP volunteers sequenced in April 2012, while the list of people enrolled and waiting was over 1,000 people. I had already donated my 23andMe SNP data. I donated my data from the 23andMe exome pilot when I obtained it in the summer of 2012. Both of these datasets are available on my page on the PGP site.

The organizers of the PGP have developed a priority system to decide who will be sequenced first. I recently learned that PGP volunteers who had attended GET 2012 and donated blood there were given priority as active volunteers, and apparently donating my exome contributed to giving me priority as well. In the fall of 2012, I learned that my genome had been fully sequenced. I consented to make the data public immediately, and became PGP89.

Whole genome sequence data are filtered at the PGP to identify variants of high and moderate impact. Variants that change the amino acid sequence of the encoded protein are identified and evaluated. These include nonsynonomous subtitutions, frameshifts, and stop gain and stop loss variants. It is easy to search a genome for these kinds of variants using the website. The screenshot below shows the results of a search for “Msh” in my genome, revealing the variants MSH6-G39E and MSH2-G322D, both present in heterozygous form.

PGP_MSH

I spent quite a bit of time going through my results, comparing my genome to other publicly available genomes, and learning to work with the raw files. This exploration frequently sent me to the biomedical literature and taxed the limits of my text-editing software as I manipulated the raw data files. I am sure that something will eventually emerge from all of this thrashing, but for this post, there is a more topical way to introduce the results of sequencing my genome.

Immediately prior to the GET 2013 conference in late April, the American College of Medical Genetics and Genomics released a set of recommendations for the reporting of incidental findings in clinical exome and whole genome sequencing. These recommendations were mentioned frequently during the meeting, and have generated a great deal of discussion.

What is an incidental finding? If a patient were given an X-ray to look for evidence of a lung condition, the radiologist can’t help seeing the heart as well. If there is something wrong with the patient’s heart, the radiologist has made an incidental finding. Whenever there is an incidental finding, physicians must decide if it is something that requires treatment or whether it is of no consequence. With the falling cost of exome and whole-genome sequencing, genomic techniques are being used as diagnostic procedures. Obtaining an exome or whole genome sequence offers the opportunity for a vast array of incidental findings.

Which incidental genomic findings should be reported? Our knowledge of the connection between genetic variation and disease is incomplete, so there is not a clear answer. There are some genetic variants that pose a well-documented risk to human health, for example, specific alleles of the BRCA1 gene that confer a greatly elevated risk of breast and ovarian cancer. Most people would agree that finding a known pathogenic allele of BRCA1 should be reported to a patient. On the other hand, most people carrying risk alleles of HFE never develop hemochromatosis, a disease that typically manifests in the fifth decade or later. Even if we were to develop a short list of genes for which variants should be reported, some variants will be clearly harmless (synonymous substitutions, for example) while others will be known to be pathogenic from prior studies. There will also be rare alleles that have never been identified in clinical cases, but which we might suspect to be pathogenic. No matter how we draw up the gene list or define the kinds of alleles to report, there will always be ambiguities.

I am something of a data junkie, obviously, so I’d like to know everything. However, there are costs to delivering a complete genome analysis to members of the general public in a clinical setting. First, there is the cost of evaluating all of the variants. Any individual human genome will have many variants that are rare, variants whose frequency is unknown, or variants that have never been reported. In the case of my exome results described in the prior post, there are 1,583 rare variants, 9,842 known variants whose allele frequency is unknown, and 4,722 variants that have never been reported. It isn’t reasonable to ask a set of clinicians to give me a full report on all of this, especially because for the majority of the variants, the best answer would be that we don’t know whether there are any medical consequences.

Adding to the problem, the average patient does not have a working knowledge of genetics and is not accustomed to dealing with probabilistic answers from a physician. There is the prospect of many unnecessary, costly, and risky diagnostic procedures and treatments for incidental findings that might be of no real consequence.

The ACMG struggled with the problem for a year before releasing a set of recommendations for the reporting of incidental findings in April 2013. The recommendations detail 57 genes associated with 24 conditions. They recommend that known pathogenic and in some cases expected pathogenic alleles of these genes be reported as incidental findings to patients who have had exome or full genome analysis for some other condition.

I thought that it might be a useful exercise to apply the ACMG recommendations to my own genome. I suspected that I might have some variants that would fall into a grey area where it would be challenging to decide whether to report them to myself as incidental findings. This is not entirely a fair experiment, as many of the diseases on the list appear in childhood, and I am a healthy adult well past the stage where these diseases would appear.

Here is the list of the 24 diseases adapted from the ACMG recommendations. Links in the table below take you to useful summaries at OMIM and PubMed.

Hereditary Conditions Recommended for Reporting (adapted from ACMG Recommendations)
Phenotype OMIM Disorder PMID GeneReview Entry Age of Onset
1. Hereditary Breast and Ovarian Cancer 604370
612555
20301425 Adult
2. Li-Fraumeni Syndrome 151623 20301488 Child/adult
3. Peutz-Jeghers Syndrome 175200 20301443 Child/adult
4. Lynch Syndrome 120435 20301390 Adult
5. Familial adenomatous polyposis 175100 20301519 Child
6. MYH-Associated Polyposis; Adenomas, multiple colorectal, FAP type 2; Colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas 608456
132600
23035301 Adult
7. Von Hippel Lindau syndrome 193300 20301636 Child/adult
8. Multiple Endocrine Neoplasia Type 1 131100 20301710 Child/adult
9. Multiple Endocrine Neoplasia Type 2 171400
162300
20301434 Child/adult
10. Familial Medullary Thyroid Cancer (FMTC) 1552401 20301434 Child/adult
11. PTEN Hamartoma Tumor Syndrome 153480 20301661 Child
12. Retinoblastoma 180200 20301625 Child
13. Hereditary Paraganglioma-Pheochromocytoma Syndrome 168000 (PGL1) 20301715 Child/adult
601650 (PGL2)
605373 (PGL3)
115310 (PGL4)
14. Tuberous Sclerosis Complex 191100
613254
20301399 Child
15. WT1-related Wilms tumor 194070 20301471 Child
16. Neurofibromatosis type 2 101100 20301380 Child/adult
17. EDS – vascular type 130050 20301667 Child/adult
18. Marfan Syndrome, Loeys-Dietz Syndromes, and Familial Thoracic Aortic Aneurysms and Dissections 154700
609192
608967
610168
610380
613795
611788
20301510
20301312
20301299
Child/adult
19. Hypertrophic cardiomyopathy, Dilated cardiomyopathy 115197
192600
601494
613690
115196
608751
612098
600858
301500
608758
115200
20301725 Child/adult
20. Catecholaminergic polymorphic ventricular tachycardia 604772
21. Arrhythmogenic right ventricular cardiomyopathy 609040
604400
610476
607450
610193
20301310 Child/adult
22. Romano-Ward Long QT Syndromes Types 1, 2, and 3, Brugada Syndrome 192500
613688
603830
601144
20301308 Child/adult
23. Familial hypercholesterolemia 143890
603776
No GeneReviews entry Child
24. Malignant hyperthermia susceptibility 145600 20301325 Child/adult

The narrow format of this blog requires me to split the wide table from the ACMG report. Here is the rest of it, showing the 57 genes that should be examined for pathogenic variants. The last column of the table shows variants for each gene found in my genome. These variants include only those expected to have high or moderate impact on gene function, including nonsynonomous amino acid substitutions, nonsense mutations (stop gains), and indels in the coding regions. In those cases where I have a variant, I give the allele frequency from the estimates provided at the PGP site as of May 2013.

Phenotype OMIM Disorder PMID GeneReview Entry Gene w/ OMIM link PGP89 Variants (Freq.)
1 604370
612555
20301425 BRCA1 BRCA1-S1634G (0.292)
BRCA1-K1183R (0.302)
BRCA1-E1038G (0.265)
BRCA1-P871L (0.555)
BRCA2 none
2 151623 20301488 TP53 TP53-P72R (0.550)
3 175200 20301443 STK11 none
4 120435 20301390 MLH1 none
MSH2 MSH2-G322D (0.016)
MSH6 MSH6-G39E (?)
PMS2 PMS2-K541E (0.904)
5 175100 20301519 APC APC-V1822D (0.887)
6 608456
132600
23035301 MUTYH MUTYH-V8M (0.028)
7 193300 20301636 VHL none
8 131100 20301710 MEN1 MEN1-T546A (0.791)
9 171400
162300
20301434 RET none
10 1552401 20301434 RET none
NTRK1 none
11 153480 20301661 PTEN none
12 180200 20301625 RB1 none
13 168000 (PGL1) 20301715 SDHD none
601650 (PGL2) SDHAF2 none
605373 (PGL3) SDHC none
115310 (PGL4) SDHB none
14 191100
613254
20301399 TSC1 TSC1-S1043Del (?)
TSC1-M322T (0.150)
TSC2 none
15 194070 20301471 WT1 none
16 101100 20301380 NF2 none
17 130050 20301667 COL3A1 COL3A1-A698T (0.181)
COL3A1-H1353Q (0.990)
18 154700
609192
608967
610168
610380
613795
611788
20301510
20301312
20301299
FBN1 FBN1-C472Y (1.000)
TGFBR1 none
TGFBR2 none
SMAD3 none
ACTA2 none
MYLK none
MYH11 MYH11-A1241T (0.223)
19 115197
192600
601494
613690
115196
608751
612098
600858
301500
608758
115200
20301725 MYBPC3 none
MYH7 none
TNNT2 TNNT2-K260R (0.088)
TNNI3 none
TPM1 none
MYL3 none
ACTC1 none
PRKAG2 none
GLA none
MYL2 none
LMNA none
20 604772 RYR2 RYR2-G1885E (0.024)
21 609040
604400
610476
607450
610193
20301310 PKP2 none
DSP none
DSC2 none
TMEM43 TMEM43-K168N (0.335)
TMEM43-M179T (0.487)
DSG2 none
22 192500
613688
603830
601144
20301308 KCNQ1 none
KCNH2 KCNH2-R1047L (?)
KCNH2-K897T (0.098)
SCN5A SCN5A-S524Y (0.008)
23 143890
603776
No GeneReviews entry LDLR LDLR-A391T (0.102)
APOB APOB-S4338N (0.725)
APOB-I2313V (0.964)
APOB-Y1422C (0.994)
PCSK9 PCSK9-V474I (0.859)
24 145600 20301325 RYR1 none
CACNA1S CACNA1S-L458H (?)

In the exome analysis provided by 23andMe described in my last post, common variants are assumed to be benign. There aren’t that many variants here, so I have set the cutoff at an allele frequency of 5% rather than at 1% as in my exome analysis. This gives a modest set of eight variants to be evaluated, shown in the table below. This includes variants whose frequency is not reported on the PGP summary site. As you will see below, one of these is a common variant that can be eliminated.

Phenotype OMIM Disorder PMID GeneReview Entry Gene w/ OMIM link PGP89 Variants(Freq.)
4 120435 20301390 MSH2 MSH2-G322D (0.016)
MSH6 MSH6-G39E (?)
6 608456
132600
23035301 MUTYH MUTYH-V8M (0.028)
14 191100
613254
20301399 TSC1 TSC1-S1043Del (?)
20 604772 RYR2 RYR2-G1885E (0.024)
22 192500
613688
603830
601144
20301308 KCNH2 KCNH2-R1047L (?)
SCN5A SCN5A-S524Y (0.008)
24 145600 20301325 CACNA1S CACNA1S-L458H (?)

4. Lynch Syndrome. The MSH2-G322D allele is evaluated at the PGP site as likely benign. They cite clinical studies that show that it appears in the controls as well as in cancer patients. I wouldn’t report this to myself as an incidental finding.

4. Lynch Syndrome. The MSH6-G39E allele is associated with an increased risk of colon cancer in men in one of the studies cited at the PGP site (2). Among cancer risk alleles, this is a relatively modest one. I would report this one to myself as an incidental finding, suggesting that I follow standard recommendations for monitoring for colon cancer.

6. MYH-Associated Polyposis; Adenomas, multiple colorectal, FAP type 2; Colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas. The MUTYH-V8M allele is a rare variant that does not appear to be associated with colorectal adenoma polyposis; over 70% of cancers of this type are associated with the two alleles MUTYH-Y165C and MUTYH-G382D (3). I would not report this to myself as an incidental finding.

14. Tuberous Sclerosis Complex. The TSC1-S1043Del allele is a rare, poorly characterized indel. Because Tuberous Sclerosis Complex results from dominant mutations and the disease manifests in childhood, this variant is of no significance and I would not report it to myself as an incidental finding.

20. Catecholaminergic polymorphic ventricular tachycardia. There is evidence that the RYR2-G1885E allele is pathogenic in compound heterozygotes that also carry the RYR2-G1886S allele. My other allele of RYR2 is normal, so I am not at risk for this recessive condition. I would not report this to myself as an incidental finding. I had an EKG as part of a medical evaluation several years ago and my cardiac function is normal.

22. Romano-Ward Long QT Syndromes Types 1, 2, and 3, Brugada Syndrome. The rare KCNH2-R1047L variant is not known to be associated with Long QT Syndrome, which shows an autosomal dominant form of inheritance, typically resulting in cardiac events in the teens and early 20s. As my heart function is normal, not only is this not reportable as an incidental finding, my health is evidence that KCNH2-R1047L is a benign allele, at least given the rest of my genetic background.

22. Romano-Ward Long QT Syndromes Types 1, 2, and 3, Brugada Syndrome. The rare SCN5A-S524Y variant is not known to be associated with Long QT Syndrome, which shows an autosomal dominant form of inheritance, typically resulting in cardiac events in the teens and early 20s. As my heart function is normal, not only is this not reportable as an incidental finding, my health is evidence that SCN5A-S524Y is a benign allele, at least given the rest of my genetic background.

24. Malignant hyperthermia susceptibility. The CACNA1S-L458H variant is common (allele frequency over 20%), and known to be benign.

Summary and Recommendations. I carry what may be a risk allele for colon cancer (MSH6-G39E). I should follow normal recommendations for the detection of colon cancer, as well as standard recommendations for prevention (fruit, vegetables, and fiber in the diet, easy on the grilled meats). I have two rare variants in genes for which other variants cause dominant cardiac abnormalities. My cardiac health is evidence that KCNH2-R1047L and SCN5A-S524Y are benign polymorphisms.

References

1. RC Green et al. (2013) ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing. ACMG Annual Clinical Genetics Meeting 3/22/2013

2. Curtin K, Samowitz WS, Wolff RK, Caan BJ, Ulrich CM, Potter JD, Slattery ML. (2009) MSH6 G39E polymorphism and CpG island methylator phenotype in colon cancer. Mol Carcinog. 48: 989-94.

3. Forsbring M, Vik ES, Dalhus B, Karlsen TH, Bergquist A, Schrumpf E, Bjørås M, Boberg KM, Alseth I. (2009) Catalytically impaired hMYH and NEIL1 mutant proteins identified in patients with primary sclerosing cholangitis and cholangiocarcinoma. Carcinogenesis. 30: 1147-54.

Leave a Reply

Your email address will not be published. Required fields are marked *