A Whole Genome Checkup

In April 2012, I attended the Genomes, Environments, and Traits 2012 conference in Boston. It was a great meeting, full of exciting speakers and opportunities to talk to people from many different fields. I had already enrolled in the Personal Genome Project and was waiting for my exome results from 23andMe. The meeting organizers took advantage of the presence of over 100 PGP volunteers to carry out blood draws for DNA sequencing and the establishment of cell lines.

I knew that there was a long waiting list for complete genome sequencing. There were fewer than 100 PGP volunteers sequenced in April 2012, while the list of people enrolled and waiting was over 1,000 people. I had already donated my 23andMe SNP data. I donated my data from the 23andMe exome pilot when I obtained it in the summer of 2012. Both of these datasets are available on my page on the PGP site.

The organizers of the PGP have developed a priority system to decide who will be sequenced first. I recently learned that PGP volunteers who had attended GET 2012 and donated blood there were given priority as active volunteers, and apparently donating my exome contributed to giving me priority as well. In the fall of 2012, I learned that my genome had been fully sequenced. I consented to make the data public immediately, and became PGP89.

Whole genome sequence data are filtered at the PGP to identify variants of high and moderate impact. Variants that change the amino acid sequence of the encoded protein are identified and evaluated. These include nonsynonomous subtitutions, frameshifts, and stop gain and stop loss variants. It is easy to search a genome for these kinds of variants using the website. The screenshot below shows the results of a search for “Msh” in my genome, revealing the variants MSH6-G39E and MSH2-G322D, both present in heterozygous form.

PGP_MSH

I spent quite a bit of time going through my results, comparing my genome to other publicly available genomes, and learning to work with the raw files. This exploration frequently sent me to the biomedical literature and taxed the limits of my text-editing software as I manipulated the raw data files. I am sure that something will eventually emerge from all of this thrashing, but for this post, there is a more topical way to introduce the results of sequencing my genome.

Immediately prior to the GET 2013 conference in late April, the American College of Medical Genetics and Genomics released a set of recommendations for the reporting of incidental findings in clinical exome and whole genome sequencing. These recommendations were mentioned frequently during the meeting, and have generated a great deal of discussion.

What is an incidental finding? If a patient were given an X-ray to look for evidence of a lung condition, the radiologist can’t help seeing the heart as well. If there is something wrong with the patient’s heart, the radiologist has made an incidental finding. Whenever there is an incidental finding, physicians must decide if it is something that requires treatment or whether it is of no consequence. With the falling cost of exome and whole-genome sequencing, genomic techniques are being used as diagnostic procedures. Obtaining an exome or whole genome sequence offers the opportunity for a vast array of incidental findings.

Which incidental genomic findings should be reported? Our knowledge of the connection between genetic variation and disease is incomplete, so there is not a clear answer. There are some genetic variants that pose a well-documented risk to human health, for example, specific alleles of the BRCA1 gene that confer a greatly elevated risk of breast and ovarian cancer. Most people would agree that finding a known pathogenic allele of BRCA1 should be reported to a patient. On the other hand, most people carrying risk alleles of HFE never develop hemochromatosis, a disease that typically manifests in the fifth decade or later. Even if we were to develop a short list of genes for which variants should be reported, some variants will be clearly harmless (synonymous substitutions, for example) while others will be known to be pathogenic from prior studies. There will also be rare alleles that have never been identified in clinical cases, but which we might suspect to be pathogenic. No matter how we draw up the gene list or define the kinds of alleles to report, there will always be ambiguities.

I am something of a data junkie, obviously, so I’d like to know everything. However, there are costs to delivering a complete genome analysis to members of the general public in a clinical setting. First, there is the cost of evaluating all of the variants. Any individual human genome will have many variants that are rare, variants whose frequency is unknown, or variants that have never been reported. In the case of my exome results described in the prior post, there are 1,583 rare variants, 9,842 known variants whose allele frequency is unknown, and 4,722 variants that have never been reported. It isn’t reasonable to ask a set of clinicians to give me a full report on all of this, especially because for the majority of the variants, the best answer would be that we don’t know whether there are any medical consequences.

Adding to the problem, the average patient does not have a working knowledge of genetics and is not accustomed to dealing with probabilistic answers from a physician. There is the prospect of many unnecessary, costly, and risky diagnostic procedures and treatments for incidental findings that might be of no real consequence.

The ACMG struggled with the problem for a year before releasing a set of recommendations for the reporting of incidental findings in April 2013. The recommendations detail 57 genes associated with 24 conditions. They recommend that known pathogenic and in some cases expected pathogenic alleles of these genes be reported as incidental findings to patients who have had exome or full genome analysis for some other condition.

I thought that it might be a useful exercise to apply the ACMG recommendations to my own genome. I suspected that I might have some variants that would fall into a grey area where it would be challenging to decide whether to report them to myself as incidental findings. This is not entirely a fair experiment, as many of the diseases on the list appear in childhood, and I am a healthy adult well past the stage where these diseases would appear.

Here is the list of the 24 diseases adapted from the ACMG recommendations. Links in the table below take you to useful summaries at OMIM and PubMed.

Hereditary Conditions Recommended for Reporting (adapted from ACMG Recommendations)
Phenotype OMIM Disorder PMID GeneReview Entry Age of Onset
1. Hereditary Breast and Ovarian Cancer 604370
612555
20301425 Adult
2. Li-Fraumeni Syndrome 151623 20301488 Child/adult
3. Peutz-Jeghers Syndrome 175200 20301443 Child/adult
4. Lynch Syndrome 120435 20301390 Adult
5. Familial adenomatous polyposis 175100 20301519 Child
6. MYH-Associated Polyposis; Adenomas, multiple colorectal, FAP type 2; Colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas 608456
132600
23035301 Adult
7. Von Hippel Lindau syndrome 193300 20301636 Child/adult
8. Multiple Endocrine Neoplasia Type 1 131100 20301710 Child/adult
9. Multiple Endocrine Neoplasia Type 2 171400
162300
20301434 Child/adult
10. Familial Medullary Thyroid Cancer (FMTC) 1552401 20301434 Child/adult
11. PTEN Hamartoma Tumor Syndrome 153480 20301661 Child
12. Retinoblastoma 180200 20301625 Child
13. Hereditary Paraganglioma-Pheochromocytoma Syndrome 168000 (PGL1) 20301715 Child/adult
601650 (PGL2)
605373 (PGL3)
115310 (PGL4)
14. Tuberous Sclerosis Complex 191100
613254
20301399 Child
15. WT1-related Wilms tumor 194070 20301471 Child
16. Neurofibromatosis type 2 101100 20301380 Child/adult
17. EDS – vascular type 130050 20301667 Child/adult
18. Marfan Syndrome, Loeys-Dietz Syndromes, and Familial Thoracic Aortic Aneurysms and Dissections 154700
609192
608967
610168
610380
613795
611788
20301510
20301312
20301299
Child/adult
19. Hypertrophic cardiomyopathy, Dilated cardiomyopathy 115197
192600
601494
613690
115196
608751
612098
600858
301500
608758
115200
20301725 Child/adult
20. Catecholaminergic polymorphic ventricular tachycardia 604772
21. Arrhythmogenic right ventricular cardiomyopathy 609040
604400
610476
607450
610193
20301310 Child/adult
22. Romano-Ward Long QT Syndromes Types 1, 2, and 3, Brugada Syndrome 192500
613688
603830
601144
20301308 Child/adult
23. Familial hypercholesterolemia 143890
603776
No GeneReviews entry Child
24. Malignant hyperthermia susceptibility 145600 20301325 Child/adult

The narrow format of this blog requires me to split the wide table from the ACMG report. Here is the rest of it, showing the 57 genes that should be examined for pathogenic variants. The last column of the table shows variants for each gene found in my genome. These variants include only those expected to have high or moderate impact on gene function, including nonsynonomous amino acid substitutions, nonsense mutations (stop gains), and indels in the coding regions. In those cases where I have a variant, I give the allele frequency from the estimates provided at the PGP site as of May 2013.

Phenotype OMIM Disorder PMID GeneReview Entry Gene w/ OMIM link PGP89 Variants (Freq.)
1 604370
612555
20301425 BRCA1 BRCA1-S1634G (0.292)
BRCA1-K1183R (0.302)
BRCA1-E1038G (0.265)
BRCA1-P871L (0.555)
BRCA2 none
2 151623 20301488 TP53 TP53-P72R (0.550)
3 175200 20301443 STK11 none
4 120435 20301390 MLH1 none
MSH2 MSH2-G322D (0.016)
MSH6 MSH6-G39E (?)
PMS2 PMS2-K541E (0.904)
5 175100 20301519 APC APC-V1822D (0.887)
6 608456
132600
23035301 MUTYH MUTYH-V8M (0.028)
7 193300 20301636 VHL none
8 131100 20301710 MEN1 MEN1-T546A (0.791)
9 171400
162300
20301434 RET none
10 1552401 20301434 RET none
NTRK1 none
11 153480 20301661 PTEN none
12 180200 20301625 RB1 none
13 168000 (PGL1) 20301715 SDHD none
601650 (PGL2) SDHAF2 none
605373 (PGL3) SDHC none
115310 (PGL4) SDHB none
14 191100
613254
20301399 TSC1 TSC1-S1043Del (?)
TSC1-M322T (0.150)
TSC2 none
15 194070 20301471 WT1 none
16 101100 20301380 NF2 none
17 130050 20301667 COL3A1 COL3A1-A698T (0.181)
COL3A1-H1353Q (0.990)
18 154700
609192
608967
610168
610380
613795
611788
20301510
20301312
20301299
FBN1 FBN1-C472Y (1.000)
TGFBR1 none
TGFBR2 none
SMAD3 none
ACTA2 none
MYLK none
MYH11 MYH11-A1241T (0.223)
19 115197
192600
601494
613690
115196
608751
612098
600858
301500
608758
115200
20301725 MYBPC3 none
MYH7 none
TNNT2 TNNT2-K260R (0.088)
TNNI3 none
TPM1 none
MYL3 none
ACTC1 none
PRKAG2 none
GLA none
MYL2 none
LMNA none
20 604772 RYR2 RYR2-G1885E (0.024)
21 609040
604400
610476
607450
610193
20301310 PKP2 none
DSP none
DSC2 none
TMEM43 TMEM43-K168N (0.335)
TMEM43-M179T (0.487)
DSG2 none
22 192500
613688
603830
601144
20301308 KCNQ1 none
KCNH2 KCNH2-R1047L (?)
KCNH2-K897T (0.098)
SCN5A SCN5A-S524Y (0.008)
23 143890
603776
No GeneReviews entry LDLR LDLR-A391T (0.102)
APOB APOB-S4338N (0.725)
APOB-I2313V (0.964)
APOB-Y1422C (0.994)
PCSK9 PCSK9-V474I (0.859)
24 145600 20301325 RYR1 none
CACNA1S CACNA1S-L458H (?)

In the exome analysis provided by 23andMe described in my last post, common variants are assumed to be benign. There aren’t that many variants here, so I have set the cutoff at an allele frequency of 5% rather than at 1% as in my exome analysis. This gives a modest set of eight variants to be evaluated, shown in the table below. This includes variants whose frequency is not reported on the PGP summary site. As you will see below, one of these is a common variant that can be eliminated.

Phenotype OMIM Disorder PMID GeneReview Entry Gene w/ OMIM link PGP89 Variants(Freq.)
4 120435 20301390 MSH2 MSH2-G322D (0.016)
MSH6 MSH6-G39E (?)
6 608456
132600
23035301 MUTYH MUTYH-V8M (0.028)
14 191100
613254
20301399 TSC1 TSC1-S1043Del (?)
20 604772 RYR2 RYR2-G1885E (0.024)
22 192500
613688
603830
601144
20301308 KCNH2 KCNH2-R1047L (?)
SCN5A SCN5A-S524Y (0.008)
24 145600 20301325 CACNA1S CACNA1S-L458H (?)

4. Lynch Syndrome. The MSH2-G322D allele is evaluated at the PGP site as likely benign. They cite clinical studies that show that it appears in the controls as well as in cancer patients. I wouldn’t report this to myself as an incidental finding.

4. Lynch Syndrome. The MSH6-G39E allele is associated with an increased risk of colon cancer in men in one of the studies cited at the PGP site (2). Among cancer risk alleles, this is a relatively modest one. I would report this one to myself as an incidental finding, suggesting that I follow standard recommendations for monitoring for colon cancer.

6. MYH-Associated Polyposis; Adenomas, multiple colorectal, FAP type 2; Colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas. The MUTYH-V8M allele is a rare variant that does not appear to be associated with colorectal adenoma polyposis; over 70% of cancers of this type are associated with the two alleles MUTYH-Y165C and MUTYH-G382D (3). I would not report this to myself as an incidental finding.

14. Tuberous Sclerosis Complex. The TSC1-S1043Del allele is a rare, poorly characterized indel. Because Tuberous Sclerosis Complex results from dominant mutations and the disease manifests in childhood, this variant is of no significance and I would not report it to myself as an incidental finding.

20. Catecholaminergic polymorphic ventricular tachycardia. There is evidence that the RYR2-G1885E allele is pathogenic in compound heterozygotes that also carry the RYR2-G1886S allele. My other allele of RYR2 is normal, so I am not at risk for this recessive condition. I would not report this to myself as an incidental finding. I had an EKG as part of a medical evaluation several years ago and my cardiac function is normal.

22. Romano-Ward Long QT Syndromes Types 1, 2, and 3, Brugada Syndrome. The rare KCNH2-R1047L variant is not known to be associated with Long QT Syndrome, which shows an autosomal dominant form of inheritance, typically resulting in cardiac events in the teens and early 20s. As my heart function is normal, not only is this not reportable as an incidental finding, my health is evidence that KCNH2-R1047L is a benign allele, at least given the rest of my genetic background.

22. Romano-Ward Long QT Syndromes Types 1, 2, and 3, Brugada Syndrome. The rare SCN5A-S524Y variant is not known to be associated with Long QT Syndrome, which shows an autosomal dominant form of inheritance, typically resulting in cardiac events in the teens and early 20s. As my heart function is normal, not only is this not reportable as an incidental finding, my health is evidence that SCN5A-S524Y is a benign allele, at least given the rest of my genetic background.

24. Malignant hyperthermia susceptibility. The CACNA1S-L458H variant is common (allele frequency over 20%), and known to be benign.

Summary and Recommendations. I carry what may be a risk allele for colon cancer (MSH6-G39E). I should follow normal recommendations for the detection of colon cancer, as well as standard recommendations for prevention (fruit, vegetables, and fiber in the diet, easy on the grilled meats). I have two rare variants in genes for which other variants cause dominant cardiac abnormalities. My cardiac health is evidence that KCNH2-R1047L and SCN5A-S524Y are benign polymorphisms.

References

1. RC Green et al. (2013) ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing. ACMG Annual Clinical Genetics Meeting 3/22/2013

2. Curtin K, Samowitz WS, Wolff RK, Caan BJ, Ulrich CM, Potter JD, Slattery ML. (2009) MSH6 G39E polymorphism and CpG island methylator phenotype in colon cancer. Mol Carcinog. 48: 989-94.

3. Forsbring M, Vik ES, Dalhus B, Karlsen TH, Bergquist A, Schrumpf E, Bjørås M, Boberg KM, Alseth I. (2009) Catalytically impaired hMYH and NEIL1 mutant proteins identified in patients with primary sclerosing cholangitis and cholangiocarcinoma. Carcinogenesis. 30: 1147-54.

How I Learned to Stop Worrying and Love My Exome

In the spring of 2012, 23andMe offered customers the opportunity to participate in a pilot program to have their exome sequenced. The exome is the minor fraction – around 2% – of the human genome that encodes proteins. Exome sequencing is a transitional technology that has an excellent chance of finding variants responsible for inherited conditions at a significantly lower cost than whole-genome sequencing. As the cost of whole-genome sequencing falls, exome sequencing will probably disappear, but in the spring of 2012, it looked like an interesting offer.

In exome sequencing, the protein-coding portions of the human genome are selected through hybridization to an array of synthetic oligonucleotides representing all known exons, then sequenced using next-gen sequencing. 23andMe offered customers raw data without any interpretation. We were cautioned not to expect any user-friendly guides like the ones provided for SNP surveys. The price for this limited-time offer was $1,000.

My prior experience with having my genome analyzed had been outstanding. I found out about a medically actionable condition (predisposition to hemochromatosis), learned about my Neandertal ancestry, and gained a wealth of raw data to check as I read the biomedical literature. My experience helped me to add personal interest to my teaching and public outreach activity. I wasn’t sure that I had the skills to analyze the raw data from my exome sequence, but I thought that once I had my exome sequence, I would be highly motivated to learn how to analyze it. I had invested far more than $1,000 in other aspects of my education, and those investments had always paid off. I signed up.

The sample kit arrived quickly, but it seemed like forever before the results were ready (it was actually only four months). I followed the instructions for the elaborate download process designed to protect my genomic privacy. Despite having been promised no analysis whatsoever, 23andMe provided a limited analysis of the results in an accompanying PDF file. You can download my exome analysis here.

The first figure from the results is shown below.

reads_to_variants

Part A of the figure shows how many bases were called as a result of the exome sequencing. Some of the sequence data fails a quality filter, is duplicate data, or is off target, but after all that, there were nearly 3 billion bases of on-target exome sequence. This gives some idea of the extent to which coverage of my exome is overlapping, because there are 3 billion base pairs in the (haploid) human genome. Taking into account that I carry two genomes, one from each parent, there is about 25x coverage of the 2% of my genome that encodes proteins.

Part B of the figure shows that the vast majority of my exome matches the reference sequence. Almost all of the over 120 million base calls are the same as the reference human genome assembly. The tiny sliver of red on top of the yellow bar represents my variants.

Part C of the figure breaks down my variants into two classes, SNPs and indels. The Single Nucleotide Polymorphisms (SNPs) are like those from my original analysis from 23andMe, but includes all variants discovered in my exome, including any variants not previously described in the analysis of human genomes. The standard analysis of 600,000 SNPs using 23andMe’s SNP chip only detects variants built into the chip; these were all described prior to the design of the chip. My exome sequence includes all variants, including previously unknown or even “private” variants confined to me alone. Indels (insertions or deletions) are sites of variation where parts of my genome differ from the reference assembly by the insertion or deletion of one or more bases.

There are many variants in my exome. Leaving out the small fraction that doesn’t pass quality checks, I have about 100,000 SNPs that differ from the reference sequence and almost 10,000 indels. The 100,000 SNPs seen in my exome are only a small fraction of the 6,000,000 SNPs that differ between any two people, because there are more SNPs per unit of DNA in the noncoding parts of the genome than in the exome. Variation in coding sequences can potentially result in changes to protein sequence, so much of the variation that arises by mutation in coding sequences is removed over time from the population by selection. Most variation in noncoding sequences is neutral (neither selected for nor selected against).

The exome analysis provided by 23andMe characterizes the variants in my exome by their impact on gene function, as shown in the graph below.

characterized_variants

In this graph, high impact variants are frameshift mutations, splice site variants, loss or gain of stop codons, and loss of start codons. In a frameshift mutation, there is an insertion or deletion of a number of bases that is not a multiple of three. Because bases are read three at a time during translation, insertion or deletion of one or two bases will cause translation downstream of the variant to take place in a different reading frame, resulting in a radically altered protein sequence that is likely to have a premature stop codon. Splice site variants may interfere with the correct splicing of RNA (removal of introns), which could cause catastrophic changes to the sequence of the encoded protein. Stop codon gain (a nonsense mutation) will cause a truncated protein product missing all amino acids downstream of the position of the new stop codon, while stop codon loss will cause a protein to have an extension of new amino acids following what is normally the end of the protein. Loss of the start codon might eliminate the protein entirely, or result in a protein with a gain or loss of amino acids at the beginning of the protein, resulting from using another start codon.

Variants of moderate impact include nonsynonymous substitutions and codon insertions and deletions. A nonsynonymous substitution will change a single amino acid in a protein sequence to a different amino acid. Codon insertion or deletion will add or remove one or more amino acids without altering the sequence of the protein downstream of the variant.

Variants of low impact include synonymous substitutions, synonymous stops, and start gains. The genetic code is degenerate, which means that there are multiple codons that encode the same amino acid. Most amino acids are encoded by multiple synonymous codons. Substitution of a synonymous codon would not be expected to have any impact on gene or protein function. Similarly, there are three stop codons. Substitution of one stop codon for another would not be expected to make any difference. In a start gain, there is a new start codon upstream of the original one, but the original start codon can still be used to encode a normal protein.

Variants of unknown impact are those outside of the coding region. Some noncoding sequence (introns, 5′ UTR, 3′ UTR, and adjacent sequence that is not transcribed) is sequenced in exome sequencing. It is difficult to predict whether any of these variants would affect gene function.

Most of my variants (80%) have unknown impact and are likely neutral. Variants of low impact are not expected to influence gene function and can be ignored. Only variants of moderate or high impact are likely to have an effect on gene function, so we should examine these more closely.

Another way of analyzing variation is to determine how rare the variants are among the many human genomes that have been sequenced. The 1000 Genomes Project is an NIH-sponsored program that has cataloged variants in several thousand human genomes, allowing us to distinguish between relatively common (presumably benign) variants and rare variants that have either arisen recently or been subject to negative selection because they are deleterious.

The graph below shows the frequency of my variants as seen among the people sampled for the 1000 Genomes Project.

rare_variants

Most of my variants (95,660) are common, with allele frequencies of 5% or higher. About 3% of my variants have allele frequencies from 1% – 5%, while 1.4% of my variants have allele frequencies below 1%. About 8.5% of my variants are unknown, meaning that while they are already present in public databases, their allele frequency has not been calculated. About 4% of my variants are novel, meaning that they are not found in public databases.

Evaluating both the likely impact of each variant and the allele frequency of each variant allows the analysts to filter all of my exome variation to find variants of high or moderate impact that are novel, unknown, or rare, as shown in the decision tree below.

variant_filter

Of my variants predicted to have high or moderate impact, 249 are rare (allele frequency less than 1%), 1107 have unknown allele frequencies, and 669 are novel. That is a lot of variation to consider, so the last filter is to check the variants against a list of 592 gene associated with inherited disorders. The last check against the gene list produces 15 uncommon variants of high or moderate impact in the gene list associated with inherited disorders. There are 1761 uncommon variants of high or moderate impact not associated with genes on the list of inherited disorders.

My exome report summarizes the 15 variants in a series of tables. I summarize the 15 variants in the table below.

All 15 variants in the table below are nonsynonymous coding variants, which means that they change the amino acid sequence of the protein encoded by the gene.

As you would expect from the rarity of the variants, I am not homozygous for any of them; in each case, I am heterozygous for the variant in question. For two of the genes (NEB and TTN), I am heterozygous for two variants, but we can’t tell the phase from the sequencing technique used. This means that I might have one copy of NEB that lacks both rare variants with a second copy that carries both, or alternatively, each of my two copies of NEB could carry a different rare variant. The same is true for TTN.

Not all amino acid substitutions are equivalent. Some amino acid substitutions are conservative, meaning that they replace an amino acid with a chemically similar amino acid that is unlikely to alter the function of the protein. Other amino acid substitutions are nonconservative, substituting an amino acid with one that is dissimilar. For the technically minded reader, I should say that I used the BLOSUM62 substitution matrix (1) to determine whether a change is conservative or nonconservative. Not all conservative substitutions are harmless, and not all nonconservative substitutions affect protein function. Each amino acid position in a particular protein has its own properties.

Finally, it is possible to assess the likely outcome if each variant resulted in a complete loss of protein function, because we have the information from the OMIM entry. This will tell us whether heterozygotes (carriers) of a defective allele have an altered phenotype. This is perhaps the most important assessment here. I carry variants of moderate impact in three genes that have a semidominant impact on disease (MLH1, MSH2, and PTCH1). It turns out that there is no direct evidence that any of the particular variants that I carry are pathogenic, as I detail below.

Uncommon variants of high or moderate impact in disease genes for
Paul Szauter (23andMe Exome Pilot)
Symbol Name OMIM Link1 dbSNP ID2 AA change
(conservative?)3
1K Genomes
Frequency4
Effect
on phenotype5
BCKDHA branched chain keto acid dehydrogenase E1,
alpha polypeptide
608348 rs34442879 T122M
(nonconservative)
0.00560 recessive
CHH23 cadherin-related 23 605516 rs41281338 E2588Q
(conservative)
0.00670 recessive
EVC Ellis van Creveld syndrome 604831 rs41269549 D184N
(conservative)
9e-04 recessive
GLE1 GLE1 RNA export mediator homolog (yeast) 603371 rs138310419 E334K
(conservative)
0.00460 recessive
HSPG2 heparan sulfate proteoglycan 2 142461 rs114851469 R2977W
(nonconservative)
0.00340 recessive
ITGB4 integrin, beta 4 147557 rs145976111 R977C
(nonconservative)
0.00140 recessive
MLH1 mutL homolog 1, colon cancer,
nonpolyposis type 2 (E. coli)
120436 rs35831931 V134M
(conservative)
4e-04 semidominant
MSH2 mutS homolog 2, colon cancer,
nonpolyposis type 1 (E. coli)
609309 rs4987188 G108D
(nonconservative)
0.00910 semidominant
NEB nebulin 161650 rs149881695 V181I
(conservative)
0.00100 recessive
NEB nebulin 161650 N/A I287V
(conservative)
0.00480 recessive
PLG plasminogen 173350 rs4252129 R523W
(nonconservative)
0.00320 recessive
PTCH1 patched 1 601309 rs113663584 G1012S
(nonconservative)
9e-04 semidominant
SP110 SP110 nuclear body protein 604457 rs149485401 G185R
(nonconservative)
0.00690 recessive
TTN titin 188840 rs33917087 V2777F
(nonconservative)
0.00830 recessive
TTN titin 188840 rs55980498 P13977S
(nonconservative)
0.00240 recessive
1Link to the gene page in OMIM. Links at the top of that page direct you to entries on inherited diseases.

2Link to the SNP entry at dbSNP. Note that there is no entry for NEB-I287V, a previously unknown variant.

3Amino acid substitutions are classified as conservative or nonconservative using the BLOSUM62 substitution matrix (1).

4Frequency of the variant from the 1000 Genomes Project, cited by 23andMe as of 08/26/2011.

5From the OMIM gene and disease entries. This is the mode of inheritance of known pathogenic alleles, and does not imply that any of the variants shown are pathogenic.

Before going through the variants, given that I am an optimist, I have to make the glass-half-full argument first. My results showed that I did not carry variants of high or moderate impact in 579 of the 592 genes, so I got a 97.8% on my exome exam as graded by 23andMe. That’s good news. I would like to be able to get the gene list from 23andMe eventually.

The phenotypes produced by pathogenic variants in the 13 genes for which I carry variants of moderate impact are terrible, which is why they are on the list of genes responsible for inherited disease. Please bear in mind as you read these grim summaries that my health is excellent, and I am not affected by any of these conditions.

BCKDHA: This gene encodes an enzyme necessary for amino acid metabolism, specifically, the degradation of products of isoleucine, leucine, and valine catabolism. Individuals homozygous for defects in this gene have maple syrup urine disease (MSUD), which despite its funny name is a devastating childhood illness that results in physical and mental retardation if untreated. Because the disease has a devastating impact and can be treated through dietary modification, it is on the list of diseases for which newborns are screened through the analysis of blood chemistry. The disease phenotype is completely recessive, so even if this allele results in a complete loss of enzyme function, it should have no impact on the health of carriers like me.

CHH23: Mutations in this gene are associated with autosomal recessive deafness, specifically with Usher syndrome. This inherited condition is completely recessive. My variant allele is a conservative amino acid substitution not likely to lead to a loss of gene function.

EVC: Mutations in this gene are associated with recessive skeletal dysplasia with short limbs, polydactyly, and dental abnormalities. Mutations are completely recessive. My variant allele is a conservative amino acid substitution not likely to lead to a loss of gene function.

GLE1: Mutations in this gene are associated with lethal congenital contracture syndrome, with most cases associated with death around the time of birth. Mutations are completely recessive. My variant allele is a conservative amino acid substitution not likely to lead to a loss of gene function.

HSPG2: Mutations in this gene are associated with dyssegmental dysplasia, a recessive lethal form of neonatal dwarfism. The three specific variants known to cause the disease are an 89 bp duplication that causes a frameshift and two different stop codon gain mutations. Other alleles are associated with Schwartz-Jampel syndrome, a milder dwarfism syndrome that is completely recessive. One of the variants associated with Schwartz-Jampel syndrome is an amino acid substitution, C1532Y. It is not clear whether individuals homozygous for the allele that I carry, R2977W, would be affected.

ITGB4: Mutations in this gene are associated with epidermolysis bullosa, a recessive skin blistering disorder. One individual homozygous for G931D had a history of blistering and hair loss beginning in childhood when examined at age 68. One patient with a lethal form of the disease was homozygous for C61Y. Nonlethal cases include homozygotes or compound heterozygotes for R252C, C562R, and R1281W. No cases involving the allele that I carry, R977C, have been observed, so it is not clear whether individuals homozygous for this allele would be affected.

MLH1: This mutation was an immediate source of alarm to me when I obtained my results. The MLH1 gene encodes a DNA repair enzyme. Mutations in this gene are associated with dominant hereditary predisposition to colon cancer. Heterozygotes (carriers) of some alleles are at increased risk of colon cancer, while rare homozygotes or compound heterozygotes for two loss of function alleles develop colon cancer or other tumors early in life. There is no direct evidence that the V134M allele that I carry (a conservative substitution) is associated with cancer predisposition, although apart from the amino acid substitution that it causes, it is classified as a mutation in an exonic splicing enhancer that may cause the mutant exon to be skipped (2).

MSH2: Like MLH1, this mutation was an immediate source of concern, because it encodes a DNA repair enzyme. Mutations in MSH2 are associated with dominant hereditary predisposition to colon cancer. Heterozygotes (carriers) of some alleles are at increased risk of colon cancer, while rare homozygotes or compound heterozygotes for two loss of function alleles develop colon cancer or other tumors early in life. There is no direct evidence that the G108D allele that I carry (a nonconservative substitution) is associated with cancer predisposition, although apart from the amino acid substitution that it causes, it is classified as a mutation in an exonic splicing enhancer that may cause the mutant exon to be skipped (2). Taking the results for MLH1 and MSH2 together, I might be more concerned had I obtained these results in my 20s. I am 58. I had a colonoscopy at 55 as part of routine medical care, and learned that everything is fine. In this case, my medical history suggests that the variant MLH1 and MSH2 alleles that I carry are not harmful.

NEB: Mutations in NEB are associated with nemaline myopathy, a recessive disorder characterized by hypotonia (low muscle tone), generally evident at birth. My two rare alleles (V181I and I287V) are both conservative substitutions, while alleles associated with the disease are typically frameshifts, nonsense mutations, or the loss of splice sites. My normal phenotype with respect to muscle tone shows that I am not, nor will I be, affected by these variants.

PLG: Mutations that inactivate plasminogen are associated with ligneous (“wood-like”) conjunctivitis of the eye, and similar lesions of other mucous membranes. The nonconservative substitution that I carry has not been associated with recessive plasminogen deficiency. In any case, carriers of known pathogenic alleles are not affected.

PTCH1: Mutations in PTCH1 are associated with holoprosencephaly, a recessive developmental abnormality of the forebrain that causes mental retardation and craniofacial abnormalities. Mutations in PTCH1 are also associated with susceptibility to cutaneous basal cell carcinoma (skin cancer caused by sun exposure). The G1012S variant (a nonconservative substitution) that I carry is not known to be pathogenic.

SP110: Mutations in SP110 are associated with recessive hepatic venoocclusive disease with immunodeficiency and recessive susceptibility to tuberculosis. The G185R allele that I carry is not known to be pathogenic.

TTN: The TTN gene encodes titin, a giant muscle protein that is an essential component of striated muscle. Defects in TTN are associated with recessive cardiomyopathy, limb-girdle muscular dystrophy, and other muscle defects. The two nonconservative substitutions that I carry (V2777F and P13977S) are not known to be pathogenic. Because of the size of this protein and because it is a structural protein rather than an enzyme, a huge number of variants are known, most not associated with any phenotype.

Getting my exome sequenced was an excellent learning experience for me. First, I got a fairly clean bill of genomic health (I have learned to stop worrying and love my variants that are not known to be pathogenic). Second, I now have a more personal sense of how vast the genome is, how little we really know, and how many genetic variants are still novel in the average genome. Finally, I donated my exome results to the Personal Genome Project, which will be the subject of my next entry in the not-too-distant future.

I might have wished for a bit more information from 23andMe, but they actually delivered more than they promised. I’d like to have their list of 592 disease genes, and I’d like to have the list of genes that correspond to my 1761 rare variants of high or moderate impact that don’t match their 592 disease genes. Finally, I look forward to the day when 23andMe opens up exome sequencing to its customers again, and refines its analysis pipeline based on their experience with the pilot.

References

1. Henikoff, S, and JG Henikoff (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89: 10915-10919.

2. Doss, CGP and R Sethumadhavan (2009) Investigation on the role of nsSNPs in HNPCC genes – a bioinformatics approach. J Biomed Sci. 16: 42.

I’m From the Future

I recently had my first visit to a doctor since getting my results from 23andMe. I have only been in New Mexico for six months, and hadn’t yet selected a primary care physician from the list provided by my employer, the University of New Mexico. The health insurance plan has a website with all of the available primary care physicians. There are a lot of them. As a geneticist, I am familiar with the concept of a screen, used to identify interesting genetic variants in an experimental organism. Here was my screen: I needed an internist rather than someone specializing in family medicine or general practice. Although I knew that genetics, let alone genomics, is given very little coverage in medical school, I still wanted someone who had graduated relatively recently from a good medical school with a strong record of externally funded research.

While going through lists of physicians, I had in mind much of the reading that I have done in the area of personal genomics over the last several months, and some of the presentations from the GET 2012 conference in April. Several participants in the Personal Genome Project have brought in very detailed genetic information to their doctors, and recounted their experience. I think that it is fair to say that there is a consensus that most physicians are not prepared to deal well with patients with genomic data. They are not specifically trained to deal with data of this kind. There has also been a fairly high-level reaction from physicians against personal genomics. One of the warning shots came in 2008, when An Unwelcome Side Effect of Direct-to-Consumer Personal Genome Testing by Amy McGuire and Wylie Burke appeared in Journal of the American Medical Association (1).

McGuire and Burke remind us that physicians are accustomed to talking to patients about health information from the Internet and other media. My father, who was a physician, talked about the “Reader’s Digest” effect in medical practice. In the 1960s and 1970s, every month there was a cluster of patients asking to be checked for some specific malady, typically one that had just been covered by a popular article in Reader’s Digest. Of course, this was before the proliferation of cable TV channels and the limitless supply of medical advice, some of it terrible, to be found on the Internet. McGuire and Burke point out that physicians have limited time with patients and are poorly compensated for preventative care. I understand that and sympathize, but if we are ever going to make any progress on containing the costs of health care, patients and physicians are going to have to become partners in preventative care. If the system is not set up to facilitate that, we will have to change the system.

McGuire and Burke write “The clinical value, if any, of most direct-to-consumer personal genome tests remains unproven.” They take particular aim at the long list of risk estimates for specific diseases that is provided by direct-to-consumer genetic testing firms like 23andMe. Here is what the top of my “Elevated Disease Risk” page looks like at 23andMe:

I am only showing the results that place me at increased risk. There is a list of conditions for which I have a reduced risk, and a list for which I have an typical risk (shown later). When you look at my elevated risk list, you would have to conclude that I am a really lucky guy with a great genome. Clicking the links takes you to more detailed information. I discover that my top risk, for Venous Thromboembolism, results from my blood type (I’m AB+). Venous Thromboembolism has a heritability of 55%, which means that 45% of your risk is due to non-genetic factors. There is some advice on prevention. Don’t smoke (always good advice; I have never smoked). Keep your weight in check (good advice for many reasons). Get up and move, because sitting still for long periods of time places you at risk for Deep Vein Thrombosis (“economy class syndrome”). I wish I could get my employer to spring for First Class when I fly, but they won’t.

The other elevated risks are marginally elevated risks for rare conditions. I can see the point that McGuire and Burke are making here. There is no way that I am bringing this list in to a physician. There is nothing here that is medically actionable. I already know that the best general preventative health information that you can currently give to anyone would fit on an index card: Don’t Smoke, Maintain an Ideal Weight, Eat a Balanced Diet, Get Regular Exercise, Reduce Stress.

McGuire and Burke raise the problem of the burden on the health care system of increased testing due to requests by patients. Tests may produce ambiguous, incidental, or false-positive results. This might cause another round of more expensive, invasive, and dangerous tests in the pursuit of nothing. I understand this argument. It is therefore my responsibility as a patient (albeit an overeducated one) to avoid this downward cycle by asking only about genomic findings that are clinically relevant and medically actionable. I am in an excellent position to do that.

The current consensus is that the average person, at this time, will get one medically significant result from having their genome analyzed. Some people get zero (that’s actually good news), some people get two. I got one: increased risk of hemochromatosis. There is a wonderful discussion area on 23andMe where people write about their findings and report new information from the medical literature. I quickly found some highly informative discussion threads on hemochromatosis. I found many people with my genotype (HFE – H63D/C282Y). They described their experiences with their physicians, and gave advice on which blood tests to get done: serum ferritin, serum iron, total iron binding capacity, transferrin saturation. They told me to go see a doctor, and asked me to let them know how things turned out.

My screening of the physician list turned up some promising candidates. I decided to start with Dr. Patricia Morrow, who graduated from medical school in 1986 from the University of Texas at San Antonio. She was accepting new patients, but those appointments were on days that conflicted with my teaching schedule, so we had to set a date after classes were over, a month from when I first called. But that was fine with me, no emergency.

In the month leading up to my appointment, I had a look at my other results (Decreased Risk and Typical Risk) from 23andMe, shown in part below:

Wow, forget my elevated risk for Ulcerative Colitis at 1.1%. Look at my risk for Obesity (63.4%) or Type 2 Diabetes (18.7%). I click through on the Diabetes link and find out that Type 2 Diabetes has a heritability of only 26%. That’s actually good news; 74% of the risk is up to me. This is where it gets familiar. Maintain an Ideal Weight. Eat a Balanced Diet. What about my 50.2% risk of Coronary Heart Disease? This one is a little foggy, with the heritability estimated at 39-56%. What can I do to avoid this? Maintain an Ideal Weight. Eat a Balanced Diet. Don’t Smoke. Exercise Regularly.

So I decided to make my visit to the doctor as productive as possible. I printed out the summary page on Hemochromatosis. I had already modified my diet. I cut down my consumption of red meat, with its high availability of heme iron, to perhaps once or twice a week, a pretty big change for me. I knew the blood tests that I wanted, but because my last physical was in 2009, I decided that I needed the usual reports on triglycerides, HDL, and LDL. The high incidence of Type 2 Diabetes bothered me, so I decided to get a fasting glucose level as well.

Finally, the big day came. I skipped breakfast and lunch to be ready for a fasting glucose level test at 2:00 pm. Because my wife and I are down to one car since we moved to New Mexico, and it was a nice day, I walked the three miles to the doctor’s office.

Dr. Morrow’s office is in a wonderful old building not far from Old Town in Albuquerque. I checked in, filled out the insurance and medical history forms, and presented the summary page on hemochromatosis for my file. A nurse checked by blood pressure (118/80, not bad at all) and asked a few questions before Dr. Morrow came in.

She knew that I was a new patient and that I was there to get a primary care physician to allow me to navigate the health care system here if need be. She also knew that I had come to be evaluated for hemochromatosis. She opened my folder, took out the report from 23andMe, and asked, “What is this?”

“It’s a report from a personal genomics company. I had my genome analyzed, and I found that I’m at risk for hemochromatosis. I’d like to get some specific blood tests,” I said. “I’m from the future,” I continued, “In ten years, most of your patients will be coming in here with reports like this.”

Dr. Morrow took it really well, and we had a great conversation. At one point I told her that she should get CME credits for talking to me, but she knew which activities got her points and which didn’t. She got my medical history, which is unremarkable, and we talked about a few specifics. My family history is uncommonly free of major heritable illnesses. There is no family history of hemochromatosis. I have moderate ocular hypertension, monitored by my eye doctor but never progressing to the point at which it required medication (my mother and my brother have been treated for this). I have Seasonal Affective Disorder, managed by diet and light therapy in our parrot room. Managed is an inadequate word for this, of course. Mitigated would be better. Medical language attempts to be polite, which is why we call it Seasonal Affective Disorder, rather than Having Your Mind Dragged Off to Hell by Demons During the Winter.

“So,” she asked, “What are you going to die of, then?”

“Not boredom,” I immediately replied. We got back onto hemochromatosis. Dr. Morrow pointed out that I didn’t have any symptoms, and asked me what my chances were of developing them. I told her that most people with my genotype never develop any symptoms, but I was likely to have high levels of iron. She asked whether we knew what caused iron to deposit in tissues, and of course I said no. Studies in mice show that there are several genes, some of whose identity is unknown, that affect the level of serum iron in mice lacking a functional HFE gene (2, 3, 4). In humans, there is a wide range in iron levels in people with the same HFE genotype, and there is genetic variation in genes that may modify the development of hemochromatosis. In both humans and mice, it seems likely that there are unknown genes that affect the level of iron accumulation in tissues. I said that I thought that the medical literature at the present time was mostly written backwards: most of the studies are of people with symptoms of hemochromatosis, who are then genotyped. As far as I know, we don’t have any prospective studies that follow a group of people with a genotype like mine to see how many develop symptoms.

We got down to figuring out which blood tests to order. At this point, Dr. Morrow was ticking boxes off on a form: ferritin, serum iron, total iron binding capacity, transferrin saturation. Glucose and lipids. That should do it.

“I need a code,” she said.

All I could think of was the genetic code. Sixty-four triplets of A, T, C, or G. Good for encoding twenty amino acids with some degeneracy, plus start and stop signals. I decided not to say anything about the genetic code, as we had already had a fairly extensive conversation about genetics. So I just asked what sort of code she needed.

“A reason that we are ordering these tests,” she said. “I know. Fatigue.”

“Yes,” I said, “I’m very tired, and I hurt all over.” Actually, I was fine. The walk to her office had lifted my spirits.

She mentioned that one of the consequences of genome analysis might be unnecessary testing. I told her that I was confident that the testing was necessary, and that in the long run we were going to save the health care system a lot of money by engaging in preventative care. If my iron levels were dangerously high, I would be bled on a regular basis to get back to normal levels and prevent the progression of the disease.

We were still talking when the nurse came back in to remind her that she had a patient waiting. We had talked for over half an hour.

About a week later, my results were in. My serum ferritin was 351, at the high end of the normal range of 30 – 400 for males. I have corresponded with people who have my HFE genotype with ferritin levels of 1000; they were bled regularly until they reached 500. I knew right away that I did not have to start a program of leeching. My other iron-related numbers were a bit above normal, but not dangerously so: serum iron, 184 (normal 65-176); total iron-binding capacity 297 (normal 240-450); transferrin saturation 62% (normal 20-50%); hematocrit 51.4 (usually up to 45). Dr. Morris told me that we were not going to have to start a course of phlebotomy. If she was surprised that a genetic test had predicted my blood chemistry in the absence of a family history, she didn’t show it on the phone.

Everything else was fine, as it always has been. My fasting glucose was 94, so I am not pre-diabetic. My triglycerides were 68, HDL was 74, LDL was 121. The last is a bit high, but I haven’t been working out in a gym since I relocated. I am going to have to get back to that.

My experience with Dr. Morris was terrific. She was really interested in learning about how personal genomics was going to impact the practice of medicine. I think that she is glad to have a patient who is going to be an educated partner in preventative care.

So, it’s working out well for me, even though we need to change the system. There is no box to check to tell the insurance company that I need a test as part of a program of preventative care based on the results of having my genome analyzed. We are a long way from the system of personalized medicine that I have been hearing about in seminars for the last ten years.

As I was writing this, I went back into the literature to see if we know anything about genes that interact with HFE. Hemochromatosis is a disease that has incomplete penetrance, which is a fancy way of saying that not everyone who is HFE – H63D/C282Y like me develops symptoms. Do we know anything yet about genetic variation that modifies the risk of developing hemochromatosis given a particular genotype at HFE?

There are some promising studies in mice. The TMPRSS6 gene encodes a transmembrane serine protease that, in both humans and mice, has the opposite effect of the HFE gene on serum levels of hepcidin, a peptide hormone that inhibits dietary iron uptake. HFE-/HFE- mice resemble HFE – C282Y/C282Y humans in that they have high serum levels of iron and accumulate iron in the liver. When HFE-/HFE- mice are heterozygous for a loss-of-function mutation in TMPRSS6, they have reduced iron overload. HFE-/HFE- mice that are homozygous for a loss-of-function mutation in TMPRSS6 actually have an iron deficiency (5). This also suggests a therapy: “Furthermore, these results suggest that natural genetic variation in the human ortholog TMPRSS6 might modify the clinical penetrance of HFE-associated hereditary hemochromatosis, raising the possibility that pharmacologic inhibition of TMPRSS6 could attenuate iron loading in this disorder.”

It is known that there is genetic variation in human populations for TMPRSS6 (6). People who are homozygous for a loss-of-function allele of TMPRSS6 have an inherited disease called Iron-Refractory Iron Deficient Anemia (IRIDA). In this condition, patients have anemia with no evidence of reduced dietary iron. They fail to respond to oral iron therapy. They respond somewhat to intramuscular iron injection (not a pleasant prospect). This condition is the opposite of hemochromatosis. People with IRIDA may have adequate levels of iron in their diet, but they don’t take it up, and their blood levels of iron are far below normal.

The studies in mice show us that these two genes with opposite effects interact. Let us simplify the situation by considering only normal (HFE+ and TMPRSS6+) and loss-of-function alleles (HFE- and TMPRSS6-). Here is a summary table to show the interaction.

HFE TMPRSS6 Phenotype
+/+ +/+ Normal
+/+ -/- IRIDA (anemia)
-/- +/+ Hemochromatosis
-/- +/- Hemochromatosis, but more normal
+/- -/- IRIDA (anemia), but more normal?

 

It is also possible to imagine that heterozygosity for the common alleles of HFE (HFE+/HFE-) might somewhat protect a person homozygous for a loss-of-function allele (TMPRSS6-/TMPRSS6-) from the symptoms of IRIDA.

So, is there any evidence that I am heterozygous for a protective allele of TMPRSS6? While my 23andMe results show my genotype with respect to SNPs in TMPRSS6, none of these SNPs is associated with loss-of-function alleles. The variant alleles of the HFE gene that I carry, HFE – H63D and HFE – C282Y, are identified in the 23andMe test by specific oligonucleotides that detect these variants, which are common in human populations. The discovery of variation in the TMPRSS6 gene is relatively recent (6), and the 23andMe test does not include results that will predict my genotype at TMPRSS6.

In February, I became part of a pilot program at 23andMe to have my exome sequenced. Most of the human genome does not encode proteins. The small portion of the human genome that does encode proteins is called the “exome.” The exome is only about 2% of the genome. In exome sequencing, genomic DNA is melted to single strands and hybridized to a vast array of oligonucleotides designed with knowledge of the sequence of the human genome. Strands of DNA from the genome being analyzed hybridize to the collection of oligonucleotides that selects for coding sequences, and are then sequenced. The rest of the genome is not. Exome sequencing is the poor man’s way of having the most highly informative parts of the genome sequenced while we wait for the cost of whole-genome sequencing to drop further.

I am still waiting for my exome sequence from 23andMe. They promised a four-month wait, which would give me my results in the middle of June, not long from now. I may very well be heterozygous for a loss-of-function allele of TMPRSS6, or have some other kind of genetic variation that protects me from hemochromatosis despite my HFE – H63D/C282Y genotype.

There is independent evidence to suggest that I carry a protective allele of TMPRSS6 or some other modifier gene. My mother has been borderline anemic all of her life. She tells me that she was given oral iron therapy as a child, and even intramuscular iron injections to treat her anemia. Her current physician has been campaigning to get her to add more red meat and other iron sources to her diet. This does not appear to be having any effect. It may be that my mother has IRIDA, modified by her genotype at HFE. Although she has not had her genome analyzed, she must be at least heterozygous for one of the two HFE alleles that I carry, HFE – H63D or HFE – C282Y, because I had to inherit one of my two alleles from her.

We’ll see when I get my exome results. I will post the findings when I get them.


References

1. Amy L. McGuire and Wylie Burke (2008). An Unwelcome Side Effect of Direct-to-Consumer Personal Genome Testing. Raiding the Medical Commons. JAMA. 300(22):2669-2671.

2. Joanne E. Levy, Lynne K. Montross and Nancy C. Andrews (2000). Genes that modify the hemochromatosis phenotype in mice. J Clin Invest. 105(9):1209–1216.

3. Mounia Bensaid, Séverine Fruchon, Christine Mazères, Seiamak Bahram, Marie-paule Roth, Hélène Coppin (2003). Multigenic control of hepatic iron loading in a murine model of hemochromatosis. Gastroenterology 126: 1400-1408.

4. Gaël Nicolas, Nancy C. Andrews, Axel Kahn, and Sophie Vaulont (2004). Hepcidin, a candidate modifier of the hemochromatosis phenotype in mice. Blood 103: 2841-2843.

5. Karin E. Finberg, Rebecca L. Whittlesey, and Nancy C. Andrews (2011). Tmprss6 is a genetic modifier of the Hfe-hemochromatosis phenotype in mice. Blood 117: 4590-4599.

6. Karin E Finberg, Matthew M Heeney, Dean R Campagna, Yeim Aydnok, Howard A Pearson, Kip R Hartman, Mary M Mayo, Stewart M Samuel, John J Strouse, Kyriacos Markianos, Nancy C Andrews & Mark D Fleming (2008). Mutations in TMPRSS6 cause iron-refractory iron deficiency anemia (IRIDA). Nature Genetics 40:569 – 571.

My Deep Ancestry

I can only recite a small bit of my ancestry. My father was born in 1909 in Hungary. His father died in 1914, forty years before I was born. His mother, who remarried, had additional children, his half-sibs. One of these, my uncle, was killed in World War II. I met my father’s mother a few times. She only spoke Hungarian and I only spoke English, so our contact was just some smiles, hugs, and laughs, no family stories.

My mother is Dutch, born in Holland. Her mother also came to the United States and lived in the city where I grew up, Youngstown, Ohio. My mother still lives in the house where I grew up, and at 89 is still going strong. She was an only child, but her mother had a brother, my great uncle, who I met on a visit to Holland in 1975. My great aunt and uncle were wonderful hosts, but we never got much into family history. As a child in a small nuclear family (father, mother, sister, brother, and grandmother), I never had much personal experience with extended families. I can’t rattle off my extended family tree, and the language of kinship – second cousins once removed, and so on – eludes me like the rules of some complex organized sport that I don’t follow.

I knew that my results from 23andMe were going to provide me with some insight into my deep ancestry. I knew that I was going to get two pieces of ancestry information right away: my mitochondrial haplotype and my Y chromosome haplotype. Each was going to give me a look at part of my ancestry going back thousands of years.

Mitochondria are subcellular organelles, found in the cells of all eukaryotic organisms (organisms with a nucleus). Plants, animals, protozoans, and fungi all have mitochondria. Mitochondria are the powerhouses of the cell, carrying out biochemical reactions that generate ATP, a chemical that provides energy for thousands of other reactions. Mitochondria have their own DNA, which encodes some of the proteins found in mitochondria. Mitochondria resemble bacteria in some ways, and are thought to be derived from endosymbionts – bacteria that a proto-eukaryote invited into its cells for mutual benefit billions of years ago.

Mitochondria and their DNA have the interesting property of being inherited exclusively from our mothers. Egg cells are much bigger than sperm cells and are packed with mitochondria. Sperm cells drop off a set of chromosomes during fertilization, but no mitochondria.

This means that my mitochondrial genome came from my mother, who got it from her mother, and so on back to my maternal great-great-great-grandmother and beyond. I have two parents, four grandparents, eight great-grandparents, and so on back through the generations to a very large number of ancestors. My nuclear genome has bits and pieces of my recent ancestors, but as you go back many generations, some of my ancestors are no longer directly represented there. Not so my mitochondrial DNA, which is an unbroken maternal line back through thousands of generations.

This would not be very informative about my ancestry, except that occasionally, mutations occur in mitochondrial DNA. Mutations in the coding sequences used to make mitochondrial proteins are usually very bad news and are eliminated through selection. There are some noncoding regions in the mitochondrial genome; mutations in these regions have no effect and are selectively neutral. Every time such a mutation occurs, it marks a new maternal lineage, branching off from the old lineage and continuing until another branch arises by mutation.

The rate at which these mutations occur is known, so we can say approximately when each new mitochondrial lineage arose. My results from 23andMe show my mitochondrial genome to be a type called H1. They show a map showing the frequency of mitochondrial lineage H1 in various populations around the world, shown below.

Clicking the history tab on this page at 23andMe tells me that haplogroup H1 originated about 13,000 years ago, not long after the end of the Ice Age. The people of Europe had been driven by ice sheets into southern France, Italy, and the Iberian Peninsula. The H1 haplotype likely arose in a woman living on the Iberian Peninsula. As the Ice Age ended, some of the descendants of this woman journeyed north all the way to Scandinavia, while others crossed into northern Africa. The blue on the map shows that H1 reaches a frequency of around 40% in Norway, far from its origins in Iberia, probably due to a founder effect. If a relatively small number of people founded the population of distant Norway, by chance the H1 haplotype is overrepresented there compared to Spain. My H1 haplotype is not a big surprise given my Dutch maternal ancestry, as H1 is common in Holland.

Men get extra information about their ancestry from 23andMe because of their Y chromosomes. The Y chromosome is one of the two sex chromosomes. Men are XY, women are XX. Normal human eggs have a single X chromosome as they await fertilization, while normal human sperm will either have an X chromosome or a Y chromosome. If the sperm fertilizing the egg carries an X chromosome, the zygote is XX and will be a girl, while if the sperm fertilizing the egg carries a Y chromosome, the zygote is XY and will be a boy.

This means that only men transmit the Y chromosome, and only to their sons. My Y chromosome traces back through my father, my father’s father, and so on back thousands of generations. Just like mitochondrial DNA, the DNA of the Y chromosome is subject to variation. Each time a new variant arises, it marks the beginning of a new paternal lineage. The rate at which these variants occur is known, so we can trace the origin of my Y chromosome haplotype to a specific time and place, just like for mitochondrial DNA. The 23andMe display for my Y chromosome is shown below.

My Y chromosome haplotype is E1b1b1a2*, a subgroup of E1b1b1a2 (also called E-V13) that arose in a population that moved from eastern Africa into northeastern Africa about 14,000 years ago, during the final days of the Ice Age. 23andMe reports that it is common among men in southern Europe, especially Greeks, Bulgarians, and Albanians. About 10% of Hungarian men carry this Y chromosome haplotype.

The Hungarian language is an odd one, related linguistically to Finnish, which reflects the migration of the Finno-Ugric people from the area near the Finnish-Russian border to what is currently Hungary in the 9th and 10th centuries. A search of the web for origins of the Hungarian people reveals a colorful history of repeated clashes with the neighboring kingdoms, especially during the 10th century. The reports of the geographic distribution of the E1b1b1a2 Y chromosome suggest that it did not come from the Finno-Ugric people. My Y chromosome likely came into Hungary from the outside.

In my search for the origins of the E1b1b1a2 Y chromosome, I found an authoritative account on Dienekes’ Anthropology Blog. Dienekes Pontikos presents the following conclusion:

The age and distribution of E-V13 chromosomes suggest that expansions of the Greek world in the Bronze and later ages were the major causes of its diffusion. Who was the E-V13 patriarch in Greece? He was perhaps one of the legendary figures of Greek mythology some of whom are said to have come from abroad. For whatever reason, his progeny grew, and were around to participate in the expansion of the Mycenaean world and the subsequent Greek colonization.

I have heard from a number of people who were forced to reevaluate their identities after getting results from 23andMe. My Y chromosome haplotype was the first result that made me question my view of my own identity. It is no longer as simple as being of “European” descent; now my paternal lineage traces from a movement of people in African to heroes of the Bronze Age in Greece, described much later in Homer’s epic poem The Iliad.

My mitochondrial and Y chromosome results pin down exactly two of my thousands of ancestors. What about all of the others? For this, we turn to my autosomal DNA, also analyzed by 23andMe. This is most of the 3 billion base pairs that make up my genome, and there is much to learn. Half is from my mother, and half from my father. Going back to my grandparents, about one quarter of my genome should be from each of them, on average, but here is where it gets messy.

The chromosome sets that end up in sperm or egg cells are the product of an elaborate cell division process called meiosis. During meiosis, homologous chromosomes replicate, pair, and then segregate from each other. The segregation process doesn’t work correctly unless the chromosomes are held together prior to segregation. Part of what holds chromosome pairs together is the process of meiotic recombination, in which chromosomes that are partly of maternal origin and partly of paternal origin are created. Because recombination takes place after chromosomes have replicated, chromosomes segregating into the sperm or egg might be entirely maternal, entirely paternal, or composite chromosomes made up of both maternal and paternal segments.

Each chromosome pair also segregates independently of all of the other pairs. This means that I only carry an average of 25% of each of my grandparent’s genomes. Tracing back through the generations, each of my great-grandparents is represented by an average of 12.5% of my genome, my great-great-grandparents by an average of 6.25% of my genome, and so on. The casino-like mechanism of sexual reproduction means that segments from some of my more remote ancestors are entirely absent from my genome, with the exception of my mitochondrial genome and my Y chromosome.

Can I learn anything about my ancestry by looking at my autosomal genome? It is best to start by asking what we might learn by surveying autosomal genetic variation across a large sample of people over the entire geographic range of our species. This has been done in increasing detail in recent years, and reveals a clear story of human history that supports independent evidence from anthropology and archaeology.

Imagine a population of individuals living in a particular area for many generations. There will be a certain level of genetic variation among these people. If a small group of people leaves this population to start a new population somewhere else, they will by chance leave some of their genetic variation behind. If a small group from the new population moves on, they will by chance leave some of their newly reduced genetic variation behind, further reducing their population’s genetic variation.

If the spread of people to new areas is rapid relative to the rate at which new genetic variation arises, which it is, we should be able to trace humanity to its original geographic location. This is easily done, and it is clear that human beings originated in Africa. Around 100,00 years ago, a small group of humans moved out of Africa to the Middle East, and spread from there throughout Europe, Asia, and eventually North and South America.

When humans first arrived in the Middle East and Europe, these adventurous people encountered an existing population of Neandertals. Neandertals are a distinct species that diverged from the human lineage about 600,000 years ago. Homo neandertalis is well known from the fossil record. In contrast to some of the stereotypes about “caveman,” we know that Neandertals used tools and weapons, cared for members of their population who were injured or disabled (often for decades), and buried their dead ceremonially with flowers and other objects.

It has long been of interest what happened when humans first encountered Neandertals. There are two broad hypotheses: displacement and admixture. Under the displacement hypothesis, humans outcompeted, drove off, or killed the Neandertals, driving them to extinction about 30,000 years ago. Under the admixture hypothesis, humans took a liking to their neighbors and interbred with them, preserving a bit of the Neandertal lineage among humans after Neandertals disappeared.

While it is easy to analyze fresh DNA that has been collected properly from people, analyzing DNA from fossils is not an easy task. While DNA is fairly stable, over thousands of years it breaks down into small fragments, and some of the bases undergo chemical changes. Nevertheless, some determined researchers have pushed this technique to the very limits. In 1997, Svante Pääbo and colleagues at the Max Planck Institute produced the first DNA sequences of Neandertal mitochondrial DNA (1). Mitochondrial DNA is easier than nuclear DNA because there are many copies of the mitochondrial genome per cell.

The answer was clear: there is no trace of the mitochondrial DNA of Neandertals among modern humans. It appeared that our species entirely displaced the Neandertals.

Techniques for sequencing DNA, including ancient DNA, have advanced rapidly. In 2010, Svante Pääbo and colleagues announced the results of sequencing genomic DNA from Neandertals (2). DNA recovered from the bones of three individuals was sequenced, producing data that reveals the sequence of most of the Neandertal genome. The sequence is, of course, very similar to the sequence of human DNA.

At each position of known SNP variation in humans, Svante Pääbo and colleagues asked whether the Neandertal sequence more closely resembles the sequence of human populations in Africa (where Neandertals never lived), Europe, or Asia. Upon making close to 100,000 such comparisons, Pääbo and colleagues made a stunning finding: the genome of Neandertals was more closely related to Europeans and Asians than it was to Africans. The average non-African appears to contain genomic sequences from Neandertals making up about 2.5% of their genome. Some people have more, some people have less. In contrast to the results of the work on Neandertal mitochondria, these results support the admixture hypothesis. The first people out of Africa encountered Neandertals in the Middle East and mated with them. Some segments of the Neandertal genome were advantageous, and have been maintained by positive selection 30,000 years after Neandertals became extinct.

I remember when these results first hit the science news. I talked to everyone that I could about it, including people not trained in science. It is a beautiful piece of work. For many scientists, there is nothing quite as much fun as finding out that something that is widely known by everyone is just plain, flat out wrong. Imagine that, people walking around today with caveman DNA. I took delight in it in an abstract kind of way.

After I got the email that my 23andMe results were ready, I moved to the ancestry portion of the site. Would I like to find out if I carried any Neandertal DNA? Sure, I thought, and clicked the link, revealing the display shown below.

I am 2.9% Neandertal, in the 92nd percentile among 23andMe users. When I first saw this, I stared at the screen for a while. I was a little bit shocked. I looked up a comparison of humans and Neandertals, showing two complete skeletons side by side. The Neandertal is described as “robust.” They were shorter, stockier, and barrel-chested. I started to identify with the wrong skeleton. I imagined how I look in a crowd of people. Shorter, stockier. Big shoulders. I ran my finger over my eyebrows and forehead to reassure myself. No brow ridges, high forehead. Human.

This was the first result from 23andMe that I discussed with other people. I found out that I’m mixed race, I’d say. They would usually pause, aware that for some this is a delicate subject. I’m 97.1% human race and 2.9% Neandertal, I’d say. Sometimes we would move on to the jokes to break the tension. I went for a long walk yesterday, I’d say. Boy, are my knuckles sore!

Some of my friends on Facebook consoled me. Your Neandertal ancestors were big-brained, gentle people, they reminded me. We know from the genomic sequence of Neandertals that they had a working copy of FOXP2, a gene required for language that is nonfunctional in our closest living relative, the chimpanzee. I began to look at many of the depictions of Neandertals in popular culture as insensitive. My Neandertal ancestors were not brutish, stupid ape-men, I thought. After a couple of weeks, I began to embrace my ancestry, boasting to others that I was probably more of a Neandertal than they were. One of my female colleagues who had heard of my ancestry told me that she tested as 3% Neandertal, and I felt disappointed, ordinary.

My colleague, Dr. Maggie Werner-Washburne, a consummate scientist, reacted well to the news.

“2.9%?” she said. She held up a hand to count on her fingers. “Let’s see: 50%, 25%, 12.5%, 6.25%,” then, touching her pinky, “3%. It hasn’t been that long for you, has it?”

The fast estimate that one of my great-great-great grandparents was a Neandertal would be a reasonable guess except that we know that Neandertals died out 30,000 years ago. This means that the parts of the Neandertal genome must have been under positive selection. Neandertals had been living in Europe for a long time when modern humans arrived, and were well adapted to the conditions there. Some of the Neandertal genome retained by the descendants of hybrids like myself are associated with the immune system, and were better for conditions in Europe than the African alleles my early human ancestors carried.

I recently registered for the Personal Genome Project, for which I volunteered to have my entire genome sequenced and made public. I was invited to the Genomes, Environments and Traits conference (GET2012) as part of the educational aspect of the project. I looked up the conference schedule and saw the keynote speaker: Svante Pääbo. I was hooked.

The GET 2012 conference was held on April 25. There were far too many wonderful things to write about in this post, so I will stick to Svante Pääbo’s talk. He held us spellbound with his unique combination of a low-key manner, wonderful data, and a few gentle jokes. He recounted the work on mitochondria, and how he made a public statement that we would never have the nuclear genome of Neandertals. Never make statements like that, he advised us.

He presented the arguments about displacement vs. admixture, showing that while the results from mitochondrial DNA favored total displacement, the results from the analysis of nuclear DNA clearly supported limited admixture. He presented the results from the analysis of a single bone fragment from a cave in Siberia that revealed another type of archaic human, now called a Denisovan, that is distinct from Neandertals and humans (3). Denisovan DNA makes up as much as 6% of the genome of some Melanesians. There is emerging genomic evidence that they may have been admixture in Africa with another type of archaic human for which there is no fossil record. The story is changing rapidly.

Pääbo closes with experiments with laboratory mice that have been genetically altered to make their FOXP2 gene match that of humans. It is a change of only three amino acids out of 714. The mice are run through a battery of tests to see if there is anything different about them. Amazingly, their vocalizations are altered. The audience is stunned. It is not exactly that they talk, but “medium spiny neurons have increased dendrite lengths and increased synaptic plasticity.”

Then, it’s time for questions. There are some good scientific questions until finally, someone asks the question that provides the perfect closer. The questioner points out that Neandertals were in western Europe for 100,000 years, but didn’t spread much. Humans moved out of Africa and relatively quickly spread everywhere: Europe, including the British Isles, Asia, and Australia. Why didn’t Neandertals spread?

Svante Pääbo reflected for a moment, then pointed out that most of the places where Neandertals never lived required them to do something that they didn’t like to do. They didn’t like to cross water if they couldn’t see land on the other side. He pointed out that many humans in boats must have died on the open ocean before the first humans reached Australia. So one of the differences between humans and Neandertals is that humans are crazy. They set out on ocean voyages without a clear idea of where they will end up.

I looked around at the auditorium, filled with participants in the Personal Genome Project who have made their genomes public, not knowing exactly what will happen. The rest of the crowd is a collection of forward-looking scientists, engineers, and venture capitalists. It occurs to me that I am looking at a group of people who all exhibit the most unique human characteristic: the willingness to set out on a voyage whose final destination cannot be clearly seen. Crazy. Human.


Hard Times for My Ancestors Have Marked My Genome

In this post, I present one small part of the tale of how the tribulations of my ancestors have left their mark upon my genome. I’m not sure who is reading this blog. I don’t know the average level of understanding of genetics among my readers. I have recently participated in the online discussions at 23andMe, where it is clear that many subscribers are struggling with the basics. This post will therefore contain a bit more background information than the average genomics blog. Some of the more technical information and citations are included in the footnotes.

Inherited genetic variation makes us different from each other. We all have the same genes, but every gene comes in different “flavors,” called alleles. Different alleles of a gene differ in their DNA sequence. Under some circumstances, alleles that eliminate the function of a gene can confer a selective advantage, meaning that people carrying that allele are likely to have more offspring. Changes in the frequency of a particular allele in a population (the “allele frequency”) over time can alert researchers to interesting problems in biology and medicine.

Most of the single nucleotide polymorphisms (SNPs) that are typed by 23andMe are “neutral.” They are sequence variants in parts of the genome that do not encode proteins. It doesn’t matter which base is present at that position, so natural selection does not change the frequency of a particular variant of this type directly. A small fraction of the SNPs typed by 23andMe are diagnostic for a variant allele of a gene that changes the sequence of the protein encoded by that gene. These variants have been discovered through research on people affected by an inherited disorder. A probe specific for the disease allele has been incorporated into the tests done by 23andMe. In my last post, I discussed three such variant alleles that I carry.

The variant allele that I carry for the PEX1 gene, PEX1-G843D, has an allele frequency of 0.001-0.002 (1). This means that if we look at all of the PEX1 alleles in a population, 0.1-0.2% of the alleles are the PEX1-G843D variant. The PEX1-G843D allele is a bad thing. While carriers like myself are unaffected, homozygotes (PEX1-G843D/PEX1-G843D) usually die before they reach one year of age. Why doesn’t natural selection “get rid of” this nasty allele?

The frequency of PEX1-G843D is at most 0.002. Assume for simplicity that there are no other variant alleles of PEX1, so the frequency of the wild-type (normal) allele is 0.998. If people mate at random without regard to their PEX1 genotype, we can calculate the frequency of the three possible genotypes as PEX1/PEX1 = 99.6%, PEX1/PEX1-G843D = 0.4%, and PEX1-G843D/PEX1-G843D = 0.0004% or 4/1,000,000 (2). PEX1-G843D is subject to negative selection, but this no longer changes the allele frequency of PEX1-G843D very much. There are about 1000 times as many heterozygotes (PEX1/PEX1-G843D) as there are homozygotes (PEX1-G843D/PEX1-G843D), so the allele frequency can’t be driven much lower by selection alone.

What about hemochromatosis? I am a “compound heterozygote” for two different variant alleles of HFE: HFE-H63D and HFE-C282Y. HFE-H63D has an allele frequency of 0.108 (10.8%) in a large diverse population sample from the Exome Sequencing Project and an allele frequency of 0.179 (17.9%) in a sample confined to Europeans (3). It is hardly surprising that I carry one HFE-H63D allele given my European ancestry. HFE-C282Y has an allele frequency of 0.047 in the sample from the Exome Sequencing Project and an allele frequency of 0.042 in a sample confined to Europeans (4). The protein encoded by the HFE-H63D allele has greatly reduced function, while the protein encoded by the HFE-C282Y allele is almost completely nonfunctional.

We know from studies of patients with hemochromatosis that the vast majority are HFE-C282Y/HFE-C282Y. A small fraction of hemochromatosis patients are HFE-C282Y/HFE-H63D like me. The rest have other variant alleles of HFE, or variant alleles of one of four other genes that predispose to hemochromatosis (5). The high frequency of variant HFE alelles raises an interesting question. Why might being a carrier for an inherited disorder be a good thing?

There are plenty of examples of disease genes that confer an advantage on carriers. Among people of European descent, the second most common inherited disorder (after hemochromatosis) is cystic fibrosis, resulting from defects in the CFTR gene. It is the most frequent inherited disorder leading to childhood deaths. The cumulative allele frequency for all variant alleles (there are many) is around 0.03 – 0.05 in people of European descent (6). Taking the low number (0.03) gives us the following genotype frequencies: CFTR/CFTR = 94%, CFTR/CFTR-variant = 5.8%, and CFTR-variant/CFTR-variant = 0.09% or 9/10,000 (7).

Almost 1/1000 children born to parents of European descent are afflicted with cystic fibrosis, in contrast to 4/1,000,000 for PEX1 variants (Zellweger Syndrome). About one person out of twenty of European descent is a carrier of a variant allele of CFTR, while only one person in 250 is a carrier of PEX1-G843D. People who are heterozygous for a variant allele of CFTR are healthy, but have additional genetic advantages: they are resistant to cholera and typhoid fever. Although these diseases are present outside of Europe, carriers of cystic fibrosis have salty sweat, and the loss of salt in hot climates may outweigh the advantages of disease resistance.

There are other examples of disease resistance conferred to carriers of genetic disorders. Three well-known inherited diseases confer resistance to malaria: sickle-cell anemia (HBB), thallasemia (HBB), and Favism or G6PD deficiency (G6PD). In a sample of 114 chromosomes from sub-Saharan Africa, the frequency of the sickle-cell allele of HBB was 11.4%, but it is not found in European populations. In some African populations, the allele frequency of a common variant of G6PD conferring malaria resistance is 20%. Other loss-of-function alleles of G6PD are common in Mediterranean or South Asian populations (8, 9).

Why might loss of function of the HFE gene, which leads to hemochromatosis, have a selective advantage? There are two interesting ideas about this. The first idea is that reduced function of HFE was a useful adaptation to the neolithic diet (10). When our ancestors switched from being hunter-gatherers to the practice of agriculture, the amount of red meat (a great source of iron) in people’s diets fell. The switch to a grain-based diet meant that careful biological regulation of the amount of iron taken in from the diet was no longer optimal. People with a defect in the signaling mechanism controlling iron uptake (of which the HFE gene product is a part) would experience iron overload, but might have an advantage during times of iron starvation, because it takes longer to deplete their body’s supply of iron.

While this allowed the allele frequency for variant alleles of HFE to rise, it might not fully account for the high frequency of HFE-H63D among people of European descent. There is a really great story here, first proposed by Sharon Moalem in a scientific paper (11) and popularized in his book Survival of the Sickest (12). Dr. Moalem has proposed that variant alleles of HFE rose to their current high frequencies because they confer resistance to bubonic plague, also known as the Black Death.

The Black Death is a bacterial infection caused by Yersinia pestis. People are infected with the bacterium when they are bitten by infected fleas, which are carried throughout the population by rats. When the plague bacterium enters the bloodstream, it is attacked by white blood cells called macrophages, which travel to the lymph nodes to carry out the destruction of the invaders. For most bacterial infections, this is usually a good strategy. However, the plague bacterium is often able to survive as a passenger in the macrophage, permitting the bacterium to attack the lymph nodes, causing one of the ghastly symptoms of bubonic plague: lymph nodes that swell to the size of an egg, sometimes bursting through the skin.

The plague bacterium needs nutrients to thrive, and one nutrient in particular is usually in short supply: iron. While people with reduced HFE function generally experience iron overload, this does not affect all cells in the body equally. Macrophages from people carrying variant alleles of HFE are deficient in iron. The iron-poor environment in the macrophage among HFE-deficient people allows the macrophage to gain the upper hand against the bacterium. The Black Death of the 14th century killed 30-50% of the population in a number of European countries. If variant HFE alleles conferred an advantage against this devastation, the allele frequency could have risen sharply among the survivors. While the Black Death is no longer part of the landscape in Europe, there is not much selection against variant HFE alleles. People with hemochromatosis do not usually display symptoms until they are past the age where most people have already had children. Because hemochromatosis does not interfere with reproduction, there is no selection against it.

In my case, it is possible that there was selection for incresed body stores of iron (and hence reduced HFE function) in my immediate ancestry. Both of my parents survived starvation during World War II. My mother, who is Dutch, lived in Holland through the entire war, and endured the Hongerwinter (“hunger winter”) of 1944. My father, who was Hungarian, spent the last 18 months of the war in a Soviet POW camp under harsh conditions. Both of my parents survived conditions in which there was widespread death from malnutrition, or from causes in which malnutrition was a factor.

I don’t wish to make light of my parent’s experience, or of the experience of my more remote ancestors who lived through the Black Death. Yet we have the genetic hand that we were dealt, and it is up to each of us to play it well. I intend to be evaluated for iron overload by a physician, and have already changed my diet, sharply reducing my intake of red meat. Humor is also an important aspect of health. In that spirit, I made this slide for one of my recent talks.

In my next post, I will go more deeply into my ancestry, as revealed by genetic testing.


Footnotes

1. The allele frequency of the PEX1-G843D allele is taken from the OMIM entry for PEX1

2. The frequency of the three PEX1 genotypes is calculated given an allele frequency of 0.002 for PEX1-G843D and 0.998 for the normal PEX1 allele as follows:
PEX1/PEX1 = 0.998 * 0.998 = 0.996 or 99.6%
PEX1/PEX1-G843D = 2 * 0.998 * 0.002 = 0.003992 = 0.4%
PEX1-G843D/PEX1-G843D = 0.002 * 0.002 = 0.000004 or 4/1,000,000

3. The allele frequency for HFE-H63D is taken from the dbSNP entry for rs1799945.

4. The allele frequency for HFE-C282Y is taken from the dbSNP entry for rs1800562.

5. Please see the OMIM entry for HFE and hemochromatosis  and for four other genes causing hemochromatosis (HJV, HAMP, TFR2, and SLC40A1).

6. Please see the OMIM entry for CFTR.

7. The frequency of the three CFTR genotypes is calculated given an allele frequency of 0.03 for all CFTR-variant alleles and 0.97 for the normal CFTR allele as follows:
CFTR/CFTR = 0.97 * 0.97 = 0.9409
CFTR/CFTR-variant = 2 * 0.97 * 0.03 = 0.0582
CFTR-variant/CFTR-variant = 0.03 * 0.03 = 0.0009 or 9/10,000

8. Please see the OMIM entry for HBB for more information about sickle-cell anemia and thalassemia. 

9. Please see the OMIM entry for G6PD for more information about favism.

10. Christopher Naugler (2008) Hemochromatosis: A Neolithic adaptation to cereal grain diets. Medical Hypotheses 70: 691-692.

11. S. Moalem, M.E. Percy, T.P.A. Kruck, and R.R. Gelbart (2002) Epidemic pathogenic selection: an explanation for hereditary hemochromatosis? Medical Hypotheses 59: 325-329.

12. Sharon Moalem with Jonathan Price (2008) Survival of the Sickest, Harper Perennial.

Getting My Genome Done

I am a geneticist with a varied career that has included research and teaching at a variety of academic institutions. In 2000, I shut down my research lab and took a job in bioinformatics, just as the human and mouse genome sequences were being completed. In November of 2011, I moved to a new position at the University of New Mexico, funded in part by the National Human Genome Research Institute. My new position involves teaching and public outreach. Recent progress in human genomics has been spectacular, and it is a great story to tell. I thought that knowledge about my own genome would motivate my learning about human genetics and would also personalize my presentations, so I decided to “get my genome done.”

There are several ways of getting a look at your own genome. Of the direct-to-consumer companies, I liked 23andMe. At the 2011 SACNAS National Conference in October, I heard a talk by Dr. Joanna Mountain, Senior Director of Research at 23andMe. Dr. Mountain talked us through 23andMe’s website as seen by a user. The 23andMe website offers information on inherited health conditions and ancestry based on a person’s genome. I liked their user interface. They present results in language accessible to people without an extensive background in science. Users are only a few clicks away from full technical data, including complete raw data that can be uploaded to third-party sites for further analysis.

I signed up for 23andMe using their website, and soon received a kit in the mail for sample collection. They recover DNA from saliva using a very clever method. Following the illustrated instructions, I spit into a plastic tube equipped with a funnel. When my saliva reached the fill line, I flipped a cap into place that dumped a solution into the saliva sample. I capped the tube and inverted it a few times. As I did this, I saw the familiar sight of DNA coming out of solution in an ethanol precipitation. I have isolated plenty of DNA in my days as a researcher, but this was the first time that it was my own. I packed the tube in the postpaid return mailer, dropped it off at the Post Office, and waited.

After a few weeks, I got an email from 23andMe that my results were ready. I had purchased their only offering at the time, a survey of my genotype using the Illumina OmniExpress Plus Genotyping BeadChip. This technology allows genotyping of a human DNA sample at about one million genomic sites. The sites that are genotyped are Single Nucleotide Polymorphisms (SNPs) that have been identified as sites of variation in survey sequencing of human populations. The 23andMe chip includes some SNPs that are the sites of mutation in well-studied genetic disorders. For example, the chip tests for 31 different sequence variants of the CFTR gene associated with Cystic Fibrosis.

Although I am healthy and free from any known genetic disease, I looked at my Carrier Status. The screenshot below shows part of the 23andMe report.

There are 44 genetic disorders listed on this page. The disorders are listed in alphabetical order. If you have a variant allele for any of them, it sorts to the top of the page. I had two: Zellweger Syndrome Spectrum, and variants associated with hemochromatosis, a disorder in which excess iron is taken in from the diet.

The Zellweger Syndrome Spectrum gene tested for is PEX1, a gene required for the normal formation of peroxisomes. Peroxisomes are membrane-bound vesicles inside of cells that are required for the catabolism of fatty acids and other compounds. Fortunately for me, I am a carrier, which means that I am heterozygous. I have one working copy of PEX1 and one bad copy. There are no health consequences for carriers. People homozygous for the allele of PEX1 that I carry generally die before they are one year old. This is why this gene is listed on the Carrier Status page; no one homozygous for the mutant PEX1 allele G843D has a computer, a credit card, and a 23andMe account.

My Hemochromatosis report is more complex. I have two different variant alleles of the HFE gene, one of which slightly predisposes to hemochromatosis, while the other causes a considerable increase in risk of the disease. Here is a screenshot of the page that appears when you click on the Hemochromatosis link.

There is a link to a technical report. The technical report is very detailed. Part of it is shown below.

The good news is that my risk for developing hemochromatosis is quite low. Nevertheless, I decided to modify my diet and to ask for some specialized tests the next time I visit a doctor. I will discuss this in greater detail in another post.

There are also discussion forums at 23andMe. I participated in these for a couple of weeks before launching this blog. Not everyone has training in genetics, even among 23andMe subscribers, so I will take this opportunity to explain the language used in the technical report.

Most genes encode proteins. A protein (polypeptide) is a chain of amino acids; there are twenty primary amino acids that make up the set that can be encoded by the 64 three-base codons of the genetic code. There are single letter codes for each of the twenty primary amino acids. The HFE gene encodes a protein 348 amino acids long. The H63D allele changes the 63rd amino acid from histidine (H) to aspartic acid (D). The C282Y allele changes the 282nd amino acid from cysteine (C) to tyrosine (Y).

The C282Y allele results in a significant loss of function of the HFE protein. The cysteine residue at that position is highly conserved, meaning that when you look at the HFE gene in other organisms, there is usually a cysteine at that position. This is a highly significant risk allele. From the OMIM entry:

“In patients with hemochromatosis, Feder et al. (1996) identified an 845G-A transition in the HFE gene (which they referred to as HLA-H or ‘cDNA 24′), resulting in a cys282-to-tyr (C282Y) substitution. This missense mutation occurs in a highly conserved residue involved in the intramolecular disulfide bridging of MHC class I proteins, and could therefore disrupt the structure and function of this protein. Using an allele-specific oligonucleotide-ligation assay on their group of 178 patients, they detected the C282Y mutation in 85% of all HFE chromosomes. In contrast, only 10 of the 310 control chromosomes (3.2%) carried the mutation, a carrier frequency of 10/155 = 6.4%. One hundred forty-eight of 178 HH patients were homozygous for this mutation, 9 were heterozygous, and 21 carried only the normal allele. These numbers were extremely discrepant from Hardy-Weinberg equilibrium. The findings corroborated heterogeneity among the hemochromatosis patients, with 83% of cases related to C282Y homozygosity.”

In other words, looking at this from the perspective of a physician, most people who receive a clinical diagnosis of hemochromatosis are homozygous for the C282Y allele of HFE.

In contrast, also from the OMIM entry, the H63D allele of HFE confers a minor risk of hemochromatosis. Here is one part of the OMIM entry:

“Jouanolle et al. (1996) commented on the significance of the C282Y mutation on the basis of a group of 65 unrelated affected individuals who had been under study in France for more than 10 years and identified by stringent criteria. Homozygosity for the C282Y mutation was found in 59 of 65 patients (90.8%); 3 of the patients were compound heterozygotes for the C282Y mutation and the H63D mutation (613609.0002); 1 was homozygous for the H63D mutation; and 2 were heterozygous for H63D. These results corresponded to an allelic frequency of 93.1% for the C282Y and 5.4% for the H63D mutations, respectively. Of note, the C282Y mutation was never observed in the family-based controls, whereas it was present in 5.8% of the general Breton population. This corresponds to a theoretical frequency of about 1 per 1,000 for the disease, which is slightly lower than generally estimated. In contrast, the H63D allelic frequency was nearly the same in both control groups (15% and 16.5% in the family-based and general population controls, respectively). While the experience of Jouanolle et al. (1996) appeared to indicate a close relationship of C282Y to hemochromatosis, the implication of the H63D variant was not clear.”

So, while the H63D allele of HFE appears to alter the function of HFE, it is almost as frequent among patients lacking a diagnosis of hemochromatosis as among those who are diagnosed with hemochromatosis. People have two alleles, so “having” the H63D allele in this case usually means also having a normal allele. I should also point out that there are other genes, different from HFE, that predispose to hemochromoatosis.

The Zellweger Syndrome Spectrum (PEX1) allele that I carry occurs at a frequency of around 0.2%. For hemochromatosis, among the 4,552 chromosomes sampled from the publicly-funded Exome Sequencing Project, the HFE-H63D allele occurs at a frequency of about 10.8%, while the HFE-C282Y allele occurs at a frequency of about 0.2%. Why is there such a wide range in the frequency of disease-causing alleles? I will cover that in my next post.