Cancer genomes

Two reports in the journal Nature represented the determination of the entire DNA sequence in a type of lung cancer tumor cell and in a melanoma tumor cell, respectively.  The two reports originated from the Wellcome Trust Sanger Institute in the United Kingdom and from their institutional collaborators in other parts of the world. 

 

It has been known for many years that tumor cells have mutations, when compared to the DNA of a patient’s normal cells.  However, by determining the sequence of the entirety of the DNA, referred to as the genome, researchers are able to identify mutations without bias involved in the previous approaches to identifying cancer mutations.  For example, cancer is a disease of abnormal cell proliferation, leading scientists to focus on genes that encode proteins that regulate cell proliferation.  And indeed, many of these types of genes are mutated in cancer, thus explaining the abnormal cell proliferation, at least in part.  However, cancer has other characteristics besides abnormal cell proliferation, for example, metastasis and immune system evasion.  Thus, by sequencing the entire genome of a cancer cell, still other characteristics of cancer may be discovered where previous approaches failed.  This type of approached is referred to as “discovery-based”, as opposed to a hypothesis based approach where a specific idea, hopefully well-grounded in science, guides the collection of data.

 

The technology for obtaining the entire sequence of the A, T, G, and C bases that make up the DNA sequence has evolved over the last decade, in particular, allowing for the sequences of the bases to be obtained more rapidly with advancing technology.  The authors of the above report used a technology marketed by Illumina. 

 

In addition to the improved technology, the knowledge of a basic human genome sequence also provides efficiency in assembling a new version, via statistical algorithms that have been developed to match a new sequence with its likely position in the previously obtained, reference sequence. 

 

In determining the sequences of two different types of cancer cells in the two different reports, the researchers learned several things.  First, the number of mutations in the cancer genome, compared to the normal genome, was far higher than expected.  The melanoma had over 32,000 mutations and the lung cancer cell had over 22,000 mutations.  In both cases, about 99% of these mutations were in sections of the genome that are considered to be inert and presumably none, or at best only a few of these mutations could conceivably play a role in tumor development.  The presumed-inert mutations are in intergenic regions or in spaces within genes that are not used for coding for a protein.  However, the large number of mutations that have occurred in these cells indicates that over the course of a lifetime, the cells that make up the body undergo an unexpectedly large number of DNA base changes. 

 

Only about 1% of these mutations occurred in regions of the DNA that code for protein.  While it is conceivable that mutations in regions of the DNA that do not code for protein could stimulate cancer, almost all known mutations that are involved in cancer development occur in protein coding regions.  However, data presented in these reports, although preliminary, indicate that even most of the new mutations in the protein coding regions are irrelevant.  Although many more complete cancer genomes will be needed to verify this result, the implication of the result that most of the mutations of the protein coding DNA are not likely to be involved in tumor development is that mutations that actually drive cancer development are very rare.  This could be hopeful for the treatment of cancer, because it would suggests that most of the targets for treatment have been identified.

 

A somewhat more esoteric value to the work is its verification of the way in which mutations accumulate in the lung cancer and in melanoma.  In the lung cancer cells, the type of mutations are consistent with smoking; in the melanoma cells, the types of mutation are consistent with ultraviolet radiation, i.e., sun exposure.  How can the researchers make these types of determinations?  Here is an example: The DNA base, C, or cytosine, occurs in the genome in two chemical forms, methylated and unmethylated.  That is, cytosine can have an additional methyl group bonded to its ring structure.  The methylated cytosine facilitates the shut down of gene expression; gene expression generally requires that the cytosines in the region of DNA that is being transcribed into RNA be unmethylated.  The cellular regulation of the cytosine methylation involves methylating, or not methyling the cytosines that occur next to G’s, or guanines.  Thus, the symbol CpG is used to refer to C’s that are candidates for methylation, with the “p” indicating the phosphate linkage between the DNA bases.  Furthermore, the frequency of C methylation is much higher for CpG’s that are isolated in the genome than for CpG’s that occur in clusters.  Tobacco carcinogens have been shown in laboratory experiments to very efficiently mutate methylated C’s to the base, T, or thymine.  The results of the DNA sequence determination for the lung cancer cell are that most of the C’s that have been mutated to T’s have occurred in isolated CpG’s, consistent with the idea that methylated C’s, in the body, are in fact highly vulnerable to tobacco carcinogens.

 

What was not learned and what are possible next steps?  First, as indicated above, there was relatively little new discovered regarding gene mutations that are relevant to cancer development.  Second, it is apparent that other DNA alterations that do play a role in cancer development, besides single base mutations, are only inefficiently identified by the technology employed.  For example, cancer development often involves chromosomal breaks that abnormally juxtapose pieces of genes, referred to as cancer fusion genes, which were not identified as efficiently as the DNA mutations.  Furthermore, the cancer cells used in the analyses were immortalized cell lines, meaning that these cells have been propagated outside the body in artificial culture processes.  While it seems unlikely that there will be significant differences between what could be discovered using an actual cancer cell versus cells cultured in the laboratory, based on previous information, this possibility cannot be ruled out and will be a concern for many scientists until it is addressed.

 

As with all important basic research reports, the most important next step is repeating the work.  For example, a determination of the entire DNA sequence of another 20 or so cancer types could confirm the preliminary indications that very few mutations are relevant to cancer development.  This confirmation in turn could have a big impact on the greater medical community’s expectation for personalized treatment of patient tumors.  For example, if every patient had to have the entire sequence of his tumor determined, this could be more complicated than if the tumors of almost all patients could be completely characterized, with regard to genome mutations involvement in cancer development, by the very rapid determination of only a subset of the genome sequence.

 

Melanoma genome

Lung cancer genome

Cytosine methylation

Illumina Corporation

Technical info regarding Illumina sequencing technology


Filed under: Cancer — Tags: , , — December 31, 2009 1:30 pm

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.