An Online Introduction

to Advanced Biology


Terms and Concepts




Chemistry:  From DNA Code to the Proteins that Make Traits







DNA is the molecule that carries genetic information, which turns out to be in the form of codes that can be translated into protein sequences, but of course its much more complicated than that.  As mentioned in the previous section, much of the DNA that organisms carry is not coding sequences at all, but apparently serves other, largely unknown functions.   The coding sequences, the genes, are not all separate, but are strung with non-coding stretches on chromosomes.  Attaching many genes on the same chromosome makes it possible for living cells to reproduce and get the entirety of genes, a copy each of all the DNA bits, into each of the two daughter cells by moving a manageable number of pieces.

In organisms that reproduce sexually, chromosomes are "mixed" (recombination) during sex - in order for all gene-based traits to be affected, each chromosome source must have two of each gene (when versions are different, theyre called alleles), carried on "matching" chromosomes, called homologous (Latin for "same information") chromosomes.  Each member of a set has a 50 - 50 chance of being passed along to each offspring, producing offspring with a very different chromosome complement than the parent and that all-important-for-evolution variety within the group.  Homologous chromosomes are found in most eukaryotes, so it turns out that, technically, when we discuss gene expression, the reading (transcription) and conversion-to-protein (translation), it is to be understood that this happens twice each time for most genes being expressed in a cell.

In preparation of cell reproduction, a copy of the existing DNA must be made, a process called replication, which will be covered in detail a bit later.  Once copies have been made, they have to be distributed properly to the new cells - this is when it matters how many bits of DNA there are.  If every one of the hundreds to tens of thousands of genes were copied and separated individually, the task would be extremely complicated, which is probably why many genes are stuck together on chromosomes.  The fewer separate molecules there are, the easier it will be to distribute copies properly.  However, if the number of molecules is quite low, then sexual recombination will produce very limited variety.

Chromosome number is the typical number a particular species of organism carries in a typical cell - in humans the number is 46, made up of 23 pairs of chromosomes.  Because of homologous pairs, most chromosome numbers are even numbers.  High numbers present difficulties during cell division - more copies to keep track of while sets are separated out.  Numbers in some species may be low, as few as 4, which would produce less variation in offspring.  Since evolution depends upon variation, one might think that organisms with low numbers might live in very stable microenvironments, or have high natural adaptability.  However, in monoecious organisms that can self-fertilize, low numbers allow a quasi-asexual mode, since a decent proportion of offspring would get the same chromosome mix as the parent.  Variation plus copying ability would be very useful in unstable or challenging environments.  Particular species have not only particular chromosome numbers, but that number is made of of unique combinations of chromosome types - long, short, fat, thin, connecting in different spots, etc. - that altogether is called a karyotype.

A chromosome is more than just DNA.   To fit very long molecules into very small places, (it is sometimes estimated that the DNA is any given human cell reaches 2 meters in length!), the molecules are tightly wound up and packaged around spool-like proteins called histones.  There are several levels of DNA packaging, from small individual histones which DNA loops around, to complexes that are like tight spool clusters.  This tightly-bundled mass of DNA and histones is called chromatinAny kind of processing, whether it involves replication or gene expression, must involve unwrapping at least some of the packaging to get at the actual DNA.  

Lengths of DNA at this level are spooled in histone complexes called nucleosomes, which are arranged in spirals that are themselves arranged in loops.  Not all areas of a chromosome are wrapped equally tightly - some parts are easier to open for processing than others, and mutations that move DNA around on a chromosome (called transposition) can produce an effect merely by making a gene easier or harder to get at.  A current theory on how human brains and chimpanzee brains can be produced by the same genes but seem to function at different levels of complexity is that the genes involved are expressed to different extents in the two species - transposition could have had a role in this.  Position effect is a term used to explain why genes in some spots on a chromosome seem to be harder to access - and transposition will sometimes change a genes accessibility.  Recent work has suggested that these sorts of mutational changes strongly affect the expression of genes and may be a powerful driver of evolution - in these cases, one doesnt need a random allele change to prove useful so much as a change in protein availability to be useful, and that seems an event that might happen more often.  Additionally, evidence is accumulating that methylation of histones is an important part of altering when, how, and whether genes are expressed.






Bacteria and archaea chromosomes tend to be different from chromosomes found in eukaryotes.  First, most prokaryotes have only a single chromosome, rather than the multiple (usually paired homologues) chromosomes of eukaryotes.  Second, that chromosome is usually circular, rather than eukaryotes 2-ended structures.  Prokaryote chromosomes, like eukaryotes, are also packaged with proteins, but because there is no isolation within a cell nucleus (there is a special zone, the nucleoid, around the chromosome), the processing of DNA through RNA and protein production involves molecules in direct contact with the chromosomes, making protein as the intermediate RNA "rolls" down the gene.  Because of this processing, it is common for genes that code for enzymes associated with metabolic pathways to be on the chromosome in a sequence that matches the pathway sequence.  If enzymes work in a particular sequence, the genes will be in that sequence as well.  In eukaryotes, thats much more rarely seen.

Prokaryotes are also capable of producing plasmids, small DNA loops that carry one or a few genes.  These can be often-used genes or genes particularly important in a given environment.  Perhaps the most important aspect of plasmids is that they can be passed to other cells, allowing the sharing of adaptive alleles and a hint of the genetic mixing that sexual reproducers are capable of.  Plasmids can be shared among unrelated organisms, sometimes even passed to eukaryotes - this lateral gene transfer is almost certainly a critical feature in the evolution of many groups.  Artificial plasmids are used in genetic engineering to get bacteria to make eukaryote proteins, such as human insulin.






Because of the way eukaryote chromosomes work, they have a couple of structures that are not needed in prokaryotes.  The ends of these non-looped chromosomes have a type of molecular cap, called a telomere, that keeps the ends from interacting and keeps repair molecules from seeing them as broken ends, but which can gradually erode as chromosomes are copied during successive cell divisions.  This erosion can lead to cell types having limited division capacities, possibly meaning that as we age, it becomes harder and harder to produce new cells to replace damaged ones.  Much research into the mechanisms of aging is aimed at perhaps keeping our telomeres from wearing away.  An enzyme, telomerase, is key to repairing telomeres with each replication, but it seems to get less active in many aging organisms.

However, not all cells have a telomere erosion problem.  Another area of research is trying to discover why cancer cells dont seem run into division limits since they activate telomerase - in this case, it would be useful to find a way to make their telomerase stop working and let the telomeres erode.

When eukaryotes make chromosome copies in preparation for a cells division, the copies stay attached to each other while the spindle, a system of microtubules, move the doubled chromosomes into a central position, where they will separate and each copy will be pulled to opposite ends of the dividing cell.  The centromere, an area of highly compressed DNA and proteins, acts as both the binding point between the duplicates and the attachment site for the spindle.






When a cell needs to make a particular protein, it needs to use the code stored in DNA for that protein - its gene.  But the gene on its chromosome is typically in a wrapped-up state, so the area of the gene needs to be uncoiled to allow access to the DNA.  Various enzymes are put to work, probably directed by gene-specific activators, to remodel the local chromatin and allow the gene to be copied over to Messenger RNA (mRNA), in the process called transcription.  

Once the DNA is accessible, an enzyme called RNA polymerase binds to a stretch of DNA "upstream" of the gene.  This stretch is called a promoter, and there are a couple of different types that are usually within 50 bases of the start point of the gene.  Other types of genetic "neighborhood" elements include initiation factors that complex to get transcription started, outlying boundary elements (also called insulators) and stimulating enhancers.

Promoters do a few jobs.  They are critical to getting the molecules together that will do the actual transcription.  They mark the starting point of transcription and orient molecules so transcription goes in the right direction.  They also respond to activators and repressors that a cell might produce to control the process.

Genes stretch from a START codon to a STOP codon on one of the two DNA strandsmRNA is made between these two codons.  The entire length from beginning to end is not necessarily all going to be used in the final protein, however.  Some pieces, called introns, get removed along the way, leaving exons to be integrated into the final product (sometimes the terms are counter-intuitive).  The processing of these pieces is often done by a type of enzyme made from RNA rather than protein - these are ribozymes.

The ability to splice pieces of genes together allows the production of different proteins from the same overall gene sequence, or combination of pieces from two different places.  This partly allows production of made-on-demand specialized proteins such as antibodies.






Once the messenger RNA is constructed from the gene strand, it needs to be processed through a cell structure called a ribosome, where the mRNA sequence will be translated into an amino acid sequence for the coded protein.  This is a molecular complex of various proteins and ribosomal RNAs (rRNA) and is found in both prokaryotes and eukaryotes.  In prokaryotes, ribosomes are in close proximity to chromosomes;  in eukaryotes, the chromosomes are in the cell nucleus, but the ribosomes are outside the nucleus, in the cytoplasm.   Recent research suggests that mRNAs may also sometimes have localization signals that cause them to be directed to those places in the cell where the coded proteins will be used.

This description is going to be simplified;  for instance, it does not include the processing that often happens between chromosome and ribosome that modifies the original sequence.

Messenger RNA feeds through the ribosome where, one codon at a time, it cross links to molecules of transfer RNA (tRNA).  Each tRNA molecule has what is called an anticodon on one end and an amino acid on the other - there are as many tRNAs as there are codons, although the same amino acids can be on a few different tRNAs (this, remember, is part of genetic redundancy).  The code can be translated using a chart based upon the mRNA sequence - each codon is linked (literally, on the tRNA) to either a start ("initiator," but these may code for amino acids once the sequence has started), a stop ("terminator"), or an amino acid.  Notice how a single base change in a codon really wont necessarily change the amino acid being coded for.  Recent studies have indicated that redundant codons may be expressed at very different rates, and how fast proteins get made can have a huge effect on the chemistry using those proteins.

Once the protein begins to emerge from the ribosome, chaperonins help it to fold properly. The protein may not be immediately active - it may be inhibited, which will be covered in the next section, or sometimes prosthetic groups will need to be attached later for it to work.  These are often non-protein molecules containing minerals.  Many cellular proteins have several domainsor zones within the molecule that have some sort of activity.  Early after production, a domain may be linked to cellular transportation machinery and moved to where it will be active.




Virtually any change in an organisms genetic material can be called a mutation.  This can be small, a single change in a single base, called a point mutation (because it happens at a particular point along the sequence), or as large as losses or additions of entire sets of chromosomes.

There are a few different types of point mutations.  A substitution is a switch from one base to another.  In non-coding regions, it is assumed that these have little effect, but some non-coding regions are better conserved (meaning that there are few changes across a broad spectrum of different organisms) than even gene sequences.  No one knows why evolution has preserved these stretches, but it seems that whatever theyre doing, it must be important right down to the details of the base sequence.  Bases can also be added to the sequence, an insertion, or lost, a deletion.

Point mutations in gene sequences can vary greatly in effect.  A substitution that changes a codon to a codon for the same amino acid (one of the types of genetic redundancy), sometimes called a silent mutation, or for an amino acid that has similar properties, or for an amino acid in a non-critical part of the coded protein, can do essentially nothing (but see note at the end of this section).  Some substitutions can switch amino acids and seriously harm the proteins function.  More rarely, a change might improve function or change it to something new and useful;  the odds arent good, but these are such common mutations that improvements are almost inevitable.   Occasionally, a codon might change to a stop codon, which of course terminates the sequence of the protein.  Insertions and deletions shift the "reading frame" of the codons, changing every codon from there on (unless there is an intron to reset them), which can have major effects.

Comparison of point mutations along non-coding DNA stretches (where they presumably have no  effect) are used as molecular clocks - once a family tree "splits," each branch will accumulate its own unique point mutations at what is assumed to be a predictable rate.  The more different point mutations, the longer since the original split.

Chromosomal mutations result from breaks in chromosomes that reattach in spots that are different from the original break.  These breaks can be spontaneous (DNA is a long complex molecule, and these can just break on their own), and there are repair enzymes to "grab" broken ends and reattach them.  Chemicals and certain types of radiation can cause breaks as well;  when multiple breaks occur, the repair enzymes can link the wrong ends together and lose pieces.  Its unlikely that persistent breaks will happen in important gene sequences (remember, most of DNA is not genes and any given cell will use a limited number of genes), but loose pieces cant be distributed properly when cells divide.  Damage detectors may keep the cell from dividing again, or daughter cells can wind up with whole segments of chromosomes (and the genes carried on them) missing or as extra duplicates.  This is why chemicals or radiation that damage DNA affect dividing cells (like epithelium, some types of connective tissue, or cancer cells) more powerfully than non-dividing cells (like nerves or muscles).

Chromosomal mutations include swaps between homologous chromosomes during meiosis. These allow linked alleles to unlink, and if the pieces are of unequal size, one swapping partner winds up with extra genes (a type of duplication, discussed below) and the other loses genes (a type of deletion, discussed below).  Extra genes can be passed to offspring, producing the second type of genetic redundancy.  One copy of the extra genes continues to do its original function, while the other gene mutates and produces a changed protein, a new ability without losing the ability of the original gene.  This may explain why evolutionary lineages tend to have more DNA, more genes, and a wider variety of proteins.  A common criticism of evolutionary theory is that there are no known ways to increase genetic information, but this type of mutation definitely does so and is fairly common.

Other chromosomal mutations:  translocations, where pieces are traded between non-homologous chromosomes;  deletions, where parts get lost;  insertions, extra pieces, which can come during meiosis but also can be genes from other chromosomes, or viruses, or prokaryotes added, as well as increased bits such as codons repeated over and over;  inversions,  where a bit of chromosome winds up the wrong way around;  non-disjunction, when entire chromosomes wind up extra or missing during division;  polyploidy, with an entire extra set of chromosomes.  Fusion happens when separate chromosomes become one - this happened in human ancestors, who at one point had a chromosome number of 48 (there are remnants of the extra telomeres and centromeres in our fused chromosome.  Not all of these mutations show up in all groups organisms - animals, for instance, rarely survive polyploidy.

NOTE on silent mutations:  although a substitution may change a codon for one amino acid to one that also codes for that amino acid, there is evidence that the change in base sequence can affect how the mRNA gets spliced or read;  it may slow down or speed up translation, and therefore the rate of gene expression, or even affect the way that the protein ultimately folds up.  These silent mutations may not always be as neutral as was thought.



Links (Weird Ones)

Heres a link on protein synthesis thats just...odd.

And the odd continues.

Test your understanding of DNA to protein translation.


Terms and Concepts

In the order they were covered.


  Gene Expression  
Chromosome Number  
Position Effect
Prokaryote Chromosomes  
Eukaryote Chromosome Structures  
Messenger RNA  
RNA Polymerase  
Codons, START & STOP  
Ribosomal RNA (rRNA)  
Transfer RNA (tRNA)  
Codon Codes  
Prosthetic Groups 
Mutation Types 
Point Mutations 
Genetic Redundancy (Codon) 
Silent Mutations 
Chromosomal Mutations 
DNA Repair 
Mutation & Chemicals 
Mutation & Radiation 
Mutation & Cell Division 
Genetic Redundancy (Chromosome) 
Deletions (Chromosome) 
Insertions (Chromosome) 
Silent Mutation Effects 





Online Introduction to Biology (Advanced)

Copyright 2003 - 2012, Michael McDarby.

Reproduction and/or dissemination without permission is prohibited.  Linking to these pages is fine.


Hit Counter