Sunday, April 12, 2009

Genome Maps

A genome map is a linear representation of
genomic landmarks (genes and markers). It refers
either to a chromosome (cytogenetic map)
or to a stretch of DNA. A map provides knowledge
of the position of a particular genomic
landmark and its relation to others. Unraveling
the human genome in some respect resembles
the mapping of new continents five hundred
years ago.
A genetic map expresses the positions of genes
relative to each other without a physical anchor
on the chromosome. This is the type of map that
was first used by A. H. Sturtevant in 1913 when
working with T. M. Morgan (who started studying
Drosophila in 1910). Here, the distance between
markers is determined by the frequency
of recombination during meiosis, which is in
turn determined by the relative distance between
the loci (see p. 116). A physical map provides
knowledge of the exact position of a gene
or marker. Its distance to another locus on the
same chromosome is expressed by number of
base pairs (bp), a physical equivalent. A variety
of methods are employed to arrive at a physical
map.

Physical and genetic gene maps

The physical map gives the position of a gene
locus and its distance from other genes on the
same chromosome in absolute values, expressed
in base pairs and related to given positions
along the chromosome. The genetic map
gives the relative position of gene loci according
to the frequency of recombination, expressed as
recombination units or centimorgans (cM). One
centimorgan corresponds to a recombination
frequency of 1%. Since recombination occurs almost
twice as often in oocytes as in spermatocytes,
the genetic map in females is about 40%
longer than in males. Each gene locus has an official
designation with a defined abbreviation
using the letter D (for DNA), the number of the
chromosome, and the number of the marker,
preceded by an S for single-copy DNA, e.g.,
D1S77.

STS mapping from a clone library

STSmapping plays amajor role in genomemapping.
An STS (sequence-tagged site) is a short
stretch (60–1000 bp) of a unique DNA nucleotide
sequence, An STS has a specific location
and can be analyzed by PCR (see p. 66). The relevant
information, i.e., the sequence of the
oligonucleotide primers used for the PCR reaction
and other data can be stored electronically
and does not depend on biological specimens.
One can start with a clone library containing
DNA fragments in unknown order (1). Each end
of the chromosomal fragment is characterized
by a pattern of restriction sites (see p. 64). The
DNA fragments are ordered by determining
which ends overlap, then assembling them as a
contiguous array of overlapping fragments into
a clone contig (2). These are linearly arranged.
This establishes a map that shows the location
and the physical distance of the landmarks,
here A, B, C, etc. (3). Sequence-tagged sites
(STSs) are generated from the two ends of the
overlapping clones. This involves sequencing
100–300 bp of DNA (4).

EST mapping

ESTs (expressed sequence tags) are short DNA
sequences obtained from cDNA clones (complementary
DNA, see p. 58). Each EST represents
part of a gene. Their location is determined
by hybridizing an assembly of different
cDNAs (1) to genomic DNA (2). Thus, the locations
of defined sequences of expressed genes
can be determined (3). These can be mapped to
a location on a chromosome to establish an EST
map.

Approach to Genome Analysis

The approach to genome analysis encompasses
several goals. Of primary interest is the number,
type, and distribution of genes. Knowing all
genes and their positions and structures in a
eukaryotic genome will provide the basis for
understanding their function. The size of a
genome needs to be taken into account for a
systematic study.
Two basic approaches to sequencing a genome
can be distinguished: clone-by-clone sequencing
and the so-called shotgun approach. In the
former, individual DNA clones of known relation
to each other are isolated, arranged in their
proper alignment, and sequenced. The shotgun
approach breaks the genome into millions of
fragments of unknown relation. The individual
DNA clones, for which prior knowledge of their
precise origin is lacking, are sequenced. Subsequently,
they are aligned by high-capacity
computers. The two approaches complement
each other.

Sizes of genomes and cloning vectors

The sizes of genomes of different organisms
vary considerably. In general, genome size reflects
the complexity of the organism. A mammalian
genome (human and mouse are known
best) contains 3!109 base pairs (bp) or 3000
Mb. If each nucleotide pairwere represented by
a 1-mm-wide letter, the text would be more
than 3000km long or take up more than ten
sets of the Encyclopaedia Britannica or 750
megabytes of computer capacity. Thus, finding
all genes, mapping their position, and determining
their structure and function is an
enormous task (see Human Genome Project).
By comparison, the genome of important model
organisms such as Drosophila, the nematode C.
elegans, yeast, and bacteria are much smaller.
The genomes of some important plants such as
maize, rice, and wheat are even larger (5000–
17000 Mb) than mammalian genomes.
Since the size of DNA fragments that can be isolated
and multiplied in cloning vectors for
analysis is relatively small, a huge cloning
capacity is necessary for analysis of a large
genome. Yeast artificial chromosomes (YAC)
can accommodate about 1.4Mb, bacterial artifical
chromosomes (BAC) about 0.5 Mb, whereas
bacteriophages and cosmids

Range of resolution within the genome

The resolution ranges from a whole chromosome
or part of a chromosome isolated from a
somatic hybrid cell line (1) to the sequence of
the nucleotide pairs (5) and cloned DNA fragments
(2). Each fragment is characterized by
distinct landmarks (restriction sites or
sequence-tagged sites, STS, see Genome Maps).
They are aligned according to their contiguous
linear orientation in a contig (3), which can be
mapped (4). The individual clones can be
sequenced (sequence map, 5). This approach is
called a clone-by-clone approach in contrast to
“shotgun sequencing”