Genetics Evolution HLA MHC Epidemiology Genetic Epidemiology Population Genetics Glossary Homepage
CHROMOSOMES and GENES
M.Tevfik Dorak, MD, PhD
Please update your bookmark: http://www.dorak.info/genetics/notes03.html
CHROMOSOMES
Descriptive and comparative statistics
* Number of chromosome pairs:
humans 23; gorilla 24; cattle 30; dog 39; mouse 20; goldfish 47; tobacco plants
24; peas 7; Drosophila 4; Parascaris (a nematode roundworm) 1; S.cerevisiae 16;
Arabidopsis thaliana: 5; hermit crab (Eupagurus) 127; some types of fern
>250. In Muntiacus muntjac (a small SE Asian deer), the number of
chromosomes differs between species: the Chinese subspecies has a haploid
number of 23 (like humans) but the Assam subspecies has only 3 pairs of
chromosomes. In C.elegans (a nematode), the sexes differ in their chromosome
numbers: the male is haploid for the sex chromosome (X,O) and the female is
diploid (X,X) resulting in a total of 11 diploid chromosomes in males and 12 in
females. Note that plants do not have sex chromosomes.
* Chromosomes differ
in their sizes. The smallest human chromosome is chromosome 21 (50 Mb) and the
largest one is chromosome 1 (263 Mb). This is one reason why Down’s syndrome
(trisomy 21) is the most common trisomy; the most tolerable of an intolerable
condition (trisomies are the most common chromosome abnormalities in
spontaneous abortions). See Human
Chromosome Maps.
Definitions (see Glossary)
The word ‘chromosome’ means colored body. This
naming is due to the capacity of chromosomes to take up histological stains
more effectively than other cell structures. Chromosomes are usually (in the
interphase) dispersed throughout the nucleus but become compacted during
metaphase of cell division. This is the state the chromosomes are depicted. At
this stage, they are also replicated as sister chromatids (the arms of the X
shape). This is different from the pair of homologous chromosomes, which
represents the chromosomes inherited from the father and the mother. The point
the two sister chromatids join together is called centromere, and the ends of
chromosomes are called telomere. Telomeres have important functions such as
preventing end-to-end fusion of chromosomes, assisting with chromosome pairing
in meiosis, and ensuring complete replication of chromosome extremities. The
staining pattern of each chromosome is unique and helps to identify individual
chromosomes (along with the size). The densely stained bands (with Giemsa) are
called G-bands, which correspond to AT-rich segments of DNA. Lightly stained
bands are R-bands that are GC-rich and transcriptionally more active.
Haploid (n) number is the number of
chromosomes in germ cells (23 in humans), diploid (2n) number is the
number of chromosomes in somatic cells (46 in humans).
Chromosomes vary in shape; they may be
metacentric (the arms are equal in size), submetacentric when centromere is off
center, and acrocentric is centromere is close to the end.
Extra-chromosomal (cytoplasmic) DNA is still
called "nucleic acid". Mitochondrial DNA (in mammals) is inherited
only through the maternal lineage (see mtDNA).
Physical (kbp, Mbp) distance is the number of
base pairs between two loci but genomic distance (cM) is the recombination
fraction between two loci. Generally 1 Mbp corresponds to 1 cM but this varies
hugely depending on the part of the genome. The human genomic average is 0.89
cM per 1 Mbp.
Chromosomal aberrations may be structural and
numerical (discussed in Clinical
Genetics).
Cell division: mitosis (in somatic cells) and
meiosis (in germ cells)
* Key points about meiosis: it halves the number of
chromosomes per cell and it gives rise to new gene combinations (via
crossing-over within the chromosomes and chromosomal re-assortment). In
mitosis, totally identical two daughter cells are formed (as in asexual
reproduction).
Mendel's first principle, segregation, is the direct result of the
separation of homologous chromosomes during anaphase I of meiosis. Mendel's
second principle, independent assortment,
occurs because each pair of homologous chromosomes line up at the metaphase
plate in meiosis I independently of all other pairs of homologous chromosomes.
This results in a brand new set of mixture of paternal and maternal origin
chromosomes each one of which may have undergone rearrangement.
Sex chromosomes X and Y
are the 23rd pair in humans. There are two Xs in females but only a single X
in males, whereas the autosomal chromosomes are present in duplicate in both
sexes. The presence of a single autosome (a monosomy) is invariably an
embryonic lethal event but monosomy for the X chromosome is viable because of
dosage compensation, which assures equality of expression of most X-linked
genes in females and males. In mammals, the dosage compensation system involves
silencing of most of the genes on one X chromosome; it is called X chromosome
inactivation (Lyonisation). Divergent sex chromosome pairs are thought to have
evolved from homologous autosomes. During evolution, the Y chromosome has
retained little coding capacity, leaving the male with reduced gene dosage for
many functions encoded by the X chromosome.
Human sex
chromosomes have homologous region at the tips of their short and long
arms. These are called pseudoautosomal
regions (PAR). PAR-1 is at the tip of the short arms, and PAR-2 is at the
tip of the long arms. PAR-1 consists of about a quarter of Xp and almost all of
Yp (2.6Mbp). The smaller Xq/Yq
pseudoautosomal region (PAR-2) is 320kb. It is believed that this region is
duplicated onto the Y chromosome (from X) during primate evolution as a
terminal interchromosomal rearrangement. X-linked
pseudoautosomal Hodgkin’s disease has a susceptibility locus within PAR-1,
probably MIC2
encoding CD99.
The blood group Xg(a),
which behaves like an X-linked dominant trait, is also encoded within PAR-1.
Polymorphism at the Xg locus and the Yg locus shows similar allele frequencies.
This could be due to chance, to selection, or to recombination between the X
and Y chromosomes (Burgoyne PS, 1982). The genes within PAR on X chromosome are
not subject to inactivation by Lyonisation. This escape from inactivation
results in an equal dosage of expressed sequences between the X and Y
chromosomes. Despite morphological dissimilarity, human sex chromosomes pair also
in male meiosis and a single obligatory recombination event takes place
in the short arm pseudoautosomal region (PAR-1). The crossover point is at
variable locations but mostly in the terminal third of the Xp/Yp pairing
segment. Recombination at male meiosis in the terminal regions of Xp and Yp is
up to 20-fold higher than between the same regions of the X chromosomes during
female meiosis. The overall recombination fraction per unit of physical
distance within PAR is 3- to 70-fold greater than the genome-average rate (Lien
S, 2000). Thus, in this region LD exists only in short (~3kb) fragments (May
CA, 2002). The consequences of the obligatory recombination within PAR-1
are that genes show only partial sex linkage and are passed equally to XX and
XY offspring by male carriers. Another consequence is that a mutation
favourable in males but disadvantageous in females will increase in frequency
on the Y chromosomes, while remaining rare on the X chromosomes, only if the
recombination rate is smaller than the fitness advantage of the mutation. The
high recombination activity of the pseudoautosomal region at male meiosis
sometimes results in unequal crossover, which can generate various sex-reversal
syndromes (such as XX male
syndrome and maybe XY female type
gonadal dysgenesis). Interleukin-9 receptor (IL9R) gene
is located at Xq28 and Yq12 and was the first gene to be mapped to the PAR-2.
For reviews on PAR, see Rappold
GA, 1993 and Meller
& Kuroda, 2002.
Chromosome abnormalities
Chromosome abnormalities may be numerical (aneuploidy: monosomy or
trisomy) or structural: deletion, inversion (pericentric or paracentric),
translocation, duplication, isochromosome, ring chromosome etc. In general,
detection of a structural anomaly in a child should trigger chromosome analysis
of parents to rule out a carrier state but numerical anomalies are presumed to
be due to sporadic cell division errors. Maternal age effect is seen in
trisomies due to nondisjunction (whereas paternal age effect is more relevant
in conditions due to de novo point mutations and structural rearrangements; see Chandley,
1991 and Ballesta,
1999 for reviews of parental origins of de novo mutations, and Grimm,
1994 for an example).
Risk of having an offspting with a chromosomal anomaly for a parent with
balanced (pericentric fusion type) translocation (Robertsonian) depends on
which parent is a carrier: id the mother has it, the risk is 8%; it is 4% when
the father has it. Disorders caused by chromosomal deletions are clinically
more severe than those caused by duplications. For monosomy X, a high
proportion of cases show loss of the paternal sex chromosome. It seems likely
that the error could arise at the pronuclear stage after sperm entry into the
egg, rather than at meiosis in the male (Chandley,
1991).
GENES
Descriptive and comparative statistics
* All of the DNA in
one cell measure about 1.7m
* Estimated number
of structural genes: humans 30,000; mouse 30,000 (all will be sequenced by
2005); Drosophila 13,600 (complete sequencing hast been finished, see Science
March 24, 2000). The yeast S.cerevisiae has 6,000, the bacteria E. coli has
4,377, and the nematode (roundworm) C.elegans has 19,000 genes.
* The number of
genes on each chromosome shows a rough correlation with the physical size of
the chromosomes. Chromosome
1 has the highest number (2776 known & 30 unknown genes); chromosome
21 has 367 known & 20 unknown genes. Male-specific Y-chromosome
has only 322 genes.
* Total genome size:
3,000 Mbp in humans; 100 Mbp in C.elegans; 12.05 Mbp in S.cerevisiae; 4.64 Mbp
in E.coli, 1.83 Mbp in another bacteria Haemophilus influenzae (the first fully
genome sequenced free-living organism); 1.045 Mbp in the parasitic bacterium
Chlamydia trachomatis; 130-140 Mbp in A. thaliana (see Genome Sizes).
* Mycoplasma
genitalium has the smallest known genome capable of independent replication. It
has 517 genes.
* The largest gene
identified so far is the dystrophin gene (responsible for Duchenne’s
muscular dystrophy (DMD)). It is 2.4 Mbp; has 80 coding regions and encodes
only a 3,700 amino acid-long protein. This is one reason why it has a very high
mutation rate (see below). In comparison, insulin gene is 1.43 kb with three
coding regions and the final product is 51 amino acid-long.
Definitions (see Glossary)
Dominant, recessive,
co-dominant, incomplete dominant.
Transcription,
translation [central dogma of genetics; semi-conservative replication].
Intron, exon:
introns end with the dinucleotide ApG [3' splice site / acceptor] and start
with the dinucleotide GpT [5' splice site / donor].
Untranslated regions
(UTRs): These are the regions flanking translated part of a gene. They are
transcribed (represented on mRNA) but not translated (do not exist in the
peptide product). The 5' UTR is usually the initial part of exon 1 and 3' UTR
is the latter half of the last exon.
Beadle
and Tatum’s original 1941 hypothesis predicting one gene - one enzyme had
to be revised first as one gene - one polypeptide; and finally one gene -
multiple polypeptides. This is because alternative splicing can create multiple
polypeptide products with differing activities. Other mechanisms that create
more than one product from a single gene include overlapping genes and
bidirectional genes (examples).
Wobble hypothesis -
degeneracy / redundancy of the genetic code [for example, arginine, leucine and
serine are each encoded by six different triplets].
Mutations,
imprinting, penetrance (all patients with ankylosing spondylarthropathy have
HLA-B27 but only 2-3% of the population have the same genetic marker which is
an example of low penetrance), mosaicism (Lyonisation, X inactivation),
methylation, epistasis are important concepts in understanding classic and
nonclassic genetic phenomena.
Trinucleotide
repeats and genetic anticipation (see Clinical Genetics).
Gene expression
a) Ubiquitous: Housekeeping genes, most
metabolic enzymes, ribosomal proteins, actin, tubulin, HLA class I and beta-2
microglobulin.
b) Tissue-specific: Myoglobulin, gamma-globulin,
TCR, HLA class II, growth hormone and other hormones.
DNA is a
negatively charged acidic molecule because of the phosphate groups. Each
of the purine or pyrimidine bases is a nitrogenous base.
Terminology: ApT (phosphodiester -
covalent bond) on the same strand vs AT (hydrogen bond) the base
pair on different strands
Start codon: AUG
codes for methionine. It does not necessarily mean that each polypeptide starts
with Met because most of the time it is eliminated by post-translational
modifications. Rarely the start codon is GUG and encodes valine.
Stop codons: UAA,
UAG or UGA do not code for an amino acid. A nonsense mutation creates one of
these codons. Redundancy (or degeneracy) of the genetic code also applies
to the stop codon. Both AUG and UGA code for stop. Any mutation creating a
triplet of one of the stop codons is called a chain-terminating or nonsense
mutation. Other types of mutations are silent (synonymous) mutation
[the new triplet still codes for the same amino acid due to the redundancy of
genetic code], and non-synonymous ones: missense mutation and frameshift
mutations (insertion or deletion of one or more nucleotides).
At the end of the transcription, the resultant
mRNA contains leader sequence, coding region and a trailer sequence.
CpG dinucleotide islands are often located at 5'
of genes. A common point mutation is a transition-type substitution between the
two pyrimidines, C and T (35-50% of all point mutations). Cytosine, when linked
to a guanine (CpG), is often methylated. 5-methyl-cytosine (C) is unstable and
when deaminated yields thymine (T). CpG is therefore replaced by TpG. Such a
mutation is shown as (C245T) or (245C>T), the number showing the position of
the nucleotide change relative to the transcription initiation site (Human Gene Mutation
Nomenclature, Mutations
in Molecular
Cell Biology).
The number of hydrogen bonds between G and C is
three, but between A and T, it is two. High GC content makes the DNA more
stable and gives it a higher melting point (Tm).
Steps involved in DNA replication:
1. Identification of the origin of replication
(not well-characterized in mammalian cells),
2. Unwinding of double stranded DNA to provide a
single-stranded DNA template by helicase and topoisomerase,
3. Formation of the replication fork,
4. Initiation of DNA synthesis and elongation
(by primase and DNA polymerase) during which single-strand binding proteins
prevent premature reannealing of DNA,
5. Ligation of the newly synthesized DNA
segments by ligase.
DNA replication proceeds from 5' end to 3' end
corresponding to N-terminal to C-terminal of the subsequent protein.
Translation of the mRNA is initiated by the interaction between eukaryotic
initiation factors 4F (eIF4F; mRNA cap-binding protein) and 7-methylguanosine
(m7G)
cap on 5' mRNA (Translation
Initiation Book Chapter). Termination of translation is achieved by
recognition of stop codons by termination/release factor (Translation
Regulation Book Chapter).
Germline Gene
Mutation Rates
Mutation rate is expressed as the number of new
mutations per locus per generation; it is estimated as the incidence of new,
sporadic cases of an autosomal dominant or X-linked disease that is fully
penetrant such as achondroplasia.
The new mutation rate ranges between 10-4 to 10-7 with a
median 10-6. The factors influencing the mutation rate are the gene
size, mutational mechanism, presence of hotspots (methylated CpG nucleotides).
Being very large ones, Duchenne’s
muscular dystrophy (DMD) and neurofibromatosis
genes have very high mutation rates (see for example Grimm,
1994). The reason for very high mutation rate in achondroplasia
is due to a hotspot causing the G380R (Gly308Arg) mutation (nucleotide 1138G>A) in fibroblast
growth factor receptor-3. Other diseases due to high germline (de novo)
mutation rate are: Rett syndrome; congenital
adrenal hyperplasia; Rubinstein-Taybi
syndrome; Marfan
syndrome.
Mammalian DNA polymerase e is
capable of proofreading of newly synthesized DNA. Both DNA polymerase e and b
(corresponding to E.coli DNA polymerase II) can repair DNA. Mechanisms of DNA
repair:
1. Mismatch repair: copying errors (single base
mismatching or two to five base unpaired loops) can be corrected by strand
cutting, exonuclease digestion and replacement. Mutations of human mismatch repair
genes MSH2, PMS1 and PMS2 are
related to hereditary nonpoliposis colon cancer (HNPCC).
2. Base excision-repair: Spontaneous or induced
point mutations can be corrected by base removal and replacement.
3. Nucleotide excision-repair: An approximately
30-nucleotide oligomer can be removed and replaced (cut-and-patch repair).
4. Double-strand break repair: Ionizing
radiation, chemotherapy and oxidative free radicals are responsible for these
breaks which can be repaired by unwinding, alignment and ligation.
Some aspect of DNA repair mechanisms is deficient in inherited diseases:
Xeroderma
pigmentosum, ataxia-telangiectasia,
Fanconi
anemia, Bloom
syndrome, and Cockayne
syndrome.
See also DNA Repair Mechanisms
in the Molecular
Biology Web Book.
The most frequent single base alteration is
deamination of cytosine to uracil. With corrective action this results in a C
to T as well as a G to A point mutation on the other strand. Cytosine, when
linked to guanine (CpG), is often methylated. 5-methyl-cytosine is unstable and
when deaminated yields an Uracil. This is corrected to a thymine (a C to T
mutation). When this strand replicates, at the residue corresponding to this T,
now there is an adenine instead of guanine (G to A mutation). In general,
transition type substitutions (between C and T, or G and A) are more common
than transversion type substitutions (between purine 'A/G' and pyrimidine 'T/C'
nucleotides). C to T transversion type of mutation within the MSH2 gene
causing HNPCC is an example of this type of mutation with clinical relevance.
See also Mitochondrial DNA.
Gene Transcription in Molecular
Biology Web Book
Molecular
Structure of Genes and Chromosomes in Molecular
Cell Biology
Wellcome Trust Human Genome: Chromosome Browser &
Zoom in Your Genome
Genes
& Chromosomes Lecture Notes Chromosome
Abnormalities DNA From the Beginning
Chromosomal
Abnormalities Tutorial: (1) (2) Chromosome Analysis by
FISH
M.Tevfik Dorak, MD, PhD
Last updated on 13 January 2007
Genetics HLA MHC Genetic Epidemiology Population Genetics Glossary Homepage