Genetics      Evolution     HLA     MHC     Epidemiology     Genetic Epidemiology     Population Genetics    Glossary     Homepage




M.Tevfik Dorak, MD, PhD


Please update your bookmark:



Descriptive and comparative statistics

* Number of chromosome pairs: humans 23; gorilla 24; cattle 30; dog 39; mouse 20; goldfish 47; tobacco plants 24; peas 7; Drosophila 4; Parascaris (a nematode roundworm) 1; S.cerevisiae 16; Arabidopsis thaliana: 5; hermit crab (Eupagurus) 127; some types of fern >250. In Muntiacus muntjac (a small SE Asian deer), the number of chromosomes differs between species: the Chinese subspecies has a haploid number of 23 (like humans) but the Assam subspecies has only 3 pairs of chromosomes. In C.elegans (a nematode), the sexes differ in their chromosome numbers: the male is haploid for the sex chromosome (X,O) and the female is diploid (X,X) resulting in a total of 11 diploid chromosomes in males and 12 in females. Note that plants do not have sex chromosomes.

* Chromosomes differ in their sizes. The smallest human chromosome is chromosome 21 (50 Mb) and the largest one is chromosome 1 (263 Mb). This is one reason why Down’s syndrome (trisomy 21) is the most common trisomy; the most tolerable of an intolerable condition (trisomies are the most common chromosome abnormalities in spontaneous abortions). See Human Chromosome Maps.


Definitions (see Glossary)

The word ‘chromosome’ means colored body. This naming is due to the capacity of chromosomes to take up histological stains more effectively than other cell structures. Chromosomes are usually (in the interphase) dispersed throughout the nucleus but become compacted during metaphase of cell division. This is the state the chromosomes are depicted. At this stage, they are also replicated as sister chromatids (the arms of the X shape). This is different from the pair of homologous chromosomes, which represents the chromosomes inherited from the father and the mother. The point the two sister chromatids join together is called centromere, and the ends of chromosomes are called telomere. Telomeres have important functions such as preventing end-to-end fusion of chromosomes, assisting with chromosome pairing in meiosis, and ensuring complete replication of chromosome extremities. The staining pattern of each chromosome is unique and helps to identify individual chromosomes (along with the size). The densely stained bands (with Giemsa) are called G-bands, which correspond to AT-rich segments of DNA. Lightly stained bands are R-bands that are GC-rich and transcriptionally more active. Another pattern is as follows: Chromosomes are packaged into transcriptionally silent heterochromatin and transcriptionally active euchromatin.

Haploid (n) number is the number of chromosomes in germ cells (23 in humans), diploid (2n) number is the number of chromosomes in somatic cells (46 in humans).

Chromosomes vary in shape; they may be metacentric (the arms are equal in size), submetacentric when centromere is off center, and acrocentric is centromere is close to the end.

Extra-chromosomal (cytoplasmic) DNA is still called "nucleic acid". Mitochondrial DNA (in mammals) is inherited only through the maternal lineage (see mtDNA).

Physical (kbp, Mbp) distance is the number of base pairs between two loci but genomic distance (cM) is the recombination fraction between two loci. Generally 1 Mbp corresponds to 1 cM but this varies hugely depending on the part of the genome. The human genomic average is 0.89 cM per 1 Mbp.

Chromosomal aberrations may be structural and numerical (discussed in Clinical Genetics).

Cell division: mitosis (in somatic cells) and meiosis (in germ cells)

* Key points about meiosis: it halves the number of chromosomes per cell and it gives rise to new gene combinations (via crossing-over within the chromosomes and chromosomal re-assortment). In mitosis, totally identical two daughter cells are formed (as in asexual reproduction).

Mendel's first principle, segregation, is the direct result of the separation of homologous chromosomes during anaphase I of meiosis. Mendel's second principle, independent assortment, occurs because each pair of homologous chromosomes line up at the metaphase plate in meiosis I independently of all other pairs of homologous chromosomes. This results in a brand new set of mixture of paternal and maternal origin chromosomes each one of which may have undergone rearrangement.


Sex chromosomes X and Y are the 23rd pair in humans. There are two Xs in females but only a single X in males, whereas the autosomal chromosomes are present in duplicate in both sexes. The presence of a single autosome (a monosomy) is invariably an embryonic lethal event but monosomy for the X chromosome is viable because of dosage compensation, which assures equality of expression of most X-linked genes in females and males. In mammals, the dosage compensation system involves silencing of most of the genes on one X chromosome; it is called X chromosome inactivation (Lyonisation). Divergent sex chromosome pairs are thought to have evolved from homologous autosomes. During evolution, the Y chromosome has retained little coding capacity, leaving the male with reduced gene dosage for many functions encoded by the X chromosome.


Human sex chromosomes have homologous region at the tips of their short and long arms. These are called pseudoautosomal regions (PAR). PAR-1 is at the tip of the short arms, and PAR-2 is at the tip of the long arms. PAR-1 consists of about a quarter of Xp and almost all of Yp (2.6Mbp). The smaller Xq/Yq pseudoautosomal region (PAR-2) is 320kb. It is believed that this region is duplicated onto the Y chromosome (from X) during primate evolution as a terminal interchromosomal rearrangement. X-linked pseudoautosomal Hodgkin’s disease has a susceptibility locus within PAR-1, probably MIC2 encoding CD99. The blood group Xg(a), which behaves like an X-linked dominant trait, is also encoded within PAR-1. Polymorphism at the Xg locus and the Yg locus shows similar allele frequencies. This could be due to chance, to selection, or to recombination between the X and Y chromosomes (Burgoyne PS, 1982). The genes within PAR on X chromosome are not subject to inactivation by Lyonisation. This escape from inactivation results in an equal dosage of expressed sequences between the X and Y chromosomes. Despite morphological dissimilarity, human sex chromosomes pair also in male meiosis and a single obligatory recombination event takes place in the short arm pseudoautosomal region (PAR-1). The crossover point is at variable locations but mostly in the terminal third of the Xp/Yp pairing segment. Recombination at male meiosis in the terminal regions of Xp and Yp is up to 20-fold higher than between the same regions of the X chromosomes during female meiosis. The overall recombination fraction per unit of physical distance within PAR is 3- to 70-fold greater than the genome-average rate (Lien S, 2000). Thus, in this region LD exists only in short (~3kb) fragments (May CA, 2002). The consequences of the obligatory recombination within PAR-1 are that genes show only partial sex linkage and are passed equally to XX and XY offspring by male carriers. Another consequence is that a mutation favourable in males but disadvantageous in females will increase in frequency on the Y chromosomes, while remaining rare on the X chromosomes, only if the recombination rate is smaller than the fitness advantage of the mutation. The high recombination activity of the pseudoautosomal region at male meiosis sometimes results in unequal crossover, which can generate various sex-reversal syndromes (such as XX male syndrome and maybe XY female type gonadal dysgenesis). Interleukin-9 receptor (IL9R) gene is located at Xq28 and Yq12 and was the first gene to be mapped to the PAR-2. For reviews on PAR, see Rappold GA, 1993 and Meller & Kuroda, 2002.


Chromosome abnormalities

Chromosome abnormalities may be numerical (aneuploidy: monosomy or trisomy) or structural: deletion, inversion (pericentric or paracentric), translocation, duplication, isochromosome, ring chromosome etc. In general, detection of a structural anomaly in a child should trigger chromosome analysis of parents to rule out a carrier state but numerical anomalies are presumed to be due to sporadic cell division errors. Maternal age effect is seen in trisomies due to nondisjunction (whereas paternal age effect is more relevant in conditions due to de novo point mutations and structural rearrangements; see Chandley, 1991 and Ballesta, 1999 for reviews of parental origins of de novo mutations, and Grimm, 1994 for an example). Risk of having an offspting with a chromosomal anomaly for a parent with balanced (pericentric fusion type) translocation (Robertsonian) depends on which parent is a carrier: id the mother has it, the risk is 8%; it is 4% when the father has it. Disorders caused by chromosomal deletions are clinically more severe than those caused by duplications. For monosomy X, a high proportion of cases show loss of the paternal sex chromosome. It seems likely that the error could arise at the pronuclear stage after sperm entry into the egg, rather than at meiosis in the male (Chandley, 1991).



Descriptive and comparative statistics

* All of the DNA in one cell measure about 1.7m

* Estimated number of structural genes: humans 30,000; mouse 30,000 (all will be sequenced by 2005); Drosophila 13,600 (complete sequencing hast been finished, see Science March 24, 2000). The yeast S.cerevisiae has 6,000, the bacteria E. coli has 4,377, and the nematode (roundworm) C.elegans has 19,000 genes.

* The number of genes on each chromosome shows a rough correlation with the physical size of the chromosomes. Chromosome 1 has the highest number (2776 known & 30 unknown genes); chromosome 21 has 367 known & 20 unknown genes. Male-specific Y-chromosome has only 322 genes.

* Total genome size: 3,000 Mbp in humans; 100 Mbp in C.elegans; 12.05 Mbp in S.cerevisiae; 4.64 Mbp in E.coli, 1.83 Mbp in another bacteria Haemophilus influenzae (the first fully genome sequenced free-living organism); 1.045 Mbp in the parasitic bacterium Chlamydia trachomatis; 130-140 Mbp in A. thaliana (see Genome Sizes).

* Mycoplasma genitalium has the smallest known genome capable of independent replication. It has 517 genes.

* The largest gene identified so far is the dystrophin gene (responsible for Duchenne’s muscular dystrophy (DMD)). It is 2.4 Mbp; has 80 coding regions and encodes only a 3,700 amino acid-long protein. This is one reason why it has a very high mutation rate (see below). In comparison, insulin gene is 1.43 kb with three coding regions and the final product is 51 amino acid-long.

Definitions  (see Glossary)

Dominant, recessive, co-dominant, incomplete dominant.

Transcription, translation [central dogma of genetics; semi-conservative replication].

Intron, exon: introns end with the dinucleotide ApG [3' splice site / acceptor] and start with the dinucleotide GpT [5' splice site / donor].

Untranslated regions (UTRs): These are the regions flanking translated part of a gene. They are transcribed (represented on mRNA) but not translated (do not exist in the peptide product). The 5' UTR is usually the initial part of exon 1 and 3' UTR is the latter half of the last exon.

Beadle and Tatum’s original 1941 hypothesis predicting one gene - one enzyme had to be revised first as one gene - one polypeptide; and finally one gene - multiple polypeptides. This is because alternative splicing can create multiple polypeptide products with differing activities. Other mechanisms that create more than one product from a single gene include overlapping genes and bidirectional genes (examples).

Wobble hypothesis - degeneracy / redundancy of the genetic code [for example, arginine, leucine and serine are each encoded by six different triplets].

Mutations, imprinting, penetrance (all patients with ankylosing spondylarthropathy have HLA-B27 but only 2-3% of the population have the same genetic marker which is an example of low penetrance), mosaicism (Lyonisation, X inactivation), methylation, epistasis are important concepts in understanding classic and nonclassic genetic phenomena.

Trinucleotide repeats and genetic anticipation (see Clinical Genetics).


Gene expression

a) Ubiquitous: Housekeeping genes, most metabolic enzymes, ribosomal proteins, actin, tubulin, HLA class I and beta-2 microglobulin.

b) Tissue-specific: Myoglobulin, gamma-globulin, TCR, HLA class II, growth hormone and other hormones.

DNA is a negatively charged acidic molecule because of the phosphate groups. Each of the purine or pyrimidine bases is a nitrogenous base.

Terminology: ApT (phosphodiester - covalent bond) on the same strand vs AT (hydrogen bond) the base pair on different strands

Start codon: AUG codes for methionine. It does not necessarily mean that each polypeptide starts with Met because most of the time it is eliminated by post-translational modifications. Rarely the start codon is GUG and encodes valine.

Stop codons: UAA, UAG or UGA do not code for an amino acid. A nonsense mutation creates one of these codons. Redundancy (or degeneracy) of the genetic code also applies to the stop codon. Both AUG and UGA code for stop. Any mutation creating a triplet of one of the stop codons is called a chain-terminating or nonsense mutation. Other types of mutations are silent (synonymous) mutation [the new triplet still codes for the same amino acid due to the redundancy of genetic code], and non-synonymous ones: missense mutation and frameshift mutations (insertion or deletion of one or more nucleotides).

At the end of the transcription, the resultant mRNA contains leader sequence, coding region and a trailer sequence.

CpG dinucleotide islands are often located at 5' of genes. A common point mutation is a transition-type substitution between the two pyrimidines, C and T (35-50% of all point mutations). Cytosine, when linked to a guanine (CpG), is often methylated. 5-methyl-cytosine (C) is unstable and when deaminated yields thymine (T). CpG is therefore replaced by TpG. Such a mutation is shown as (C245T) or (245C>T), the number showing the position of the nucleotide change relative to the transcription initiation site (Human Gene Mutation Nomenclature, Mutations in Molecular Cell Biology).

The number of hydrogen bonds between G and C is three, but between A and T, it is two. High GC content makes the DNA more stable and gives it a higher melting point (Tm).

More on Gene Expression


DNA replication

Steps involved in DNA replication:

1. Identification of the origin of replication (not well-characterized in mammalian cells),

2. Unwinding of double stranded DNA to provide a single-stranded DNA template by helicase and topoisomerase,

3. Formation of the replication fork,

4. Initiation of DNA synthesis and elongation (by primase and DNA polymerase) during which single-strand binding proteins prevent premature reannealing of DNA,

5. Ligation of the newly synthesized DNA segments by ligase.

DNA replication proceeds from 5' end to 3' end corresponding to N-terminal to C-terminal of the subsequent protein. Translation of the mRNA is initiated by the interaction between eukaryotic initiation factors 4F (eIF4F; mRNA cap-binding protein) and 7-methylguanosine (m7G) cap on 5' mRNA (Translation Initiation Book Chapter). Termination of translation is achieved by recognition of stop codons by termination/release factor (Translation Regulation Book Chapter).


Germline Gene Mutation Rates

Mutation rate is expressed as the number of new mutations per locus per generation; it is estimated as the incidence of new, sporadic cases of an autosomal dominant or X-linked disease that is fully penetrant such as achondroplasia. The new mutation rate ranges between 10-4 to 10-7 with a median 10-6. The factors influencing the mutation rate are the gene size, mutational mechanism, presence of hotspots (methylated CpG nucleotides). Being very large ones, Duchenne’s muscular dystrophy (DMD) and neurofibromatosis genes have very high mutation rates (see for example Grimm, 1994). The reason for very high mutation rate in achondroplasia is due to a hotspot causing the G380R (Gly308Arg) mutation (nucleotide 1138G>A) in fibroblast growth factor receptor-3. Other diseases due to high germline (de novo) mutation rate are: Rett syndrome; congenital adrenal hyperplasia; Rubinstein-Taybi syndrome; Marfan syndrome.


DNA repair

Mammalian DNA polymerase e is capable of proofreading of newly synthesized DNA. Both DNA polymerase e and b (corresponding to E.coli DNA polymerase II) can repair DNA. Mechanisms of DNA repair:

1. Mismatch repair: copying errors (single base mismatching or two to five base unpaired loops) can be corrected by strand cutting, exonuclease digestion and replacement. Mutations of human mismatch repair genes MSH2, PMS1 and PMS2 are related to hereditary nonpoliposis colon cancer (HNPCC).

2. Base excision-repair: Spontaneous or induced point mutations can be corrected by base removal and replacement.

3. Nucleotide excision-repair: An approximately 30-nucleotide oligomer can be removed and replaced (cut-and-patch repair).

4. Double-strand break repair: Ionizing radiation, chemotherapy and oxidative free radicals are responsible for these breaks which can be repaired by unwinding, alignment and ligation.

Some aspect of DNA repair mechanisms is deficient in inherited diseases: Xeroderma pigmentosum, ataxia-telangiectasia, Fanconi anemia, Bloom syndrome, and Cockayne syndrome.

See also DNA Repair Mechanisms in the Molecular Biology Web Book.


The most frequent single base alteration is deamination of cytosine to uracil. With corrective action this results in a C to T as well as a G to A point mutation on the other strand. Cytosine, when linked to guanine (CpG), is often methylated. 5-methyl-cytosine is unstable and when deaminated yields an Uracil. This is corrected to a thymine (a C to T mutation). When this strand replicates, at the residue corresponding to this T, now there is an adenine instead of guanine (G to A mutation). In general, transition type substitutions (between C and T, or G and A) are more common than transversion type substitutions (between purine 'A/G' and pyrimidine 'T/C' nucleotides). C to T transversion type of mutation within the MSH2 gene causing HNPCC is an example of this type of mutation with clinical relevance. See also Mitochondrial DNA.


20 Things You Didn’t Know About... DNA (Discovery Magazine 2011)


Gene Transcription in Molecular Biology Web Book


Molecular Structure of Genes and Chromosomes in Molecular Cell Biology


Wellcome Trust Human Genome: Chromosome Browser & Zoom in Your Genome


Genes & Chromosomes Lecture Notes    Chromosome Abnormalities    DNA From the Beginning


Chromosomal Abnormalities Tutorial: (1)  (2)    Chromosome Analysis by FISH


NCBI: Entrez-GENE   Map Viewer 


M.Tevfik Dorak, MD, PhD


Last updated on 26 July 2013


Genetics     HLA     MHC     Genetic Epidemiology     Population Genetics    Glossary     Homepage