Genetics Evolution HLA MHC Genetic Epidemiology Population Genetics Glossary Homepage
POSSIBLE MISUNDERSTANDINGS AND
MISCONCEPTS IN GENETICS
M.Tevfik Dorak, MD, PhD
Please update your
bookmark: http://www.dorak.info/genetics/misund.html
See also Misconceptions
About Evolution: Berkeley,
TalkOrigins and Wikipedia
* OMIM
(Online Mendelian Inheritance in Man) database lists non-Mendelian disorders as
well. Likewise, NCBI-dbSNP
records all base changes not just SNPs.
* You would be excused if you
thought that SNPs account for all kinds of variation in the genome or there was
no known variation in the genome before the term SNP was first used. SNPs may
be the most common type of variation but there are many other kinds of
variations and most importantly an extensive array of structural variation (see
Feuk, 2006)
and copy number variations (see Redon,
2006).
* Mutation involves any change in
the hereditary material: from a point mutation to a chromosomal loss. To have a
functional consequence, a mutation does not have to be in the coding region. An
intronic mutation may well result in a non-functional gene (like the splicing site
mutation in CYP21A2).
* Even 1M SNP chips may not be able
to cover the whole genome. One reason for that will be those genes that need to
be selectively amplified first (due to the presence of a duplicated copy or
pseudogene) will not be represented at all. Highly polymorphic regions (such as
HLA genes) are not represented either due to difficulty with designing primers because
of the lack of constant regions flanking the variants.
* Many authors use the term mutation
for any rare allele (<1%) and the term polymorphism for any common allele
(>1%). This is one definition of mutation and polymorphism. The other one is
that ‘mutation’ is any variation in the gene that causes an obvious change in
phenotype whereas polymorphisms do not change any obvious phenotypic variation.
It is best to be aware of these definitions while sticking with the
recommendations of the Human
Genome Variation Society and to use 'sequence variant', 'alteration' or 'allelic
variant' for any genomic change
regardless of their frequency or phenotypic effects.
* Chromosomes do not have the shape
of what they are most frequently illustrated as. Those figures refer to a
metacentric 'replicated' chromosome.
* Expressivity and penetrance are
very different concepts. Expressivity is the variation in the expression of a
trait or a disease (phenotypic heterogeneity). Gaucher disease and
neurofibromatosis are examples of variable expressivity. Penetrance refers to
frequency of expression of a genotype regardless of severity of the phenotype.
Low penetrance genotypes will only be expressed in a small frequency of
individuals bearing them (as in acute intermittent porphyria). This expression
may still vary in its clinical severity (this is expressivity) (a review
on penetrance and expressivity by Zlotogora J, 2003 in Genetics in Medicine).
* If different alleles of a locus
cause the same disease, this is called 'allelic heterogeneity' but if they
cause different diseases/phenotypes, this is also called 'allelic heterogeneity'.
This term should be used unambiguously.
* Genetic heterogeneity and locus
heterogeneity are used interchangeably in practice but this requires attention.
'Locus heterogeneity' is only used for the involvement of different loci in the
causation of a disease/phenotype individually (as in early-onset Alzheimer
disease; three different genes may cause the same phenotype). 'Genetic
heterogeneity' may also be used to mean a combined effect of different loci in
the development of a (complex) disease (as in diabetes; multiple loci are
'simultaneously' involved in the development of diabetes).
* There is a difference between a
trait being influenced by genes and trait variation being influenced by genetic
variation. See
Terwilliger & Weiss, 2003.
* Syntenic means two different
things in different contexts: It is (syntenic genes) described as "genes
thought to reside on the same chromosome" in Dictionary of Genetics by
King & Stansfield (see for example a Lecture
Note on Linkage & Recombination); and (syntenic maps) as "genetic loci
that lie in the same order on the same chromosome in different species" in
Dictionary of Biological Terms by E Lawrence (see for example, NCBI Human-Mouse conserved synteny maps
or Human chromosome 22 and
syntenic Mouse chromosome 16 maps). Both are correct but this dual usage may cause
misunderstandings. Lee Silver explains both in the same entry in the Encyclopedia
of Genetics: synteny describes two or more genes or loci that have been
mapped to the same linkage group. Conserved synteny refers to the situation
where two linked loci in one species have homologs that are also linked in
another species.
* At the genomic level, it is often
quoted that there is 99.8% similarity in certain coding regions between humans
and chimpanzee genomes. Remember that:
1.
One nucleotide (out of 3 billion) difference may cause lethal diseases [sickle
cell disease, hereditary hemochromatosis, etc]
2.
Just one gene 'SRY' of the Y chromosome is responsible for almost all the
difference between a male and a female
3.
On the other hand, at the sequence level any two human subjects differ from
each other by 0.1%. This corresponds to 3 million nucleotide differences. It is
not the number of differences but the nature and location of differences that
matter. And most importantly:
4.
Any conclusion drawn only from linear DNA sequence comparison ignores the
effects of epigenetic differences, posttranscriptional and posttranslational
effects.
(See a discussion in Scientific American; Marks J: What It means To Be 98% Chimpanzee. University of California Press, 2002, Diamond J: The Third Chimpanzee. Perennial, 1994; What Makes us Humans? in Human Molecular Genetics; and Science Magazine Breakthrough of the Year 2005: Evolution in Action.)
* Humans are not descendants of apes. They share a
common ancestor who lived about 5-7 Mya and is now extinct. This is similar to
say that modern humans are not descendants of Neanderthals but they shared a common
ancestor lived about half a million years ago.
* Different species of humans have been recognized (H.
erectus, H. habilis etc). This does not mean these ‘species’ lived
contemporarily and could not interbreed. This operational classification is
based on structural differences and not on genetic isolation.
* Mendel's experiments were on characters determined
by single genes. Multiple genes and environmental factors that interact with
one another determine most characters. Quantitative genetics deals with such
characters or complex diseases. See also ‘Some apparent
exceptions to Mendelian rules’.
* A quantitative locus is involved in the expression
of a continuous character like weight or height but not a countable one.
* Dominance and recessiveness are the features of
phenotypes but not genes. It is more common to call genes dominant or recessive
but strictly speaking, this is wrong.
* Dominance models in genetic epidemiology refer to
associations with heterozygosity and ‘dominance’ here is very different from ‘dominance’used
in classic genetics.
* DNA and RNA are traditionally called nucleic acids. The
so-called 'nucleic' acids include extra-nuclear (cytoplasmic) DNA and tRNA/rRNA
too. In other words, mitochondrial DNA is also a nucleic acid but it is not in
the nucleus. Similarly, viral DNA or RNA are nucleic acids but not enclosed in
a nucleus.
* DNA is deoxyribonucleic acid composed of four nitrogenous bases (A, T, G, C),
deoxyribose and acidic phosphate groups. It is the phosphate group that
makes it acidic.
* Start codons AUG/GAG code for methionine and valine.
It does not necessarily mean that each polypeptide starts with Met/Val because
most of the time it is eliminated by post-translational modifications.
* Genes are said to be transcribed 5' to 3'. This is
the direction on the coding strand but it is actually the non-coding or
template strand which is transcribed and this happens 3' to 5'. The resulting
mRNA is made 5' to 3'.
* Antisense treatment is targeted against the mRNA
(which is always a sense strand) but not against the sense (coding) strand of
DNA.
* Beadle’s one gene-one enzyme/protein concept is
essentially correct but not strictly valid any more. Alternative splicing,
overlapping genes, posttranslational modifications and other mechanisms create
more than one protein product from a given sequence in the genome. This is
similar to the fact that same gene may cause more than one distinct clinical
syndromes (due to different mutations) (for more on this, see Clinical Genetics). One reason for the
estimated number of genes in the human genome was initially so high (around 100
thousand) was strict interpretation of one gene-one protein concept.
* The number 30-35 thousand for functional genes in
human genome is only for genes identified structurally by conventional
criteria. It does not include the non-conventional genes such as small RNAs,
transcribed/processed pseudogenes, alternatively spliced versions and some of
the overlapping genes. Total number of proteins encoded by the human genome is many
times the number of structurally recognized genes.
* Confusion may arise when two different numbers are
quoted for the number of genes in a genome. One has to be specific about what
is mentioned. Total number of genes (loci) and number of protein-expressing
genes are different in a genome. Thus, number of polymorphic markers is
(millions) far too much more than the number of genes (which is 30-35 thousand
in the human genome).
* Pseudogenes can be transcribed (examples are CYP21A1P
and DRB4-null).
Although not always translated into a protein product, a pseudogene can be
transcribed to a RNA product and this can be involved in gene expression
regulation (for an example, see Hirotsune
et al, 2003). This is similar to transcribed but untranslated parts of a
conventional gene. Pseudogenes and processed
pseudogenes should also be distinguished (for more, see http://pseudogene.org).
* DNA is not the blueprint for life. It can
be said that the DNA contains the biochemical instructions a living organism
will need. Think about sex determination in reptiles or the penetrance issue
for a cancer susceptibility gene. The epigenetic variation creates difference
between individuals bearing the same sequence of DNA (see the chimpanzee vs.
human sequence similarity discussion above).
* Leader sequence and signal sequence are used
interchangeably as if they were the same thing. Although this is common
practice, strictly speaking, leader sequence is only transcribed but not
translated and it leads the mRNA to the ribosomes; signal sequence is
translated and helps the protein to reach its final destination (this may be
outside the cell) where it is cleaved off.
* Another unfortunate common practice is using
allele/gene/antigen/phenotype/marker frequency interchangeably. For a brief
discussion of their exact definition, see: Statistical
Analysis of HLA and Disease Associations. One thing that is more than just
careless usage is the use of ‘carrier / carriage frequency’ instead of marker
frequency. Carrier frequency has a specific meaning in genetics (as in carrier
frequency for thalassaemia trait) and should not be used to describe the
proportion of tested subjects positive for a marker (heterozygous and
homozygous combined) when marker frequency is the more appropriate term (for
Reference, see Svejgaard
& Ryder, 1994).
* A huge and unforgivable mistake is to compare allele
(gene) frequencies (corresponding to multiplicative model) with marker (allele
positivity) frequencies (corresponding to dominant model) in association
studies (and usually to find a very significant association!). This is not as
rare as one might think (or hope).
*
Polymorphism has several definitions. A polymorphic locus was originally
defined as a locus in which the least common allele occurs with a frequency of
at least 1%. A more appropriate definition has been suggested by Elston as a
locus in which the most common allele occurs with a frequency of at most 99% (Elston,
2000). The original definition would fail to accommodate the HLA loci which
have >100 alleles (although they are cumulative product of multiple polymorphic
sites within each gene) but Elston's definition allow for more than 100
alleles.
*
A functional polymorphism may either increase or decrease gene expression of
the gene function. This means not all polymorphisms decrease gene
expression/function.
*
A genetic association may be a chance finding and opposite is also true that
the lack of it may be a chance finding. However, no reviewer will tell you that
failure to replicate an association might have been due to lack of chance!
Population stratification or any bias may also work either way but are usually
suspected or has to be ruled out when an association is found.
* In PCR, dNTPs are used but in the final product, the
nucleotides are dNMPs. The two phosphates are cleaved off to obtain energy to
drive the reaction (DNA replication).
* Meiosis is said to create four daughter cells out of
one. This does not apply to an oocyte, which gives rise to a single daughter
cell (ovum) and two polar bodies. In other words, an ovum's chromosome content
is halved only after fertilization.
* It is quite common to name the growing human
offspring as embryo or fetus regardless of the period of intrauterine life. The
correct terminology for the offspring: [fertilization] zygote - conceptus -
embryo (after implantation) - fetus (after organogenesis is complete) [birth]. (Plural for conceptus is either concepti (Latin)
or conceptuses (English)).
* 'Murine' refers to the rodent family Muridae, which
includes both rats and mice. By common practice, however, the term is used
almost exclusively for mice. If you mean mice, say mice.
* Evolution does not prearrange what action to take.
It proceeds by natural selection in which individuals with the most adaptive
characteristics in a given environment are selected (favored). Over many
generations, the number of individuals bearing the adaptive characteristics
increases until all individuals have them.
* Genetic fitness is the overall ability to leave
surviving offspring who themselves will be able to reproduce successfully. It
has no correlation with physical fitness. Also reduced fitness does not
necessarily involve death; there are a lot of long-living people but infertile.
* Heritability may be a high in a given population for
a given character/disease but this does not mean environment does not play any
role in the expression of that phenotype. If the same estimate is attempted in
a different population, it may not be that high.
* Ethical issues aside, eugenics is a fallacy. Even if
all individuals expressing a recessive disease are eliminated from the
population (most of which are not fertile anyway), there will still be about
100 times more asymptomatic carriers of the gene for the same disease. Unless
based on a very comprehensive genetic screening program, a eugenics program
will never achieve its aim.
* It is now possible to type thousands of
polymorphisms of the genome in a single assay using microarray technology.
Every time a new susceptibility marker is announced, there is a lot of talk
about the possibility of an insurance company’s use of it. If we are all
screened for all of our genes, all of us will have a few recessive lethal genes
and a lot of susceptibility markers for complex diseases. No one will ever be
clear of all susceptibility genes. The fallacy is that having the nucleotide
sequence for a ‘bad’ gene does not mean it will do any harm.
* Susceptibility and predisposition are usually used
interchangeably but most genetic epidemiologists have begun to mean different
things by these two words. Predisposing genes are those high-penetrance genes
that are necessary and sufficient to cause a disease and susceptibility genes
are low-penetrance genes that are neither necessary nor sufficient to cause a
disease. Susceptibility genes contribute to disease development in a
multifactorial setting but the disease can occur in
their absence (Greenberg,
1993; Greenberg
& Doneshka, 1996).
* Nick translation is not actually a translation event
in classical sense. It is the replication of DNA by a polymerase.
* Gene expression is the process that converts a
gene's coded information into the structures operating in the cell.
Expressed genes include those that are transcribed and translated all the way
to proteins, and those that are transcribed into RNA but not translated into
protein (e.g., transfer and ribosomal RNAs). XIST
and H19
genes are transcribed but not translated to a protein product (Joubel,
1996; Milligan,
2002). Thus, gene expression should not be described as conversion of
genomic information to protein sequences.
* Nucleotide level changes
are only one of many phenomena that affect gene expression. Cis- and
trans-acting modifiers, epistatic interactions, gene-environment interactions,
parent-of-origin, sex-specific imprinting and other epigenetic effects,
post-translational modifications and many others are involved in gene
expression. These are some of the reasons for the lack of strict
genotype-phenotype correlations for many genes.
* Homology, analogy and
paralogy are related but different concepts and cannot be used interchangeably.
Homology and analogy concern different species (former with a common ancestor)
and paralogy concerns the same species (see glossary).
Most importantly, most of the time there is no such thing as 'sequence
homology' but what is meant by this is 'sequence similarity'
(see Similarity, Homology, Divergence and Convergence in the
NCBI Online Book: Sequence - Evolution - Function).
* Mitochondrial Eve theory does not say that at some point in the history there was one female member of
our species. The difference between gene and individual genealogies is
important in the interpretation of the findings led to this theory. All genes
eventually coalesce into a single common ancestor. The mitochondrial Eve was
not necessarily the only female alive at that time, or the first female to have
that particular type. Many other mtDNA lineages may have gone extinct before or
after she lived.
* Linkage, association and linkage disequilibrium (LD)
are all different concepts. The HLA-A locus and hereditary hemochromatosis show
linkage in families. The HLA-A3 allele shows an association at the population
level because of LD between the C282Y mutation of HFE and HLA-A3. Linkage does
not imply LD and vice versa. The delta (D) value may be zero for linked loci (linkage only occurring in families
and no LD at the population level which would occur
if the disease mutation occurred on multiple chromosomes independently), and
delta value may be different from zero for unlinked loci. Linkage stems from no
recombination within the family (limited number of meiosis events) and LD is a
reflection of proportion of recombination events over many generations in the
population.
*
Model-based linkage analysis is based on a likelihood ratio, the logarithm of
which is called a LOD score. This is not
the logarithm of the odds for linkage but the logarithm of the likelihood ratio
for a particular value of the recombination fraction vs. free recombination,
i.e., q (theta) = 0.5 (Elston,
1998; Olson,
1999).
*
Linkage disequilibrium (LD) and Hardy-Weinberg equilibrium (HWE) are also different
things and have not got much in common. LD is -not having- linkage equilibrium,
which is quantitated by a delta value and an associated P value shows
the significance of the disequilibrium. In HWE tests, getting a significant P value also means disequilibrium,
which is a worrying thing when the population sample is supposed to be in HWE
(like the control group of an association study).
* Major assumption of HWE is random mating while there
is no random mating in any human population. How many tall women do you know
are married to short men? The popular program Haploview uses P = 0.001 as
the threshold for Hardy-Weinberg equilibrium violation probably in recognition
of the unrealistic assumptions of HWE.
* The definition of haplotype does not require having
two markers on the same chromosome. For the HLA system, for example,
HLA-B44-DR4 is a haplotype flanked by the genes encoding B44 and DR4 but it is
also possible to talk about DR4 haplotypes. This would mean an undefined length
of chromosomal segment carrying DR4 gene extending either way.
* Cancer is a genetic disease but this concerns
somatic cells. Cancer due to germ-line DNA mutations (inherited cancer) is much
less than 10% of all cancers. Genes are usually the target, not the origin, of
the cancer process. Most mutations found in cancer cells (somatic) cannot be
detected in surrounding healthy cells (germ-line).
*
Anti-oncogenes are not genes that antagonize the effects of the oncogenes. They
are the genes with anti-oncogenesis effects. Similarly, carcinogens (or
cancerogens) are not necessarily cancer-causing genes, they are more frequently
chemicals or other environmental factors (e.g., viruses) which promote cancer
development.
*
The original Knudson's two-hit hypothesis (Knudson,
1975) suggesting loss of heterozygosity or homozygous deletion are the two
hits required for the loss of tumour suppressor gene activity has now been
extended to include transcriptional silencing by DNA methylation of promoters
that can disable tumor-suppressor genes (eviews by Yu & Shen, 2002; Balmain
et al, 2003 and Paige,
2003).
*
X-linked diseases (eg, hemophilia, Wiskott-Aldrich syndrome) occur almost
exclusively in males but these are recessive ones. X-linked dominant diseases
are not seen in males because they are lost during fetal development (see Clinical Genetics).
*
Some X chromosome genes escape inactivation (Shapiro,
1979). This creates a situation, which is opposite of what is achieved by
Lyonisation (gene dosage compensation): X chromosome genes that escape
inactivation are represented twice as much in females (eg, MIC2X/CD99;
STS;
XG).
Such genes are either in pseudoautosomal region of the X and Y chromosomes or
they have a homologous gene on the Y chromosomes (eg, ZFX/ZFY).
*
Not all genes on X-chromosome behave like sex-chromosome-linked genes. Those
within the pseudoautosomal regions (PAR) exist in two copies in both sexes (see
pseudoautosomal region in Genes and Chromosomes).
*
Does anyone have an idea about the
following (or would like to do research on these)?
1. Why is it that primary sex ratio at fertilization
may be as high as 165:100 (see for example: Tricomi,
1960; Shettles,
1964; Serr
& Ismajovich, 1963; Lee
& Takano, 1970; McMillen,
1979; Kellokumpu-Lehtinen
& Pelliniemi, 1984; Vatten, 2004; C3 Newsletter 13/2)
but it falls down to 106:100 at birth in humans
(and similarly in most mammals) but nobody thinks about the
reasons/implications of this? A continuation of this process (elimination of
excess males) is the increased morbidity and mortality of male infants and
children (well-known male disadvantage (Stevenson, 2000)
or fragile male (Kraemer, 2000),
which has evolutionary explanations (Trivers
& Willard, 1973; Wells,
2000; Dorak,
2002)). Could the sex-chromosomes be involved? [One reason must be the
male-specific lethality associated with X-linked dominant diseases but this is
not frequent enough to explain the huge loss.]
2. Why is everybody acknowledging that the carrier
frequency for HFE-C282Y (causing hereditary
hemochromatosis) and CYP21A2
mutations (causing congenital adrenal hyperplasia) is so high (more than 1 in
50 people carry them in some populations) but not much is done about their
implications in public health?
[There was another
question here:
*
Why is it that everybody knows HLA-identical sibling frequency for a leukemic
child is >25% instead of the Mendelian expectation of 25% but nobody wonders
about the genetic basis for this?
The
answer is as follows: This is due to increased parental HLA sharing in leukemic
families (Werner-Favre,
1979; MacSween,
1980; Nordlander,
1983; von
Vliedner, 1983; Carpentier,
1987. 25% expected frequency applies to situations where parents are
heterozygous and do not share any alleles.]
M.Tevfik Dorak, MD, PhD
Last updated on 5 April 2008
Genetics Evolution HLA MHC Genetic Epidemiology Population Genetics Glossary Homepage