Linkage is said to occur if two different alleles are passed to a subsequent generation in a proportion different to that expected by chance alone and is therefore a technique applied to family pedigrees rather than to cases and population controls. If two loci are completely linked (which usually means that they are very close together on the same chromosome), they will nearly always be passed together via the gametes to the offspring. Thus no new combinations of these two alleles will be found to have occurred during meiosis in the transmitted chromosome. Where similar alleles at the two loci are completely unlinked, the alleles assort independently in meiosis and 50 per cent of the offspring will have chromosomes with combinations not found in the parental strain.
In practice, the proportion of new genotypes will be between 0 and 50 per cent depending on the recombination percentage present (the number of gametes with a recombination that separates two particular parental alleles divided by the total number).
Clearly, the closer the two gene loci (or a gene and a marker locus), the less likely it is for a recombination event to separate them. By convention, a 1 per cent recombination frequency implies a distance of 1 centimorgan (cM) at least for differences less than 10 cM and this equates (depending on the region of the genome) to some 1 million base pairs in distance. By detecting linkage with polymorphic markers whose location is known, it is possible to use this technique to infer the approximate location of a disease susceptibility gene.
In principle, genetic linkage may be sought between a particular marker variant and phenotype or throughout the entire genome, the latter requiring no assumptions to be made about gene function. However, the sheer scale of this endeavour must be appreciated. Typically, a genome scan might utilize some 300–400 polymorphic markers to localize linkage (with a given confidence interval, say 95 per cent) to an area spanning approximately 10–20 cM (typically around 10–20 million base pairs). This genomic region may contain more than 200 genes, 60 000 common variants and many other rarer variants in intronic, exonic and regulatory sequences (McCarthy, 2002). It is clear from this that finer mapping techniques must then be used if there is to be any realistic prospect of cloning a specific gene.
The statistical significance of a linkage signal is denoted by the log odds likelihood ratio difference (LOD) score, which is the logarithm of the likelihood of the odds that two loci are linked compared with the likelihood of the odds for independent assortment. Correct interpretation of linkage signals relies critically on understanding the precise meaning of a LOD score to avoid inadvertent type 1 error. Whilst a LOD score of 3 for the effect of a single genotype on a single phenotype denotes a nominal p value of less than 0.0001, when repeated for multiple markers in a typical genome linkage scan, this equates to a true p value of around 0.09 and thus fails to reach classically defined statistical significance.
It has been recommended that adopting LOD score thresholds of 3.3–3.8 to define definite linkage will result in less than 5 per cent probability of type 1 error over a whole genome scan comprising 300–400 marker loci whilst linkage signals with LOD scores between 1.9 and 3.3 are best considered ‘suggestive’ and worthy of further investigation (Lander and Kruglyak, 1995).
Limitations of traditional linkage analysis include its reliance on assumptions about disease transmission (genetic architecture) and the fact that linkage signals may be hard to detect when the gene in question contributes only marginally to the phenotype. The use of special relationships such as discordant sibling pairs has been used to circumvent these problems to some extent but, even with the advent of more complex study designs and elaborate computational methods (including models robust to assumptions about mode of inheritance such as non- parametric linkage analysis) it is still not possible to model the involvement of more than two disease loci and this is especially problematic in human obesity where disease susceptibility may depend more on particular configurations of more than one variant (as with calpain 10 in type 2 diabetes; Altshuler et al., 2000) than on the presence or absence of a single variant.
At the time of writing, human genome scans in various ethnic populations have uncovered obesity loci on chromosomes 2 (containing the POMC gene), 5, 10 (confirmed in multiple ethnic populations), 11 and 20 (Clement et al., 2002).
To date, and probably for all the reasons discussed above, none of these linkage signals has led to the identification of a specific polymorphism that is associated with obesity in the general population.
Warden CH and Fisler JS
Katsanis N, Beales PL, Woods MO