Basic Concepts in Genetics - Technion

Basic Concepts in Genetics - Technion

Tutorial #7 by Maayan Fishelson The Given Problem Input: A pedigree + phenotype information about some of the people. These people are called founder typed. leaf 1/2 type d Output: the probability of the observed data, given some probability model for the transmission of alleles. Q: What is the probability of the observed data composed of ? A: There are three types of probability functions: founder probabilities, penetrance probabilities, and transmission probabilities.

Founder Probabilities One Locus Founders individuals whose parents are not in the pedigree. We need to assign probabilities to their genotypes. This is done by assuming HardyWeinberg equilibrium. 1 d/h d-mutant allele h-normal allele Suppose the gene frequency of d is 0.05, then: P(d/h) = 2 * 0.05 * 0.95 Genotypes of different founders are treated as independen 1 2 d/h

h/h Pr(d/h, h/h) = Pr(d/h) * Pr(h/h) = (2 * 0.05 * 0.95)*(0.95)2 Founder Probabilities Multiple Loci According to linkage equilibrium, the probability of the multi-locus genotype of founder k is: Pr(xk) = Pr(xk1) ** Pr(xkn) :Example 1 d/h 1/2 Pr(d/h, 1/2) = Pr(d/h) * Pr(1/2) = 4 * Pr(d)*Pr(h) * Pr(1)*Pr(2 Linkage equilibrium HardyWeinberg equilibrium

Penetrance Probabilities Penetrance: the probability of the phenotype, given the genotype. E.g.,dominant disease, complete penetrance: d/d d/h d/h Pr(affected |d/d) = 1.0 Pr(affected | d/h) = 1.0 Pr(affected | h/h) = 0 E.g., recessive disease, incomplete Can Can be, be, for for example, example, d/d penetrance:

sex-dependent, sex-dependent, Pr(affected | d/d) = 0.7 age-dependent, age-dependent, environmentenvironmentdependent. dependent. Transmission Probabilities Transmission probability: the probability of a child having a certain genotype given the parents genotypes. Pr(xc| xm, xf). If we split the ordered genotype xc into the maternal allele xcm and the paternal allele xcf, we get: Pr(xc| xm, xf) = Pr(xcm|xm)Pr(xcf|xf) The inheritance from each parent is independent. Transmission Probabilities

One locus st The transmission is according to the 1 law of Mendel. d/h 2 1 3 h/h d/h Pr(Xc=d/h | Xm=h/h, Xf=d/h) = Pr(Xcm=h | Xm=h/h)*Pr(Xcf=d | Xf=d/h) = 1 * = We also need to add the inheritance probability of the other phase, but we can see that its zero !

Transmission Probabilities One locus Different children are independent given the genotypes of their parents. 2 d/h 1 3 d/h h/h 4 5 h/h h/h Pr(X3=d/h, X4=h/h, x5=d/h | X1=d/h, X2=h/h) = = (1 * ) * (1 * ) * (1 * )

Transmission Probabilities Multiple Loci Lets look at paternal inheritance for example. We generate all possible recombination sequences (s 1,s2, ,sn), where sl = 1 or sl = -1. (2n sequences for n loci). Each sequence determines a selection of paternal alleles p1,p2,,pn where: x fM if s1 sl 1 pl x fF if s1 sl 1, :and therefore its probability of inheritance is n if sl 1 l 1 (1) (l ) [ p1 xkf ][ pl xkf ] 2 l 2 1 l if sl 1,

We need to sum the probabilities of all .2n recombination sequences Calculating the Likelihood of Family Data - Summary The likelihood of the data is the probability of the observed data (the known phenotypes), given certain values for the unknown .recombination fractions Each person i has an ordered multi-locus genotype xi = (xi1, xi2, ,xin), and a multi-locus phenotype gi. For a pedigree with m people: LL PP((xx)) PP((xx,,gg)) PP((xx||gg))PP((gg)),, gg gg where x=(x1,,xm) and g=(g1,,gm).

Computational Problem L P( x | g ) P( g ) g Performing a multiple sum over all possible genotype combinations for all members of the pedigree. Solution: the Elston-Stewart algorithm provides a means for evaluating the multiple sum in a streamlined fashion, for simple pedigrees. This .results in a more efficient computation Simple Pedigree No consanguineous marriages, marriages of blood-related individuals ( no loops in the pedigree). There is one pair of founders from which the whole pedigree is generated. Peeling Order Assume that the individuals in the pedigree are ordered such that parents precede their children,

then the pedigree likelihood can be represented as: L( ) P ( x1 | g1 ) P ( g1 | ) where P( x m | g m ) P( g m | ) , P ( g i | )is: P(gi), if i is a founder, or ( g i | g mi , g fi ) ,Potherwise. the genotypes of is parents

In this way, we first sum over all possible genotypes of the children and only then on the possible genotypes for the parents. An Example for Peeling Order h(gi) = P(xi|gi) P(gi) 1 2 3 4 5 6 7 h(gm,gf,gc) = P(xc|gc) P(gc|gm,gf)

L h( g1 )h( g 2 )h( g1 , g 2 , g 3 )h( g1 , g 2 , g 4 ) * g1 g2 g7 h( g 5 ) h( g 4 , g 5 , g 6 ) h( g 4 , g 5 , g 7 ) According to the Elston-Stewart algorithm: L h( g1 ) h( g 2 ) h( g1 , g 2 , g 3 ) h( g1 , g 2 , g 4 ) * g1 g2 g3 g4 h( g ) h( g , g , g ) h( g , g , g 5

g5 4 g6 5 6 4 g7 5 7 ) Elston-Stewart Peeling Order As can be seen, this peeling order, clips off branches (sibships) of the pedigree, one

after the other, in a bottom-up order. 11 2 3 4 5 6 7 Elston-Stewart Computational Complexity The The computational computational complexity complexity of of the

the algorithm algorithm is is linear linear in in the the number number of of people people but but exponential exponential in in the the number number of of loci. loci. Ga[1,p] Sc[1,p] Gb[1,p]

Ga[1,m] Pb[1] Pa[1] Pc[1] locus 1 variables Sc[2,p] Ga[2,m] Gb[2,m] Gb[2,p] Pb[2] Pa[2] Gc[2,p] locus 2 variables

Sc[1,m] Gc[1,m] Gc[1,p] Ga[2,p] Gb[1,m] Gc[2,m] Pc[2] 2 1 2 P s c [2, p] | s c [1, p], 2 1 2 2 Sc[2,m]

Variation on the ElstonStewart Algorithm in Fastlink The pedigree traversal order in Fastlink is some modification of the Elston-Stewart algorithm. Assume no multiple marriages Nuclear family graph: Vertices: each nuclear family is a vertex. Edges: if some individual is a child in nuclear family x and a parent in nuclear family y, then x and y are connected by and edge x-y which is called a down edge w.r.t. x and an up edge w.r.t. y. Traversal Order One individual A is chosen to be a proband. For each genotype g, the probability is computed that A has genotype g conditioned on the known phenotypes for the rest of the pedigree and the assumed recombination fractions. The first family that is visited is a family containing the proband, preferably, a family in which he is a child.

Visit(w) { While w has an unvisited neighbor x reachable via an up edge: Visit(x); While w has an unvisited neighbor y reachable via a down edge: Visit(y); Update w; } Traversal Order Updates If nuclear family w is reached via a down edge from z, the parent in w that nuclear families w and z share, is updated. If nuclear family w is reached via an up edge from z, then the child that w and z share is updated. Example 1 :An example pedigree 204 205 304

The corresponding :nuclear family graph 304 100 101 203 202 302 303 201 300 301 400 205 302

200 300 400 Example 2 :An example pedigree 100 102 101 202 201 305 404 The corresponding :nuclear family graph

103 203 304 302 303 300 301 405 403 203 201 304 404 403 400 400 401 402

Recently Viewed Presentations

  • CSE Template - Aamir Cheema

    CSE Template - Aamir Cheema

    q cannot be the closest facility of u if it lies in the half-space. q cannot be among the k-closest facilities of u if u lies in k half-spaces. Pruning Algorithm. Find the nearest unseen facility f in the unpruned...
  • PS (electron cloud?!) instability at flat top Mauro

    PS (electron cloud?!) instability at flat top Mauro

    LHC beam, 25 ns bunch spacing. LHC beam, 25 ns bunch spacing, 1.15 e11 ppb. Removed last part of compression before extraction from PS: 40 MHz cavity maxed up at 100kV (adiabatic compression)
  • Anger-Management-Demo

    Anger-Management-Demo

    Anger problem not only affects others but also adversely affects Henry's health, peace of mind and concentration on work. Concentration on work. Peace of mind. Health. Hence, it is important that everyone should learn to manage their anger so that...
  • Absolute Value

    Absolute Value

    Two numbers that are the same distance from zero are said to be opposites But what is Absolute Value? Absolute Value is the distance a number is from zero. Remember distance itself is ALWAYS positive How do we denote absolute...
  • Essentials of Public Speaking, 3e: Chapter 1

    Essentials of Public Speaking, 3e: Chapter 1

    Note: The bullets on each slide are stationary--making the number of items per slide immediately clear.
  • LReplay: A Pending Period Based Deterministic Replay Scheme

    LReplay: A Pending Period Based Deterministic Replay Scheme

    LReplay: A Pending Period Based Deterministic Replay Scheme ISCA-2010 Yunji Chen1,2 Weiwu Hu1,2 Tianshi Chen3,1 Ruiyang Wu1,2 [email protected] 1 Institute of Computing Technology, Chinese Academy of Sciences
  • Forelesninger i kommunikasjon, mobilitet og infrastruktur

    Forelesninger i kommunikasjon, mobilitet og infrastruktur

    Arial Standard utforming Forelesning om den elektroniske byutviklingens sosiale sider Hovedretninger i litteraturen Teknologien bestemmer Det sosiale bestemmer "Den tredje vei" Byenes betydning Infrastrukturnettverk og urbanisme (Graham og Marvin) Internetts sosialt selektive sider Eksklusjon i byenes transportsystemer Analoge byer ...
  • Curriculum/Instructional Models - thenewPE

    Curriculum/Instructional Models - thenewPE

    Curriculum/Instructional Models Skill Themes (2nd of 2 constructivist styles) 3 W's of Skill Themes What are skill themes? Developing individual skills (soccer pass) and skill themes (dribbling) using concepts learned in movement education's concepts.