Directed Acyclic Graphs - Texas A&M University

Directed Acyclic Graphs - Texas A&M University

Directed Acyclic Graphs David A. Bessler Texas A&M University November 20, 2002 Universidad Internacional del Ecuador Quito, Ecuador 1 Outline Introduction Causal Forks Inverted Causal Forks D-separation Markov Property The Adjustment Problem Policy Modeling PC Algorithm 2

Outline Continued Example: Traffic Fatalities Correlation and Partial Correlation Forecasting Traffic Fatalities More Examples: US Money, Prices and Income World Stock Markets Conclusion 3 Motivation Oftentimes we are uncertain about which variables are causal in a modeling effort. Theory may tell us what our fundamental causal variables are in a controlled system; however, it is common that our data may not be collected in a controlled

environment. In fact we are rarely involved with the collection of our data. 4 Observational Data In the case where no experimental control is present in the generation of our data, such data are said to be observational (non-experimental) and usually secondary, not collected explicitly for our purpose but rather for some other primary purpose. 5 Use of Theory Theory is a good potential source of information about direction of causal flow. However, theory usually invokes the ceteris paribus condition to achieve results. Data are usually observational (non-experimental) and

thus the ceteris paribus condition may not hold. We may not ever know if it holds because of unknown variables operating on our system (see Malinvauds econometric text). 6 Experimental Methods If we do not know the "true" system, but have an approximate idea that one or more variables operate on that system, then experimental methods can yield appropriate results. Experimental methods work because they use randomization, random assignment of subjects to alternative treatments, to account for any additional variation associated with the unknown variables on the system. 7

Directed Graphs Can Be Used To Represent Causation Directed graphs help us assign causal flows to a set of observational data. The problem under study and theory suggests certain variables ought to be related, even if we do not know exactly how; i.e. we dont know the "true" system. 8 Causal Models Are Well Represented By Directed Graphs One reason for studying causal models, represented here as X Y, is to predict the consequences of changing the effect variable (Y) by changing the cause variable (X). The possibility of manipulating Y by way of manipulating X is at the heart of causation.

Hausman (1998, page 7) writes: Causation seems connected to intervention and manipulation: One can use causes to wiggle their effects. 9 We Need More Than Algebra To Represent Cause Linear algebra is symmetric with respect to the equal sign. We can re-write y = a + bx as x = -a/b +(1/b)y. Either form is legitimate for representing the information conveyed by the equation. A preferred representation of causation would be the sentence x y, or the words: if you change x by one unit you will change y by b units, ceteris paribus. The algebraic statement suggests a symmetry that does not hold for causal statements. 10 Arrows Carry the Information An arrow placed with its base at X and head at Y indicates X

causes Y: X Y. By the words X causes Y we mean that one can change the values of Y by changing the values of X. Arrows indicate a productive or genetic relationship between X and Y. Causal Statements are asymmetric: x y is not consistent with y x. 11 Problems with Predictive Definitions of Cause Definition of the word cause that focus on prediction alone, without distinguishing between intervention (first) and subsequent realization, may mistakenly label as causal variables that are associated only through an omitted variable. Prediction is one attribute of the word cause. We must be careful not to make it the only attribute (more or less a summary of Bunge 1959).

12 Granger-type Causality For example, Granger-type causality (Granger 1980) focuses solely on prediction, without considering intervention. If we can predict Y better by using past values of X than by not using past values of X , then X Granger-causes Y. The consequences of such focus is to open oneself up to the frustration of unrealized expectations by attempting policy on the wrong set of variables. 13 Graph A graph is an ordered triple . V is a non-empty set of vertices (variables). M is a non-empty set of marks (symbols attached to the end of undirected edges).

E is a set of ordered pairs. Each member of E is called an edge. 14 Vertices are variables; Edges are lines Vertices connected by an edge are said to be adjacent. If we have a set of vertices {A,B,C,D} the undirected graph contains only undirected edges (e.g., A B). A directed graph contains only directed edges: C D. 15 Directed Acyclic Graphs (DAGs) A directed acyclic graph is a directed graph that contains no directed cyclic paths. An acyclic graph has no path that leads away from a variable only to return to that same variable.

The path A B C A is labeled cyclic as here we move from A to B, but then return to A by way of C. 16 Graphs and Probabilities of Variables Directed acyclic graphs are pictures (illustrations) for representing conditional independence as given by the recursive decomposition: n Pr(v1,v2 vn-1,vn ) = Pr( vi | pai ) i=1 where Pr is the probability of vertices (variables) v1, v2, v3, ... vn and pai the realization of some subset of the variables that precede (come before in a causal sense) vi in order (v1, v2, v3, ... vn), and the symbol represents the product operation, with index of

operation denoted below (start) and above (finish) the symbol. Think of pai as the parent of variable i. 17 D-Separation Let X, Y and Z be three disjoint subsets of variables in a directed acylic graph G, and let p be any path between a vertex [variable] in X and a vertex [variable] in Y, where by 'path' we mean any succession of edges, regardless of their directions. Z is said to block p if there is a vertex w on p satisfying one of the following: (i) (i) w has converging arrows along p, and neither w nor any of its descendants are on Z or (ii) (ii) w does not have converging arrows along p, and w is in Z.

Furthermore, Z is said to d-separate X from Y on graph G, written (X Y | Z)G , if and only if Z blocks every path from a vertex [variable] in X to a vertex [variable] in Y. 18 Graphs and D-Separation Geiger, Verma and Pearl (1990) show that there is a oneto-one correspondence between the set of conditional independencies, X Y | Z, implied by the above factorization and the set of triples, X, Y, Z, that satisfy the d-separation criterion in graph G. If G is a directed acyclic graph with vertex set V, if A and B are in V and if H is also in V, then G linearly implies the correlation between A and B conditional on H is zero if and only if A and B are d-separated given H. 19 Colliders (Inverted Fork) Consider three variables (vertices): A, B and C. A variable is a

collider if arrows converge on it: A B C. The vertex B is a collider, A and C are d-separated, given the null set. Intuitively, think of two trains one starting at A, the other at C. Both move toward B. Unconditionally, they will crash at B. However, if we condition on B, (if we build a switch station at B with side tracks), we open-up the flow from A to C. Conditioning on B makes A and C d-connected (directionally connected). 20 Conditioning on Children (of colliders) Opens Up Information Flows Too! Amend the above graph given above to include variable D, as a child of B, such that: ABC D If we condition on D rather than B, we, as well, open up the

flow between A and C (Pearl, 2000 p.17). This illustrates the (i) component of the definition given above. 21 Common Causes (causal fork) Say we have three vertices K, L and M, described by the following graph: K L M. Here L is a common cause of K and M. The unconditional association (correlation) between K and M will be non-zero, as they have a common cause L. However, if we condition on L (know the value of L), the association between K and M disappears (Pearl, 2000, p.17). Conditioning on common causes blocks the flow of information between effects. 22 Causal chains Finally, if our causal path is one of a chain (causal chain),

condition (ii) in the above definition again applies. If D causes E and E causes F, we have the representational flow: D E F. The unconditional association (correlation) between D and F will be non-zero, but the association (correlation) between D and F conditional on E will be zero. (For those in the audience familiar with Box and Jenkins time series methods, this is a property they exploited in testing for AR models) 23 Example of an Inverted Causal Fork In the example we study below we take data from Peltzman (Jo. Political Economy 1976). This is a study of Traffic Fatalities in the U.S. over the period 1947 1972. Roh, Bessler and Gilbert (1997) find the following (not a surprise): Speed(t) Alcohol Consumption(t)

Traffic Fatalities(t) 24 What Should We Expect Based On The Previous Directed Graph? Here year to year changes in speed and year to year changes in alcohol consumption are direct causes of year to year changes in traffic fatalities. The graph is an inverted fork. So,, we should expect to see that Speed and Alcohol Consumption are not related in unconditional tests of association. However, if we condition on Traffic Fatalities , we should see a nonzero measure of association between Speed and Alcohol Consumption. 25

OLS Regressions On An Inverted Fork (use ols to measure association) Regression #1: Speed(t) = .01 - .01*( Alcohol Consumption(t)) (.002) (.053) Estimated standard errors of the coefficients are in ( ). Based on this regression we would say Speed(t) and Alcohol Consumption(t) are not related (note: -.01/.053 < 2.0). 26 OLS Regressions On An Inverted Fork: Now We Condition on the Effect (traffic fatalities) Regression #2: Speed(t) = .01 - .11*( Alcohol Consumption(t)) (.002) (.051) + .15 * ( Traffic Fatalities(t))

(.046) Here conditioning on the common effect makes the two causes dependent (note: -.11/.051 > 2.0). 27 Example of a Causal Chain In another example, consider the relationship among GDP, Poverty and Malnutrition. Based on World Bank data for 80 less developed countries, we find: GDP Poverty Malnutrition We expect, from the directed graph theory given above, Malnutrition and GDP will be related in unconditional tests. However, if we condition on poverty they should be unrelated. Lets see! 28 Regressions with Causal Chains

Regression #1 (for i=1, , 80 countries) Malnutrition(i) = 24.18 - .003* GDP(i) (1.91) (.0006) Note the t-ratio of -.003/.0006 = -5.38 suggests that GDP is an important variable in moving levels of malnutrition. 29 Regressions with Causal Chains, continued. Regression #2 (for i=1, , 80 countries) Malnutrition(i) = 7.52 - .0013* GDP(i) (2.09) (.0007) + .289 * Poverty(i) (.055) Note the t-ratio of -.0013/.0007 = -1.78 suggests (if we are 5% ers) that GDP is not informative with respect to malnutrition if we have information about a countrys poverty levels. 30

Markov Property Key to understanding these ideas is that d-separation allows us to write the probability of our variables X,Y, and Z in terms of the product of the conditional probabilities on each variable (X,Y, or Z), where the conditioning factor is the immediate parent of each variable. We do not have to condition on grandparents, great grandparents, aunts, uncles or children. (It is helpful and valid to refer to genealogical analogies when thinking about conditioning information.) 31 Some probabilities The following directed graphs have these associated probability factorizations: A B C ; Pr(A,B,C) = Pr(A) Pr(C)Pr(B|C,A) D E F ; Pr(D,E,F) = Pr(D)Pr(E|D)Pr(F|E)

GHI J; Pr(G,H,I,J) = Pr(G)Pr(J)Pr(H|G)Pr(I|J,H) P Q ; Pr(P,Q) = Pr(P)Pr(Q) Here Pr(.) refers to the probability of the variable(s) in parentheses 32 Adjustment Problem (from Pearl 2000) What must I measure if I want to know how X affects Y? Z1 Z3 Z2 Z4

Z6 Z5 Z7 Z8 Z9 Z10 X Z11 Y Original Causal Graph Illustrating the Adjustment Problem

33 D-Separation is Key to Solving the Adjustment Problem Ask the question: can I get back to Y via the ancestors of X without running into converging arrows? Yes! I can take several paths from X to Y through Xs ancestors: X Z3 X Z6 X Z6 X Z6 Z 1 Z4 Z7 Y Z 4 Z7 Y Z 4 Z2 Z5 Z 9 Y Z 4 Z2 Z7 Y I have to condition on variables to block the path back to Y from X. There are

several possibilities: It looks like Z7 and Z9 are two. Below we give six steps for solving the adjustment problem. 34 Step 1. Z7 and Z9 should be non-descendants of X Z1 Z3 Z2 Z4 Z6 Z5 Z7 Z8

Z9 Z10 X Z11 Y Z11 will not work as it is a child of X. 35 Step 2. Delete all non-ancestors of {X,Y and Z}. Z1 Z3 Z2

Z4 Z6 Z5 Z7 Z8 Z9 Z10 X Z11 Y Here Z is the set of candidate blocking variables Z = {Z7 and Z9 }.

36 Step 3. Delete all arcs emanating from X. Z1 Z3 Z2 Z4 Z6 Z5 Z7 Z8 Z9

Z10 X Z11 Y Here we will remove the X Z11 edge, as Z11 is a child of X. 37 Step 4. Connect any two parents sharing a common child. Z1 Z3

Z2 Z4 Z6 Z5 Z7 Z8 Z9 Z10 X Z11

Y Here we will use dotted lines to connect parents with a common child 38 Step 5. Strip arrow-heads from all edges Z1 Z3 Z2 Z4 Z6 Z5 Z7

Z8 Z9 Z10 X Z11 Y 39 Step 6. Delete Lines into and out of Z7 and Z9 Z1 Z3

Z2 Z4 Z6 Z5 We cannot get from X to Y Z7 Z8 Z9 Z10 X Z11

Y Here we delete all lines into the variables that we wish to condition on, Z7 and Z9. 40 Test Test: if X is disconnected from Y in the remaining graph, then Z7 and Z9 are sufficient measurements to condition on. By disconnected we mean that we cannot get from X to Y via the remaining lines. Z7 and Z9 pass the test. So we can perform ols regression of Y on X, Z 7 and Z9 to find an unbiased estimate of the effect of X on Y. 41 Another candidate: Lets Try Z4 all by Itself. If we try just Z4 as a sole candidate variable to condition on, our last figure will be amended as follows: Z1

Z3 Z2 Z4 Z6 Z5 Z7 Z8 Z9 Clearly Z4 will not work Z10

X Z11 Y 42 Why does Z4 fail our test in the previous slide? Because Z4 opens up the path between Z1 and Z2. Remember our speed, alcohol and traffic fatalities example, slides 24 27. If we run an ols regression of Y on X and Z4 we would find biased estimates of the coefficient associated with X. X will be correlated with errors in Y. We say there is a backdoor path between X and Y that will give us biased parameter estimates of the effect of X on Y. 43

Policy and Directed Graphs Consider the following simple graph: XYU We observe X in an uncontrolled setting and we are interested in manipulating Y by setting the value of X. The inference task we have as economists is to move from a sample obtained from a distribution associated with passive observation to conclusions about the distribution that would obtain if a particular policy is imposed. Policy is thus asking questions about counterfactuals. What values will Y take on if we force X to take on a value of Xf=1 ? (here we use the notation that X is forced to have a value of 1 as X f=1). 44 Simple Example of Policy with Exogenous Variable X Consider the table which is based on the graph given above, where Y = X + U

Passive Observations X U Y 1 0 1 1 1 2 1 2 3 2 0 2 2 1

3 2 2 4 | Forced or Policy Induced Xf UXf=1 YXf=1 1 0 1 1 1 2 1 2

3 1 0 1 1 1 2 1 2 3 Notice how Y behaves when we force a value on X (X=1). 45 What were we suppose to see in the previous table? In the above table, Y, under passive observation when X=1, has the same distribution as YXf=1. Look back at the table. Here X is a parent of Y and there is no backdoor path from X

to Y (say via U). A policy analyst may conclude that knowing how X and Y are related in this uncontrolled (passive) setting is sufficient for predicting how they will behave in policy settings. 46 A case where we must be careful V X Y U Here we have a variable V that causes both X and U. Will knowledge of how X and Y behave in passive settings be sufficient for predicting how they will behave in a

policy setting? 47 Consider the following: Let: Y = X + U ; X = V and U = V Passive Observations | Forced or Policy Induced V X U Y

V Xf=1 1 1 1 2 2 2 1 1 1 2 2 2

1 1 1 2 2 2 2 2 2 4 4 4 1 1 1

2 2 2 1 1 1 1 1 1 UXf=1 YXf=1 1 1 1

2 2 2 2 2 2 3 3 3 Notice here that when X=1 in the unforced setting our distribution on Y is 2,2,2,4,4,4. However, when we force X=1 our distribution on Y is sometimes 2, and sometimes 3, but never 4. Under the policy setting on X we cannot ignore V. We have to have it in our model, else we will have policy results which are not well predicted through knowledge of X and Y (passively observed). 48

Results on Traffic Fatalities We find the following relationship among traffic fatalities, alcohol consumption and income. alcohol consumption(t) income(t) traffic fatalities(t) This graph along with our work above on policy analysis suggests that it is not enough to understand the alcohol traffic fatalities link, but we must as well understand how income levels contribute to the problem. 49 Consider the Following Two Regressions Regression # 1: (Ignore the Income connection) TF(t) = .015 + .608 AC(t) : R2 = .38 (.009) (.200)

Regression # 2: (Include Income with Alcohol Consumption) TF(t) = -.005 + .338 AC(t) + 1.055 IN(t) ; R2 = .68 (.008) (.160) (.276) 50 Policy Implications From the last slide we conclude that if we direct national policy towards reducing alcohol consumption we will see a decrease in traffic fatalities. However, that decrease is likely to follow regression #2 rather than regression # 1. In fact below we will see that speed is an even more prominent mover (cause) of traffic fatalities. (We didnt consider speed in the regression given above because as we will see speed is exogenous relative to alcohol consumption and income levels). 51

PC Algorithm Here will present one algorithm which can be used to build directed graphs. The algorithm starts systematically from a complete undirected graph and removes edges (lines) between vertices based on correlation or partial correlation between vertices. Spirtes, Glymour and Scheines (1993) have incorporated the notion of d-separation into an algorithm (PC Algorithm) for building directed acyclic graphs, using the notion of sepset (defined below). 52 A Complete Undirected Graph (gets us started) One forms a complete undirected graph G on the vertex set V. The complete undirected graph shows an undirected edge between every variable of the system (every variable in V).

Edges between variables are removed sequentially based on zero correlation or partial correlation (conditional correlation). Z X Y Here X, Y, and Z are connected with lines having no arrows 53 Remove Edges Using Correlation or Conditional Correlation Each edge is subjected to tests that the correlation between its endpoints is zero: Ho: ( X, Y ) = 0 ? If a correlation is judged to be not different from zero, we remove the edge between the two end points of the corresponding edge. Edges surviving these unconditional correlation tests are then subjected to conditional correlation tests:

Ho: ( X, Y | Z ) = 0 ? If these conditional correlations equal zero pick up the edge X, Y. 54 Fishers Z Fishers z statistic can be applied to test for significance from zero: z((i,j|k)n) = 1/2(n-|k|-3)1/2 x ln{(|1 + ( i,j|k)|) x (|1 - ( i,j|k)|)-1 } . n is the number of observations used to estimate the correlations, ( i,j|k) is the population correlation between series i and j conditional on series k (removing the influence of series k on each i and j), and |k| is the number of variables in k (that we condition on). 55 Sepset The conditioning variable(s) on removed edges between two variables

is called the sepset of the variables whose edge has been removed (for vanishing zero order conditioning information the sepset is the empty set). x y z If we remove the edge between x and y through unconditional correlation test, (x,y)=0, then the sepset (x,y) is {}. If we remove this edge by conditioning on z, (x,y| z)=0 then the sepset (x,y) is z. 56 Edge Direction Edges are directed by considering triples, such that X and Y are adjacent as are Y and Z, but X and Z are not adjacent: X Y Z. Direct the edges between triples: if Y is not in the sepset of X and Z.

If , Y and Z are adjacent, X and Z are not adjacent, and there is no arrowhead at Y, then orient as X Y Z If there is a directed path from X to Y, and an edge between Y and Z, then direct (Y Z ) as: Y Z . 57 Assumptions for PC Algorithm to Work Causal Sufficiency: There are no omitted variables that cause two of my included variables. Markov Condition: We can write probabilities of variables by conditioning just on each variables parents. (we discussed this above). Faithfulness Condition: If we see zero correlation between two variables, the reason we see it is because there is no edge

between these variables and not cancellation of structural parameters. 58 Causal Sufficiency We want to measure the effect of y on z (write this as y ) and we have x, y and z in our study, but we leave another variable, w, out of the study. The world is generated by the graph: w x y w

0 y z 59 Causal Sufficiency Continued If we fail to include w in our sample we will end up with the following graph (let by and bx represent our measured effects based on x,y and z): w x y

E( bx ) 0 E (by ) y z The key to causal sufficiency is that we dont have to have every variable that causes z in our study. But we do need all variables that cause two or more variables in our study. (Here E (b y ) is the expected value of or measure of the effect of y on z). 60 Faithfulness Here we assume that if we measure the correlation between two variables, say x and y, as zero, it is zero because there is no edge between x and y in the true model. It is not zero because of cancellation of deeper

parameters. 61 Faithfulness Continued Say we have the following true model: xy x y xz yz z Thetrue parameters connecting these

variables are given by the betas (xy xz yz ). 62 Faithfulness Continued If it so happens that in the real world: xz = - xy yz then the correlation between x and z will equal zero. PC algorithm will remove the edge between x and z, even though the true model has such an edge. 63 Example: Traffic Fatalities Variables and Data taken from Peltzman Journal of Political Science 1977 Eight variables for the U.S. 1947 - 1974 data: number of traffic fatalities; number of young drivers (15

(air bags etc.); mileage driven; income; alcohol consumption; speed; and trend. All variables are expressed in logarithms. 64 Starting point: a complete undirected graph fatalities youth speed Trend alcohol cost income

mileage 65 Correlation matrix (lower triangular elements) TF SP 1.00 .47 1.00 .51 .34 .19 .03 .77 .20 .61

-.05 .23 .31 .00 .00 MI YO 1.00 -.45 1.00 .57 -.01 .15 .28 .73 -.72 -.29 .71

IN 1.00 .43 .14 -.08 AL CO 1.00 -.16 .25 TN

1.00 -.42 1.00 66 Each element of the covariance matrix is tested: Ho: (i,j) = 0. Each one of these elements can be tested against zero (is each correlation element significantly different from zero?) using Fishers z formula. Here at the zero order correlation put k=0. 67 Correlation (x,y)

Correlation (x,y) is represented by (x,y) (x,y) = cov(x,y)/sd(x)xsd(y) T T T =[ (xt x)(yt y)/ T ]/ [ (xt x)2/T].5 [ (yt y)2/T].5 t=1

t=1 t=1 We can tests these against the null hypothesis that (x,y) = 0, using Fisher;s Z (see above). 68 Correlation Matrix (red and italic elements are not different from zero at a 20% significance level) TF SP 1.00 .47 1.00

.51 .34 .19 .03 .77 .20 .61 -.05 .23 .31 .00 .00 MI YO 1.00

-.45 1.00 .57 -.01 .15 .28 .73 -.72 -.29 .71 IN AL 1.00 .43 .14 -.08 CO

1.00 -.16 .25 TN 1.00 -.42 1.00 We use a 20% significance level based on the recommendation of Spirtes, et.al 1993 (their Monte Carlo work). 69 Undirected Graph Following Zero Order Conditioning

fatalities youth speed Trend alcohol cost income mileage 70 Calculating Conditional Correlations To do this go to the original variance covariance matrix -- take the corresponding elements from it to form a 3x3 sub-matrix, here we just take the relevant

elements from our original 8x8 matrix (go back and look at the original matrix): TF 1.00 .51 .77 MI .51 1.00 .57 IN .77 .57 1.00 Now we have to invert this new matrix, scale it just like we did for the correlation matrix (divide variance and co-variances by the standard deviations). But first

lets make sure we know how to get the elements of this matrix. 71 Original Correlation Matrix (take out the elements you want to focus on, here TF, MI, and IN ) TF SP 1.00 .47 .47 1.00 .51 .34 .19 .03 .77 .20 .61

-.05 .23 .31 .00 .00 MI .51 .34 1.00 -.45 .57 .15 .73 -.29 YO IN AL

.19 .77 .61 .03 .20 -.05 -.45 .57 .15 1.00 -.01 .28 -.01 1.00 .43 .28 .43 1.00 -.72 .14 -.16 .71 -.08 .25

CO TN .23 .00 .31 .00 .73 -.29 -.72 .71 .14 .08 -.16 .25 1.00 -.42 -.42 1.00

Here we have taken the corresponding elements of our original correlation matrix for the variables TF , MI, and IN. Notice that the red elements in this matrix (this page) are the same as the matrix on the previous page. 72 Conditional Correlations on the variables TF, MI and IN The scaled inverse of this matrix will give us the negatives of the partial correlations between these three variables (Whittaker, Graphical Models in Applied Multivariate Analysis, Wiley 1990). TF MI IN 1.00 .14 .63

.14 1.00 .32 .63 .32 1.00 Understand what each element of this matrix is: .14 is the correlation between TF and MI given IN. The number .63 is the correlation between TF and IN given MI, etc. Now each of these can be tested for significance from zero, just as we did for zero order partial correlations. 73 Undirected graph after first order conditioning fatalities youth speed Trend alcohol

cost income mileage 74 Direct the Remaining Edges Recall form above: Direct edges between triples: X Y Z as X Y Z if Y is not in the sepset of X and Z. So look at the triple: [ speed] [ Traffic Fatalities] [ Alcohol] Why did we remove the edge between speed and alcohol ? At zero order conditioning we found the corr ( speed and alcohol) was -.05 and had a p-value of .84. So we did not have to condition on traffic fatalities. 75

Direction of Two Edges Traffic Fatalities is not in the sepset of (Alcohol , speed) so we direct the triple as: [ speed] [ Traffic Fatalities] [ Alcohol] Other edges are directed using the same rules as though given above. 76 Final Directed Graph Traffic Fatalities Youth Speed Trend

Alcohol Cost Income Mileage PC algorithm cannot direct the income alcohol edge so we show it in red ( ). 77 TETRAD II A group of researchers at Carnegie Mellon University has written the software TETRAD II that applies the ideas discussed above. All TETRAD II requires is the correlation matrix between a set of at least three variables and N the number of

observations used to estimate these correlations. See my Texas A&M University, Department of Agricultural Economics WEB site for TETRAD II 78 An Application Money and Prices: U.S. Data 1869-1914 (A Study with Directed Graphs) 79 There are two explanations for long-term sustained movement in the nominal pricelevel in nineteenth-century U.S. data: Real Forces Rostow (1978) and Lewis (1978) argue that real, not monetary forces (changing costs are represented by

the price of wheat) explain major periods of inflation and deflation in the United States (U.S.) and the United Kingdom (U.K.) from 1797 to 1914. Monetary Forces Bordo and Schwartz (1981) assert that monetary forces were the primary cause of major price swings in both countries in the late 19th century. 80 Required Reading: Baums (1901) Wonderful Wizard of Oz. 81 Rostow and Lewis Evidence

Plots of wheat prices and the price level over nineteenth century data The long downward trend in wheat prices seems to lead the long downward trend in price level 82 Bordo and Schwartzs Evidence Regression analysis of contemporaneous prices on money and wheat prices Money is represented by the ratio of M2 to output

Their evidence supports a monetarists position 83 Besslers 1984 Footnote on the Evidence Bessler (1984) contributes the obvious to Bordo and Schwartzs story: correlation does not imply causation A lagged relation (a Granger-type causal relation) between money (t1) and prices(t) appears to support Bordo and Schwartz 84

Fast-Forward to 2000 What advances have we seen in analytical tools since the mid1980s that may shed additional light on the Bordo-Schwartz versus Rostow-Lewis debate? Directed Acyclic Graphs 85 Question Addressed Can we find a causal relation behind the contemporaneous correlations uncovered by Bordo and Schwartz? 86 Data Studied U.S. Data 1869 - 1914

National income (Y) is net national production Money Stock (M2) General Price Level (P) Implicit Price Deflator Above taken from Monetary Trends in the U.S. and U.K. (Friedman and Schwartz) Wheat Price (PW) is found in Historical Statistics of the U.S. 87 Complete Undirected Graph on Innovations of the error correction model m2 y

p pw Remove edges based on vanishing correlation or partial correlation. 88 TETRAD II uses the following input program / covariance 44 y1 y2 y3 y4 1.00 .41 1.00 .54 -.05 1.00 .30 .18 .38 1.00 This is all we need with TETRAD II to study the causal relationships among the four variables

89 TETRAD II Output TETRAD II - Version 1.2 for DOS by Peter Spirtes, Richard Scheines, Christopher Meek, and Clark Glymour Copyright (C) 1994 by Lawrence Erlbaum Associates Output file: bordo3.out Data file: four Parameters: Sample Size: 44 Continuous Data 90 TETRAD II output continued

Covariance Matrix y1 y2 y3 y4 1.0000 0.4100 1.0000 0.5400 -0.0500 1.0000 0.3000 0.1800 0.3800 Significance: Settime: 1.0000

0.2000 Unbounded ------------------------------------------------------ 91 TETRAD II output continued List of vanishing (partial) correlations that made TETRAD remove adjacencies. Corr. : Sample (Partial) Correlation Prob. : Probability that the absolute value of the sample (partial) correlation exceeds the observed value, on the assumption of zero (partial) correlation in the population, assuming a multinormal distribution. Edge (Partial)

Removed Correlation Corr. Prob. --------------------- ----y2 -- y3 rho(y2 y3) -0.0500 0.7487 y2 -- y4 rho(y2 y4 . y1) 0.0655 0.6782 y1 -- y4 rho(y1 y4 . y3) 0.1218 0.4396 92 TETRAD II output continued --------------------------------------------------

The Pattern (under the assumption of causal sufficiency): y2 y3 y3 --> y1 --> y1 --- y4 y1 y2 y3 y4 is is is is

money supply income price level wheat price 93 Correlation and P-values on removed edges y the pw -- this edge is removed as corr (y, pw) = .07; p = .68 y m2

p -- this edge is removed as the corr (y, p) =-.05; p = .75 pw -- this edge is removed as the corr (m2, pw | p) = .12; p = .44 94 Final Directed Acyclic Graph p m2 y pw Money (M2) is endogenous in contemporaneous time; a result that does not agree with Bordo and

Schwartz. What result from slide 94 allowed us to direct the edges on p, m2 and y as an inverted fork? 95 Stock Market Integration Are stock market movements from various countries independent? If the answer is no, where is information created? Look at indexes from nine countries: Australia, Japan, Kong Kong, Germany, Switzerland, France, United Kingdom, United States, Canada. First we removed any lagged relationship between the nine series. We next study their daily co-movements. 96 Stock Market Integration (standard clock day begins at 180 degrees west of Greenwich England)

Australia Japan Germany Hong Kong Switzerland France United Kingdom United States Canada

97 Stock Market Integration (the day begins at, 60o west of Greenwich, England) Australia Japan Germany Hong Kong Switzerland France United Kingdom

United States Canada 98 Implications of stock market results Canada is an information sink. Japan is an information sink in Asia. Hong Kong is the prime source of information movement from Asia to Europe. Yesterdays close in the U.S. moves Asia today through Hong Kong and Australia. 99

The Literature on Such Causal Structures has been Advanced in the Last Decade Under the Label of Artificial Intelligence Pearl , Biometrika, 1995 Pearl, Causality, Cambridge Press, 2000 Spirtes, Glymour and Scheines, Causation, Prediction and Search, Springer-Verlag, 1993 Glymour and Cooper, Computation, Causation and Discovery, MIT Press, 1999. 100

Recently Viewed Presentations

  • Top 10 POEMs 2011-2012

    Top 10 POEMs 2011-2012

    A similarly increased risk was seen in half siblings (6%; 95% CI 1.3 - 1.8). People with both a parent and sibling or half sibling with CRC had a 3.6-fold increase in cumulative. incidence (9%). Having only one second-degree relative...
  • Retrospective on Aristotle-&gt;Einstein New Topic: Statistical ...

    Retrospective on Aristotle->Einstein New Topic: Statistical ...

    Planck's hypothesis gave the right answer, but had no physical motivation. Is the phenomenon a property of the light, the atoms, or of the interaction between them? Is this an epicycle? It breaks the seamless description of motion and energy...
  • Getting to Know Albridge Steve Schreiber Albridge 1.0

    Getting to Know Albridge Steve Schreiber Albridge 1.0

    Asset allocation by investment style. Estimated cash flow report. Shows what is expected to be received by each account each month both in a visual and table format. Holdings by investor. Plain vanilla report and probably the most popular default...
  • Canada - cpb-us-e1.wpmucdn.com

    Canada - cpb-us-e1.wpmucdn.com

    Canada has 10 provinces (These are similar in their structure and purpose to our ownStates.British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, Quebec, Newfoundland and Labrador, Prince Edward Island, Nova Scotia, & New Brunswick)
  • NAEP Score Analysis - Class Size Matters

    NAEP Score Analysis - Class Size Matters

    In 4th grade math, NYC Hispanic students slipped from tied for 3rd place to tied for 7th place among other large cities. In 8th grade reading, NYC Hispanic students decreased from 2nd place to 6th place. In 8th grade math,...
  • Cost and quality measure diagram - IHI

    Cost and quality measure diagram - IHI

    Notes. It is easy to see the quality impact of initiatives that promote health and prevent direct harm to patients; more difficult to get a clean cost measure.. It is easier to have clear cost measure for interventions aimed at...
  • Day 1 - Miss Cao&#x27;s Classroom

    Day 1 - Miss Cao's Classroom

    Composite figure. A composite figure is made up of two or more different two-dimensional figures (triangles, squares, rectangles, semicircles, trapezoids…). To find the perimeter of a composite figure, find the distance around the figure.
  • El uso de los participantes semánticos en los predicados de ...

    El uso de los participantes semánticos en los predicados de ...

    Los actos de habla la taxonomía de Searle Omar Sabaj Meruane [email protected] http://omarsabaj.wordpress.com