CAUSAL REPRESENTATION LEARNING

Jul 15, 20216 min read

Updated: Aug 6, 2021

Authors: Rimpa Poria, Abhinav kale

Towards Causal Representation Learning provides the way through which the artificially intelligent systems can learn causal representations and how the absence of the same in machine learning algorithms and models is giving rise to challenges in front of us.

Causal representation learning

A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

Consider an example:

Figure. 1

Let’s look at the causal relations between different elements while observing the girl on the horse trying to jump over a barrier. We can clearly observe that the girl, the horse, and the motion of their bodies are in unison. The girl is pulling the horse’s collar with her hands in order to jump over. Similarly, we as humans should think about cases, like what would happen if the horse’s legs hit the barrier? What if the collar around the horse’s neck slipped away from the girl’s hand? These are counterfactuals, and it’s natural to think this way too. We have observed things around us from childhood, learned from nature, and looked out for every other possibility associated with an event. This is the basic intuitive nature of a human being.

Causal Models

Causal models are mathematical models representing causal relationships within an individual system or population. They facilitate inferences about causal relationships from statistical data. They can teach us a good deal about the epistemology of causation, and about the relationship between causation and probability.

Causal Diagrams Include Casual Loop Diagram and Directed Acyclic Graphs:

1. Causal Loop Diagram

Causal loop diagrams (also known as system thinking diagrams) are used to display the behavior of cause and effect from a system’s standpoint. The diagram consists of a set of nodes and edges. Nodes represent the variables and edges are the links that represent a connection or a relation between the two variables.

Figure. 2

Explanation

Step 1: A new home burger delivery-focused and opens up in the neighbourhood. At first, the demand is low, but the burger’s quality is excellent, as well as the delivery times.

Step 2: After a while, the hamburger gets noticed and is featured in a local online food blog. As a result, the demand for burgers rises sharply.

Step 3: But the hamburger owners are reluctant to purchase more delivery capacity (burger delivery vehicles and personnel) along with higher burger production capacity (additional burger ovens).

Step 4: That results in higher delivery times and a larger percentage of undercooked burgers, in turn lowering the number of returning customers.

Step 5: As a result, the pressure for additional investment in both delivery and production capacity is eliminated. The hamburger owners are happy that they held off on the additional investment.

2. Directed Acyclic Graph

A directed acyclic graph is a directed graph that contains no cycles.

Step 1: Consider a gas grill, used to cook meat. We can describe the operations of

the grill using the following variables

Figure. 3

Step 2: Let 1 be the is connected and 0 for not connected and for the Gas knob, Gas

level, and Gas Flame let (0 for off, 1 for low, 2 for medium, 3 for high)

Step 3: For Igniter (1 if pressed, 0 if not) and Meat on (let 0 for no, 1 for yes) and for

the Meat-cooked (0 for raw, 1 for rare, 2 for medium, 3 for perfectly cooked)

Step 4: Thus, for example, Gas knob = 1 means that the gas knob is set to low;

Igniter = 1 means that the igniter is pressed, and so on.

Then the equations might be:

Gas level = Gas connected ×× Gas knob
Flame = Gas level ×× Igniter
Meat-cooked = Flame ×× Meat on

Step 5: For meat is cooked;

Let Gas connected be 1,

Gas knob = 3,

Igniter = 1,

Meat on = 1;

Then the equation would be:

Gas level = Gas connected × Gas knob; Flame = 1
Meat cooked = Flame × Meat on

Here the meat is cooked properly by these variables. Let us begin with a classical example of a causal system: the sprinkler. It is a system of five variable which indicate the conditions on a certain day:

Season : Indicates which season it is
Rain : Indicates whether it is raining
Sprinkler : Indicates whether our sprinkler is on
Wet : Indicates whether the group is wet
Slippery : Indicates whether the ground is slippery

We know that when it rains, the ground will become wet, however, making the ground wet doesn't cause it to rain. This is exactly the kind of direct relationship that could be described by a function. In the absence of this actual function, we are left with a set of variables and directed relationships between them. A natural way to represent this structure is a directed graph, specifically a Directed Acyclic Graph. We require the graph to be acyclic to prevent "causal loops".

We can create a causal graphical model of this system by specifying the nodes and edges of this graph: specifically a Directed Acyclic Graph. We require the graph to be acyclic to prevent "causal loops". We can create a causal graphical model of this system by specifying the nodes and edges of this graph:

from causalgraphicalmodels import CausalGraphicalModel
sprinkler = CausalGraphicalModel(
nodes=["season", "rain", "sprinkler", "wet", "slippery"],
edges=[
    ("season", "rain"),
    ("season", "sprinkler"),
    ("rain", "wet"),
    ("sprinkler", "wet"),
    ("wet", "slippery")
   ]
)
# draw return a graphviz `dot` object, which jupyter can 
rendersprinkler.draw()

Figure. 4

This is a Probabilistic Graphical Model description of the system. Describing a system in this way implies that the joint probability distribution over all variables can be factored in the following way:

P(X) = ∏iP (Xi ∣ PA (Xi) ) P(X) = ∏iP (Xi ∣ PA (Xi) )

Where PA(Xi)PA(Xi) is the set of parents of the variable XiXi, with respect to the graph. We can get the join probability distribution implied by our causal graphical model using

This factorization of the joint probability distribution implies certain conditional independence relationships between variables. For example, if we know whether or not the ground is wet, then whether or not it is slippery is independent of the season. In the language of probabilistic graphical models, two variables are conditionally independent given other variables if they are d-separated.

We not going to go into a full proof of d-separation, but to get some intuition about how it is calculated, consider the skeleton of our DAG (the graph with the same nodes and edges, but no notion of "direction"). Two variables can only be related if there are paths between them, so we can limit our attention to the paths between variables. If there is only a single edge between the variables, they cannot be conditionally independent.

For paths of three nodes, there are three possible situations, a fork, a chain, and a collider, shown below:

Figure. 5

Figure. 6

In the fork and the chain imply the same independence relationships: X1 and X3 are not independent unless we condition on X2 when they become conditionally independent. (Although I should note they imply very different causal structures: In a chain X1 has causal influence on X3, but in a fork, there is no causal influence).

Figure. 7

For the collider, x1 and x3 are independent, unless x2 or any of its descendants are in the group we condition on. This is sometimes called Berkson's Paradox. For paths longer than length 3, it turns out we can use the previous results to decide if two nodes are d-separated by examining every three structures along the paths: a path is d-separated if all sets of consecutive 3-nodes are d-separated. Consider the following path between X1 and X5:

Figure. 8

If we condition on nothing, they are d-separated because the collider (X1, X2, X3) leaves the path blocked. However, if we condition on X2 or any of its descendants the path becomes unblocked because the rest of the path is made up of forks (X3, X4, X5) and chains (X2, X3, X4). If we condition on X2 and X3 the path becomes blocked again because the chain (X2, X3, X4) is blocked. We can check this with the following code:

Figure. 9

We can read off all independence relationships implied by the graph in the sprinkler system using:

Figure. 10

At this point it is worth emphasizing that causal graphical models are non-parametric: they do not make any assumptions about the functional form of relationships between variables, only that they exist. Because of this the only testable assumption these models make are the conditional independence relationships between the variables. Unfortunately, testing conditional independence, in the general case, is impossible. Combined with the fact that there are many possible DAGs for even a reasonable number of variables, discovering causal structure from observational data alone is very difficult.

There are still some interesting approaches to identifying causal structure, but for these notes, it is best to think of the main use of causal graphical models as a way of explicitly encoding prior knowledge about the structure of a system and to use this structure combined with observational data to make predictions about the effect of causal interventions.

GitHub:

https://github.com/Abhinav2194/-Causal-Representation-Learning

References:

https://en.wikipedia.org/wiki/Causal_model
http://www.degeneratestate.org/posts/2018/Jul/10/causal-inference-with-python-part-2-causal-graphical-models/

Madras Scientific Research Foundation