Causal Discovery & Analysis
The next revolution of data science is causal analysis, a quantum leap over correlation or regression.
For ages, the scientific community has struggled to find causal effects in datasets. What they wanted was the answer to one simple question – WHY? This could be in the areas of consumer behavior, social policies or understanding the effectiveness of various medicines and treatments.
Causal is superior to current methods like correlation or regression. Correlational analysis allows you to identify relationships with some accuracy, but it cannot tell you anything about cause and effect.
Causal analysis involves causal discovery and causal inference. It allows you to understand what variables impact your target outcomes. It also lets you control those variables independently or jointly, and change the value of the target parameters. This allows you to create a clear, ROI-based action plan based on the causal structure and optimized target variable.
Backed by machine learning, our solution is more efficient, flexible and powerful than traditional techniques
Our platform is less laborous compared to confirmatory Structural Equations Modeling (SEM) or Bayesian inference.
- Analyze a large number of variables in minutes, automatically identifying the key ones causing a target variable
- Our goodness-of-fit measures confirm model accuracy, which means we can provide the most robust and actionable insights through simulation and prediction
- Allows researchers to be able to spend more time synthesizing what the data means for their clients
The output of Inguo’s causal analysis software is a directed acyclic graph (DAG).
- Shows target variable and all key drivers that are causing the target variable to various degrees
- Arrow from one node to another indicates the direction of the causal relationship
- Each pathway has a weight, representing the strength of the relationship between variables (higher numbers indicate a stronger relationship)
- By combining these strengths for a path from any node to the target node, the tool sorts variables in order of strength and displays the top ones in the Total Causal Effect list
- RMSEA (Root Mean Square Error of Approximation) represents accuracy of the model
Causal – A New Way of Thinking
Judea Pearl, acclaimed UCLA professor, first began thinking about causal theory in the late 1970’s. He discussed a new approach that was different from traditional mathematics.
He believed that we must first acknowledge that there can be up to three potential relations between two variables, not one, like traditional mathematics assumes.
For example, when there are only two variables A and B, there can be three possible relations:
- A causes B
- B causes A
- There is no causal relation between A and B.
However, when there are three variables, A, B and C, there are 3 pairs of variables (AB, BC and CA). For each pair there are 3 possible causal relationships, meaning there are 3³ = 27 different combinations to consider. As we can see, just by adding a single variable, the number of combinations increases exponentially.
If we follow the same logic with 4 variables, the number of combinations will be 729, for 6 variables – 14 million, for 8 variables – 22 trillion and for 14 variables – 2.6 with 43 zeros.
This is precisely why causal isn’t performed on a lot of quantitative studies. This combinatorial explosion is too much for a human researcher, who has to hypothesize a model based on their understanding of the industry and test each one. However, beyond a certain number of variables, even domain knowledge cannot hypothesize the causal relations reliably and accurately without human bias.
The New Method for Causal Analysis
Based on a relatively new Algorithm – Linear, Non-Gaussian, Acyclic Causal Models (LiNGAM) originally proposed by Prof. Shimizu of Shiga University, Japan
Creates a Directed Acyclic Graph (DAG) that has the highest Mixed Information Criterion (MIC) score automatically from only the observable variables
Uses A* Algorithm to reduce the number of combinations of all the different DAGs fast and accurately
Capabilities of the Inguo Software
Does not need any domain knowledge or experimental setup.
Will minimize bias in the various methodologies commonly found in correlation analysis.
Uses observable data such as individual level sales data, call center data and survey data.
Can process 200 variables (that has 5e+9494 possible causal relationships) and 20,000 records within 24 hours.
Will highlight the root cause of findings in the data.