Causal Inference for Business Decisions
"Correlation does not imply causation"—but business decisions require causal understanding. Here's how to move beyond correlation to actionable insights.
Why Causal Inference?
Standard analytics tells you what happened. Causal inference tells you:
- What would happen if we did X?
- What caused Y to change?
- Should we scale this intervention?
The Fundamental Problem
We can never observe the same unit under treatment and control simultaneously. We must infer the counterfactual—what would have happened without the intervention.
Key Methods
1. Difference-in-Differences
Compare treatment and control groups before and after intervention:
import pandas as pd
import statsmodels.formula.api as smf
# DiD regression model
model = smf.ols(
'outcome ~ treatment * post_period',
data=df
).fit()
# The treatment effect is the interaction coefficient
did_effect = model.params['treatment:post_period']
print(f"Treatment effect: {did_effect:.2f}")
2. Propensity Score Matching
Match treated units with similar control units:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
# Estimate propensity scores
ps_model = LogisticRegression()
ps_model.fit(X_covariates, treatment)
propensity_scores = ps_model.predict_proba(X_covariates)[:, 1]
# Match on propensity score
nn = NearestNeighbors(n_neighbors=1)
nn.fit(propensity_scores[treatment == 0].reshape(-1, 1))
matches = nn.kneighbors(
propensity_scores[treatment == 1].reshape(-1, 1)
)
3. Regression Discontinuity
Exploit sharp cutoffs in treatment assignment:
# Treatment assigned at score threshold
df['treated'] = df['score'] >= threshold
# Local linear regression around threshold
bandwidth = 5
near_threshold = (df['score'] >= threshold - bandwidth) & \
(df['score'] <= threshold + bandwidth)
model = smf.ols(
'outcome ~ treated * score_centered',
data=df[near_threshold]
).fit()
4. Instrumental Variables
Use an instrument that affects treatment but not outcome directly:
from linearmodels.iv import IV2SLS
# Two-stage least squares
model = IV2SLS.from_formula(
'outcome ~ 1 + controls + [treatment ~ instrument]',
data=df
).fit()
print(model.summary)
Practical Applications
- Marketing: True lift from campaigns
- Operations: Process change impact
- HR: Training program effectiveness
- Pricing: Price elasticity estimation
Common Pitfalls
- Confusing prediction with causation
- Ignoring selection bias
- Assuming parallel trends without testing
- Over-controlling (blocking mediators)
Causal thinking should inform any analysis that aims to guide action.