Causal Inference for Business Decisions

2025.04.15 ANALYSIS

Causal Inference for Business Decisions

"Correlation does not imply causation"—but business decisions require causal understanding. Here's how to move beyond correlation to actionable insights.

Why Causal Inference?

Standard analytics tells you what happened. Causal inference tells you:

What would happen if we did X?
What caused Y to change?
Should we scale this intervention?

The Fundamental Problem

We can never observe the same unit under treatment and control simultaneously. We must infer the counterfactual—what would have happened without the intervention.

Key Methods

1. Difference-in-Differences

Compare treatment and control groups before and after intervention:

import pandas as pd
import statsmodels.formula.api as smf

# DiD regression model
model = smf.ols(
    'outcome ~ treatment * post_period',
    data=df
).fit()

# The treatment effect is the interaction coefficient
did_effect = model.params['treatment:post_period']
print(f"Treatment effect: {did_effect:.2f}")

2. Propensity Score Matching

Match treated units with similar control units:

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Estimate propensity scores
ps_model = LogisticRegression()
ps_model.fit(X_covariates, treatment)
propensity_scores = ps_model.predict_proba(X_covariates)[:, 1]

# Match on propensity score
nn = NearestNeighbors(n_neighbors=1)
nn.fit(propensity_scores[treatment == 0].reshape(-1, 1))
matches = nn.kneighbors(
    propensity_scores[treatment == 1].reshape(-1, 1)
)

3. Regression Discontinuity

Exploit sharp cutoffs in treatment assignment:

# Treatment assigned at score threshold
df['treated'] = df['score'] >= threshold

# Local linear regression around threshold
bandwidth = 5
near_threshold = (df['score'] >= threshold - bandwidth) & \
                 (df['score'] <= threshold + bandwidth)

model = smf.ols(
    'outcome ~ treated * score_centered',
    data=df[near_threshold]
).fit()

4. Instrumental Variables

Use an instrument that affects treatment but not outcome directly:

from linearmodels.iv import IV2SLS

# Two-stage least squares
model = IV2SLS.from_formula(
    'outcome ~ 1 + controls + [treatment ~ instrument]',
    data=df
).fit()

print(model.summary)

Practical Applications

Marketing: True lift from campaigns
Operations: Process change impact
HR: Training program effectiveness
Pricing: Price elasticity estimation

Common Pitfalls

Confusing prediction with causation
Ignoring selection bias
Assuming parallel trends without testing
Over-controlling (blocking mediators)

Causal thinking should inform any analysis that aims to guide action.