JBON_DATA

Anomaly Detection in Business Data

Anomalies signal problems—or opportunities. Detecting unusual patterns in business data helps catch fraud, identify system issues, and surface unexpected trends before they become critical.

Types of Anomalies

  • Point anomalies: Individual outliers
  • Contextual anomalies: Normal value, wrong context
  • Collective anomalies: Patterns that break expected behavior

Statistical Methods

import numpy as np
from scipy import stats

def z_score_anomalies(data, threshold=3):
    """Detect anomalies using z-scores."""
    z_scores = np.abs(stats.zscore(data))
    return z_scores > threshold

def iqr_anomalies(data, k=1.5):
    """Detect anomalies using interquartile range."""
    q1, q3 = np.percentile(data, [25, 75])
    iqr = q3 - q1
    lower = q1 - k * iqr
    upper = q3 + k * iqr
    return (data < lower) | (data > upper)

def modified_zscore(data, threshold=3.5):
    """More robust to existing outliers."""
    median = np.median(data)
    mad = np.median(np.abs(data - median))
    modified_z = 0.6745 * (data - median) / mad
    return np.abs(modified_z) > threshold

Machine Learning Approaches

from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

# Isolation Forest
iso_forest = IsolationForest(
    contamination=0.01,  # Expected anomaly ratio
    random_state=42
)
predictions = iso_forest.fit_predict(X)
anomalies = predictions == -1

# Local Outlier Factor
lof = LocalOutlierFactor(
    n_neighbors=20,
    contamination=0.01
)
predictions = lof.fit_predict(X)
anomalies = predictions == -1

# Autoencoder for complex patterns
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense

def build_autoencoder(input_dim):
    input_layer = Input(shape=(input_dim,))
    encoded = Dense(32, activation='relu')(input_layer)
    encoded = Dense(16, activation='relu')(encoded)
    decoded = Dense(32, activation='relu')(encoded)
    decoded = Dense(input_dim, activation='linear')(decoded)
    
    autoencoder = Model(input_layer, decoded)
    autoencoder.compile(optimizer='adam', loss='mse')
    return autoencoder

# Anomalies = high reconstruction error
reconstruction_error = np.mean((X - autoencoder.predict(X))**2, axis=1)
threshold = np.percentile(reconstruction_error, 99)
anomalies = reconstruction_error > threshold

Time Series Anomalies

from prophet import Prophet

def detect_time_series_anomalies(df, interval_width=0.99):
    """Use Prophet for time series anomaly detection."""
    model = Prophet(interval_width=interval_width)
    model.fit(df)
    
    forecast = model.predict(df)
    
    # Anomalies outside confidence interval
    df['anomaly'] = (
        (df['y'] < forecast['yhat_lower']) |
        (df['y'] > forecast['yhat_upper'])
    )
    
    return df

Practical Considerations

  1. False positive cost: Too many alerts cause alert fatigue
  2. False negative cost: Missing real issues
  3. Seasonality: Account for expected patterns
  4. Drift: Retrain models as distributions change

Alerting Strategy

  • Severity levels for different anomaly types
  • Cooldown periods to prevent alert storms
  • Context-rich notifications
  • Clear escalation paths

The goal isn't to catch every anomaly—it's to surface actionable insights.

← Back to Blog