← Back to Articles
Analysing Fraud Patterns: A Data-Driven Approach
Fraud DetectionData AnalysisSQLPythonRisk ManagementAnalytics

Analysing Fraud Patterns: A Data-Driven Approach

Analysing Fraud Patterns: A Data-Driven Approach

Fraud detection in modern fintech requires sophisticated data analysis. This article explores how to use SQL, Python, and data visualization tools to identify fraud patterns, build detection models, and create actionable insights for fraud prevention teams.

The Data-Driven Fraud Detection Framework

Effective fraud detection requires a systematic approach to data analysis:

  1. Data Collection: Gathering transaction, account, and behavioural data
  2. Feature Engineering: Creating meaningful indicators from raw data
  3. Pattern Recognition: Identifying suspicious patterns and anomalies
  4. Model Development: Building detection models
  5. Visualization: Creating dashboards and reports
  6. Action: Taking preventive or investigative action

SQL for Fraud Pattern Analysis

SQL is essential for querying transaction databases and identifying patterns. Here are key techniques:

Transaction Velocity Analysis

Identifying rapid transactions:

SELECT 
    account_id,
    COUNT(*) as transaction_count,
    SUM(amount) as total_amount,
    MIN(transaction_time) as first_transaction,
    MAX(transaction_time) as last_transaction,
    TIMESTAMPDIFF(HOUR, MIN(transaction_time), MAX(transaction_time)) as hours_span
FROM transactions
WHERE transaction_date = CURRENT_DATE
GROUP BY account_id
HAVING transaction_count > 20 
   AND hours_span < 24
ORDER BY transaction_count DESC;

Use Cases:

  • Detecting fanout patterns
  • Identifying money mule activity
  • Finding account takeover indicators

Structuring Detection

Finding transactions just below thresholds:

SELECT 
    account_id,
    COUNT(*) as transaction_count,
    AVG(amount) as avg_amount,
    MAX(amount) as max_amount
FROM transactions
WHERE amount BETWEEN 9000 AND 9999  -- Just below $10k threshold
GROUP BY account_id
HAVING transaction_count >= 5
ORDER BY transaction_count DESC;

Cross-Account Pattern Analysis

Identifying related accounts:

WITH account_relationships AS (
    SELECT 
        t1.account_id as account_a,
        t2.account_id as account_b,
        COUNT(*) as shared_transactions
    FROM transactions t1
    JOIN transactions t2 
        ON t1.counterparty = t2.counterparty
        AND t1.account_id < t2.account_id
    WHERE t1.transaction_date = t2.transaction_date
    GROUP BY t1.account_id, t2.account_id
    HAVING shared_transactions >= 3
)
SELECT * FROM account_relationships
ORDER BY shared_transactions DESC;

Python for Advanced Analysis

Python enables sophisticated fraud analysis beyond SQL capabilities:

Feature Engineering

Creating fraud indicators:

import pandas as pd
import numpy as np

def create_fraud_features(df):
    """
    Create features for fraud detection
    """
    features = df.copy()
    
    # Transaction velocity
    features['transactions_last_hour'] = features.groupby('account_id')['timestamp'].transform(
        lambda x: x.dt.hour.value_counts()
    )
    
    # Amount patterns
    features['amount_deviation'] = features.groupby('account_id')['amount'].transform(
        lambda x: (x - x.mean()) / x.std()
    )
    
    # Time-based features
    features['hour'] = features['timestamp'].dt.hour
    features['day_of_week'] = features['timestamp'].dt.dayofweek
    features['is_weekend'] = features['day_of_week'].isin([5, 6])
    
    # Geographic features
    features['location_change'] = features.groupby('account_id')['location'].transform(
        lambda x: x != x.shift(1)
    )
    
    return features

Pattern Detection with Machine Learning

Anomaly detection:

from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

def detect_anomalies(transaction_data):
    """
    Use Isolation Forest for anomaly detection
    """
    # Prepare features
    features = create_fraud_features(transaction_data)
    feature_columns = ['amount', 'transactions_last_hour', 
                      'amount_deviation', 'is_weekend']
    X = features[feature_columns].fillna(0)
    
    # Scale features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Train isolation forest
    iso_forest = IsolationForest(contamination=0.01, random_state=42)
    anomalies = iso_forest.fit_predict(X_scaled)
    
    # Add predictions
    features['anomaly_score'] = iso_forest.score_samples(X_scaled)
    features['is_anomaly'] = anomalies == -1
    
    return features[features['is_anomaly']]

Network Analysis

Identifying money mule networks:

import networkx as nx

def analyse_transaction_network(transactions):
    """
    Build and analyse transaction network
    """
    # Create graph
    G = nx.DiGraph()
    
    # Add edges (transactions)
    for _, row in transactions.iterrows():
        G.add_edge(row['from_account'], row['to_account'], 
                  weight=row['amount'], time=row['timestamp'])
    
    # Find suspicious patterns
    suspicious_accounts = []
    
    # High in-degree (receiving from many)
    in_degree = dict(G.in_degree())
    high_in_degree = [acc for acc, deg in in_degree.items() if deg > 10]
    
    # High out-degree (sending to many)
    out_degree = dict(G.out_degree())
    high_out_degree = [acc for acc, deg in out_degree.items() if deg > 10]
    
    # Centrality measures
    betweenness = nx.betweenness_centrality(G)
    high_betweenness = [acc for acc, cent in betweenness.items() 
                       if cent > 0.1]
    
    suspicious_accounts = set(high_in_degree + high_out_degree + 
                             list(high_betweenness.keys()))
    
    return suspicious_accounts, G

Data Visualization for Fraud Insights

Effective visualization helps fraud analysts quickly identify patterns:

Transaction Flow Visualization

Using network graphs:

  • Show connections between accounts
  • Highlight suspicious clusters
  • Visualise fund flows
  • Identify central nodes (potential money mules)

Time-Series Analysis

Transaction patterns over time:

  • Identify unusual spikes
  • Detect seasonal patterns
  • Compare to historical baselines
  • Highlight anomalies

Geographic Analysis

Location-based patterns:

  • Map transaction origins and destinations
  • Identify impossible travel (same-day transactions in distant locations)
  • Highlight geographic clusters
  • Detect cross-border patterns

Dashboard Creation

Key metrics to track:

  • Fraud detection rate
  • False positive rate
  • Average investigation time
  • Top fraud patterns
  • Risk score distribution

Real-World Application: Case Study

Scenario: Detecting Money Mule Network

Step 1: Data Collection

  • Gather 30 days of transaction data
  • Include account metadata
  • Collect device and location data

Step 2: SQL Analysis

-- Find accounts receiving from many sources
SELECT 
    to_account,
    COUNT(DISTINCT from_account) as unique_senders,
    SUM(amount) as total_received,
    AVG(amount) as avg_amount
FROM transactions
WHERE transaction_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
GROUP BY to_account
HAVING unique_senders > 5
ORDER BY unique_senders DESC;

Step 3: Python Analysis

# Load data
transactions = pd.read_sql(query, connection)

# Network analysis
suspicious_accounts, graph = analyse_transaction_network(transactions)

# Feature engineering
features = create_fraud_features(transactions)

# Anomaly detection
anomalies = detect_anomalies(features)

# Combine results
high_risk_accounts = set(suspicious_accounts) | set(anomalies['account_id'])

Step 4: Visualization

  • Create network graph showing connections
  • Generate time-series of transaction volumes
  • Map geographic distribution
  • Build risk score dashboard

Step 5: Action

  • Flag high-risk accounts for investigation
  • Generate suspicious activity reports
  • Implement additional monitoring
  • Update detection rules

Best Practices

Data Quality

  • Ensure data completeness
  • Handle missing values appropriately
  • Validate data accuracy
  • Maintain data lineage

Performance Optimization

  • Index database columns used in queries
  • Use appropriate data types
  • Optimise query performance
  • Cache frequently used results

Model Maintenance

  • Regularly retrain models
  • Monitor model performance
  • Update features as patterns evolve
  • A/B test new approaches

Documentation

  • Document analysis methodologies
  • Maintain query libraries
  • Create reusable functions
  • Share insights across team

Tools and Technologies

SQL Databases

  • PostgreSQL
  • MySQL
  • SQL Server
  • BigQuery

Python Libraries

  • pandas: Data manipulation
  • numpy: Numerical computing
  • scikit-learn: Machine learning
  • networkx: Network analysis
  • matplotlib/seaborn: Visualization

Visualization Tools

  • Splunk: Log analysis and visualization
  • Wazuh: Security monitoring
  • Tableau: Business intelligence
  • Custom dashboards

Conclusion

Data-driven fraud detection requires combining SQL for efficient data querying, Python for advanced analysis, and visualization tools for insights. The key is to think systematically about fraud patterns and build detection capabilities that evolve with new threats.

This approach demonstrates the importance of thinking deeper about fraud patterns, getting things done through systematic analysis, and delivering effective detection that protects customers.