Analysis Tools Documentation

Analysis tools for processing federated learning experiment results.

analysis/
├── metrics.py      # Pure functions: list[float] -> value
├── io.py           # Read CSV/JSON, unit conversion
├── single.py       # Per-experiment: runs/expN/ -> results.csv
├── multi.py        # Cross-experiment: pivot/ranking tables
├── landscape.py    # Loss surface visualization (1D/2D/3D)
├── datasets.py     # Dataset statistics visualization
├── inference.py    # Model evaluation (zero-shot = special case)
├── cli.py          # Unified entry point
└── tables/         # Output directory for multi.py

Data Flow

Training (main.py):
  server.train() -> server.csv, client_*.csv per run
  main.py -> timing.json

Per-experiment (single.py):
  runs/exp19/{0,1,2}/results/server.csv -> runs/exp19/results.csv

Cross-experiment (multi.py):
  runs/{exp1,exp2,exp3}/results.csv -> analysis/tables/*.csv

Loss landscape (landscape.py):
  runs/exp19/0/models/server_best.pt -> runs/exp19/landscape/run_0_server_{1d,2d,3d}.png

Per-Experiment Analysis (single.py)

Aggregates results across all runs of a single experiment.

Usage

# Analyze a single experiment
python -m analysis single --experiment runs/exp19

# Or run the module directly
python -m analysis.single --experiment runs/exp19

Output

Produces runs/exp19/results.csv with one row per metric:

metric,avg_min,std_min,avg_max,std_max
global_avg_test_loss,0.3406,0.0,0.3406,0.0
efficiency,5.3805,0.0,5.3805,0.0
communication,0.2842,0.0,0.2842,0.0
...

Metrics

Metric Description
global_avg_test_loss Average test loss across clients
global_avg_train_loss Average train loss across clients
efficiency Time per training round
communication Total bandwidth per round
uplink Data sent from clients to server
downlink Data sent from server to clients
last_improvement_round Last round with loss improvement
longest_improvement_streak Max consecutive improvements
most_frequent_improvement_streak Most common streak length
oscillation_count Number of loss direction changes
improvement_ratio Fraction of rounds with improvement (0-1)
improvement_magnitude Average loss reduction per improvement
time_per_experiment Wall-clock time per run (from timing.json)

Cross-Experiment Analysis (multi.py)

Compares results across multiple experiments. Reads results.csv produced by single.py.

Usage

# Compare experiments on a metric
python -m analysis multi --runs-dir runs --metric loss

# Pivot by strategy instead of model
python -m analysis multi --runs-dir runs --metric loss --pivot strategy

# Batch all metrics
python -m analysis multi --runs-dir runs --metric all

# Filter specific experiments
python -m analysis multi --runs-dir runs --metric loss --models DLinear --strategies FedAvg

# Excel-driven batch runs
python -m analysis multi --excels scripts/setting1.xlsx --metric all

Command Line Arguments

Argument Short Type Default Description
--runs-dir -r str "runs" Directory containing experiment folders
--output-dir -o str "analysis/tables" Output directory for generated tables
--std-multiplier -s float 1000 Factor to multiply standard deviation for visibility
--decimal-places -d int 3 Number of decimal places to display
--agg-mode -a choice "min" Aggregation mode: min, max, mean, last, median
--time-unit -t choice "s" Time unit: s (seconds), ms, m (minutes), h (hours)
--size-unit -z choice "mb" Size unit: b (bytes), kb, mb, gb, tb
--metric -m choice "loss" Metric to analyze (or "all" for batch processing)
--pivot -p choice "model" Pivot mode: model or strategy
--no-ranking flag False Disable ranking table generation
--higher-is-better flag False Higher metric values are better (default: lower)
--verbose -v flag False Enable debug logging
--excels -e list None One or more Excel files providing batch queries
--models list None Filter to specific models (e.g. Linear DLinear)
--strategies list None Filter to specific strategies (e.g. FedAvg FedProx)
--datasets list None Filter to specific datasets (e.g. SolarEnergy)
--output-lens list None Filter to specific output lengths (e.g. 24 48 96)
--experiments list None Process specific experiments (e.g. exp76 exp77)

Pivot Modes

Model Pivot (default): - Groups tables by model - Columns represent different strategies - Useful for comparing strategies within each model

Strategy Pivot: - Groups tables by strategy - Columns represent different models - Useful for comparing models within each strategy

Output Files

Tables are saved to analysis/tables/ organized by pivot value: - {pivot_value}_{metric}.csv: Main analysis table with mean+/-std format - {pivot_value}_{metric}_ranking.csv: Ranking table (if enabled)

Examples: - DLinear_loss.csv - Loss analysis for DLinear model - FedAvg_communication.csv - Communication cost for FedAvg strategy

Rankings

Ranking tables display performance comparisons: 1. Primary Sort: Mean metric value (lower is better by default) 2. Tiebreaker: Standard deviation (lower is better when means are equal) 3. Best Column: Shows rank 1 strategy/model for each configuration 4. Average Rank Row: Mean ranking across all configurations

Excel Batch Processing

Provide Excel files via --excels. Each row must include --project= and --name= columns. An optional script column is echoed when an experiment is missing.


Loss Landscape (landscape.py)

Visualizes the loss surface around trained model weights by perturbing along random directions.

Usage

# Best run (lowest loss) - server model
python -m analysis landscape --experiment runs/exp19

# Specific run
python -m analysis landscape --experiment runs/exp19 --run 0

# All runs
python -m analysis landscape --experiment runs/exp19 --all

# Specific client (personalized FL)
python -m analysis landscape --experiment runs/exp19 --client 0

# All clients (personalized FL)
python -m analysis landscape --experiment runs/exp19 --all-clients

Two FL Modes

Traditional FL (save_local_model=False): - One global server model - Generates: run_{idx}_server_{1d,2d,3d}.png

Personalized FL (save_local_model=True): - Per-client models - Generates: run_{idx}_client_{NNN}_{1d,2d,3d}.png - Use --client N for a specific client, --all-clients for all

Plot Types

Type Description
1D Loss along a single random direction (line plot)
2D Loss along two random directions (contour plot)
3D Loss surface along two directions (3D surface)

Model Selection

  • Default: server_best.pt from the run with lowest min loss (from results.csv)
  • --run N: use run N's server_best.pt
  • --client N: use client_N_best.pt (personalized FL)

Output

Saves to runs/expN/landscape/ (experiment-level directory, survives compact):

runs/exp19/landscape/
├── run_0_server_1d.png
├── run_0_server_2d.png
├── run_0_server_3d.png
├── run_0_client_000_1d.png   # personalized FL only
├── run_0_client_000_2d.png
├── run_0_client_000_3d.png
└── ...

Inference Evaluation (inference.py)

Evaluates trained models on datasets (same or different).

Usage

# Regular inference (same dataset as training)
python -m analysis inference --experiments exp1 --target-dataset SolarEnergy

# Zero-shot evaluation (different dataset)
python -m analysis inference --experiments exp1 --target-dataset ETDatasetHour

# Batch evaluate multiple experiments
python -m analysis inference --experiments exp1 exp2 exp3 --target-dataset ETDatasetHour

# Denormalized metrics only
python -m analysis inference --experiments exp1 --target-dataset SolarEnergy --norm-mode denorm

Arguments

Argument Short Type Default Description
--runs-dir -r str "runs" Directory containing experiment folders
--output-dir -o str "analysis/inference" Output directory for inference results
--experiments list Required Experiments to evaluate
--target-dataset str Required Target dataset for evaluation
--norm-mode choice "both" Evaluation mode: norm, denorm, or both
--verbose -v flag False Enable debug logging

Zero-Shot vs Regular

  • Regular inference: target dataset = training dataset (measures final performance)
  • Zero-shot inference: target dataset != training dataset (measures generalization)

Both use the same pipeline; the distinction is purely whether the datasets match.

Output

Saved to analysis/inference/: - {experiment}_inference_{target_dataset}.csv


Unified CLI

All analysis tools accessible via python -m analysis:

python -m analysis single --experiment runs/exp19
python -m analysis multi --runs-dir runs --metric loss
python -m analysis landscape --experiment runs/exp19
python -m analysis inference --experiments exp1 --target-dataset ETDatasetHour

Internal Modules

metrics.py

Pure functions for computing metrics from raw value lists. No I/O, fully testable.

from analysis.metrics import compute_per_run_agg, last_improvement_round, improvement_streaks

val = compute_per_run_agg([0.5, 0.4, 0.3], mode="min")  # 0.3
round = last_improvement_round([0.5, 0.4, 0.35, 0.36])  # 2

io.py

Shared I/O helpers: file reading, unit conversion, numeric parsing.

from analysis.io import read_csv, convert_time, parse_numeric_list

data = read_csv("runs/exp19/0/results/server.csv")
secs = convert_time(1500, "ms", "s")  # 1.5
vals = parse_numeric_list(["0.5", "N/A", "0.3"])  # [0.5, 0.3]