Evaluation¶

When you have ground truth — typically because you generated the data yourself with generate_synthetic_cdans — you can score the recovered graph with standard structure-recovery metrics.

Quick example¶

from cdans import CDANs, evaluate_graph
from cdans.utils import generate_synthetic_cdans

dataset = generate_synthetic_cdans(
    n_vars=4, n_samples=800, tau_max=1, n_changing=1, seed=7,
)
result = CDANs(tau_max=1, ci_test="kci").fit(dataset.data)

metrics = evaluate_graph(result.graph, dataset)
print(metrics.summary())

Output:

Structure recovery metrics
============================================================
  Lagged edges               TP=  9 FP=  0 FN=  0  P=1.00 R=1.00 F1=1.00 FDR=0.00
  Contemp skeleton           TP=  2 FP=  0 FN=  0  P=1.00 R=1.00 F1=1.00 FDR=0.00
  Contemp directed           TP=  2 FP=  0 FN=  0  P=1.00 R=1.00 F1=1.00 FDR=0.00
  Changing modules           TP=  1 FP=  0 FN=  0  P=1.00 R=1.00 F1=1.00 FDR=0.00
  ----------------------------------------------------------
  Total (directed)           TP= 12 FP=  0 FN=  0  P=1.00 R=1.00 F1=1.00 FDR=0.00
  Total (skeleton)           TP= 12 FP=  0 FN=  0  P=1.00 R=1.00 F1=1.00 FDR=0.00

  SHD (lagged):    0
  SHD (contemp):   0
  SHD (total):     0

This is a clean recovery on a small, easy DGP. For larger problems where the algorithm doesn't reach 100%, see the experiments page for realistic 6-variable benchmark numbers.

What's reported¶

The metrics are computed separately for four edge categories:

Category	Compared as
Lagged edges	Strict `(src, dst, lag)` match — lagged edges are always directed.
Contemp skeleton	Adjacency only — undirected pairs `(i, j)`. Direction ignored.
Contemp directed	Strict directed-edge match. Undirected predicted edges count as neither TP nor FP for a directed truth edge.
Changing modules	Set comparison on variable indices receiving `C → X_i`.

For each category, a GraphMetrics object exposes:

Attribute	Definition
`tp`	True positives (predicted ∩ truth)
`fp`	False positives (predicted − truth)
`fn`	False negatives (truth − predicted)
`precision`	`TP / (TP + FP)`
`recall`	`TP / (TP + FN)`
`tpr`	True positive rate (alias for recall)
`fdr`	False discovery rate `= 1 − precision`
`f1`	Harmonic mean of precision and recall

Structural Hamming Distance (SHD)¶

Two SHD numbers are reported:

shd_lagged — symmetric difference of the lagged-edge sets. Each missing or extra edge counts as 1 unit.
shd_contemp — PDAG-aware. For each unordered pair (i, j), the state is one of {no edge, i→j, j→i, undirected}. One SHD unit is added per state mismatch. A reversed direction or an undirected-vs-directed disagreement is one unit; a missing edge is also one unit.

shd_total = shd_lagged + shd_contemp. Changing-module disagreements are not folded into SHD — they're reported separately, since they're a binary attribute per variable rather than an edge.

Without a `SyntheticDataset`¶

If you have ground truth from elsewhere (e.g. a real dataset with known structure), build a TimeSeriesGraph directly:

from cdans import TimeSeriesGraph, evaluate_graph

truth = TimeSeriesGraph(n_vars=5, tau_max=2)
truth.add_lagged_edge(0, 1, lag=1)
truth.add_lagged_edge(2, 3, lag=2)
truth.orient_contemp(0, 4)
truth.mark_changing(2)

metrics = evaluate_graph(predicted_graph, truth)

Aggregate (full) TPR / FDR¶

For papers and benchmarking, you usually want a single overall TPR / FDR / F1 number per fit instead of one per category. Two aggregates are computed automatically:

metrics.total.tpr           # overall true-positive rate
metrics.total.fdr           # overall false-discovery rate
metrics.total.f1            # overall F1
metrics.total.precision     # overall precision

total pools the TP / FP / FN counts across lagged edges, contemporaneous directed edges, and changing modules — each edge or marked module counts once. This is the standard "full TPR / FDR" causal-discovery papers report.

A more lenient skeleton-level companion is also available:

metrics.total_skeleton.tpr  # treats wrong-direction contemp edges as TPs

total_skeleton uses the contemp adjacency instead of directed edges. The difference between total and total_skeleton tells you how much of the algorithm's error comes from direction mistakes versus finding the wrong adjacencies.

Just the SHD¶

If you only want a single scalar:

from cdans import shd
total = shd(predicted_graph, dataset)

For the full API reference (signatures, defaults, every attribute), see the evaluation API page.