Step 3 — Skeleton refinement and changing-module detection¶
The methodological heart of CDANs. Section 3.3 of the paper.
The key idea¶
Standard PC-style algorithms test X_i ⊥ X_j | S over many candidate
conditioning sets S drawn from (X_i, X_j)'s contemporaneous neighbors.
For an n-variable problem, S can grow up to size n - 2, which is both
expensive and statistically poor (high-dimensional CI tests need huge sample
sizes).
CDANs replaces this with a tightly-bounded conditioning set:
The lagged parents of
X_iandX_j(from Step 1), plus at mostmax_extra_condsof their contemporaneous neighbors.
The size is O(|lagged_parents|) + max_extra_conds — typically much smaller
than n for sparse graphs — and it leverages the structural information
already discovered in Step 1.
Skeleton refinement (the contemp pruning loop)¶
For each undirected pair (i, j) from the partial graph:
- Test
X_i ⊥ X_j | lagged_parents(i) ∪ lagged_parents(j). Ifp > alpha, drop the edge with empty contemporaneous witness set. - Otherwise, try adding 1, 2, …,
max_extra_condscontemporaneous neighbors as additional conditioning. The first subset that givesp > alphadrops the edge, with that subset recorded as the witness set. - If every test rejects independence, keep the edge.
The witness sets are stashed on the graph for Step 4 to use in v-structure detection.
Changing-module confirmation¶
Step 2 marked every variable as a candidate changing module. Step 3 confirms or rejects each one with a single CI test:
If the test passes (high p-value, fail to reject independence), X_i's
mechanism does not depend on the surrogate after we account for its own past;
unmark it.
This conditioning is the same trick as the skeleton refinement: the lagged
parents already explain a lot of X_i's structure, so the surrogate has to
add information beyond what the past explains in order to count as a
changing-module signal. Without this conditioning, almost any time-series
variable would look like a changing module simply because its values drift
with t.
Common failure mode: cascade from Step 1¶
The changing-module test depends on the lagged parents being correct. If
Step 1 misses an important lagged edge — particularly an
autoregressive X_i[t-1] → X_i[t] for a changing module — the conditioning
fails to remove the variable's own past variation, and the C-dependence test
can incorrectly accept independence.
Practical mitigation:
- Use a tighter
pc_alphain Step 1 (default 0.2; try 0.1). - Use KCI rather than Fisher-Z for the CI test if the time-varying mechanism is nonlinear.
- Increase the sample size — KCI's power scales noticeably with
n.
See the example in the project's six-variable demo for a worked case where this cascade played out and how it was diagnosed.
API¶
refine_skeleton ¶
refine_skeleton(
graph: TimeSeriesGraph,
data: ndarray,
*,
surrogate: ndarray | None = None,
ci_test: str | CITest = "fisherz",
alpha: float = 0.05,
max_extra_conds: int = 2,
verbose: bool = False,
) -> TimeSeriesGraph
Prune contemporaneous edges and confirm changing modules.
Two passes:
-
Contemporaneous skeleton refinement. For each undirected pair
(i, j)already in the graph, run CI tests with conditioning sets drawn from the lagged parents ofiandj, plus up tomax_extra_condsof their contemporaneous neighbors. Drop the edge if any test gives a p-value abovealpha. Witness sets are stored on the graph for use by Step 4 (v-structure detection). -
Changing-modules confirmation. For each variable
iinitially marked as changing in Step 2, testX_i ⊥ C | lagged_parents(i). If the test passes (independence not rejected),iis unmarked.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
TimeSeriesGraph
|
Graph from Step 2 (lagged edges + fully-connected contemporaneous skeleton + all variables marked changing). |
required |
data
|
ndarray
|
Time series, shape |
required |
surrogate
|
ndarray | None
|
Surrogate variable |
None
|
ci_test
|
str | CITest
|
CI test name or instance. |
'fisherz'
|
alpha
|
float
|
Significance level. |
0.05
|
max_extra_conds
|
int
|
Maximum number of contemporaneous neighbors to include in the conditioning set in addition to the lagged parents. Bigger values give stronger tests but at quadratic cost. |
2
|
verbose
|
bool
|
Print per-edge progress. |
False
|
Returns:
| Type | Description |
|---|---|
TimeSeriesGraph
|
The same graph with edges pruned and changing modules confirmed.
A |