Utilities¶
Synthetic data generator¶
generate_synthetic_cdans ¶
generate_synthetic_cdans(
n_vars: int = 4,
n_samples: int = 500,
tau_max: int = 2,
n_changing: int = 2,
autocorr: float = 0.4,
contemp_strength: float = 0.5,
lagged_strength: float = 0.4,
noise_std: float = 0.5,
nonstationary_amplitude: float = 0.6,
seed: int | None = 42,
) -> SyntheticDataset
Generate a synthetic nonstationary, autocorrelated time series.
The data-generating process is, for each variable i and time t::
X_i[t] = a_ii(t) * X_i[t-1] # autoregressive term
+ sum_{(j,lag) in lagged_pa(i)} # lagged parents
b_{ij,lag}(t) * X_j[t-lag]
+ sum_{j in contemp_pa(i)} # contemporaneous parents
c_{ij}(t) * X_j[t]
+ eps_i[t]
For variables in changing_modules the coefficients a, b, c are
smoothly varying functions of t (sinusoidal). For other variables the
coefficients are constants.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_vars
|
int
|
Number of observed variables. |
4
|
n_samples
|
int
|
Length of the generated time series. |
500
|
tau_max
|
int
|
Maximum lag for the random lagged-parent structure. |
2
|
n_changing
|
int
|
Number of variables whose mechanism is nonstationary. |
2
|
autocorr
|
float
|
Magnitude of the autoregressive coefficient |
0.4
|
contemp_strength
|
float
|
Magnitude of contemporaneous coefficients. |
0.5
|
lagged_strength
|
float
|
Magnitude of lagged coefficients. |
0.4
|
noise_std
|
float
|
Standard deviation of the additive noise term. |
0.5
|
nonstationary_amplitude
|
float
|
Amplitude of coefficient drift for changing modules. The effective
coefficient at time |
0.6
|
seed
|
int | None
|
RNG seed for reproducibility. |
42
|
Returns:
| Type | Description |
|---|---|
SyntheticDataset
|
The data, ground-truth graph, and changing-module indices. |
SyntheticDataset
dataclass
¶
SyntheticDataset(
data: ndarray,
lagged_edges: set[tuple[int, int, int]],
contemporaneous_edges: set[tuple[int, int]],
changing_modules: set[int],
metadata: dict = dict(),
)
Container for a synthetic dataset and its ground-truth structure.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
ndarray
|
Observations, shape |
lagged_edges |
set[tuple[int, int, int]]
|
Set of ground-truth lagged edges. Each tuple |
contemporaneous_edges |
set[tuple[int, int]]
|
Ground-truth contemporaneous DAG. Each tuple |
changing_modules |
set[int]
|
Indices of variables whose generating mechanism is nonstationary (time-varying coefficients). |
Lagged design matrices¶
lagged_design_matrix ¶
lagged_design_matrix(
data: ndarray, tau_max: int
) -> tuple[np.ndarray, np.ndarray, list[tuple[int, int]]]
Build a lagged design matrix from a time series.
For data of shape (T, n) and tau_max = k, returns:
Yof shape(T - k, n): the "current" valuesX[t]fort = k, k+1, ..., T-1.X_laggedof shape(T - k, n * k): the lagged values, columns ordered asX_0[t-1], X_1[t-1], ..., X_{n-1}[t-1], X_0[t-2], ....column_index: list of(variable, lag)tuples describing each column ofX_lagged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Time series, shape |
required |
tau_max
|
int
|
Maximum lag to include. |
required |
Returns:
| Type | Description |
|---|---|
tuple
|
|
column_for ¶
Return the column index in a lagged design matrix for (var, lag).