Modules

This section includes modules that are used in this project.

DataMaster Module

This module handles the masterfile and collections metadata.

class spacebench.datamaster.DataMaster[source]

Class for managing the masterfile and collections metadata

Parameters:

masterfile (pd.DataFrame) – A dataframe with metadata about available datasets.
collections (pd.DataFrame) – A dataframe with information about the collections where the datasets are generated from.

Examples

>>> from spacebench.datamaster import DataMaster
>>> dm = DataMaster()
>>> print(dm)
Available datasets (total: 11):

  healthd_dmgrcs_mortality_disc
  cdcsvi_limteng_hburdic_cont
  climate_relhum_wfsmoke_cont
  climate_wfsmoke_minrty_disc
  healthd_hhinco_mortality_cont
  ...
  county_educatn_election_cont
  county_phyactiv_lifexpcy_cont
  county_dmgrcs_election_disc
  cdcsvi_nohsdp_poverty_cont
  cdcsvi_nohsdp_poverty_disc

list_envs(binary: Optional[bool] = None, continuous: Optional[bool] = None) → list[str][source]

Returns a list of names of available datasets.

Arguments: binary : bool, optional. If True, only binary datasets are returned. continuous : bool, optional. If True, only continuous datasets are returned.
Returns: list[str]: Names of all available datasets.

Environment Module

This module handles the environment variables.

Module for defining the SpaceEnvironment class

class spacebench.env.SpaceDataset(treatment: ndarray, covariates: ndarray, outcome: ndarray, edges: list[tuple[int, int]], treatment_values: ndarray, smoothness_of_missing: Optional[float] = None, confounding_of_missing: Optional[float] = None, counterfactuals: Optional[ndarray] = None, coordinates: Optional[ndarray] = None)[source]

Class for storing a spatial causal inference benchmark dataset.

adjacency_matrix(sparse: bool = False) → numpy.ndarray | scipy.sparse._csr.csr_matrix[source]

Returns the adjacency matrix of the graph.

Parameters:: sparse (bool, optional (default is False)) – If True, returns a sparse matrix of type csr_matrix. If False, returns a dense matrix.
Returns:: Adjacency matrix where entry (i, j) is 1 if there is an edge between node i and node j.
Return type:: np.ndarray | scipy.sparse.csr_matrix

erf() → ndarray[source]

Returns the exposure-response function, also known in the literature as the average dose-response function.

Returns:: np.ndarray
Return type:: The exposure-response function

has_binary_treatment() → bool[source]: Returns true if treatment is binary.

class spacebench.env.SpaceEnv(name: str, dir: Optional[str] = None)[source]

Class for a SpaCE environment.

It holdss the data and metadata that is used to generate the datasets by masking a covariate, which becomes a missing confounder.

api

Dataverse API object.

Type:: DataverseAPI

config

Dictionary with the configuration of the dataset.

Type:: dict

counfound_score_dict

Dictionary with the confounding scores of the covariates.

Type:: dict

datamaster

DataMaster object.

Type:: DataMaster

dir

Directory where the dataset is stored.

Type:: str

graph

Graph of the dataset.

Type:: networkx.Graph

metadata

Dictionary with the metadata of the dataset.

Type:: dict

name

Name of the dataset.

Type:: str

smoothness_score_dict

Dictionary with the smoothness scores of the covariates.

Type:: dict

synthetic_data

Synthetic data of the dataset.

Type:: pd.DataFrame

_check_scores(c: str, min_confounding: float, max_confounding: float, min_smoothness: float, max_smoothness: float) → bool[source]

Check if given covariate’s smoothness and confounding is within the given ranges.

Parameters:

c (str) – Covariate to check.
min_confounding (float) – Minimum confounding score.
max_confounding (float) – Maximum confounding score.
min_smoothness (float) – Minimum smoothness score.
max_smoothness (float) – Maximum smoothness score.

Returns:

True if scores are within range, False otherwise.

Return type:

bool

make(missing: Optional[str] = None, min_confounding: float = 0.0, max_confounding: float = 1.0, min_smoothness: float = 0.0, max_smoothness: float = 1.0) → SpaceDataset[source]

Generates a SpaceDataset by masking a covariate.

Parameters:

missing (str, optional (Default is None)) – Name of the covariate to be masked. If no covariate is specified, a covariate is selected at random from the ones that satisfy requirements for masking in terms of smoothness and confounding.
min_confounding (float, optional (Default is 0.0)) – Minimum confounding score for the covariate to be masked.
max_confounding (float, optional (Default is 1.0)) – Maximum confounding score for the covariate to be masked.
min_smoothness (float, optional (Default is 0.0)) – Minimum smoothness score for the covariate to be masked.
max_smoothness (float, optional (Default is 1.0)) – Maximum smoothness score for the covariate to be masked.

Returns:

A SpaceDataset.

Return type:

SpaceDataset

make_all(min_confounding: float = 0.0, max_confounding: float = 1.0, min_smoothness: float = 0.0, max_smoothness: float = 1.0)[source]

Generates all possible SpaceDatasets by masking all posssible covariates.

Parameters:

min_confounding (float, optional (Default is 0.0)) – Minimum confounding score for the covariate to be masked.
max_confounding (float, optional (Default is 1.0)) – Maximum confounding score for the covariate to be masked.
min_smoothness (float, optional (Default is 0.0)) – Minimum smoothness score for the covariate to be masked.
max_smoothness (float, optional (Default is 1.0)) – Maximum smoothness score for the covariate to be masked.

Returns:

Generator[SpaceDataset]

Return type:

Generator of SpaceDatasets

make_unmasked() → SpaceDataset[source]

Generates a SpaceDataset with all covariates observed (no missing confounding).

Returns:: A SpaceDataset with all covariates observed.
Return type:: SpaceDataset

Evaluation Module

This module handles the evaluation of the models.

class spacebench.eval.DatasetEvaluator(dataset: SpaceDataset)[source]: Class for evaluating the performance of a causal inference method in a specific SpaceDataset.

class spacebench.eval.EnvEvaluator(env: SpaceEnv)[source]

Class for evaluating the performance of a causal inference method in a specific SpaceEnv.

add(dataset: SpaceDataset, ate: Optional[ndarray] = None, att: Optional[ndarray] = None, counterfactuals: Optional[ndarray] = None, erf: Optional[ndarray] = None) → None[source]: Add a dataset to the buffer.

summarize() → dict[str, float][source]: Evaluate the error in causal prediction.