Modules
This section includes modules that are used in this project.
DataMaster Module
This module handles the masterfile and collections metadata.
- class spacebench.datamaster.DataMaster[source]
Class for managing the masterfile and collections metadata
- Parameters:
masterfile (pd.DataFrame) – A dataframe with metadata about available datasets.
collections (pd.DataFrame) – A dataframe with information about the collections where the datasets are generated from.
Examples
>>> from spacebench.datamaster import DataMaster >>> dm = DataMaster() >>> print(dm) Available datasets (total: 11): healthd_dmgrcs_mortality_disc cdcsvi_limteng_hburdic_cont climate_relhum_wfsmoke_cont climate_wfsmoke_minrty_disc healthd_hhinco_mortality_cont ... county_educatn_election_cont county_phyactiv_lifexpcy_cont county_dmgrcs_election_disc cdcsvi_nohsdp_poverty_cont cdcsvi_nohsdp_poverty_disc
- list_envs(binary: Optional[bool] = None, continuous: Optional[bool] = None) list[str] [source]
Returns a list of names of available datasets.
- Arguments
binary : bool, optional. If True, only binary datasets are returned. continuous : bool, optional. If True, only continuous datasets are returned.
- Returns
list[str]: Names of all available datasets.
Environment Module
This module handles the environment variables.
Module for defining the SpaceEnvironment class
- class spacebench.env.SpaceDataset(treatment: ndarray, covariates: ndarray, outcome: ndarray, edges: list[tuple[int, int]], treatment_values: ndarray, smoothness_of_missing: Optional[float] = None, confounding_of_missing: Optional[float] = None, counterfactuals: Optional[ndarray] = None, coordinates: Optional[ndarray] = None)[source]
Class for storing a spatial causal inference benchmark dataset.
- adjacency_matrix(sparse: bool = False) numpy.ndarray | scipy.sparse._csr.csr_matrix [source]
Returns the adjacency matrix of the graph.
- Parameters:
sparse (bool, optional (default is False)) – If True, returns a sparse matrix of type csr_matrix. If False, returns a dense matrix.
- Returns:
Adjacency matrix where entry (i, j) is 1 if there is an edge between node i and node j.
- Return type:
np.ndarray | scipy.sparse.csr_matrix
- class spacebench.env.SpaceEnv(name: str, dir: Optional[str] = None)[source]
Class for a SpaCE environment.
It holdss the data and metadata that is used to generate the datasets by masking a covariate, which becomes a missing confounder.
- api
Dataverse API object.
- Type:
DataverseAPI
- config
Dictionary with the configuration of the dataset.
- Type:
dict
- counfound_score_dict
Dictionary with the confounding scores of the covariates.
- Type:
dict
- datamaster
DataMaster object.
- Type:
- dir
Directory where the dataset is stored.
- Type:
str
- graph
Graph of the dataset.
- Type:
networkx.Graph
- metadata
Dictionary with the metadata of the dataset.
- Type:
dict
- name
Name of the dataset.
- Type:
str
- smoothness_score_dict
Dictionary with the smoothness scores of the covariates.
- Type:
dict
- synthetic_data
Synthetic data of the dataset.
- Type:
pd.DataFrame
- _check_scores(c: str, min_confounding: float, max_confounding: float, min_smoothness: float, max_smoothness: float) bool [source]
Check if given covariate’s smoothness and confounding is within the given ranges.
- Parameters:
c (str) – Covariate to check.
min_confounding (float) – Minimum confounding score.
max_confounding (float) – Maximum confounding score.
min_smoothness (float) – Minimum smoothness score.
max_smoothness (float) – Maximum smoothness score.
- Returns:
True if scores are within range, False otherwise.
- Return type:
bool
- make(missing: Optional[str] = None, min_confounding: float = 0.0, max_confounding: float = 1.0, min_smoothness: float = 0.0, max_smoothness: float = 1.0) SpaceDataset [source]
Generates a SpaceDataset by masking a covariate.
- Parameters:
missing (str, optional (Default is None)) – Name of the covariate to be masked. If no covariate is specified, a covariate is selected at random from the ones that satisfy requirements for masking in terms of smoothness and confounding.
min_confounding (float, optional (Default is 0.0)) – Minimum confounding score for the covariate to be masked.
max_confounding (float, optional (Default is 1.0)) – Maximum confounding score for the covariate to be masked.
min_smoothness (float, optional (Default is 0.0)) – Minimum smoothness score for the covariate to be masked.
max_smoothness (float, optional (Default is 1.0)) – Maximum smoothness score for the covariate to be masked.
- Returns:
A SpaceDataset.
- Return type:
- make_all(min_confounding: float = 0.0, max_confounding: float = 1.0, min_smoothness: float = 0.0, max_smoothness: float = 1.0)[source]
Generates all possible SpaceDatasets by masking all posssible covariates.
- Parameters:
min_confounding (float, optional (Default is 0.0)) – Minimum confounding score for the covariate to be masked.
max_confounding (float, optional (Default is 1.0)) – Maximum confounding score for the covariate to be masked.
min_smoothness (float, optional (Default is 0.0)) – Minimum smoothness score for the covariate to be masked.
max_smoothness (float, optional (Default is 1.0)) – Maximum smoothness score for the covariate to be masked.
- Returns:
Generator[SpaceDataset]
- Return type:
Generator of SpaceDatasets
- make_unmasked() SpaceDataset [source]
Generates a SpaceDataset with all covariates observed (no missing confounding).
- Returns:
A SpaceDataset with all covariates observed.
- Return type:
Evaluation Module
This module handles the evaluation of the models.
- class spacebench.eval.DatasetEvaluator(dataset: SpaceDataset)[source]
Class for evaluating the performance of a causal inference method in a specific SpaceDataset.
- class spacebench.eval.EnvEvaluator(env: SpaceEnv)[source]
Class for evaluating the performance of a causal inference method in a specific SpaceEnv.
- add(dataset: SpaceDataset, ate: Optional[ndarray] = None, att: Optional[ndarray] = None, counterfactuals: Optional[ndarray] = None, erf: Optional[ndarray] = None) None [source]
Add a dataset to the buffer.