Data integration

Loading a model

Warning

Currently, the model requires the data it was trained on for proper loading. We are working on removing this in the next version.

This is the recommended way to load a model:

import multiDGD
import anndata as ad

data = ad.read_h5ad("path/to/adata.h5ad")

model = multiDGD.load(
    data=data,
    save_dir="path/to/save_dir",
    model_name="name-of-saved-model",
)

Integrating new data

In general, this is how new data can be integrated into an existing model loaded as shown above. It is important to note that the features (genes, peaks) have to match those the model was trained on.

# remember that this data needs to be prepared in the same way as the training data
new_data = ad.read_h5ad("path/to/new_data.h5ad")

model.predict_new(new_data, n_epochs=20, include_correction_error=True, indices_of_new_distribution=None, external=False)

This method is described in the predict_new API section. A detailed example is given in the following notebook:

human bonemarrow example integration notebook

It also includes a demonstration of how to integrate a single modality.

If the data comes from a different dataset with different covariates, indices_of_new_distribution should be a list of indices of samples coming from new covariate classes (e.g. new batches). This currently only supports learning one new component. include_correction_error determines whether the integration should take into account the covariates during inference. When integrating all new batches, it might be beneficial to set this to False. We recommend to set n_epochs to a higher number (e.g. 100), especially for larger datasets.

Integrating a single modality / Predicting missing modalities

Note

See section above for tutorial.

In order to integrate a single modality, the same method as above can be used. Just make sure that the number and order of features of the modality to be integrated matches the one the model was trained on.

Integrating novel covariates

Note

Documentation coming soon.