multiDGD.functions

multiDGD.functions.count_parameters(model)

count the number of trainable parameters in a model

multiDGD.functions.sc_feature_selection(data, modalities, feature_selection)

takes a MuData or AnnData object and a feature selection mode and returns the data with the selected features

‘modalities’ needs to be provided if the data is an AnnData object (if the obs does not contain a column called ‘modality’)

feature_selection can be: - a list of floats, where each float is the percentage of features to keep for each modality - an integer, where it presents the variance threshold for all modalities - a float, where the same percentage of features is kept for all modalities

multiDGD.functions.set_random_seed(seed)

Setting a random seed for reproducibility (including numpy, random, and torch)

multiDGD.functions.setup_data(data: MuData | AnnData, modality_key: str = None, observable_key: str = None, layer: str = None, covariate_keys: List[str] = None, train_fraction: float = 0.8, include_test: bool = True, reference=None) MuData | AnnData

This function will prepare the data for the model. Input formats can be both anndata and mudata objects.

Arguments

data : anndata or mudata object modality_key: str, optional

If the object is not a mudata object, this key will be used to define the modalities of the data.

observable_key: str, optional

Key of the ‘observable’ factor that will be used to define the number of GMM components in the prior distribution. If None, the model initialization will ask for a definition of the numberof components. The default is None.

layerstr, optional

Layer of the data to use. If None, use X. The default is None.

covariate_keys: list, optional

List of keys of the ‘nuisance’ factors that should be excluded from the biological representation, i.e. batch, donor, disease state, … The default is None.

train_fraction: float, optional

Fraction of the data that will be used for training. The default is 0.8.

include_test: bool, optional

If True, the test set will be included in the data object. The default is True. Otherwise, the split will only be train and validation. For integrating a new data set, set the train_fraction to 1.0.

Returns

Anndata or mudata object with the data prepared for the model.