multiDGD.functions
- multiDGD.functions.count_parameters(model)
count the number of trainable parameters in a model
- multiDGD.functions.sc_feature_selection(data, modalities, feature_selection)
takes a MuData or AnnData object and a feature selection mode and returns the data with the selected features
‘modalities’ needs to be provided if the data is an AnnData object (if the obs does not contain a column called ‘modality’)
feature_selection can be: - a list of floats, where each float is the percentage of features to keep for each modality - an integer, where it presents the variance threshold for all modalities - a float, where the same percentage of features is kept for all modalities
- multiDGD.functions.set_random_seed(seed)
Setting a random seed for reproducibility (including numpy, random, and torch)
- multiDGD.functions.setup_data(data: MuData | AnnData, modality_key: str = None, observable_key: str = None, layer: str = None, covariate_keys: List[str] = None, train_fraction: float = 0.8, include_test: bool = True, reference=None) MuData | AnnData
This function will prepare the data for the model. Input formats can be both anndata and mudata objects.
Arguments
data : anndata or mudata object modality_key: str, optional
If the object is not a mudata object, this key will be used to define the modalities of the data.
- observable_key: str, optional
Key of the ‘observable’ factor that will be used to define the number of GMM components in the prior distribution. If None, the model initialization will ask for a definition of the numberof components. The default is None.
- layerstr, optional
Layer of the data to use. If None, use X. The default is None.
- covariate_keys: list, optional
List of keys of the ‘nuisance’ factors that should be excluded from the biological representation, i.e. batch, donor, disease state, … The default is None.
- train_fraction: float, optional
Fraction of the data that will be used for training. The default is 0.8.
- include_test: bool, optional
If True, the test set will be included in the data object. The default is True. Otherwise, the split will only be train and validation. For integrating a new data set, set the train_fraction to 1.0.
Returns
Anndata or mudata object with the data prepared for the model.