Model customization

multiDGD hyperparameters

Hyperparameters for multiDGD are provided in the form of a dictionary. The package has a default set of hyperparameters, which can be overwritten by the user (by providing a dictionary to the model initialization, see below).

import multiDGD

custom_parameters = {
    # custom hyperparameters
}

# initializing the model with default and custom parameters
model = multiDGD.DGD(
    data=data,
    parameter_dictionary=custom_parameters,
)

The following table lists the hyperparameters that can be customized, as well as their default values:

Argument	Type	Default	Description
`latent_dimension`	int	`20`	Dimensionality of the latent space.
`n_components`	int	`1`	Number of components in the mixture model.
`n_hidden`	int	`2`	Number of hidden layers in the shared decoder (\(\theta_{h}\)).
`n_hidden_modality`	int	`3`	Number of hidden layers in the modality-specific decoder (\(\theta_{h_m}\)).
`n_units`	int	`100`	Number of units in the hidden layers (except the last layer, which is the maximum of \(\{100, \sqrt{\|features\|}\}\)).
`value_init`	str	`'zero'`	Initialization of the weights. Options are `'zero'` or an array of values.
`softball_scale`	float	`2`	Scale parameter of the Softball prior (see manuscript). It determines the scale of the sphere of the (mollified uniform) prior over component means.
`softball_hardness`	float	`5`	Hardness parameter of the Softball prior (see manuscript). It determines how not smooth the transition from probability 1 to 0 is.
`sd_sd`	float	`1`	Standard deviation of the Gaussian prior over the negative log covariance. It is pretty irrelevant and can just stay at 1. The mean of this prior is determined by the number of components and the softball scale.
`softball_scale_corr`	float	`2`	Same as `softball_scale` for the covariate models.
`softball_hardness_corr`	float	`5`	Same as `softball_hardness` for the covariate models.
`sd_sd_corr`	float	`1`	Same as `sd_sd` for the covariate models.
`dirichlet_a`	float	`1`	Concentration parameter of the Dirichlet prior over the mixture weights. Higher values means stronger enforcement of equal probabilities.
`batch_size`	int	`128`	Batch size for training.
`learning_rates`	list	`[1e-4, 1e-2, 1e-2]`	Learning rates for the three sets of parameters: decoder, representation, GMM.
`betas`	list	`[0.5,0.7]`	Betas for the Adam optimizer.
`weight_decay`	float	`1e-4`	Weight decay for the Adam optimizer.
`decoder_width`	int	`1`	Multiplies all hidden units by its factor (to bypass the last layer width rule).
`log_wandb`	list	`['username', 'projectname']`	List of strings to log to wandb.