Configuration class#

class Config(polydfe_spectra_config: str = None, polydfe_init_file: str = None, polydfe_init_file_id: int = 1, sfs_neut: Spectra | Spectrum = None, sfs_sel: Spectra | Spectrum = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, **kwargs)[source]#

Bases: object

Configuration class to be used for BaseInference and JointInference.

__init__(polydfe_spectra_config: str = None, polydfe_init_file: str = None, polydfe_init_file_id: int = 1, sfs_neut: Spectra | Spectrum = None, sfs_sel: Spectra | Spectrum = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, **kwargs)[source]#

Create config object.

Parameters:
  • polydfe_spectra_config (str) – Path to polyDFE SFS config file.

  • polydfe_init_file (str) – Path to polyDFE init file.

  • polydfe_init_file_id (int) – ID of polyDFE init file.

  • sfs_neut (Spectra | Spectrum) – Neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.

  • sfs_sel (Spectra | Spectrum) – Selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.

  • intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.

  • intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.

  • intervals_h (Tuple[float, float, int]) – (start, stop, n_interval) for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values of h between the edges will be interpolated linearly.

  • h_callback (Callable[[ndarray], ndarray]) – A function mapping the scalar parameter h and the array of selection coefficients S to dominance coefficients of the same shape, allowing models where h depends on S. The default is lambda h, S: np.full_like(S, h), keeping h constant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values in intervals_h. The inferred parameter is still named h, even if transformed by h_callback, and its bounds, scales, and initial values can be set via bounds, scales, and x0. The fitness of heterozygotes and mutation homozygotes is defined as 1 + 2hs and 1 + 2s, respectively.

  • integration_mode (Literal['midpoint', 'quad']) – Integration mode when computing expected SFS under semidominance. quad is not recommended.

  • linearized (bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or use scipy.integrate.quad in each call. False not recommended.

  • model (Parametrization | str) – Parametrization of the DFE.

  • seed (int) – Seed for the random number generator. Use None for no seed.

  • x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {type: {param: value}}

  • bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}

  • scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form {param: scale}

  • loss_type (Literal['likelihood', 'L2']) – Loss function to use.

  • opts_mle (dict) – Options for the optimization.

  • method_mle (str) – Method to use for optimization. See scipy.optimize.minimize for available methods.

  • n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.

  • fixed_params (Dict[str, Dict[str, float]]) – Fixed parameters for the optimization.

  • shared_params (List[SharedParams]) – Shared parameters for the optimization.

  • covariates (List[Covariate]) – Covariates for the optimization.

  • do_bootstrap (bool) – Whether to do bootstrapping automatically.

  • n_bootstraps (int) – Number of bootstraps.

  • n_bootstrap_retries (int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.

  • parallelize (bool) – Whether to parallelize the optimization.

  • kwargs – Additional keyword arguments which are ignored.

update(**kwargs)[source]#

Update config with given data.

Parameters:

kwargs – Data to update.

Return type:

Config

Returns:

Updated config.

parse_polydfe_init_file(file: str, id: int = 1, type='all')[source]#

Parse polyDFE init file. This will define the initial parameters and which ones will be held fixed during the optimization.

Parameters:
  • type – Type of parameters to parse for.

  • id (int) – ID of the init file.

  • file (str) – Path to the init file.

create_polydfe_init_file(file: str, n: int, type: str = 'all')[source]#

Create an init file for polyDFE.

Parameters:
  • type (str) – Type to use for the init file.

  • n (int) – SFS samples size.

  • file (str) – Path to the init file to be created.

parse_polydfe_sfs_config(file: str)[source]#

Parse frequency spectra and mutational target site from polyDFE configuration file.

Parameters:

file (str) – Path to the polyDFE config file.

create_polydfe_sfs_config(file: str)[source]#

Create a sfs config file for polyDFE.

Parameters:

file (str) – Path to the sfs config file to be created.

to_dict()[source]#

Represent config as dictionary.

Return type:

dict

Returns:

Dictionary representation of config.

to_json()[source]#

Create JSON representation of object.

Return type:

str

Returns:

JSON string

to_yaml()[source]#

Create YAML representation of object.

Return type:

str

Returns:

YAML string

to_file(file: str)[source]#

Save object to file.

Parameters:

file (str) – Path to file.

static from_dict(data: dict)[source]#

Load config from dictionary.

Return type:

Config

Returns:

Config object.

static from_json(data: str)[source]#

Load config from JSON str.

Parameters:

data (str) – JSON string.

Return type:

Config

Returns:

Config object.

static from_yaml(data: str)[source]#

Load config from YAML str.

Parameters:

data (str) – YAML string.

Return type:

Config

Returns:

Config object.

classmethod from_file(file: str, cache: bool = True)[source]#

Load object from file.

Parameters:
  • file (str) – Path to file, possibly gzipped or a URL.

  • cache (bool) – Whether to use the cache if available.

Return type:

Config

Returns:

Config object.

get_polydfe_model()[source]#

Get the model name in polyDFE that corresponds to the configured DFE parametrization.

Return type:

str

Returns:

polyDFE model name.