DFE Inference

Contents

DFE Inference#

class BaseInference(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Bases: AbstractInference

Base inference class for inferring the SFS given one neutral and one selected SFS. Note that BaseInference is by default seeded.

Example usage:

import fastdfe as fd

sfs_neut = fd.Spectrum([177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652])
sfs_sel = fd.Spectrum([797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794])

# create inference object
inf = fd.BaseInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    n_runs=10,
    n_bootstraps=100,
    do_bootstrap=True
)

# run inference
inf.run()

# plot discretized DFE
inf.plot_discretized()

# plot SFS comparison
inf.plot_sfs_comparison()
__init__(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Create BaseInference instance.

Parameters:
  • sfs_neut (Spectra | Spectrum) – The neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If a Spectra object with more than one SFS is provided, the all attribute will be used.

  • sfs_sel (Spectra | Spectrum) – The selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If a Spectra object with more than one SFS is provided, the all attribute will be used.

  • intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.

  • intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.

  • intervals_h (Tuple[float, float, int]) – (start, stop, n_interval) for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values of h between the edges will be interpolated linearly.

  • h_callback (Callable[[ndarray], ndarray]) – A function mapping the scalar parameter h and the array of selection coefficients S to dominance coefficients of the same shape, allowing models where h depends on S. The default is lambda h, S: np.full_like(S, h), keeping h constant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values in intervals_h. The inferred parameter is still named h, even if transformed by h_callback, and its bounds, scales, and initial values can be set via bounds, scales, and x0. The fitness of heterozygotes and mutation homozygotes is defined as 1 + 2hs and 1 + 2s, respectively.

  • integration_mode (Literal['midpoint', 'quad']) – Integration mode when computing expected SFS under semidominance. quad is not recommended.

  • linearized (bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or use scipy.integrate.quad in each call. False not recommended.

  • model (Parametrization | str) – Instance of DFEParametrization which parametrized the DFE

  • seed (int) – Seed for the random number generator. Use None for no seed.

  • x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {'all': {param: value}}

  • bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}

  • scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form {param: scale}

  • loss_type (Literal['likelihood', 'L2']) – Type of loss function to use for optimization.

  • opts_mle (dict) – Options for the optimization.

  • method_mle (str) – Method to use for optimization. See scipy.optimize.minimize for available methods.

  • n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.

  • fixed_params (Dict[str, Dict[str, float]]) – Parameters kept constant during optimization, given as {'all': {param: value}}. By default, the ancestral misidentification rate eps is fixed to zero, h to 0.5, and the DFE parameters are fixed so that only the deleterious component of the DFE is estimated.

  • do_bootstrap (bool) – Whether to do bootstrapping.

  • n_bootstraps (int) – Number of bootstraps.

  • n_bootstrap_retries (int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.

  • pbar – Whether to show a progress bar.

  • parallelize (bool) – Whether to parallelize computations.

  • folded (bool) – Whether the SFS are folded. If not specified, the SFS will be folded if both of the given SFS appear to be folded.

  • discretization (Discretization) – Discretization instance. Mainly intended for internal use.

  • optimization (Optimization) – Optimization instance. Mainly intended for internal use.

  • locked (bool) – Whether inferences can be run from the class itself. Used internally.

  • kwargs – Additional keyword arguments which are ignored.

sfs_neut: Spectrum#

Neutral SFS

sfs_sel: Spectrum#

Selected SFS

model: Parametrization#

The DFE parametrization

folded: bool#

Whether the SFS are folded

n: int#

Sample size

fixed_params: Dict[str, Dict[str, float]]#

Fixed parameters

discretization: Discretization#

Discretization instance

theta: float#

Estimate of theta from neutral SFS

params_mle: Optional[Dict[str, float]]#

MLE estimates of the initial optimization

sfs_mle: Optional[Spectrum]#

Modelled MLE SFS

likelihood: Optional[float]#

Likelihood of the MLE, this value may be updated after bootstrapping

likelihoods: Optional[List[float]]#

Likelihoods of the different ML runs, controlled by n_runs

n_runs: int#

Number of MLE runs to perform

runs: Optional[DataFrame] = None#

Dataframe holding information on all optimization runs

result: Optional['scipy.optimize.OptimizeResult']#

Numerical optimization result

do_bootstrap: bool#

Whether to do bootstrapping

n_bootstraps: int#

Number of bootstraps

n_bootstrap_retries: int#

Number of optimization runs for each bootstrap sample

parallelize: bool#

Whether to parallelize computations

scales: Dict[str, Literal['lin', 'log', 'symlog']]#

Parameter scales

bounds: Dict[str, Tuple[float, float]]#

Parameter bounds

optimization: Optimization#

Optimization instance

x0: Dict[str, Dict[str, float]]#

Initial values

bootstraps: Optional[pd.DataFrame]#

Bootstrapped MLE parameter estimates

bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#

Bootstrap optimization results

L2_residual: Optional[float]#

L2 norm of differential between modelled and observed SFS

seed: int | None#

Random number generator seed

rng: np.random.Generator#

Random generator instance

execution_time: float#

Total execution time in seconds

locked: bool#

Whether inferences can be run from the class itself

get_fixed_param_names()[source]#

Get the names of the fixed parameters.

Return type:

List[str]

check_fixed_params_exist()[source]#

Check that the fixed parameters are valid.

raise_if_locked()[source]#

Raise an error if this object is locked.

Raises:

Exception

run_if_required(*args, **kwargs)[source]#

Run if not run yet.

Parameters:
  • args – Arguments.

  • kwargs – Keyword arguments.

Return type:

Optional[Spectrum]

Returns:

DFE parametrization and modelled SFS.

run(do_bootstrap: bool = None, pbar: bool = None, **kwargs)[source]#

Perform the DFE inference.

Parameters:
  • pbar (bool) – Whether to show a progress bar.

  • do_bootstrap (bool) – Whether to perform bootstrapping.

  • kwargs – Additional keyword arguments which are ignored.

Return type:

Spectrum

Returns:

Modelled SFS.

get_counts()[source]#

Get callback functions for modelling SFS counts from given parameters.

Return type:

dict

Returns:

Callback functions for modelling SFS counts for each type.

evaluate_likelihood(params: Dict[str, Dict[str, float]])[source]#

Evaluate likelihood function at given parameters.

Parameters:

params (Dict[str, Dict[str, float]]) – Parameters indexed by type and parameter name.

Return type:

float

Returns:

The likelihood.

get_residual(k: int)[source]#

Residual of the modelled and observed SFS.

Parameters:

k (int) – Order of the norm

Return type:

float

Returns:

L2 residual

bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = False, n_retries: int = None, pbar: bool = True, pbar_title: str = 'Bootstrapping')[source]#

Perform the parametric bootstrap.

Parameters:
  • n_samples (int) – Number of bootstrap samples. Defaults to n_bootstraps.

  • parallelize (bool) – Whether to parallelize the bootstrap. Defaults to parallelize.

  • update_likelihood (bool) – Whether to update the likelihood to be the mean of the bootstrap samples and the original likelihood. This is not statistically valid but can stabilize maximum likelihood estimate subject optimization noise.

  • n_retries (int) – Number of optimization runs for each bootstrap sample. Defaults to n_bootstrap_retries.

  • pbar (bool) – Whether to show a progress bar.

  • pbar_title (str) – Title for the progress bar.

Return type:

DataFrame

Returns:

Dataframe with bootstrap results.

get_scales_linear()[source]#

Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:

Dict[str, Literal['lin']]

Returns:

Dictionary of scales.

get_bounds_linear()[source]#

Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:

Dict[str, Tuple[float, float]]

Returns:

Dictionary of bounds.

plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', title: str = 'DFE', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot continuous DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:
  • title (str) – Plot title.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.

  • ci_level (float) – Confidence level for confidence intervals.

  • confidence_intervals (bool) – Whether to plot confidence intervals.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • scale (Literal['lin', 'log', 'symlog']) – y-scale

  • scale_density (bool) – Whether to scale the density by the x-axis interval size

  • intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)[source]#

Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.

Parameters:
  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • title (str) – Plot title.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)[source]#

Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!

Parameters:
  • color (str) – Color of the bars.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • interval_labels (List[str]) – Labels for the intervals.

  • intervals (Sequence) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#

Plot SFS comparison.

Parameters:
  • file (str) – File to save plot to.

  • labels (List[str]) – Labels for the SFS.

  • colors (List[str]) – Colors for the SFS. Only for Python visualization backend.

  • sfs_types (List[Literal['modelled', 'observed', 'selected', 'neutral']]) – Types of SFS to plot.

  • show (bool) – Whether to show plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if use_subplots is False.

  • title (str) – Plot title

  • use_subplots (bool) – Whether to use subplots. Only for Python visualization backend.

  • show_monomorphic (bool) – Whether to show monomorphic counts

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)[source]#

Plot neutral and selected SFS.

Parameters:
  • file (str) – File to save plot to.

  • labels (List[str]) – Labels for the SFS.

  • show (bool) – Whether to show plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_all(show: bool = True)[source]#

Plot a bunch of plots.

Parameters:

show (bool) – Whether to show plot.

get_stats_discretized(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get statistics for the discretized DFE.

Parameters:
  • ci_level (float) – Confidence interval level.

  • intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Returns:

Center values, errors around center, and confidence intervals.

plot_inferred_parameters(confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#

Visualize the inferred parameters and their confidence intervals.

Parameters:
  • scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.

  • title (str) – Plot title.

  • confidence_intervals (bool) – Whether to show confidence intervals.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.

  • ci_level (float) – Confidence level for the confidence intervals.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#

Visualize the inferred parameters and their confidence intervals.

Parameters:
  • title (str) – Plot title.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

Return type:

plt.Axes

Returns:

Axes object

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)[source]#

Visualize the likelihoods of the optimization runs using a scatter plot.

Parameters:
  • scale (Literal['lin', 'log']) – y-scale of the plot.

  • title (str) – Plot title.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • ylabel (str) – Label for the y-axis.

Return type:

plt.Axes

Returns:

Axes object

lrt(ll_simple: float, ll_complex: float, df: int, m: int = 0)[source]#

Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).

Parameters:
  • ll_simple (float) – Log-likelihood of the simple model.

  • ll_complex (float) – Log-likelihood of the complex model.

  • df (int) – Total number of parameters being tested.

  • m (int) – Number of parameters at the boundary.

Return type:

float

Returns:

p-value

compare_nested(complex: BaseInference)[source]#

Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters. Low p-values indicate that the more complex model provides a significantly better fit.

Parameters:

complex (BaseInference) – More complex model.

Return type:

float | None

Returns:

p-value or None if the models are not nested.

Raises:

ValueError – If the inference objects are not nested.

compare_nested_models(do_bootstrap: bool = False)[source]#

Compare the various nested versions of the specified model using likelihood ratio tests. We check for inclusion of ancestral misidentification and beneficial mutations.

Parameters:

do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

Tuple[ndarray, Dict[str, BaseInference]]

Returns:

Matrix of p-values, dict of base inference objects

plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = False)[source]#

Plot the p-values of nested likelihoods.

Parameters:
  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • remove_empty (bool) – Whether to remove empty rows and columns.

  • transpose (bool) – Whether to transpose the matrix.

  • cmap (str) – Colormap to use.

  • title (str) – Plot title.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

plt.Axes

Returns:

Axes object

plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', title: str = 'discretized DFE', ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}})#

Plot discretized DFE.

Parameters:
  • file (str) – File to save the plot to

  • show (bool) – Whether to show the plot

  • intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.

  • confidence_intervals (bool) – Whether to plot confidence intervals

  • ci_level (float) – Confidence interval level

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

  • title (str) – Title of the plot

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes

get_dfe()[source]#

Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.

Return type:

DFE

Returns:

DFE object

get_spectra()[source]#

Return a Spectra object containing the neutral, selected and modelled SFS.

Return type:

Spectra

Returns:

Spectra object with types neutral, selected, and modelled.

get_alpha(params: dict = None)[source]#

Get alpha, the proportion of beneficial non-synonymous substitutions.

Parameters:

params (dict) – DFE Parameters to use for calculation.

Return type:

float

property alpha: float#

Cache alpha, the proportion of beneficial non-synonymous substitutions.

get_bootstrap_params()[source]#

Get the parameters to be included in the bootstraps.

Return type:

Dict[str, float]

Returns:

Parameters to be included in the bootstraps.

get_bootstrap_param_names()[source]#

Get the parameters to be included in the bootstraps.

Return type:

List[str]

Returns:

Parameters to be included in the bootstraps.

get_optimized_param_names()[source]#

Get the parameters names for the parameters that are optimized.

Return type:

List[str]

Returns:

List of parameter names.

get_n_optimized()[source]#

Get the number of parameters that are optimized.

Return type:

int

Returns:

Number of parameters that are optimized.

create_config()[source]#

Create a config object from the inference object.

Return type:

Config

Returns:

Config object.

classmethod from_config(config: Config)[source]#

Load from config object.

Parameters:

config (Config) – Config object.

Return type:

Self

Returns:

Inference object.

classmethod from_config_file(file: str, cache: bool = True)[source]#

Load from config file.

Parameters:
  • file (str) – Config file path, possibly URL.

  • cache (bool) – Whether to use the cache if available.

Return type:

Self

Returns:

Inference object.

get_summary()[source]#

Get summary.

Return type:

InferenceResult

Returns:

Inference results.

get_x0()[source]#

Get initial values.

Return type:

Dict[str, Dict[str, float]]

Returns:

Initial values.

property param_names: List[str]#

Parameter names.

Returns:

List of parameter names.

to_json()[source]#

Serialize object.

Return type:

str

Returns:

JSON string

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:
  • classes – Classes to be used for unserialization.

  • file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:
  • classes – Classes to be used for unserialization

  • json (str) – JSON string

Return type:

Self

get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')#

Get discretized DFE.

Parameters:
  • intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bins.

  • confidence_intervals (bool) – Whether to return confidence intervals

  • ci_level (float) – Confidence interval level

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, Optional[ndarray]]

Returns:

Array of values and array of deviations

to_file(file: str)#

Save object to file.

Parameters:

file (str) – File to save to

class JointInference(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Bases: BaseInference

Enabling the sharing of parameters among several Inference objects.

Example usage:

import fastdfe as fd

# neutral SFS for two types
sfs_neut = fd.Spectra(dict(
    pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652],
    pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31]
))

# selected SFS for two types
sfs_sel = fd.Spectra(dict(
    pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794],
    pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41]
))

# create inference object
inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    do_bootstrap=True
)

# run inference
inf.run()
__init__(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Create instance.

Parameters:
  • sfs_neut (Spectra) – Neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.

  • sfs_sel (Spectra) – Selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.

  • include_divergence (bool) – Whether to include divergence in the likelihood

  • intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.

  • intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.

  • intervals_h (Tuple[float, float, int]) – (start, stop, n_interval) for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values of h between the edges will be interpolated linearly.

  • h_callback (Callable[[ndarray], ndarray]) – A function mapping the scalar parameter h and the array of selection coefficients S to dominance coefficients of the same shape, allowing models where h depends on S. The default is lambda h, S: np.full_like(S, h), keeping h constant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values in intervals_h. The inferred parameter is still named h, even if transformed by h_callback, and its bounds, scales, and initial values can be set via bounds, scales, and x0. The fitness of heterozygotes and mutation homozygotes is defined as 1 + 2hs and 1 + 2s, respectively.

  • integration_mode (Literal['midpoint', 'quad']) – Integration mode when computing expected SFS under semidominance. quad is not recommended.

  • linearized (bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or use scipy.integrate.quad in each call. False not recommended.

  • model (Parametrization | str) – DFE parametrization

  • seed (int) – Random seed. Use None for no seed.

  • x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {type: {param: value}}

  • bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}

  • scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form {param: scale}

  • loss_type (Literal['likelihood', 'L2']) – Loss type

  • opts_mle (dict) – Options for the optimization

  • method_mle (str) – Method to use for optimization. See scipy.optimize.minimize for available methods.

  • n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.

  • fixed_params (Dict[str, Dict[str, float]]) – Parameters kept constant during optimization, given as {type: {param: value}}. By default, the ancestral misidentification rate eps is fixed to zero, h to 0.5, and the DFE parameters are fixed so that only the deleterious component of the DFE is estimated.

  • shared_params (List[SharedParams]) – List of shared parameters

  • do_bootstrap (bool) – Whether to perform bootstrapping

  • n_bootstraps (int) – Number of bootstraps

  • n_bootstrap_retries (int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.

  • parallelize (bool) – Whether to parallelize computations

  • folded (bool) – Whether the SFS are folded. If not specified, the SFS will be folded if all of the given SFS appear to be folded.

  • discretization (Discretization) – Discretization instance. Mainly intended for internal use.

  • optimization (Optimization) – Optimization instance. Mainly intended for internal use.

  • locked (bool) – Whether inferences can be run from the class itself. Used internally.

  • kwargs – Additional keyword arguments which are ignored.

types: List[str]#

SFS types

params_mle_raw: Optional[Dict[str, Dict[str, float]]]#

original MLE parameters before adding covariates and unpacked shared

shared_params#

Shared parameters with expanded ‘all’ type

covariates: Dict[str, Covariate]#

Covariates indexed by parameter name

fixed_params: Dict[str, Dict[str, float]]#

fixed parameters with expanded ‘all’ type

discretization: Discretization#

Discretization instance

marginal_inferences: Dict[str, BaseInference]#

Dictionary of marginal inferences indexed by type

scales: Dict[str, Literal['lin', 'log', 'symlog']]#

parameter scales

bounds: Dict[str, Tuple[float, float]]#

parameter bounds

optimization: Optimization#

Joint optimization instance

joint_inferences: Dict[str, BaseInference]#

Joint inference object indexed by type

check_shared_params()[source]#

Check if the shared parameters were specified correctly.

check_no_shared_params_fixed()[source]#

Check that no shared parameters are fixed and raise an error otherwise.

get_shared_param_names()[source]#

Get the names of the shared parameters.

Return type:

List[str]

Returns:

Names of the shared parameters.

run(**kwargs)[source]#

Run inference.

Parameters:

kwargs – Additional keyword arguments

Return type:

Spectrum

Returns:

Modelled SFS.

marginals_without_all()[source]#

Get marginal inference without ‘all’ type.

Return type:

Dict[str, BaseInference]

Returns:

Dict of marginal inferences indexed by type.

joint_inference_makes_sense()[source]#

Check if joint inference makes sense.

Return type:

bool

Returns:

Whether joint inference makes sense.

get_counts()[source]#

Get callback functions for modelling SFS counts from given parameters.

Return type:

dict

Returns:

Dict of callback functions indexed by type.

run_joint_without_covariates(do_bootstrap: bool = True)[source]#

Run joint inference without covariates. Note that the result of this function is cached.

Return type:

JointInference

Returns:

Joint inference instance devoid of covariates.

create_config()[source]#

Create a config object from the inference object.

Return type:

Config

Returns:

Config object.

perform_lrt_covariates(do_bootstrap: bool = False)[source]#

Perform likelihood ratio test against joint inference without covariates. In the simple model we share parameters across types. Low p-values indicate that including covariates significantly improves the model fit.

To access the JointInference object without covariates, you can call run_joint_without_covariates(), which is cached.

Parameters:

do_bootstrap (bool) – Whether to bootstrap.

Return type:

float

Returns:

Likelihood ratio test p-value.

bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = False, n_retries: int = None, pbar: bool = True, pbar_title: str = 'Bootstrapping joint inference')[source]#

Perform the parametric bootstrap both for the marginal and joint inferences.

Parameters:
  • n_samples (int) – Number of bootstrap samples. Defaults to n_bootstraps.

  • parallelize (bool) – Whether to parallelize the bootstrap. Defaults to parallelize.

  • update_likelihood (bool) – Whether to update the likelihood to be the mean of the bootstrap samples and the original likelihood. This is not statistically valid but can stabilize maximum likelihood estimate subject optimization noise.

  • n_retries (int) – Number of optimization runs for each bootstrap sample. Defaults to n_bootstrap_retries.

  • pbar (bool) – Whether to show a progress bar.

  • pbar_title (str) – Title for the progress bar.

Return type:

Optional[DataFrame]

Returns:

DataFrame with bootstrap samples

bootstrap_if_required()[source]#

Bootstrap if not done yet.

get_x0()[source]#

Get initial values for joint inference.

Return type:

Dict[str, Dict[str, float]]

Returns:

Dictionary of initial values indexed by type

perform_lrt_shared(do_bootstrap: bool = False)[source]#

Compare likelihood of joint inference with product of marginal likelihoods. This provides information about the goodness of fit achieved by the parameter sharing. Low p-values indicate that parameter sharing is not justified, i.e., that the marginal inferences provide a significantly better fit to the data. Note that it is more difficult to properly optimize the joint likelihood, which makes this test conservative, i.e., the reported p-value may be larger than what it should be.

Parameters:

do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

float

Returns:

p-value

get_inferences(types: List[str] = None, labels: List[str] = None, show_marginals: bool = True)[source]#

Get all inference objects as dictionary.

Parameters:
  • types (List[str]) – Types to include

  • labels (List[str]) – Labels for types

  • show_marginals (bool) – Whether to also show marginal inferences

Return type:

Dict[str, BaseInference]

Returns:

Dictionary of base inference objects indexed by type and joint vs marginal subtypes

plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), show_marginals: bool = True, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'original', title: str = 'discretized DFE comparison', labels: List[str] = None, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot discretized DFE comparing the different types.

Parameters:
  • labels (List[str]) – Labels for types

  • title (str) – Title of plot

  • intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.

  • show_marginals (bool) – Whether to also show marginal inferences

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

  • ci_level (float) – Confidence level

  • confidence_intervals (bool) – Whether to plot confidence intervals

  • file (str) – File to save plot to

  • show (bool) – Whether to show plot

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], types: List[str] = None, labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#

Plot SFS comparison.

Parameters:
  • types (List[str]) – Types to plot

  • file (str) – File to save plot to

  • labels (List[str]) – Labels for types.

  • colors (List[str]) – Colors for types. Only for Python visualization backend.

  • sfs_types (List[Literal['modelled', 'observed', 'selected', 'neutral']]) – Types of SFS to plot

  • show (bool) – Whether to show plot

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if use_subplots is False.

  • title (str) – Plot title

  • use_subplots (bool) – Whether to use subplots. Only for Python visualization backend.

  • show_monomorphic (bool) – Whether to show monomorphic counts

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'DFE comparison', labels: List[str] = None, scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot discretized DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:
  • scale_density (bool) – Whether to scale the density by the x-axis interval size

  • scale (Literal['lin', 'log', 'symlog']) – y-scale

  • labels (List[str]) – Labels for types

  • title (str) – Title of plot

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • ci_level (float) – Confidence level

  • confidence_intervals (bool) – Whether to plot confidence intervals

  • file (str) – File to save plot to

  • show (bool) – Whether to show plot

  • intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters(file: str = None, confidence_intervals: bool = True, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, ax: plt.Axes = None, scale: Literal['lin', 'log', 'symlog'] = 'log', kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs: List[str])[source]#

Plot discretized DFE comparing the different types.

Parameters:
  • labels (List[str]) – Labels for types

  • title (str) – Title of plot

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • ci_level (float) – Confidence level

  • confidence_intervals (bool) – Whether to plot confidence intervals

  • file (str) – File to save plot to

  • show (bool) – Whether to show plot

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • scale (Literal['lin', 'log', 'symlog']) – y-scale

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, **kwargs: List[str])[source]#

Plot discretized DFE comparing the different types.

Parameters:
  • labels (List[str]) – Labels for types

  • title (str) – Title of plot

  • file (str) – File to save plot to

  • show (bool) – Whether to show plot

Return type:

plt.Axes

Returns:

Axes object

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

plot_covariate(index: int = 0, file: str = None, show: bool = True, title: str = None, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'original', show_types: bool = True, ci_level: float = 0.05, xlabel: str = 'cov', ylabel: str = None, ax: plt.Axes = None)[source]#

Plot the covariate given by the index.

Parameters:
  • index (int) – The index of the covariate.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • title (str) – Plot title.

  • bootstrap_type (Literal['percentile', 'bca']) – Bootstrap type.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

  • show_types (bool) – Whether to show types on second x-axis.

  • ci_level (float) – Confidence level.

  • xlabel (str) – X-axis label.

  • ylabel (str) – Y-axis label, defaults to the covariate parameter name.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object.

get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get discretized DFEs.

Parameters:
  • intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bins.

  • confidence_intervals (bool) – Whether to return confidence intervals

  • ci_level (float) – Confidence interval level

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Dict[str, Tuple[ndarray, ndarray]]

Returns:

Dictionary of array of values and array of errors indexed by inference type

get_bootstrap_params()[source]#

Get bootstrap parameters.

Return type:

Dict[str, float]

Returns:

Bootstrap parameters

to_json()[source]#

Serialize object. Note that the deserialized inference objects no longer share the same optimization instance among other things.

Return type:

str

Returns:

JSON string

property alpha: float | None#

The is no single alpha for the joint inference. Please refer to the self.joint_inferences[t].alpha.

Returns:

None

get_residual(k: int)[source]#

Residual of joint inference. We calculate the residual over the jointly inferred SFS for all types.

Parameters:

k (int) – Order of the norm

Return type:

float

Returns:

L2 residual

check_fixed_params_exist()#

Check that the fixed parameters are valid.

compare_nested(complex: BaseInference)#

Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters. Low p-values indicate that the more complex model provides a significantly better fit.

Parameters:

complex (BaseInference) – More complex model.

Return type:

float | None

Returns:

p-value or None if the models are not nested.

Raises:

ValueError – If the inference objects are not nested.

compare_nested_models(do_bootstrap: bool = False)#

Compare the various nested versions of the specified model using likelihood ratio tests. We check for inclusion of ancestral misidentification and beneficial mutations.

Parameters:

do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

Tuple[ndarray, Dict[str, BaseInference]]

Returns:

Matrix of p-values, dict of base inference objects

evaluate_likelihood(params: Dict[str, Dict[str, float]])#

Evaluate likelihood function at given parameters.

Parameters:

params (Dict[str, Dict[str, float]]) – Parameters indexed by type and parameter name.

Return type:

float

Returns:

The likelihood.

classmethod from_config(config: Config)#

Load from config object.

Parameters:

config (Config) – Config object.

Return type:

Self

Returns:

Inference object.

classmethod from_config_file(file: str, cache: bool = True)#

Load from config file.

Parameters:
  • file (str) – Config file path, possibly URL.

  • cache (bool) – Whether to use the cache if available.

Return type:

Self

Returns:

Inference object.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:
  • classes – Classes to be used for unserialization.

  • file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:
  • classes – Classes to be used for unserialization

  • json (str) – JSON string

Return type:

Self

get_alpha(params: dict = None)[source]#

Get alpha, the proportion of beneficial non-synonymous substitutions.

Parameters:

params (dict) – DFE parameters to use for calculation of format {type: {param: value}}.

Return type:

float

get_bootstrap_param_names()#

Get the parameters to be included in the bootstraps.

Return type:

List[str]

Returns:

Parameters to be included in the bootstraps.

get_bounds_linear()#

Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:

Dict[str, Tuple[float, float]]

Returns:

Dictionary of bounds.

get_dfe()#

Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.

Return type:

DFE

Returns:

DFE object

get_fixed_param_names()#

Get the names of the fixed parameters.

Return type:

List[str]

get_n_optimized()#

Get the number of parameters that are optimized.

Return type:

int

Returns:

Number of parameters that are optimized.

get_optimized_param_names()#

Get the parameters names for the parameters that are optimized.

Return type:

List[str]

Returns:

List of parameter names.

get_scales_linear()#

Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:

Dict[str, Literal['lin']]

Returns:

Dictionary of scales.

get_spectra()#

Return a Spectra object containing the neutral, selected and modelled SFS.

Return type:

Spectra

Returns:

Spectra object with types neutral, selected, and modelled.

get_stats_discretized(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')#

Get statistics for the discretized DFE.

Parameters:
  • ci_level (float) – Confidence interval level.

  • intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Returns:

Center values, errors around center, and confidence intervals.

get_summary()#

Get summary.

Return type:

InferenceResult

Returns:

Inference results.

lrt(ll_simple: float, ll_complex: float, df: int, m: int = 0)#

Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).

Parameters:
  • ll_simple (float) – Log-likelihood of the simple model.

  • ll_complex (float) – Log-likelihood of the complex model.

  • df (int) – Total number of parameters being tested.

  • m (int) – Number of parameters at the boundary.

Return type:

float

Returns:

p-value

property param_names: List[str]#

Parameter names.

Returns:

List of parameter names.

plot_all(show: bool = True)#

Plot a bunch of plots.

Parameters:

show (bool) – Whether to show plot.

plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)#

Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.

Parameters:
  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • title (str) – Plot title.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)#

Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!

Parameters:
  • color (str) – Color of the bars.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • interval_labels (List[str]) – Labels for the intervals.

  • intervals (Sequence) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)#

Visualize the likelihoods of the optimization runs using a scatter plot.

Parameters:
  • scale (Literal['lin', 'log']) – y-scale of the plot.

  • title (str) – Plot title.

  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • ylabel (str) – Label for the y-axis.

Return type:

plt.Axes

Returns:

Axes object

plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = False)#

Plot the p-values of nested likelihoods.

Parameters:
  • file (str) – File to save plot to.

  • show (bool) – Whether to show plot.

  • remove_empty (bool) – Whether to remove empty rows and columns.

  • transpose (bool) – Whether to transpose the matrix.

  • cmap (str) – Colormap to use.

  • title (str) – Plot title.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

plt.Axes

Returns:

Axes object

plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)#

Plot neutral and selected SFS.

Parameters:
  • file (str) – File to save plot to.

  • labels (List[str]) – Labels for the SFS.

  • show (bool) – Whether to show plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

raise_if_locked()#

Raise an error if this object is locked.

Raises:

Exception

run_if_required(*args, **kwargs)#

Run if not run yet.

Parameters:
  • args – Arguments.

  • kwargs – Keyword arguments.

Return type:

Optional[Spectrum]

Returns:

DFE parametrization and modelled SFS.

runs: Optional[pd.DataFrame] = None#

Dataframe holding information on all optimization runs

to_file(file: str)#

Save object to file.

Parameters:

file (str) – File to save to

sfs_neut: Spectrum#

Neutral SFS

sfs_sel: Spectrum#

Selected SFS

model: Parametrization#

The DFE parametrization

folded: bool#

Whether the SFS are folded

n: int#

Sample size

theta: float#

Estimate of theta from neutral SFS

params_mle: Optional[Dict[str, float]]#

MLE estimates of the initial optimization

sfs_mle: Optional[Spectrum]#

Modelled MLE SFS

likelihood: Optional[float]#

Likelihood of the MLE, this value may be updated after bootstrapping

likelihoods: Optional[List[float]]#

Likelihoods of the different ML runs, controlled by n_runs

n_runs: int#

Number of MLE runs to perform

result: Optional['scipy.optimize.OptimizeResult']#

Numerical optimization result

do_bootstrap: bool#

Whether to do bootstrapping

n_bootstraps: int#

Number of bootstraps

n_bootstrap_retries: int#

Number of optimization runs for each bootstrap sample

parallelize: bool#

Whether to parallelize computations

x0: Dict[str, Dict[str, float]]#

Initial values

bootstraps: Optional[pd.DataFrame]#

Bootstrapped MLE parameter estimates

bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#

Bootstrap optimization results

L2_residual: Optional[float]#

L2 norm of differential between modelled and observed SFS

seed: int | None#

Random number generator seed

rng: np.random.Generator#

Random generator instance

execution_time: float#

Total execution time in seconds

locked: bool#

Whether inferences can be run from the class itself

class Inference[source]#

Bases: object

Static utility methods for inference objects.

static plot_discretized(inferences: List[AbstractInference], intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', file: str = None, show: bool = True, title: str = 'discretized DFEs', labels: Sequence = None, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects.

Parameters:
  • inferences (List[AbstractInference]) – List of inference objects.

  • intervals (Sequence) – Intervals over (-inf, inf) to use for discretization.

  • confidence_intervals (bool) – Whether to plot confidence intervals.

  • ci_level (float) – Confidence level for confidence intervals.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

  • file (str) – Path to file to save the plot to.

  • show (bool) – Whether to show the plot.

  • title (str) – Title of the plot.

  • labels (Sequence) – Labels for the DFEs.

  • kwargs – Additional arguments for the plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes of the plot.

static plot_continuous(inferences: List[AbstractInference], intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'continuous DFEs', labels: Sequence = None, scale: Literal['lin', 'log', 'symlog'] = 'lin', scale_density: bool = False, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#

Visualize several DFEs given by the list of inference objects. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:
  • inferences (List[AbstractInference]) – List of inference objects.

  • intervals (ndarray) – Intervals to use for discretization.

  • confidence_intervals (bool) – Whether to plot confidence intervals.

  • ci_level (float) – Confidence level for confidence intervals.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.

  • file (str) – Path to file to save the plot to.

  • show (bool) – Whether to show the plot.

  • title (str) – Title of the plot.

  • labels (Sequence) – Labels for the DFEs.

  • scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.

  • scale_density (bool) – Whether to scale the density by the x-axis interval size.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • kwargs – Additional arguments for the plot.

Return type:

plt.Axes

Returns:

Axes of the plot.

static plot_inferred_parameters(inferences: List[AbstractInference], labels: Sequence, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.

Parameters:
  • inferences (List[AbstractInference]) – List of inference objects.

  • labels (Sequence) – Unique labels for the DFEs.

  • scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.

  • confidence_intervals (bool) – Whether to plot confidence intervals.

  • ci_level (float) – Confidence level for confidence intervals.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

  • file (str) – Path to file to save the plot to.

  • show (bool) – Whether to show the plot.

  • title (str) – Title of the plot.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • kwargs – Additional arguments which are ignored.

Return type:

plt.Axes

Returns:

Axes of the plot.

Raises:

ValueError – If no inference objects are given.

static plot_inferred_parameters_boxplot(inferences: List[AbstractInference], labels: Sequence, file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.

Parameters:
  • inferences (List[AbstractInference]) – List of inference objects.

  • labels (Sequence) – Unique labels for the DFEs.

  • file (str) – Path to file to save the plot to.

  • show (bool) – Whether to show the plot.

  • title (str) – Title of the plot.

  • kwargs – Additional arguments for the plot.

Return type:

plt.Axes

Returns:

Axes of the plot.

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

static get_errors_params_mle(ci_level: float, confidence_intervals: bool, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence, bootstrap_type: Literal['percentile', 'bca'], point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get errors and values for MLE params of inferences.

Parameters:
  • ci_level (float) – Confidence level for confidence intervals.

  • confidence_intervals (bool) – Whether to compute confidence intervals.

  • inferences (List[AbstractInference]) – List of inference objects.

  • labels (Sequence) – Labels for the inferences.

  • param_names (Sequence) – Names of the parameters to get errors and values for.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[Dict[str, Optional[Tuple[ndarray, ndarray]]], Dict[str, ndarray]]

Returns:

dictionary of errors and dictionary of center values indexed by labels.

static get_discretized(inferences: List[AbstractInference], labels: Sequence, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get values and errors of discretized DFE.

Parameters:
  • inferences (List[AbstractInference]) – List of inference objects.

  • labels (Sequence) – Labels for the DFEs.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use

  • ci_level (float) – Confidence interval level

  • confidence_intervals (bool) – Whether to compute confidence intervals

  • intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[Dict[str, ndarray], Dict[str, Optional[ndarray]]]

Returns:

Dictionary of values and dictionary of errors indexed by labels.

static get_stats_discretized(params: dict, bootstraps: DataFrame, model: Parametrization | str, ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Compute errors and confidence interval for a discretized DFE.

Parameters:
  • params (dict) – Parameters of the model

  • bootstraps (DataFrame) – Bootstrapped samples

  • model (Parametrization | str) – DFE parametrization

  • ci_level (float) – Confidence interval level

  • intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

  • bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

  • point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

Center values, errors around center, and confidence intervals.

static compute_histogram(model: Parametrization | str, params: dict, intervals: ndarray)[source]#

Discretize the DFE given a DFE parametrization and its parameter values.

Parameters:
  • model (Parametrization | str) – DFE parametrization

  • params (dict) – Parameters of the model

  • intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

Return type:

ndarray

Returns:

Discretized DFE

class InferenceResult(inference: BaseInference)[source]#

Bases: Serializable

Inference result.

__init__(inference: BaseInference)[source]#

Initialize object.

Parameters:

inference (BaseInference) – Inference object.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:
  • classes – Classes to be used for unserialization.

  • file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:
  • classes – Classes to be used for unserialization

  • json (str) – JSON string

Return type:

Self

to_file(file: str)#

Save object to file.

Parameters:

file (str) – File to save to

to_json()#

Serialize object.

Return type:

str

Returns:

JSON string

config: Config#

Configuration of the inference

dfe: DFE#

Inferred DFE

spectra: Spectra#

Spectra

bootstraps: DataFrame#

Bootstrap samples

runs: DataFrame#

Optimization runs

class SharedParams(params: List[str] | Literal['all'] = 'all', types: List[str] | Literal['all'] = 'all')[source]#

Bases: object

Class specifying the sharing of params among types. all means all available types or params.

Example usage:

import fastdfe as fd

# neutral SFS for two types
sfs_neut = fd.Spectra(dict(
    pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652],
    pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31]
))

# selected SFS for two types
sfs_sel = fd.Spectra(dict(
    pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794],
    pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41]
))

# create inference object
inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    shared_params=[fd.SharedParams(types=["pendula", "pubescens"], params=["eps", "S_d"])],
    do_bootstrap=True
)

# run inference
inf.run()
params: Union[List[str], Literal['all']] = 'all'#

The params to share

types: Union[List[str], Literal['all']] = 'all'#

The types to share

__init__(params: List[str] | Literal['all'] = 'all', types: List[str] | Literal['all'] = 'all')#
class Covariate(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')[source]#

Bases: object

Class defining a covariate which induces a relationship with one or many parameters. The relationship is defined by a callback function which modifies the parameters. The default callback introduces a linear relationship.

Below an example of introducing linear covariates for S_d, the for mean strength of deleterious mutations (cf. GammaExpParametrization). Each of the three types is associated with one covariate. This we pass to JointInference together with the stratified spectra:

import fastdfe as fd

cov = fd.Covariate(
    param='S_d',
    values=dict(type1=5, type2=3, type3=1)
)
param: str#

The parameter to modify

values: Dict[str, float]#

The values of the covariate for each type

callback: Optional[Callable] = None#

The callback function to modify the parameters

bounds: tuple = (0.0001, 10000.0)#

The bounds of the covariate parameter to be estimated

x0: float = 0#

The initial value of the covariate

bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog'#

The scale of the bounds. See scale_value() for details

apply(covariate: float, type: str, params: Dict[str, float])[source]#

Apply the custom or default callback to modify the given parameters.

Parameters:
  • covariate (float) – The value of the covariate.

  • type (str) – The type of the relationship.

  • params (Dict[str, float]) – The input parameters.

Return type:

Dict[str, float]

Returns:

Modified parameters.

apply_default(covariate: float, type: str, params: Dict[str, float])[source]#

Modify the given parameters introducing a linear relationship with the given covariate.

Parameters:
  • covariate (float) – The value of the covariate.

  • type (str) – The type of the relationship.

  • params (Dict[str, float]) – The input parameters.

Return type:

Dict[str, float]

Returns:

Modified parameters.

__init__(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')#