DFE Inference

DFE Inference#

Classes for inferring the distribution of fitness effects (DFE) from site-frequency spectra, spanning single-type inference, joint inference across multiple types, and the shared-parameter and covariate machinery that links them.

Classes:

`BaseInference`	Base inference class for inferring the SFS given one neutral and one selected SFS.
`JointInference`	Enabling the sharing of parameters among several Inference objects.
`Inference`	Static utility methods for inference objects.
`InferenceResult`	Inference result.
`SharedParams`	Class specifying the sharing of params among types.
`Covariate`	Class defining a covariate which induces a relationship with one or many parameters.

class BaseInference(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, include_divergence: bool = True, n_sites_div_neut: float = None, n_sites_div_sel: float = None, **kwargs)[source]#

Bases: AbstractInference

Base inference class for inferring the SFS given one neutral and one selected SFS. Note that BaseInference is by default seeded.

Example usage:

import fastdfe as fd

sfs_neut = fd.Spectrum([177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652])
sfs_sel = fd.Spectrum([797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794])

# create inference object
inf = fd.BaseInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    n_runs=10,
    n_bootstraps=100,
    do_bootstrap=True
)

# run inference
inf.run()

# plot discretized DFE
inf.plot_discretized()

# plot SFS comparison
inf.plot_sfs_comparison()

Methods:

`__init__`	Create BaseInference instance.
`get_fixed_param_names`	Get the names of the fixed parameters.
`check_fixed_params_exist`	Check that the fixed parameters are valid.
`raise_if_locked`	Raise an error if this object is locked.
`run_if_required`	Run if not run yet.
`run`	Perform the DFE inference.
`get_counts`	Get callback functions for modelling SFS counts from given parameters.
`evaluate_likelihood`	Evaluate likelihood function at given parameters.
`get_residual`	Residual of the modelled and observed SFS.
`bootstrap`	Perform the parametric bootstrap.
`get_scales_linear`	Get linear scales for all parameters.
`get_bounds_linear`	Get linear bounds for all parameters.
`plot_continuous`	Plot continuous DFE.
`plot_bucket_sizes`	Plot mass in each bucket for the MLE DFE.
`plot_interval_density`	Plot density of the discretization intervals chosen.
`plot_sfs_comparison`	Plot SFS comparison.
`plot_observed_sfs`	Plot neutral and selected SFS.
`plot_all`	Plot a bunch of plots.
`get_stats_discretized`	Get statistics for the discretized DFE.
`plot_inferred_parameters`	Visualize the inferred parameters and their confidence intervals.
`plot_inferred_parameters_boxplot`	Visualize the inferred parameters and their confidence intervals.
`plot_likelihoods`	Visualize the likelihoods of the optimization runs using a scatter plot.
`lrt`	Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).
`compare_nested`	Perform likelihood ratio test with given more complex model.
`compare_nested_models`	Compare the various nested versions of the specified model using likelihood ratio tests.
`plot_nested_models`	Plot the p-values of nested likelihoods.
`plot_discretized`	Plot discretized DFE.
`get_dfe`	Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.
`get_spectra`	Return a Spectra object containing the neutral, selected and modelled SFS.
`get_alpha`	Get alpha, the proportion of beneficial non-synonymous substitutions.
`get_omega`	Get omega, the rate of non-synonymous over synonymous substitutions (dN/dS).
`get_omega_a`	Get omega_a, the rate of adaptive non-synonymous over synonymous substitutions (adaptive dN/dS), which equals `alpha * omega`.
`get_bootstrap_params`	Get the parameters to be included in the bootstraps.
`get_bootstrap_param_names`	Get the parameters to be included in the bootstraps.
`get_optimized_param_names`	Get the parameters names for the parameters that are optimized.
`get_n_optimized`	Get the number of parameters that are optimized.
`create_config`	Create a config object from the inference object.
`from_config`	Load from config object.
`from_config_file`	Load from config file.
`get_summary`	Get summary.
`get_x0`	Get initial values.
`from_file`	Load object from file.
`from_json`	Unserialize object.
`get_discretized`	Get discretized DFE.
`to_file`	Save object to file.
`to_json`	Serialize object.

__init__(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, include_divergence: bool = True, n_sites_div_neut: float = None, n_sites_div_sel: float = None, **kwargs)[source]#

Create BaseInference instance.

Parameters:

sfs_neut (Spectra | Spectrum) – The neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If a Spectra object with more than one SFS is provided, the all attribute will be used.
sfs_sel (Spectra | Spectrum) – The selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If a Spectra object with more than one SFS is provided, the all attribute will be used.
intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.
intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.
intervals_h (Tuple[float, float, int]) – (start, stop, n_interval) for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values of h between the edges will be interpolated linearly.
h_callback (Callable[[ndarray], ndarray]) – A function mapping the scalar parameter h and the array of selection coefficients S to dominance coefficients of the same shape, allowing models where h depends on S. The default is lambda h, S: np.full_like(S, h), keeping h constant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values in intervals_h. The inferred parameter is still named h, even if transformed by h_callback, and its bounds, scales, and initial values can be set via bounds, scales, and x0. The fitness of heterozygotes and mutation homozygotes is defined as 1 + 2hs and 1 + 2s, respectively.
integration_mode (Literal['midpoint', 'quad']) – Integration mode when computing expected SFS under semidominance. quad is not recommended.
linearized (bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or use scipy.integrate.quad in each call. False not recommended.
model (Parametrization | str) – Instance of DFEParametrization which parametrized the DFE
seed (int) – Seed for the random number generator. Use None for no seed.
x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {'all': {param: value}}
bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}
scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form {param: scale}
loss_type (Literal['likelihood', 'L2']) – Type of loss function to use for optimization.
opts_mle (dict) – Options for the optimization.
method_mle (str) – Method to use for optimization. See scipy.optimize.minimize for available methods.
n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.
fixed_params (Dict[str, Dict[str, float]]) – Parameters kept constant during optimization, given as {'all': {param: value}}. By default, the ancestral misidentification rate eps is fixed to zero, h to 0.5, and the DFE parameters are fixed so that only the deleterious component of the DFE is estimated.
do_bootstrap (bool) – Whether to do bootstrapping.
n_bootstraps (int) – Number of bootstraps.
n_bootstrap_retries (int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.
pbar – Whether to show a progress bar.
parallelize (bool) – Whether to parallelize computations.
folded (bool) – Whether the SFS are folded. If not specified, the SFS will be folded if both of the given SFS appear to be folded.
discretization (Discretization) – Discretization instance. Mainly intended for internal use.
optimization (Optimization) – Optimization instance. Mainly intended for internal use.
locked (bool) – Whether inferences can be run from the class itself. Used internally.
include_divergence (bool) – Whether to include divergence counts (fixed differences to an outgroup) in the likelihood, like polydfe. This only takes effect when both spectra carry a divergence target size, i.e. when n_sites_div_neut and n_sites_div_sel are given; otherwise inference is based on polymorphism alone. The last entry of each SFS is the divergence count. When active, the divergence scale (the unknown outgroup branch length) is derived in closed form from the neutral divergence, and \(\alpha\) can be estimated from the observed divergence (see get_alpha()). Defaults to True. Note that, unlike polydfe, we do not model misattributed polymorphism (polymorphic sites that appear fixed in a finite sample); we tried this but it did not improve agreement with polydfe or recovery of the true \(\alpha\) in SLiM simulations.
n_sites_div_neut (float) – Number of mutational target sites over which neutral divergence was counted. polyDFE allows this to differ from the polymorphism target size. Specifying it (together with n_sites_div_sel) opts the neutral divergence into the likelihood.
n_sites_div_sel (float) – Number of mutational target sites over which selected divergence was counted. Specifying it (together with n_sites_div_neut) opts the selected divergence into the likelihood.
kwargs – Additional keyword arguments which are ignored.

sfs_neut: Spectrum#: Neutral SFS

sfs_sel: Spectrum#: Selected SFS

n_sites_div_neut: Optional[float] = None#: Number of mutational target sites over which neutral divergence was counted. Owned by the inference (not the spectra); specifying both this and n_sites_div_sel opts divergence into the likelihood.

n_sites_div_sel: Optional[float] = None#: Number of mutational target sites over which selected divergence was counted.

include_divergence: bool = True#: Whether divergence counts were requested

model: Parametrization#: The DFE parametrization

folded: bool#: Whether the SFS are folded

n: int#: Sample size

fixed_params: Dict[str, Dict[str, float]]#: Fixed parameters

discretization: Discretization#: Discretization instance

theta: float#: Estimate of theta from neutral SFS

params_mle: Optional[Dict[str, float]]#: MLE estimates of the initial optimization

sfs_mle: Optional[Spectrum]#: Modelled MLE SFS

likelihood: Optional[float]#: Likelihood of the MLE, this value may be updated after bootstrapping

likelihoods: Optional[List[float]]#: Likelihoods of the different ML runs, controlled by n_runs

n_runs: int#: Number of MLE runs to perform

runs: Optional[DataFrame] = None#: Dataframe holding information on all optimization runs

result: Optional['scipy.optimize.OptimizeResult']#: Numerical optimization result

do_bootstrap: bool#: Whether to do bootstrapping

n_bootstraps: int#: Number of bootstraps

n_bootstrap_retries: int#: Number of optimization runs for each bootstrap sample

parallelize: bool#: Whether to parallelize computations

scales: Dict[str, Literal['lin', 'log', 'symlog']]#: Parameter scales

bounds: Dict[str, Tuple[float, float]]#: Parameter bounds

optimization: Optimization#: Optimization instance

x0: Dict[str, Dict[str, float]]#: Initial values

bootstraps: Optional[pd.DataFrame]#: Bootstrapped MLE parameter estimates

bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#: Bootstrap optimization results

L2_residual: Optional[float]#: L2 norm of differential between modelled and observed SFS

seed: int | None#: Random number generator seed

rng: np.random.Generator#: Random generator instance

execution_time: float#: Total execution time in seconds

locked: bool#: Whether inferences can be run from the class itself

get_fixed_param_names()[source]#

Get the names of the fixed parameters.

Return type:: List[str]

check_fixed_params_exist()[source]#: Check that the fixed parameters are valid.

raise_if_locked()[source]#

Raise an error if this object is locked.

Raises:: Exception –

run_if_required(*args, **kwargs)[source]#

Run if not run yet.

Parameters:

args – Arguments.
kwargs – Keyword arguments.

Return type:

Optional[Spectrum]

Returns:

DFE parametrization and modelled SFS.

run(do_bootstrap: bool = None, pbar: bool = None, **kwargs)[source]#

Perform the DFE inference.

Parameters:

pbar (bool) – Whether to show a progress bar.
do_bootstrap (bool) – Whether to perform bootstrapping.
kwargs – Additional keyword arguments which are ignored.

Return type:

Spectrum

Returns:

Modelled SFS.

get_counts()[source]#

Get callback functions for modelling SFS counts from given parameters.

Return type:: dict
Returns:: Callback functions for modelling SFS counts for each type.

evaluate_likelihood(params: Dict[str, Dict[str, float]])[source]#

Evaluate likelihood function at given parameters.

Parameters:: params (Dict[str, Dict[str, float]]) – Parameters indexed by type and parameter name.
Return type:: float
Returns:: The likelihood.

get_residual(k: int)[source]#

Residual of the modelled and observed SFS.

Parameters:: k (int) – Order of the norm
Return type:: float
Returns:: L2 residual

bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = False, n_retries: int = None, pbar: bool = True, pbar_title: str = 'Bootstrapping')[source]#

Perform the parametric bootstrap.

Parameters:

n_samples (int) – Number of bootstrap samples. Defaults to n_bootstraps.
parallelize (bool) – Whether to parallelize the bootstrap. Defaults to parallelize.
update_likelihood (bool) – Whether to update the likelihood to be the mean of the bootstrap samples and the original likelihood. This is not statistically valid but can stabilize maximum likelihood estimate subject optimization noise.
n_retries (int) – Number of optimization runs for each bootstrap sample. Defaults to n_bootstrap_retries.
pbar (bool) – Whether to show a progress bar.
pbar_title (str) – Title for the progress bar.

Return type:

DataFrame

Returns:

Dataframe with bootstrap results.

get_scales_linear()[source]#

Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Literal['lin']]
Returns:: Dictionary of scales.

get_bounds_linear()[source]#

Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Tuple[float, float]]
Returns:: Dictionary of bounds.

plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', title: str = 'DFE', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot continuous DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:

title (str) – Plot title.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
confidence_intervals (bool) – Whether to plot confidence intervals.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
scale (Literal['lin', 'log', 'symlog']) – y-scale
scale_density (bool) – Whether to scale the density by the x-axis interval size
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)[source]#

Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)[source]#

Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!

Parameters:

color (str) – Color of the bars.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
interval_labels (List[str]) – Labels for the intervals.
intervals (Sequence) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#

Plot SFS comparison.

Parameters:

file (str) – File to save plot to.
labels (List[str]) – Labels for the SFS.
colors (List[str]) – Colors for the SFS. Only for Python visualization backend.
sfs_types (List[Literal['modelled', 'observed', 'selected', 'neutral']]) – Types of SFS to plot.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if use_subplots is False.
title (str) – Plot title
use_subplots (bool) – Whether to use subplots. Only for Python visualization backend.
show_monomorphic (bool) – Whether to show monomorphic counts
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)[source]#

Plot neutral and selected SFS.

Parameters:

file (str) – File to save plot to.
labels (List[str]) – Labels for the SFS.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_all(show: bool = True)[source]#

Plot a bunch of plots.

Parameters:: show (bool) – Whether to show plot.

get_stats_discretized(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get statistics for the discretized DFE.

Parameters:

ci_level (float) – Confidence interval level.
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Returns:

Center values, errors around center, and confidence intervals.

plot_inferred_parameters(confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#

Visualize the inferred parameters and their confidence intervals.

Parameters:

scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.
title (str) – Plot title.
confidence_intervals (bool) – Whether to show confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.
ci_level (float) – Confidence level for the confidence intervals.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#

Visualize the inferred parameters and their confidence intervals.

Parameters:

title (str) – Plot title.
file (str) – File to save plot to.
show (bool) – Whether to show plot.

Return type:

plt.Axes

Returns:

Axes object

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)[source]#

Visualize the likelihoods of the optimization runs using a scatter plot.

Parameters:

scale (Literal['lin', 'log']) – y-scale of the plot.
title (str) – Plot title.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (str) – Label for the y-axis.

Return type:

plt.Axes

Returns:

Axes object

lrt(ll_simple: float, ll_complex: float, df: int, m: int = 0)[source]#

Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).

Parameters:

ll_simple (float) – Log-likelihood of the simple model.
ll_complex (float) – Log-likelihood of the complex model.
df (int) – Total number of parameters being tested.
m (int) – Number of parameters at the boundary.

Return type:

float

Returns:

p-value

compare_nested(complex: BaseInference)[source]#

Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters. Low p-values indicate that the more complex model provides a significantly better fit.

Parameters:: complex (BaseInference) – More complex model.
Return type:: float | None
Returns:: p-value or None if the models are not nested.
Raises:: ValueError – If the inference objects are not nested.

compare_nested_models(do_bootstrap: bool = False)[source]#

Compare the various nested versions of the specified model using likelihood ratio tests. We check for inclusion of ancestral misidentification and beneficial mutations.

Parameters:: do_bootstrap (bool) – Whether to perform bootstrapping.
Return type:: Tuple[ndarray, Dict[str, BaseInference]]
Returns:: Matrix of p-values, dict of base inference objects

plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = False)[source]#

Plot the p-values of nested likelihoods.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
remove_empty (bool) – Whether to remove empty rows and columns.
transpose (bool) – Whether to transpose the matrix.
cmap (str) – Colormap to use.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

plt.Axes

Returns:

Axes object

plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', title: str = 'discretized DFE', ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}})#

Plot discretized DFE.

Parameters:

file (str) – File to save the plot to
show (bool) – Whether to show the plot
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.
confidence_intervals (bool) – Whether to plot confidence intervals
ci_level (float) – Confidence interval level
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
title (str) – Title of the plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes

get_dfe()[source]#

Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.

Return type:: DFE
Returns:: DFE object

get_spectra()[source]#

Return a Spectra object containing the neutral, selected and modelled SFS.

Return type:: Spectra
Returns:: Spectra object with types neutral, selected, and modelled.

get_alpha(params: dict = None, use_divergence: bool = None)[source]#

Get alpha, the proportion of beneficial non-synonymous substitutions.

Parameters:

params (dict) – DFE Parameters to use for calculation.
use_divergence (bool) – Whether to estimate alpha from observed divergence counts using polyDFE’s McDonald-Kreitman style estimator (see get_alpha_divergence()), or from the DFE alone (see get_alpha()). When None (the default), this follows the inference mode (whether divergence counts are used in the likelihood). Passing False explicitly always yields the polymorphism-only estimate, which is useful for comparison.

Return type:

float

get_omega(params: dict = None, use_divergence: bool = None)[source]#

Get omega, the rate of non-synonymous over synonymous substitutions (dN/dS).

Parameters:

params (dict) – DFE Parameters to use for calculation.
use_divergence (bool) – Whether to estimate omega from observed divergence counts using polyDFE’s McDonald-Kreitman style estimator (see get_omega_divergence()), or from the DFE alone (see get_omega()). When None (the default), this follows the inference mode (whether divergence counts are used in the likelihood). Passing False explicitly always yields the polymorphism-only estimate, which is useful for comparison.

Return type:

float

get_omega_a(params: dict = None, use_divergence: bool = None)[source]#

Get omega_a, the rate of adaptive non-synonymous over synonymous substitutions (adaptive dN/dS), which equals alpha * omega.

Parameters:

params (dict) – DFE Parameters to use for calculation.
use_divergence (bool) – Whether to estimate omega_a from observed divergence counts using polyDFE’s McDonald-Kreitman style estimator (see get_omega_a_divergence()), or from the DFE alone (see get_omega_a()). When None (the default), this follows the inference mode (whether divergence counts are used in the likelihood). Passing False explicitly always yields the polymorphism-only estimate, which is useful for comparison.

Return type:

float

property alpha: float#: Cache alpha, the proportion of beneficial non-synonymous substitutions.

property omega: float#: Cache omega, the rate of non-synonymous over synonymous substitutions (dN/dS).

property omega_a: float#: Cache omega_a, the rate of adaptive non-synonymous over synonymous substitutions (adaptive dN/dS).

get_bootstrap_params()[source]#

Get the parameters to be included in the bootstraps.

Return type:: Dict[str, float]
Returns:: Parameters to be included in the bootstraps.

get_bootstrap_param_names()[source]#

Get the parameters to be included in the bootstraps.

Return type:: List[str]
Returns:: Parameters to be included in the bootstraps.

get_optimized_param_names()[source]#

Get the parameters names for the parameters that are optimized.

Return type:: List[str]
Returns:: List of parameter names.

get_n_optimized()[source]#

Get the number of parameters that are optimized.

Return type:: int
Returns:: Number of parameters that are optimized.

create_config()[source]#

Create a config object from the inference object.

Return type:: Config
Returns:: Config object.

classmethod from_config(config: Config)[source]#

Load from config object.

Parameters:: config (Config) – Config object.
Return type:: Self
Returns:: Inference object.

classmethod from_config_file(file: str, cache: bool = True)[source]#

Load from config file.

Parameters:

file (str) – Config file path, possibly URL.
cache (bool) – Whether to use the cache if available.

Return type:

Self

Returns:

Inference object.

get_summary()[source]#

Get summary.

Return type:: InferenceResult
Returns:: Inference results.

get_x0()[source]#

Get initial values.

Return type:: Dict[str, Dict[str, float]]
Returns:: Initial values.

property param_names: List[str]#

Parameter names.

Returns:: List of parameter names.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:

classes – Classes to be used for unserialization.
file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:

classes – Classes to be used for unserialization
json (str) – JSON string

Return type:

Self

get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')#

Get discretized DFE.

Parameters:

intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bins.
confidence_intervals (bool) – Whether to return confidence intervals
ci_level (float) – Confidence interval level
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, Optional[ndarray]]

Returns:

Array of values and array of deviations

to_file(file: str)#

Save object to file.

Parameters:: file (str) – File to save to

to_json()[source]#

Serialize object.

Return type:: str
Returns:: JSON string

class JointInference(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = True, n_sites_div_neut: Dict[str, float] = None, n_sites_div_sel: Dict[str, float] = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Bases: BaseInference

Enabling the sharing of parameters among several Inference objects.

Example usage:

import fastdfe as fd

# neutral SFS for two types
sfs_neut = fd.Spectra(dict(
    pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652],
    pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31]
))

# selected SFS for two types
sfs_sel = fd.Spectra(dict(
    pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794],
    pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41]
))

# create inference object
inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    do_bootstrap=True
)

# run inference
inf.run()

Methods:

`__init__`	Create instance.
`check_shared_params`	Check if the shared parameters were specified correctly.
`check_no_shared_params_fixed`	Check that no shared parameters are fixed and raise an error otherwise.
`get_shared_param_names`	Get the names of the shared parameters.
`from_config`	Load from config object.
`run`	Run inference.
`marginals_without_all`	Get marginal inference without 'all' type.
`joint_inference_makes_sense`	Check if joint inference makes sense.
`get_counts`	Get callback functions for modelling SFS counts from given parameters.
`run_joint_without_covariates`	Run joint inference without covariates.
`create_config`	Create a config object from the inference object.
`perform_lrt_covariates`	Perform likelihood ratio test against joint inference without covariates.
`bootstrap`	Perform the parametric bootstrap both for the marginal and joint inferences.
`bootstrap_if_required`	Bootstrap if not done yet.
`get_x0`	Get initial values for joint inference.
`perform_lrt_shared`	Compare likelihood of joint inference with product of marginal likelihoods.
`get_inferences`	Get all inference objects as dictionary.
`plot_discretized`	Plot discretized DFE comparing the different types.
`plot_sfs_comparison`	Plot SFS comparison.
`plot_continuous`	Plot discretized DFE.
`plot_inferred_parameters`	Plot discretized DFE comparing the different types.
`plot_inferred_parameters_boxplot`	Plot discretized DFE comparing the different types.
`plot_covariate`	Plot the covariate given by the index.
`get_discretized`	Get discretized DFEs.
`get_bootstrap_params`	Get bootstrap parameters.
`to_json`	Serialize object.
`get_residual`	Residual of joint inference.
`get_alpha`	Get alpha, the proportion of beneficial non-synonymous substitutions, for each type.
`get_omega`	Get omega, the rate of non-synonymous over synonymous substitutions (dN/dS), for each type.
`check_fixed_params_exist`	Check that the fixed parameters are valid.
`compare_nested`	Perform likelihood ratio test with given more complex model.
`compare_nested_models`	Compare the various nested versions of the specified model using likelihood ratio tests.
`evaluate_likelihood`	Evaluate likelihood function at given parameters.
`from_config_file`	Load from config file.
`from_file`	Load object from file.
`from_json`	Unserialize object.
`get_bootstrap_param_names`	Get the parameters to be included in the bootstraps.
`get_bounds_linear`	Get linear bounds for all parameters.
`get_dfe`	Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.
`get_fixed_param_names`	Get the names of the fixed parameters.
`get_n_optimized`	Get the number of parameters that are optimized.
`get_omega_a`	Get omega_a, the rate of adaptive non-synonymous over synonymous substitutions (adaptive dN/dS), for each type.
`get_optimized_param_names`	Get the parameters names for the parameters that are optimized.
`get_scales_linear`	Get linear scales for all parameters.
`get_spectra`	Return a Spectra object containing the neutral, selected and modelled SFS.
`get_stats_discretized`	Get statistics for the discretized DFE.
`get_summary`	Get summary.
`lrt`	Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).
`plot_all`	Plot a bunch of plots.
`plot_bucket_sizes`	Plot mass in each bucket for the MLE DFE.
`plot_interval_density`	Plot density of the discretization intervals chosen.
`plot_likelihoods`	Visualize the likelihoods of the optimization runs using a scatter plot.
`plot_nested_models`	Plot the p-values of nested likelihoods.
`plot_observed_sfs`	Plot neutral and selected SFS.
`raise_if_locked`	Raise an error if this object is locked.
`run_if_required`	Run if not run yet.
`to_file`	Save object to file.

__init__(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = True, n_sites_div_neut: Dict[str, float] = None, n_sites_div_sel: Dict[str, float] = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Create instance.

Parameters:

sfs_neut (Spectra) – Neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.
sfs_sel (Spectra) – Selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.
include_divergence (bool) – Whether to include divergence counts in the likelihood. Only takes effect when the divergence target sizes are given via n_sites_div_neut and n_sites_div_sel; see BaseInference for details.
n_sites_div_neut (Dict[str, float]) – Number of mutational target sites over which neutral divergence was counted, keyed by type. Specifying it (with n_sites_div_sel) opts neutral divergence into the likelihood.
n_sites_div_sel (Dict[str, float]) – Number of mutational target sites over which selected divergence was counted, keyed by type. Specifying it (with n_sites_div_neut) opts selected divergence into the likelihood.
intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.
intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients. Decreasing the number of intervals to 100 provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.
intervals_h (Tuple[float, float, int]) – (start, stop, n_interval) for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values of h between the edges will be interpolated linearly.
h_callback (Callable[[ndarray], ndarray]) – A function mapping the scalar parameter h and the array of selection coefficients S to dominance coefficients of the same shape, allowing models where h depends on S. The default is lambda h, S: np.full_like(S, h), keeping h constant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values in intervals_h. The inferred parameter is still named h, even if transformed by h_callback, and its bounds, scales, and initial values can be set via bounds, scales, and x0. The fitness of heterozygotes and mutation homozygotes is defined as 1 + 2hs and 1 + 2s, respectively.
integration_mode (Literal['midpoint', 'quad']) – Integration mode when computing expected SFS under semidominance. quad is not recommended.
linearized (bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or use scipy.integrate.quad in each call. False not recommended.
model (Parametrization | str) – DFE parametrization
seed (int) – Random seed. Use None for no seed.
x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {type: {param: value}}
bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}
scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form {param: scale}
loss_type (Literal['likelihood', 'L2']) – Loss type
opts_mle (dict) – Options for the optimization
method_mle (str) – Method to use for optimization. See scipy.optimize.minimize for available methods.
n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.
fixed_params (Dict[str, Dict[str, float]]) – Parameters kept constant during optimization, given as {type: {param: value}}. By default, the ancestral misidentification rate eps is fixed to zero, h to 0.5, and the DFE parameters are fixed so that only the deleterious component of the DFE is estimated.
shared_params (List[SharedParams]) – List of shared parameters
do_bootstrap (bool) – Whether to perform bootstrapping
n_bootstraps (int) – Number of bootstraps
n_bootstrap_retries (int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.
parallelize (bool) – Whether to parallelize computations
folded (bool) – Whether the SFS are folded. If not specified, the SFS will be folded if all of the given SFS appear to be folded.
discretization (Discretization) – Discretization instance. Mainly intended for internal use.
optimization (Optimization) – Optimization instance. Mainly intended for internal use.
locked (bool) – Whether inferences can be run from the class itself. Used internally.
kwargs – Additional keyword arguments which are ignored.

types: List[str]#: SFS types

params_mle_raw: Optional[Dict[str, Dict[str, float]]]#: original MLE parameters before adding covariates and unpacked shared

shared_params#: Shared parameters with expanded ‘all’ type

covariates: Dict[str, Covariate]#: Covariates indexed by parameter name

fixed_params: Dict[str, Dict[str, float]]#: fixed parameters with expanded ‘all’ type

discretization: Discretization#: Discretization instance

marginal_inferences: Dict[str, BaseInference]#: Dictionary of marginal inferences indexed by type

include_divergence: bool = True#: Whether divergence counts were requested

scales: Dict[str, Literal['lin', 'log', 'symlog']]#: parameter scales

bounds: Dict[str, Tuple[float, float]]#: parameter bounds

optimization: Optimization#: Joint optimization instance

joint_inferences: Dict[str, BaseInference]#: Joint inference object indexed by type

check_shared_params()[source]#: Check if the shared parameters were specified correctly.

check_no_shared_params_fixed()[source]#: Check that no shared parameters are fixed and raise an error otherwise.

get_shared_param_names()[source]#

Get the names of the shared parameters.

Return type:: List[str]
Returns:: Names of the shared parameters.

classmethod from_config(config: Config)[source]#

Load from config object. Unlike BaseInference.from_config(), the per-type divergence target sizes are passed through unchanged (JointInference expects them keyed by type).

Parameters:: config (Config) – Config object.
Return type:: JointInference
Returns:: JointInference object.

run(**kwargs)[source]#

Run inference.

Parameters:: kwargs – Additional keyword arguments
Return type:: Spectrum
Returns:: Modelled SFS.

marginals_without_all()[source]#

Get marginal inference without ‘all’ type.

Return type:: Dict[str, BaseInference]
Returns:: Dict of marginal inferences indexed by type.

joint_inference_makes_sense()[source]#

Check if joint inference makes sense.

Return type:: bool
Returns:: Whether joint inference makes sense.

get_counts()[source]#

Get callback functions for modelling SFS counts from given parameters.

Return type:: dict
Returns:: Dict of callback functions indexed by type.

run_joint_without_covariates(do_bootstrap: bool = True)[source]#

Run joint inference without covariates. Note that the result of this function is cached.

Return type:: JointInference
Returns:: Joint inference instance devoid of covariates.

create_config()[source]#

Create a config object from the inference object.

Return type:: Config
Returns:: Config object.

perform_lrt_covariates(do_bootstrap: bool = False)[source]#

Perform likelihood ratio test against joint inference without covariates. In the simple model we share parameters across types. Low p-values indicate that including covariates significantly improves the model fit.

To access the JointInference object without covariates, you can call run_joint_without_covariates(), which is cached.

Parameters:: do_bootstrap (bool) – Whether to bootstrap.
Return type:: float
Returns:: Likelihood ratio test p-value.

bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = False, n_retries: int = None, pbar: bool = True, pbar_title: str = 'Bootstrapping joint inference')[source]#

Perform the parametric bootstrap both for the marginal and joint inferences.

Parameters:

n_samples (int) – Number of bootstrap samples. Defaults to n_bootstraps.
parallelize (bool) – Whether to parallelize the bootstrap. Defaults to parallelize.
update_likelihood (bool) – Whether to update the likelihood to be the mean of the bootstrap samples and the original likelihood. This is not statistically valid but can stabilize maximum likelihood estimate subject optimization noise.
n_retries (int) – Number of optimization runs for each bootstrap sample. Defaults to n_bootstrap_retries.
pbar (bool) – Whether to show a progress bar.
pbar_title (str) – Title for the progress bar.

Return type:

Optional[DataFrame]

Returns:

DataFrame with bootstrap samples

bootstrap_if_required()[source]#: Bootstrap if not done yet.

get_x0()[source]#

Get initial values for joint inference.

Return type:: Dict[str, Dict[str, float]]
Returns:: Dictionary of initial values indexed by type

perform_lrt_shared(do_bootstrap: bool = False)[source]#

Compare likelihood of joint inference with product of marginal likelihoods. This provides information about the goodness of fit achieved by the parameter sharing. Low p-values indicate that parameter sharing is not justified, i.e., that the marginal inferences provide a significantly better fit to the data. Note that it is more difficult to properly optimize the joint likelihood, which makes this test conservative, i.e., the reported p-value may be larger than what it should be.

Parameters:: do_bootstrap (bool) – Whether to perform bootstrapping.
Return type:: float
Returns:: p-value

get_inferences(types: List[str] = None, labels: List[str] = None, show_marginals: bool = True)[source]#

Get all inference objects as dictionary.

Parameters:

types (List[str]) – Types to include
labels (List[str]) – Labels for types
show_marginals (bool) – Whether to also show marginal inferences

Return type:

Dict[str, BaseInference]

Returns:

Dictionary of base inference objects indexed by type and joint vs marginal subtypes

plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), show_marginals: bool = True, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'original', title: str = 'discretized DFE comparison', labels: List[str] = None, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot discretized DFE comparing the different types.

Parameters:

labels (List[str]) – Labels for types
title (str) – Title of plot
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.
show_marginals (bool) – Whether to also show marginal inferences
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
ci_level (float) – Confidence level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save plot to
show (bool) – Whether to show plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], types: List[str] = None, labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#

Plot SFS comparison.

Parameters:

types (List[str]) – Types to plot
file (str) – File to save plot to
labels (List[str]) – Labels for types.
colors (List[str]) – Colors for types. Only for Python visualization backend.
sfs_types (List[Literal['modelled', 'observed', 'selected', 'neutral']]) – Types of SFS to plot
show (bool) – Whether to show plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if use_subplots is False.
title (str) – Plot title
use_subplots (bool) – Whether to use subplots. Only for Python visualization backend.
show_monomorphic (bool) – Whether to show monomorphic counts
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'DFE comparison', labels: List[str] = None, scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot discretized DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:

scale_density (bool) – Whether to scale the density by the x-axis interval size
scale (Literal['lin', 'log', 'symlog']) – y-scale
labels (List[str]) – Labels for types
title (str) – Title of plot
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save plot to
show (bool) – Whether to show plot
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters(file: str = None, confidence_intervals: bool = True, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, ax: plt.Axes = None, scale: Literal['lin', 'log', 'symlog'] = 'log', kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs: List[str])[source]#

Plot discretized DFE comparing the different types.

Parameters:

labels (List[str]) – Labels for types
title (str) – Title of plot
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save plot to
show (bool) – Whether to show plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
scale (Literal['lin', 'log', 'symlog']) – y-scale
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, **kwargs: List[str])[source]#

Plot discretized DFE comparing the different types.

Parameters:

labels (List[str]) – Labels for types
title (str) – Title of plot
file (str) – File to save plot to
show (bool) – Whether to show plot

Return type:

plt.Axes

Returns:

Axes object

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

plot_covariate(index: int = 0, file: str = None, show: bool = True, title: str = None, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'original', show_types: bool = True, ci_level: float = 0.05, xlabel: str = 'cov', ylabel: str = None, ax: plt.Axes = None)[source]#

Plot the covariate given by the index.

Parameters:

index (int) – The index of the covariate.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
title (str) – Plot title.
bootstrap_type (Literal['percentile', 'bca']) – Bootstrap type.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
show_types (bool) – Whether to show types on second x-axis.
ci_level (float) – Confidence level.
xlabel (str) – X-axis label.
ylabel (str) – Y-axis label, defaults to the covariate parameter name.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object.

Get discretized DFEs.

Parameters:

intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bins.
confidence_intervals (bool) – Whether to return confidence intervals
ci_level (float) – Confidence interval level
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Dict[str, Tuple[ndarray, ndarray]]

Returns:

Dictionary of array of values and array of errors indexed by inference type

get_bootstrap_params()[source]#

Get bootstrap parameters.

Return type:: Dict[str, float]
Returns:: Bootstrap parameters

to_json()[source]#

Serialize object. Note that the deserialized inference objects no longer share the same optimization instance among other things.

Return type:: str
Returns:: JSON string

property alpha: float | None#

The is no single alpha for the joint inference. Please refer to the self.joint_inferences[t].alpha.

Returns:: None

property omega: float | None#

The is no single omega for the joint inference. Please refer to the self.joint_inferences[t].omega.

Returns:: None

property omega_a: float | None#

The is no single omega_a for the joint inference. Please refer to the self.joint_inferences[t].omega_a.

Returns:: None

get_residual(k: int)[source]#

Residual of joint inference. We calculate the residual over the jointly inferred SFS for all types.

Parameters:: k (int) – Order of the norm
Return type:: float
Returns:: L2 residual

get_alpha(params: dict = None, use_divergence: bool = None)[source]#

Get alpha, the proportion of beneficial non-synonymous substitutions, for each type.

Parameters:

params (dict) – DFE parameters to use for calculation of format {type: {param: value}}.
use_divergence (bool) – Whether to estimate alpha from observed divergence counts using polyDFE’s McDonald-Kreitman style estimator, or from the DFE alone. When None (the default), this follows the inference mode. See BaseInference.get_alpha().

Return type:

float

get_omega(params: dict = None, use_divergence: bool = None)[source]#

Get omega, the rate of non-synonymous over synonymous substitutions (dN/dS), for each type.

Parameters:

params (dict) – DFE parameters to use for calculation of format {type: {param: value}}.
use_divergence (bool) – Whether to estimate omega from observed divergence counts using polyDFE’s McDonald-Kreitman style estimator, or from the DFE alone. When None (the default), this follows the inference mode. See BaseInference.get_omega().

Return type:

Dict[str, float]

check_fixed_params_exist()#: Check that the fixed parameters are valid.

compare_nested(complex: BaseInference)#

Parameters:: complex (BaseInference) – More complex model.
Return type:: float | None
Returns:: p-value or None if the models are not nested.
Raises:: ValueError – If the inference objects are not nested.

compare_nested_models(do_bootstrap: bool = False)#

Compare the various nested versions of the specified model using likelihood ratio tests. We check for inclusion of ancestral misidentification and beneficial mutations.

Parameters:: do_bootstrap (bool) – Whether to perform bootstrapping.
Return type:: Tuple[ndarray, Dict[str, BaseInference]]
Returns:: Matrix of p-values, dict of base inference objects

evaluate_likelihood(params: Dict[str, Dict[str, float]])#

Evaluate likelihood function at given parameters.

Parameters:: params (Dict[str, Dict[str, float]]) – Parameters indexed by type and parameter name.
Return type:: float
Returns:: The likelihood.

classmethod from_config_file(file: str, cache: bool = True)#

Load from config file.

Parameters:

file (str) – Config file path, possibly URL.
cache (bool) – Whether to use the cache if available.

Return type:

Self

Returns:

Inference object.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:

classes – Classes to be used for unserialization.
file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:

classes – Classes to be used for unserialization
json (str) – JSON string

Return type:

Self

get_bootstrap_param_names()#

Get the parameters to be included in the bootstraps.

Return type:: List[str]
Returns:: Parameters to be included in the bootstraps.

get_bounds_linear()#

Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Tuple[float, float]]
Returns:: Dictionary of bounds.

get_dfe()#

Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.

Return type:: DFE
Returns:: DFE object

get_fixed_param_names()#

Get the names of the fixed parameters.

Return type:: List[str]

get_n_optimized()#

Get the number of parameters that are optimized.

Return type:: int
Returns:: Number of parameters that are optimized.

get_omega_a(params: dict = None, use_divergence: bool = None)[source]#

Get omega_a, the rate of adaptive non-synonymous over synonymous substitutions (adaptive dN/dS), for each type.

Parameters:

params (dict) – DFE parameters to use for calculation of format {type: {param: value}}.
use_divergence (bool) – Whether to estimate omega_a from observed divergence counts using polyDFE’s McDonald-Kreitman style estimator, or from the DFE alone. When None (the default), this follows the inference mode. See BaseInference.get_omega_a().

Return type:

Dict[str, float]

get_optimized_param_names()#

Get the parameters names for the parameters that are optimized.

Return type:: List[str]
Returns:: List of parameter names.

get_scales_linear()#

Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Literal['lin']]
Returns:: Dictionary of scales.

get_spectra()#

Return a Spectra object containing the neutral, selected and modelled SFS.

Return type:: Spectra
Returns:: Spectra object with types neutral, selected, and modelled.

Get statistics for the discretized DFE.

Parameters:

ci_level (float) – Confidence interval level.
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Returns:

Center values, errors around center, and confidence intervals.

get_summary()#

Get summary.

Return type:: InferenceResult
Returns:: Inference results.

lrt(ll_simple: float, ll_complex: float, df: int, m: int = 0)#

Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).

Parameters:

ll_simple (float) – Log-likelihood of the simple model.
ll_complex (float) – Log-likelihood of the complex model.
df (int) – Total number of parameters being tested.
m (int) – Number of parameters at the boundary.

Return type:

float

Returns:

p-value

n_sites_div_neut: Optional[float] = None#: Number of mutational target sites over which neutral divergence was counted. Owned by the inference (not the spectra); specifying both this and n_sites_div_sel opts divergence into the likelihood.

n_sites_div_sel: Optional[float] = None#: Number of mutational target sites over which selected divergence was counted.

property param_names: List[str]#

Parameter names.

Returns:: List of parameter names.

plot_all(show: bool = True)#

Plot a bunch of plots.

Parameters:: show (bool) – Whether to show plot.

plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)#

Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!

Parameters:

color (str) – Color of the bars.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
interval_labels (List[str]) – Labels for the intervals.
intervals (Sequence) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)#

Visualize the likelihoods of the optimization runs using a scatter plot.

Parameters:

scale (Literal['lin', 'log']) – y-scale of the plot.
title (str) – Plot title.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (str) – Label for the y-axis.

Return type:

plt.Axes

Returns:

Axes object

Plot the p-values of nested likelihoods.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
remove_empty (bool) – Whether to remove empty rows and columns.
transpose (bool) – Whether to transpose the matrix.
cmap (str) – Colormap to use.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (bool) – Whether to perform bootstrapping.

Return type:

plt.Axes

Returns:

Axes object

plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)#

Plot neutral and selected SFS.

Parameters:

file (str) – File to save plot to.
labels (List[str]) – Labels for the SFS.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

raise_if_locked()#

Raise an error if this object is locked.

Raises:: Exception –

run_if_required(*args, **kwargs)#

Run if not run yet.

Parameters:

args – Arguments.
kwargs – Keyword arguments.

Return type:

Optional[Spectrum]

Returns:

DFE parametrization and modelled SFS.

runs: Optional[pd.DataFrame] = None#: Dataframe holding information on all optimization runs

to_file(file: str)#

Save object to file.

Parameters:: file (str) – File to save to

sfs_neut: Spectrum#: Neutral SFS

sfs_sel: Spectrum#: Selected SFS

model: Parametrization#: The DFE parametrization

folded: bool#: Whether the SFS are folded

n: int#: Sample size

theta: float#: Estimate of theta from neutral SFS

params_mle: Optional[Dict[str, float]]#: MLE estimates of the initial optimization

sfs_mle: Optional[Spectrum]#: Modelled MLE SFS

likelihood: Optional[float]#: Likelihood of the MLE, this value may be updated after bootstrapping

likelihoods: Optional[List[float]]#: Likelihoods of the different ML runs, controlled by n_runs

n_runs: int#: Number of MLE runs to perform

result: Optional['scipy.optimize.OptimizeResult']#: Numerical optimization result

do_bootstrap: bool#: Whether to do bootstrapping

n_bootstraps: int#: Number of bootstraps

n_bootstrap_retries: int#: Number of optimization runs for each bootstrap sample

parallelize: bool#: Whether to parallelize computations

x0: Dict[str, Dict[str, float]]#: Initial values

bootstraps: Optional[pd.DataFrame]#: Bootstrapped MLE parameter estimates

bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#: Bootstrap optimization results

L2_residual: Optional[float]#: L2 norm of differential between modelled and observed SFS

seed: int | None#: Random number generator seed

rng: np.random.Generator#: Random generator instance

execution_time: float#: Total execution time in seconds

locked: bool#: Whether inferences can be run from the class itself

class Inference[source]#

Bases: object

Static utility methods for inference objects.

Methods:

`plot_discretized`	Visualize several discretized DFEs given by the list of inference objects.
`plot_continuous`	Visualize several DFEs given by the list of inference objects.
`plot_inferred_parameters`	Visualize several discretized DFEs given by the list of inference objects.
`plot_inferred_parameters_boxplot`	Visualize several discretized DFEs given by the list of inference objects.
`get_errors_params_mle`	Get errors and values for MLE params of inferences.
`get_discretized`	Get values and errors of discretized DFE.
`get_stats_discretized`	Compute errors and confidence interval for a discretized DFE.
`compute_histogram`	Discretize the DFE given a DFE parametrization and its parameter values.

static plot_discretized(inferences: List[AbstractInference], intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', file: str = None, show: bool = True, title: str = 'discretized DFEs', labels: Sequence = None, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
intervals (Sequence) – Intervals over (-inf, inf) to use for discretization.
confidence_intervals (bool) – Whether to plot confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
labels (Sequence) – Labels for the DFEs.
kwargs – Additional arguments for the plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes of the plot.

static plot_continuous(inferences: List[AbstractInference], intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'continuous DFEs', labels: Sequence = None, scale: Literal['lin', 'log', 'symlog'] = 'lin', scale_density: bool = False, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#

Visualize several DFEs given by the list of inference objects. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
intervals (ndarray) – Intervals to use for discretization.
confidence_intervals (bool) – Whether to plot confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
labels (Sequence) – Labels for the DFEs.
scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.
scale_density (bool) – Whether to scale the density by the x-axis interval size.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
kwargs – Additional arguments for the plot.

Return type:

plt.Axes

Returns:

Axes of the plot.

static plot_inferred_parameters(inferences: List[AbstractInference], labels: Sequence, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Unique labels for the DFEs.
scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.
confidence_intervals (bool) – Whether to plot confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
kwargs – Additional arguments which are ignored.

Return type:

plt.Axes

Returns:

Axes of the plot.

Raises:

ValueError – If no inference objects are given.

static plot_inferred_parameters_boxplot(inferences: List[AbstractInference], labels: Sequence, file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Unique labels for the DFEs.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
kwargs – Additional arguments for the plot.

Return type:

plt.Axes

Returns:

Axes of the plot.

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

static get_errors_params_mle(ci_level: float, confidence_intervals: bool, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence, bootstrap_type: Literal['percentile', 'bca'], point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get errors and values for MLE params of inferences.

Parameters:

ci_level (float) – Confidence level for confidence intervals.
confidence_intervals (bool) – Whether to compute confidence intervals.
inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Labels for the inferences.
param_names (Sequence) – Names of the parameters to get errors and values for.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[Dict[str, Optional[Tuple[ndarray, ndarray]]], Dict[str, ndarray]]

Returns:

dictionary of errors and dictionary of center values indexed by labels.

static get_discretized(inferences: List[AbstractInference], labels: Sequence, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Get values and errors of discretized DFE.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Labels for the DFEs.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use
ci_level (float) – Confidence interval level
confidence_intervals (bool) – Whether to compute confidence intervals
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[Dict[str, ndarray], Dict[str, Optional[ndarray]]]

Returns:

Dictionary of values and dictionary of errors indexed by labels.

static get_stats_discretized(params: dict, bootstraps: DataFrame, model: Parametrization | str, ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#

Compute errors and confidence interval for a discretized DFE.

Parameters:

params (dict) – Parameters of the model
bootstraps (DataFrame) – Bootstrapped samples
model (Parametrization | str) – DFE parametrization
ci_level (float) – Confidence interval level
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
point_estimate (Literal['original', 'mean', 'median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

Center values, errors around center, and confidence intervals.

static compute_histogram(model: Parametrization | str, params: dict, intervals: ndarray)[source]#

Discretize the DFE given a DFE parametrization and its parameter values.

Parameters:

model (Parametrization | str) – DFE parametrization
params (dict) – Parameters of the model
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

Return type:

ndarray

Returns:

Discretized DFE

class InferenceResult(inference: BaseInference)[source]#

Bases: Serializable

Inference result.

Methods:

`from_file`	Load object from file.
`from_json`	Unserialize object.
`to_file`	Save object to file.
`to_json`	Serialize object.
`__init__`	Initialize object.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:

classes – Classes to be used for unserialization.
file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:

classes – Classes to be used for unserialization
json (str) – JSON string

Return type:

Self

to_file(file: str)#

Save object to file.

Parameters:: file (str) – File to save to

to_json()#

Serialize object.

Return type:: str
Returns:: JSON string

__init__(inference: BaseInference)[source]#

Initialize object.

Parameters:: inference (BaseInference) – Inference object.

config: Config#: Configuration of the inference

dfe: DFE#: Inferred DFE

spectra: Spectra#: Spectra

bootstraps: DataFrame#: Bootstrap samples

runs: DataFrame#: Optimization runs

class SharedParams(params: List[str] | Literal['all'] = 'all', types: List[str] | Literal['all'] = 'all')[source]#

Bases: object

Class specifying the sharing of params among types. all means all available types or params.

Example usage:

import fastdfe as fd

# neutral SFS for two types
sfs_neut = fd.Spectra(dict(
    pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652],
    pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31]
))

# selected SFS for two types
sfs_sel = fd.Spectra(dict(
    pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794],
    pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41]
))

# create inference object
inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    shared_params=[fd.SharedParams(types=["pendula", "pubescens"], params=["eps", "S_d"])],
    do_bootstrap=True
)

# run inference
inf.run()

Methods:

__init__

params: Union[List[str], Literal['all']] = 'all'#: The params to share

types: Union[List[str], Literal['all']] = 'all'#: The types to share

__init__(params: List[str] | Literal['all'] = 'all', types: List[str] | Literal['all'] = 'all')#

class Covariate(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')[source]#

Bases: object

Class defining a covariate which induces a relationship with one or many parameters. The relationship is defined by a callback function which modifies the parameters. The default callback introduces a linear relationship.

Below an example of introducing linear covariates for S_d, the for mean strength of deleterious mutations (cf. GammaExpParametrization). Each of the three types is associated with one covariate. This we pass to JointInference together with the stratified spectra:

import fastdfe as fd

cov = fd.Covariate(
    param='S_d',
    values=dict(type1=5, type2=3, type3=1)
)

Methods:

`apply`	Apply the custom or default callback to modify the given parameters.
`apply_default`	Modify the given parameters introducing a linear relationship with the given covariate.
`__init__`

param: str#: The parameter to modify

values: Dict[str, float]#: The values of the covariate for each type

callback: Optional[Callable] = None#: The callback function to modify the parameters

bounds: tuple = (0.0001, 10000.0)#: The bounds of the covariate parameter to be estimated

x0: float = 0#: The initial value of the covariate

bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog'#: The scale of the bounds. See scale_value() for details

apply(covariate: float, type: str, params: Dict[str, float])[source]#

Apply the custom or default callback to modify the given parameters.

Parameters:

covariate (float) – The value of the covariate.
type (str) – The type of the relationship.
params (Dict[str, float]) – The input parameters.

Return type:

Dict[str, float]

Returns:

Modified parameters.

apply_default(covariate: float, type: str, params: Dict[str, float])[source]#

Modify the given parameters introducing a linear relationship with the given covariate.

Parameters:

covariate (float) – The value of the covariate.
type (str) – The type of the relationship.
params (Dict[str, float]) – The input parameters.

Return type:

Dict[str, float]

Returns:

Modified parameters.

__init__(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')#

DFE Inference

Contents

DFE Inference#