Inference

Inference#

class BaseInference(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Bases: AbstractInference

Base inference class for inferring the SFS given one neutral and one selected SFS. Note that BaseInference is by default seeded.

Example usage:

import fastdfe as fd

sfs_neut = fd.Spectrum([177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652])
sfs_sel = fd.Spectrum([797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794])

# create inference object
inf = fd.BaseInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    n_runs=10,
    n_bootstraps=100,
    do_bootstrap=True
)

# run inference
inf.run()

# plot discretized DFE
inf.plot_discretized()

# plot SFS comparison
inf.plot_sfs_comparison()

__init__(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#

Create BaseInference instance.

Parameters:

sfs_neut (Spectra | Spectrum) – The neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If a Spectra object with more than one SFS is provided, the all attribute will be used.
sfs_sel (Spectra | Spectrum) – The selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If a Spectra object with more than one SFS is provided, the all attribute will be used.
intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced.
intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients.
integration_mode (Literal['midpoint', 'quad']) – Integration mode for the DFE, quad not recommended
linearized (bool) – Whether to use the linearized DFE, False not recommended
model (Parametrization | str) – Instance of DFEParametrization which parametrized the DFE
seed (int) – Seed for the random number generator. Use None for no seed.
x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {'all': {param: value}}
bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}
scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form ``{param: scale}`
loss_type (Literal['likelihood', 'L2']) – Type of loss function to use for optimization.
opts_mle (dict) – Options for the optimization.
n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.
fixed_params (Dict[str, Dict[str, float]]) – Parameters held fixed during optimization, i.e., {'all': {param: value}}
do_bootstrap (bool) – Whether to do bootstrapping.
n_bootstraps (int) – Number of bootstraps.
n_bootstrap_retries (int) – Number of retries for bootstraps that did not terminate normally.
parallelize (bool) – Whether to parallelize computations.
folded (bool) – Whether the SFS are folded. If not specified, the SFS will be folded if both of the given SFS appear to be folded.
discretization (Discretization) – Discretization instance. Mainly intended for internal use.
optimization (Optimization) – Optimization instance. Mainly intended for internal use.
locked (bool) – Whether to lock the instance.
kwargs – Additional keyword arguments which are ignored.

sfs_neut: Spectrum#: Neutral SFS

sfs_sel: Spectrum#: Selected SFS

model: Parametrization#: The DFE parametrization

folded: bool#: Whether the SFS are folded

n: int#: Sample size

discretization: Discretization#: Discretization instance

theta: float#: Estimate of theta from neutral SFS

params_mle: Optional[Dict[str, float]]#: MLE estimates of the initial optimization

sfs_mle: Optional[Spectrum]#: Modelled MLE SFS

likelihood: Optional[float]#: Likelihood of the MLE, this value may be updated after bootstrapping

likelihoods: Optional[List[float]]#: Likelihoods of the different ML runs, controlled by n_runs

n_runs: int#: Number of MLE runs to perform

result: Optional['scipy.optimize.OptimizeResult']#: Numerical optimization result

do_bootstrap: bool#: Whether to do bootstrapping

n_bootstraps: int#: Number of bootstraps

n_bootstrap_retries: int#: Number of retries for bootstraps that did not terminate normally

parallelize: bool#: Whether to parallelize computations

fixed_params: Dict[str, Dict[str, float]]#: Fixed parameters

scales: Dict[str, Literal['lin', 'log', 'symlog']]#: parameter scales

bounds: Dict[str, Tuple[float, float]]#: parameter bounds

optimization: Optimization#: Optimization instance

x0: Dict[str, Dict[str, float]]#: Initial values

bootstraps: Optional[pd.DataFrame]#: Bootstrapped MLE parameter estimates

bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#: Bootstrap optimization results

L2_residual: Optional[float]#: L2 norm of differential between modelled and observed SFS

seed: int | None#: Random number generator seed

rng: np.random.Generator#: Random generator instance

execution_time: float#: Total execution time in seconds

locked: bool#: Whether inferences can be run from the class itself

get_fixed_param_names()[source]#

Get the names of the fixed parameters.

Return type:: List[str]

check_fixed_params_exist()[source]#: Check that the fixed parameters are valid.

raise_if_locked()[source]#

Raise an error if this object is locked.

Raises:: Exception –

run_if_required(*args, **kwargs)[source]#

Run if not run yet.

Parameters:

args – Arguments.
kwargs – Keyword arguments.

Return type:

Optional[Spectrum]

Returns:

DFE parametrization and modelled SFS.

run(do_bootstrap: bool = None, pbar: bool = None, **kwargs)[source]#

Perform the DFE inference.

Parameters:

pbar (bool) – Whether to show a progress bar.
do_bootstrap (bool) – Whether to perform bootstrapping.
kwargs – Additional keyword arguments which are ignored.

Return type:

Spectrum

Returns:

Modelled SFS.

get_counts()[source]#

Get callback functions for modelling SFS counts from given parameters.

Return type:: dict
Returns:: Callback functions for modelling SFS counts for each type.

evaluate_likelihood(params: Dict[str, Dict[str, float]])[source]#

Evaluate likelihood function at given parameters.

Parameters:: params (Dict[str, Dict[str, float]]) – Parameters indexed by type and parameter name.
Return type:: float
Returns:: The likelihood.

get_residual(k: int)[source]#

Residual of the modelled and observed SFS.

Parameters:: k (int) – Order of the norm
Return type:: float
Returns:: L2 residual

bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = True, n_retries: int = None, pbar: bool = True)[source]#

Perform the parametric bootstrap.

Parameters:

n_samples (int) – Number of bootstrap samples. Defaults to n_bootstraps.
parallelize (bool) – Whether to parallelize the bootstrap. Defaults to parallelize.
update_likelihood (bool) – Whether to update the likelihood to be the mean of the bootstrap samples.
n_retries (int) – Number of retries for bootstrap samples that did not terminate normally. Defaults to n_bootstrap_retries.
pbar (bool) – Whether to show a progress bar.

Return type:

DataFrame

Returns:

Dataframe with bootstrap results.

get_scales_linear()[source]#

Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Literal['lin']]
Returns:: Dictionary of scales.

get_bounds_linear()[source]#

Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Tuple[float, float]]
Returns:: Dictionary of bounds.

plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', title: str = 'DFE', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot continuous DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:

title (str) – Plot title.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
confidence_intervals (bool) – Whether to plot confidence intervals.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
scale (Literal['lin', 'log', 'symlog']) – y-scale
scale_density (bool) – Whether to scale the density by the x-axis interval size
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)[source]#

Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)[source]#

Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!

Parameters:

color (str) – Color of the bars.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
interval_labels (List[str]) – Labels for the intervals.
intervals (Sequence) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#

Plot SFS comparison.

Parameters:

file (str) – File to save plot to.
labels (List[str]) – Labels for the SFS.
colors (List[str]) – Colors for the SFS. Only for Python visualization backend.
sfs_types (List[Literal['modelled', 'observed', 'selected', 'neutral']]) – Types of SFS to plot.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if use_subplots is False.
title (str) – Plot title
use_subplots (bool) – Whether to use subplots. Only for Python visualization backend.
show_monomorphic (bool) – Whether to show monomorphic counts
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)[source]#

Plot neutral and selected SFS.

Parameters:

file (str) – File to save plot to.
labels (List[str]) – Labels for the SFS.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_all(show: bool = True)[source]#

Plot a bunch of plots.

Parameters:: show (bool) – Whether to show plot.

get_errors_discretized_dfe(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#

Get the discretized DFE errors.

Parameters:

ci_level (float) – Confidence interval level.
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.

Return type:

ndarray

Returns:

Arrays of errors, confidence intervals, bootstraps, means and values

plot_inferred_parameters(confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#

Visualize the inferred parameters and their confidence intervals.

Parameters:

scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.
title (str) – Plot title.
confidence_intervals (bool) – Whether to show confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.
ci_level (float) – Confidence level for the confidence intervals.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#

Visualize the inferred parameters and their confidence intervals.

Parameters:

title (str) – Plot title.
file (str) – File to save plot to.
show (bool) – Whether to show plot.

Return type:

plt.Axes

Returns:

Axes object

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)[source]#

Visualize the likelihoods of the optimization runs using a scatter plot.

Parameters:

scale (Literal['lin', 'log']) – y-scale of the plot.
title (str) – Plot title.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (str) – Label for the y-axis.

Return type:

plt.Axes

Returns:

Axes object

lrt(ll_simple: float, ll_complex: float, df: int = 1)[source]#

Perform the likelihood ratio test (LRT).

Note

We do not adjust the test for parameters that are on the boundary of the parameter space.

Parameters:

ll_simple (float) – Log-likelihood of the simple model.
ll_complex (float) – Log-likelihood of the complex model.
df (int) – Degrees of freedom.

Return type:

float

Returns:

p-value

compare_nested(complex: BaseInference)[source]#

Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters.

Parameters:: complex (BaseInference) – More complex model.
Return type:: float | None
Returns:: p-value

compare_nested_models(do_bootstrap: bool = True)[source]#

Compare the various nested versions of the specified model using likelihood ratio tests.

Parameters:: do_bootstrap (bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.
Return type:: (ndarray, Dict[str, BaseInference])
Returns:: Matrix of p-values, dict of base inference objects

plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = True)[source]#

Plot the p-values of nested likelihoods.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
remove_empty (bool) – Whether to remove empty rows and columns.
transpose (bool) – Whether to transpose the matrix.
cmap (str) – Colormap to use.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.

Return type:

plt.Axes

Returns:

Axes object

plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'discretized DFE', ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}})#

Plot discretized DFE.

Parameters:

title (str) – Title of the plot
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence interval level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save the plot to
show (bool) – Whether to show the plot
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes

get_alpha(params: dict = None)[source]#

Get alpha, the proportion of beneficial non-synonymous substitutions.

Parameters:: params (dict) – Parameters to use for calculation.
Return type:: float

property alpha: float#: Cache alpha, the proportion of beneficial non-synonymous substitutions.

get_bootstrap_params()[source]#

Get the parameters to be included in the bootstraps.

Return type:: Dict[str, float]
Returns:: Parameters to be included in the bootstraps.

get_bootstrap_param_names()[source]#

Get the parameters to be included in the bootstraps.

Return type:: List[str]
Returns:: Parameters to be included in the bootstraps.

get_optimized_param_names()[source]#

Get the parameters names for the parameters that are optimized.

Return type:: List[str]
Returns:: List of parameter names.

get_n_optimized()[source]#

Get the number of parameters that are optimized.

Return type:: int
Returns:: Number of parameters that are optimized.

create_config()[source]#

Create a config object from the inference object.

Return type:: Config
Returns:: Config object.

classmethod from_config(config: Config)[source]#

Load from config object.

Parameters:: config (Config) – Config object.
Return type:: Self
Returns:: Inference object.

classmethod from_config_file(file: str, cache: bool = True)[source]#

Load from config file.

Parameters:

file (str) – Config file path, possibly URL.
cache (bool) – Whether to use the cache if available.

Return type:

Self

Returns:

Inference object.

get_summary()[source]#

Get summary.

Return type:: InferenceResults
Returns:: Inference results.

get_x0()[source]#

Get initial values.

Return type:: Dict[str, Dict[str, float]]
Returns:: Initial values.

property param_names: List[str]#

Parameter names.

Returns:: List of parameter names.

to_json()[source]#

Serialize object.

Return type:: str
Returns:: JSON string

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:

classes – Classes to be used for unserialization
file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:

classes – Classes to be used for unserialization
json (str) – JSON string

Return type:

Self

get_cis_params_mle(bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, param_names: list[str] | None = None)#

Get confidence intervals for the MLE parameters.

Parameters:

bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence interval level
param_names (Optional[list[str]]) – Names of the parameters to return confidence intervals for

Return type:

Dict[str, Tuple[float, float]]

Returns:

Confidence intervals for the parameters

get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile')#

Get discretized DFE.

Parameters:

bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence interval level
confidence_intervals (bool) – Whether to return confidence intervals
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bins.

Return type:

(ndarray, ndarray)

Returns:

Array of values and array of deviations

to_file(file: str)#

Save object to file.

Parameters:: file (str) – File to save to

class JointInference(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, **kwargs)[source]#

Bases: BaseInference

Enabling the sharing of parameters among several Inference objects.

Example usage:

import fastdfe as fd

# neutral SFS for two types
sfs_neut = fd.Spectra(dict(
    pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652],
    pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31]
))

# selected SFS for two types
sfs_sel = fd.Spectra(dict(
    pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794],
    pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41]
))

# create inference object
inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    fixed_params=dict(eps=0), # fix eps to 0
    do_bootstrap=True
)

# run inference
inf.run()

__init__(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, **kwargs)[source]#

Create instance.

Parameters:

sfs_neut (Spectra) – Neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.
sfs_sel (Spectra) – Selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.
include_divergence (bool) – Whether to include divergence in the likelihood
intervals_del (Tuple[float, float, int]) – (start, stop, n_interval) for deleterious population-scaled selection coefficients. The intervals will be log10-spaced.
intervals_ben (Tuple[float, float, int]) – Same as intervals_del but for positive selection coefficients.
integration_mode (Literal['midpoint', 'quad']) – Integration mode, quad not recommended
linearized (bool) – Whether to use the linearized model, False not recommended
model (Parametrization | str) – DFE parametrization
seed (int) – Random seed. Use None for no seed.
x0 (Dict[str, Dict[str, float]]) – Dictionary of initial values in the form {type: {param: value}}
bounds (Dict[str, Tuple[float, float]]) – Bounds for the optimization in the form {param: (lower, upper)}
scales (Dict[str, Literal['lin', 'log', 'symlog']]) – Scales for the optimization in the form {param: scale}
loss_type (Literal['likelihood', 'L2']) – Loss type
opts_mle (dict) – Options for the optimization
n_runs (int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.
fixed_params (Dict[str, Dict[str, float]]) – Dictionary of fixed parameters in the form {type: {param: value}}
shared_params (List[SharedParams]) – List of shared parameters
do_bootstrap (bool) – Whether to perform bootstrapping
n_bootstraps (int) – Number of bootstraps
n_bootstrap_retries (int) – Number of retries for bootstraps that did not terminate normally.
parallelize (bool) – Whether to parallelize computations
folded (bool) – Whether the SFS are folded. If not specified, the SFS will be folded if all of the given SFS appear to be folded.
kwargs – Additional keyword arguments which are ignored.

types: List[str]#: SFS types

params_mle_raw: Optional[Dict[str, Dict[str, float]]]#: original MLE parameters before adding covariates and unpacked shared

shared_params#: Shared parameters with expanded ‘all’ type

covariates: Dict[str, Covariate]#: Covariates indexed by parameter name

fixed_params: Dict[str, Dict[str, float]]#: fixed parameters with expanded ‘all’ type

marginal_inferences: Dict[str, BaseInference]#: Dictionary of marginal inferences indexed by type

scales: Dict[str, Literal['lin', 'log', 'symlog']]#: parameter scales

bounds: Dict[str, Tuple[float, float]]#: parameter bounds

optimization: Optimization#: Joint optimization instance

joint_inferences: Dict[str, BaseInference]#: Joint inference object indexed by type

check_shared_params()[source]#: Check if the shared parameters were specified correctly.

check_no_shared_params_fixed()[source]#: Check that no shared parameters are fixed and raise an error otherwise.

get_shared_param_names()[source]#

Get the names of the shared parameters.

Return type:: List[str]
Returns:: Names of the shared parameters.

run(**kwargs)[source]#

Run inference.

Parameters:: kwargs – Additional keyword arguments
Return type:: Spectrum
Returns:: Modelled SFS.

marginals_without_all()[source]#

Get marginal inference without ‘all’ type.

Return type:: Dict[str, BaseInference]
Returns:: Dict of marginal inferences indexed by type.

joint_inference_makes_sense()[source]#

Check if joint inference makes sense.

Return type:: bool
Returns:: Whether joint inference makes sense.

get_counts()[source]#

Get callback functions for modelling SFS counts from given parameters.

Return type:: dict
Returns:: Dict of callback functions indexed by type.

run_joint_without_covariates(do_bootstrap: bool = True)[source]#

Run joint inference without covariates. Note that the result of this function is cached.

Return type:: JointInference
Returns:: Joint inference instance devoid of covariates.

create_config()[source]#

Create a config object from the inference object.

Return type:: Config
Returns:: Config object.

perform_lrt_covariates(do_bootstrap: bool = True)[source]#

Perform likelihood ratio test against joint inference without covariates. In the simple model we share parameters across types. Low p-values indicate that the covariates provide a significant improvement in the fit.

To access the JointInference object without covariates, you can call run_joint_without_covariates(), which is cached.

Parameters:: do_bootstrap (bool) – Whether to bootstrap. This improves the accuracy of the p-value. Note that if bootstrapping was performed previously without updating the likelihood, this won’t have any effect.
Return type:: float
Returns:: Likelihood ratio test p-value.

bootstrap(n_samples: int = None, parallelize: bool = None, n_retries: int = None, update_likelihood: bool = True, **kwargs)[source]#

Perform the parametric bootstrap both for the marginal and joint inferences.

Parameters:

n_samples (int) – Number of bootstrap samples. Defaults to n_bootstraps.
parallelize (bool) – Whether to parallelize computations. Defaults to parallelize.
n_retries (int) – Number of retries for bootstraps that did not terminate normally. Defaults to n_bootstrap_retries.
update_likelihood (bool) – Whether to update the likelihood

Return type:

Optional[DataFrame]

Returns:

DataFrame with bootstrap samples

bootstrap_if_required()[source]#: Bootstrap if not done yet.

get_x0()[source]#

Get initial values for joint inference.

Return type:: Dict[str, Dict[str, float]]
Returns:: Dictionary of initial values indexed by type

perform_lrt_shared(do_bootstrap: bool = True)[source]#

Compare likelihood of joint inference with product of marginal likelihoods. This provides information about the goodness of fit achieved by the parameter sharing. Low p-values indicate that parameter sharing is not justified, i.e., that the marginal inferences provide a better fit to the data. Note that it is more difficult to properly optimize the joint likelihood, which makes this test conservative, i.e., the reported p-value might be larger than what it really is.

Parameters:: do_bootstrap (bool) – Whether to perform bootstrapping. This improves the accuracy of the p-value. Note that if bootstrapping was performed previously without updating the likelihood, this won’t have any effect.
Return type:: float
Returns:: p-value

get_inferences(types: List[str] = None, labels: List[str] = None, show_marginals: bool = True)[source]#

Get all inference objects as dictionary.

Parameters:

types (List[str]) – Types to include
labels (List[str]) – Labels for types
show_marginals (bool) – Whether to also show marginal inferences

Return type:

Dict[str, BaseInference]

Returns:

Dictionary of base inference objects indexed by type and joint vs marginal subtypes

plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), show_marginals: bool = True, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'discretized DFE comparison', labels: List[str] = None, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot discretized DFE comparing the different types.

Parameters:

labels (List[str]) – Labels for types
title (str) – Title of plot
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.
show_marginals (bool) – Whether to also show marginal inferences
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save plot to
show (bool) – Whether to show plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], types: List[str] = None, labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#

Plot SFS comparison.

Parameters:

types (List[str]) – Types to plot
file (str) – File to save plot to
labels (List[str]) – Labels for types.
colors (List[str]) – Colors for types. Only for Python visualization backend.
sfs_types (List[Literal['modelled', 'observed', 'selected', 'neutral']]) – Types of SFS to plot
show (bool) – Whether to show plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if use_subplots is False.
title (str) – Plot title
use_subplots (bool) – Whether to use subplots. Only for Python visualization backend.
show_monomorphic (bool) – Whether to show monomorphic counts
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'DFE comparison', labels: List[str] = None, scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot discretized DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:

scale_density (bool) – Whether to scale the density by the x-axis interval size
scale (Literal['lin', 'log', 'symlog']) – y-scale
labels (List[str]) – Labels for types
title (str) – Title of plot
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save plot to
show (bool) – Whether to show plot
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters(file: str = None, confidence_intervals: bool = True, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, ax: plt.Axes = None, scale: Literal['lin', 'log', 'symlog'] = 'log', kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs: List[str])[source]#

Plot discretized DFE comparing the different types.

Parameters:

labels (List[str]) – Labels for types
title (str) – Title of plot
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence level
confidence_intervals (bool) – Whether to plot confidence intervals
file (str) – File to save plot to
show (bool) – Whether to show plot
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
scale (Literal['lin', 'log', 'symlog']) – y-scale
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, **kwargs: List[str])[source]#

Plot discretized DFE comparing the different types.

Parameters:

labels (List[str]) – Labels for types
title (str) – Title of plot
file (str) – File to save plot to
show (bool) – Whether to show plot

Return type:

plt.Axes

Returns:

Axes object

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

plot_covariate(index: int = 0, file: str = None, show: bool = True, title: str = None, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', show_types: bool = True, ci_level: float = 0.05, xlabel: str = 'cov', ylabel: str = None, ax: plt.Axes = None)[source]#

Plot the covariate given by the index.

Parameters:

index (int) – The index of the covariate.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
title (str) – Plot title.
bootstrap_type (Literal['percentile', 'bca']) – Bootstrap type.
show_types (bool) – Whether to show types on second x-axis.
ci_level (float) – Confidence level.
xlabel (str) – X-axis label.
ylabel (str) – Y-axis label, defaults to the covariate parameter name.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object.

get_cis_params_mle(bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, param_names: list[str] | None = None)[source]#

Get confidence intervals for the parameters.

Returns:: Confidence intervals for the parameters

Get discretized DFEs.

Parameters:

bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap
ci_level (float) – Confidence interval level
confidence_intervals (bool) – Whether to return confidence intervals
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bins.

Return type:

Dict[str, Tuple[ndarray, ndarray]]

Returns:

Dictionary of array of values and array of errors indexed by inference type

get_bootstrap_params()[source]#

Get bootstrap parameters.

Return type:: Dict[str, float]
Returns:: Bootstrap parameters

to_json()[source]#

Serialize object. Note that the deserialized inference objects no longer share the same optimization instance among other things.

Return type:: str
Returns:: JSON string

property alpha: float | None#

The is no single alpha for the joint inference. Please refer to the self.joint_inferences[t].alpha.

Returns:: None

get_residual(k: int)[source]#

Residual of joint inference. We calculate the residual over the jointly inferred SFS for all types.

Parameters:: k (int) – Order of the norm
Return type:: float
Returns:: L2 residual

check_fixed_params_exist()#: Check that the fixed parameters are valid.

compare_nested(complex: BaseInference)#

Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters.

Parameters:: complex (BaseInference) – More complex model.
Return type:: float | None
Returns:: p-value

compare_nested_models(do_bootstrap: bool = True)#

Compare the various nested versions of the specified model using likelihood ratio tests.

Parameters:: do_bootstrap (bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.
Return type:: (ndarray, Dict[str, BaseInference])
Returns:: Matrix of p-values, dict of base inference objects

evaluate_likelihood(params: Dict[str, Dict[str, float]])#

Evaluate likelihood function at given parameters.

Parameters:: params (Dict[str, Dict[str, float]]) – Parameters indexed by type and parameter name.
Return type:: float
Returns:: The likelihood.

classmethod from_config(config: Config)#

Load from config object.

Parameters:: config (Config) – Config object.
Return type:: Self
Returns:: Inference object.

classmethod from_config_file(file: str, cache: bool = True)#

Load from config file.

Parameters:

file (str) – Config file path, possibly URL.
cache (bool) – Whether to use the cache if available.

Return type:

Self

Returns:

Inference object.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:

classes – Classes to be used for unserialization
file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:

classes – Classes to be used for unserialization
json (str) – JSON string

Return type:

Self

get_alpha(params: dict = None)#

Get alpha, the proportion of beneficial non-synonymous substitutions.

Parameters:: params (dict) – Parameters to use for calculation.
Return type:: float

get_bootstrap_param_names()#

Get the parameters to be included in the bootstraps.

Return type:: List[str]
Returns:: Parameters to be included in the bootstraps.

get_bounds_linear()#

Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Tuple[float, float]]
Returns:: Dictionary of bounds.

get_errors_discretized_dfe(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile')#

Get the discretized DFE errors.

Parameters:

ci_level (float) – Confidence interval level.
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.

Return type:

ndarray

Returns:

Arrays of errors, confidence intervals, bootstraps, means and values

get_fixed_param_names()#

Get the names of the fixed parameters.

Return type:: List[str]

get_n_optimized()#

Get the number of parameters that are optimized.

Return type:: int
Returns:: Number of parameters that are optimized.

get_optimized_param_names()#

Get the parameters names for the parameters that are optimized.

Return type:: List[str]
Returns:: List of parameter names.

get_scales_linear()#

Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.

Return type:: Dict[str, Literal['lin']]
Returns:: Dictionary of scales.

get_summary()#

Get summary.

Return type:: InferenceResults
Returns:: Inference results.

lrt(ll_simple: float, ll_complex: float, df: int = 1)#

Perform the likelihood ratio test (LRT).

Note

We do not adjust the test for parameters that are on the boundary of the parameter space.

Parameters:

ll_simple (float) – Log-likelihood of the simple model.
ll_complex (float) – Log-likelihood of the complex model.
df (int) – Degrees of freedom.

Return type:

float

Returns:

p-value

property param_names: List[str]#

Parameter names.

Returns:: List of parameter names.

plot_all(show: bool = True)#

Plot a bunch of plots.

Parameters:: show (bool) – Whether to show plot.

plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)#

Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!

Parameters:

color (str) – Color of the bars.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
interval_labels (List[str]) – Labels for the intervals.
intervals (Sequence) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)#

Visualize the likelihoods of the optimization runs using a scatter plot.

Parameters:

scale (Literal['lin', 'log']) – y-scale of the plot.
title (str) – Plot title.
file (str) – File to save plot to.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (str) – Label for the y-axis.

Return type:

plt.Axes

Returns:

Axes object

Plot the p-values of nested likelihoods.

Parameters:

file (str) – File to save plot to.
show (bool) – Whether to show plot.
remove_empty (bool) – Whether to remove empty rows and columns.
transpose (bool) – Whether to transpose the matrix.
cmap (str) – Colormap to use.
title (str) – Plot title.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.

Return type:

plt.Axes

Returns:

Axes object

plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)#

Plot neutral and selected SFS.

Parameters:

file (str) – File to save plot to.
labels (List[str]) – Labels for the SFS.
show (bool) – Whether to show plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes object

raise_if_locked()#

Raise an error if this object is locked.

Raises:: Exception –

run_if_required(*args, **kwargs)#

Run if not run yet.

Parameters:

args – Arguments.
kwargs – Keyword arguments.

Return type:

Optional[Spectrum]

Returns:

DFE parametrization and modelled SFS.

to_file(file: str)#

Save object to file.

Parameters:: file (str) – File to save to

sfs_neut: Spectrum#: Neutral SFS

sfs_sel: Spectrum#: Selected SFS

model: Parametrization#: The DFE parametrization

folded: bool#: Whether the SFS are folded

n: int#: Sample size

discretization: Discretization#: Discretization instance

theta: float#: Estimate of theta from neutral SFS

params_mle: Optional[Dict[str, float]]#: MLE estimates of the initial optimization

sfs_mle: Optional[Spectrum]#: Modelled MLE SFS

likelihood: Optional[float]#: Likelihood of the MLE, this value may be updated after bootstrapping

likelihoods: Optional[List[float]]#: Likelihoods of the different ML runs, controlled by n_runs

n_runs: int#: Number of MLE runs to perform

result: Optional['scipy.optimize.OptimizeResult']#: Numerical optimization result

do_bootstrap: bool#: Whether to do bootstrapping

n_bootstraps: int#: Number of bootstraps

n_bootstrap_retries: int#: Number of retries for bootstraps that did not terminate normally

parallelize: bool#: Whether to parallelize computations

x0: Dict[str, Dict[str, float]]#: Initial values

bootstraps: Optional[pd.DataFrame]#: Bootstrapped MLE parameter estimates

bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#: Bootstrap optimization results

L2_residual: Optional[float]#: L2 norm of differential between modelled and observed SFS

seed: int | None#: Random number generator seed

rng: np.random.Generator#: Random generator instance

execution_time: float#: Total execution time in seconds

locked: bool#: Whether inferences can be run from the class itself

class Inference[source]#

Bases: object

Static utility methods for inference objects.

static plot_discretized(inferences: List[AbstractInference], intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'discretized DFEs', labels: Sequence = None, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
intervals (Sequence) – Intervals over (-inf, inf) to use for discretization.
confidence_intervals (bool) – Whether to plot confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
labels (Sequence) – Labels for the DFEs.
kwargs – Additional arguments for the plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes of the plot.

static plot_continuous(inferences: List[AbstractInference], intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'continuous DFEs', labels: Sequence = None, scale: Literal['lin', 'log', 'symlog'] = 'lin', scale_density: bool = False, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#

Visualize several DFEs given by the list of inference objects. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
intervals (ndarray) – Intervals to use for discretization.
confidence_intervals (bool) – Whether to plot confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
labels (Sequence) – Labels for the DFEs.
scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.
scale_density (bool) – Whether to scale the density by the x-axis interval size.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
kwargs – Additional arguments for the plot.

Return type:

plt.Axes

Returns:

Axes of the plot.

static plot_inferred_parameters(inferences: List[AbstractInference], labels: Sequence, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Unique labels for the DFEs.
scale (Literal['lin', 'log', 'symlog']) – y-scale of the plot.
confidence_intervals (bool) – Whether to plot confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.
kwargs – Additional arguments which are ignored.

Return type:

plt.Axes

Returns:

Axes of the plot.

Raises:

ValueError – If no inference objects are given.

static plot_inferred_parameters_boxplot(inferences: List[AbstractInference], labels: Sequence, file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#

Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Unique labels for the DFEs.
file (str) – Path to file to save the plot to.
show (bool) – Whether to show the plot.
title (str) – Title of the plot.
kwargs – Additional arguments for the plot.

Return type:

plt.Axes

Returns:

Axes of the plot.

Raises:

ValueError – If no inference objects are given or no bootstraps are found.

static get_cis_params_mle(bootstrap_type: Literal['percentile', 'bca'], ci_level: float, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence)[source]#

Get confidence intervals for the MLE parameters.

Parameters:

bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use for confidence intervals.
ci_level (float) – Confidence level for confidence intervals.
inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Labels for the DFEs.
param_names (Sequence) – Names of the parameters.

Return type:

dict[str, Optional[Dict[str, Tuple[float, float]]]]

Returns:

Dictionary of confidence intervals indexed by labels.

static get_errors_params_mle(bootstrap_type: Literal['percentile', 'bca'], ci_level: float, confidence_intervals: bool, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence)[source]#

Get errors and values for MLE params of inferences.

Parameters:

bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use.
ci_level (float) – Confidence level for confidence intervals.
confidence_intervals (bool) – Whether to compute confidence intervals.
inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Labels for the inferences.
param_names (Sequence) – Names of the parameters to get errors and values for.

Return type:

(Dict[str, Optional[Tuple[ndarray, ndarray]]], Dict[str, ndarray])

Returns:

dictionary of errors and dictionary of values

static get_discretized(inferences: List[AbstractInference], labels: Sequence, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#

Get values and errors of discretized DFE.

Parameters:

inferences (List[AbstractInference]) – List of inference objects.
labels (Sequence) – Labels for the DFEs.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap to use
ci_level (float) – Confidence interval level
confidence_intervals (bool) – Whether to compute confidence intervals
intervals (ndarray) – Array of interval boundaries over (-inf, inf) yielding intervals.shape[0] - 1 bars.

Return type:

(ndarray, Optional[ndarray])

Returns:

Array of values and array of errors

static get_stats_discretized(params: dict, bootstraps: DataFrame, model: Parametrization | str, ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#

Compute errors and confidence interval for a discretized DFE.

Parameters:

params (dict) – Parameters of the model
bootstraps (DataFrame) – Bootstrapped samples
model (Parametrization | str) – DFE parametrization
ci_level (float) – Confidence interval level
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.
bootstrap_type (Literal['percentile', 'bca']) – Type of bootstrap

Return type:

(ndarray, ndarray, ndarray, ndarray, ndarray)

Returns:

Arrays of errors, confidence intervals, bootstraps, means and values

static compute_histogram(model: Parametrization | str, params: dict, intervals: ndarray)[source]#

Discretize the DFE given a DFE parametrization and its parameter values.

Parameters:

model (Parametrization | str) – DFE parametrization
params (dict) – Parameters of the model
intervals (ndarray) – Array of interval boundaries yielding intervals.shape[0] - 1 bins.

Return type:

ndarray

Returns:

Discretized DFE

class SharedParams(params: List[str] | Literal['all'] = 'all', types: List[str] | Literal['all'] = 'all')[source]#

Bases: object

Class specifying the sharing of params among types. all means all available types or params.

Example usage:

import fastdfe as fd

# neutral SFS for two types
sfs_neut = fd.Spectra(dict(
    pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652],
    pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31]
))

# selected SFS for two types
sfs_sel = fd.Spectra(dict(
    pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794],
    pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41]
))

# create inference object
inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    shared_params=[fd.SharedParams(types=["pendula", "pubescens"], params=["eps", "S_d"])],
    do_bootstrap=True
)

# run inference
inf.run()

params: Union[List[str], Literal['all']] = 'all'#: The params to share

types: Union[List[str], Literal['all']] = 'all'#: The types to share

__init__(params: List[str] | Literal['all'] = 'all', types: List[str] | Literal['all'] = 'all')#

class Covariate(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')[source]#

Bases: object

Class defining a covariate which induces a relationship with one or many parameters. The relationship is defined by a callback function which modifies the parameters. The default callback introduces a linear relationship.

Below an example of introducing linear covariates for S_d, the for mean strength of deleterious mutations (cf. GammaExpParametrization). Each of the three types is associated with one covariate. This we pass to JointInference together with the stratified spectra:

import fastdfe as fd

cov = fd.Covariate(
    param='S_d',
    values=dict(type1=5, type2=3, type3=1)
)

param: str#: The parameter to modify

values: Dict[str, float]#: The values of the covariate for each type

callback: Optional[Callable] = None#: The callback function to modify the parameters

bounds: tuple = (0.0001, 10000.0)#: The bounds of the covariate parameter to be estimated

x0: float = 0#: The initial value of the covariate

bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog'#: The scale of the bounds. See scale_value() for details

apply(covariate: float, type: str, params: Dict[str, float])[source]#

Apply the custom or default callback to modify the given parameters.

Parameters:

covariate (float) – The value of the covariate.
type (str) – The type of the relationship.
params (Dict[str, float]) – The input parameters.

Return type:

Dict[str, float]

Returns:

Modified parameters.

apply_default(covariate: float, type: str, params: Dict[str, float])[source]#

Modify the given parameters introducing a linear relationship with the given covariate.

Parameters:

covariate (float) – The value of the covariate.
type (str) – The type of the relationship.
params (Dict[str, float]) – The input parameters.

Return type:

Dict[str, float]

Returns:

Modified parameters.

__init__(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')#

Inference

Contents

Inference#