DFE Inference#
- class BaseInference(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#
Bases:
AbstractInferenceBase inference class for inferring the SFS given one neutral and one selected SFS. Note that BaseInference is by default seeded.
Example usage:
import fastdfe as fd sfs_neut = fd.Spectrum([177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652]) sfs_sel = fd.Spectrum([797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794]) # create inference object inf = fd.BaseInference( sfs_neut=sfs_neut, sfs_sel=sfs_sel, n_runs=10, n_bootstraps=100, do_bootstrap=True ) # run inference inf.run() # plot discretized DFE inf.plot_discretized() # plot SFS comparison inf.plot_sfs_comparison()
- __init__(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#
Create BaseInference instance.
- Parameters:
sfs_neut (
Spectra|Spectrum) – The neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If aSpectraobject with more than one SFS is provided, theallattribute will be used.sfs_sel (
Spectra|Spectrum) – The selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If aSpectraobject with more than one SFS is provided, theallattribute will be used.intervals_del (
Tuple[float,float,int]) –(start, stop, n_interval)for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to100provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.intervals_ben (
Tuple[float,float,int]) – Same asintervals_delbut for positive selection coefficients. Decreasing the number of intervals to100provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.intervals_h (
Tuple[float,float,int]) –(start, stop, n_interval)for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values ofhbetween the edges will be interpolated linearly.h_callback (
Callable[[ndarray],ndarray]) – A function mapping the scalar parameterhand the array of selection coefficientsSto dominance coefficients of the same shape, allowing models wherehdepends onS. The default islambda h, S: np.full_like(S, h), keepinghconstant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values inintervals_h. The inferred parameter is still namedh, even if transformed byh_callback, and its bounds, scales, and initial values can be set viabounds,scales, andx0. The fitness of heterozygotes and mutation homozygotes is defined as1 + 2hsand1 + 2s, respectively.integration_mode (
Literal['midpoint','quad']) – Integration mode when computing expected SFS under semidominance.quadis not recommended.linearized (
bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or usescipy.integrate.quadin each call.Falsenot recommended.model (
Parametrization|str) – Instance of DFEParametrization which parametrized the DFEseed (
int) – Seed for the random number generator. UseNonefor no seed.x0 (
Dict[str,Dict[str,float]]) – Dictionary of initial values in the form{'all': {param: value}}bounds (
Dict[str,Tuple[float,float]]) – Bounds for the optimization in the form{param: (lower, upper)}scales (
Dict[str,Literal['lin','log','symlog']]) – Scales for the optimization in the form{param: scale}loss_type (
Literal['likelihood','L2']) – Type of loss function to use for optimization.opts_mle (
dict) – Options for the optimization.method_mle (
str) – Method to use for optimization. Seescipy.optimize.minimizefor available methods.n_runs (
int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.fixed_params (
Dict[str,Dict[str,float]]) – Parameters kept constant during optimization, given as{'all': {param: value}}. By default, the ancestral misidentification rateepsis fixed to zero,hto 0.5, and the DFE parameters are fixed so that only the deleterious component of the DFE is estimated.do_bootstrap (
bool) – Whether to do bootstrapping.n_bootstraps (
int) – Number of bootstraps.n_bootstrap_retries (
int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.pbar – Whether to show a progress bar.
parallelize (
bool) – Whether to parallelize computations.folded (
bool) – Whether the SFS are folded. If not specified, the SFS will be folded if both of the given SFS appear to be folded.discretization (
Discretization) – Discretization instance. Mainly intended for internal use.optimization (
Optimization) – Optimization instance. Mainly intended for internal use.locked (
bool) – Whether inferences can be run from the class itself. Used internally.kwargs – Additional keyword arguments which are ignored.
- sfs_neut: Spectrum#
Neutral SFS
- sfs_sel: Spectrum#
Selected SFS
- model: Parametrization#
The DFE parametrization
- folded: bool#
Whether the SFS are folded
- n: int#
Sample size
- fixed_params: Dict[str, Dict[str, float]]#
Fixed parameters
- discretization: Discretization#
Discretization instance
- theta: float#
Estimate of theta from neutral SFS
- params_mle: Optional[Dict[str, float]]#
MLE estimates of the initial optimization
- sfs_mle: Optional[Spectrum]#
Modelled MLE SFS
- likelihood: Optional[float]#
Likelihood of the MLE, this value may be updated after bootstrapping
- likelihoods: Optional[List[float]]#
Likelihoods of the different ML runs, controlled by
n_runs
- n_runs: int#
Number of MLE runs to perform
-
runs:
Optional[DataFrame] = None# Dataframe holding information on all optimization runs
- result: Optional['scipy.optimize.OptimizeResult']#
Numerical optimization result
- do_bootstrap: bool#
Whether to do bootstrapping
- n_bootstraps: int#
Number of bootstraps
- n_bootstrap_retries: int#
Number of optimization runs for each bootstrap sample
- parallelize: bool#
Whether to parallelize computations
- scales: Dict[str, Literal['lin', 'log', 'symlog']]#
Parameter scales
- bounds: Dict[str, Tuple[float, float]]#
Parameter bounds
- optimization: Optimization#
Optimization instance
- x0: Dict[str, Dict[str, float]]#
Initial values
- bootstraps: Optional[pd.DataFrame]#
Bootstrapped MLE parameter estimates
- bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#
Bootstrap optimization results
- L2_residual: Optional[float]#
L2 norm of differential between modelled and observed SFS
- seed: int | None#
Random number generator seed
- rng: np.random.Generator#
Random generator instance
- execution_time: float#
Total execution time in seconds
- locked: bool#
Whether inferences can be run from the class itself
- run_if_required(*args, **kwargs)[source]#
Run if not run yet.
- Parameters:
args – Arguments.
kwargs – Keyword arguments.
- Return type:
Optional[Spectrum]- Returns:
DFE parametrization and modelled SFS.
- run(do_bootstrap: bool = None, pbar: bool = None, **kwargs)[source]#
Perform the DFE inference.
- Parameters:
pbar (
bool) – Whether to show a progress bar.do_bootstrap (
bool) – Whether to perform bootstrapping.kwargs – Additional keyword arguments which are ignored.
- Return type:
- Returns:
Modelled SFS.
- get_counts()[source]#
Get callback functions for modelling SFS counts from given parameters.
- Return type:
dict- Returns:
Callback functions for modelling SFS counts for each type.
- evaluate_likelihood(params: Dict[str, Dict[str, float]])[source]#
Evaluate likelihood function at given parameters.
- Parameters:
params (
Dict[str,Dict[str,float]]) – Parameters indexed by type and parameter name.- Return type:
float- Returns:
The likelihood.
- get_residual(k: int)[source]#
Residual of the modelled and observed SFS.
- Parameters:
k (
int) – Order of the norm- Return type:
float- Returns:
L2 residual
- bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = False, n_retries: int = None, pbar: bool = True, pbar_title: str = 'Bootstrapping')[source]#
Perform the parametric bootstrap.
- Parameters:
n_samples (
int) – Number of bootstrap samples. Defaults ton_bootstraps.parallelize (
bool) – Whether to parallelize the bootstrap. Defaults toparallelize.update_likelihood (
bool) – Whether to update the likelihood to be the mean of the bootstrap samples and the original likelihood. This is not statistically valid but can stabilize maximum likelihood estimate subject optimization noise.n_retries (
int) – Number of optimization runs for each bootstrap sample. Defaults ton_bootstrap_retries.pbar (
bool) – Whether to show a progress bar.pbar_title (
str) – Title for the progress bar.
- Return type:
DataFrame- Returns:
Dataframe with bootstrap results.
- get_scales_linear()[source]#
Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Literal['lin']]- Returns:
Dictionary of scales.
- get_bounds_linear()[source]#
Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Tuple[float,float]]- Returns:
Dictionary of bounds.
- plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', title: str = 'DFE', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot continuous DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using
scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.- Parameters:
title (
str) – Plot title.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.ci_level (
float) – Confidence level for confidence intervals.confidence_intervals (
bool) – Whether to plot confidence intervals.file (
str) – File to save plot to.show (
bool) – Whether to show plot.scale (
Literal['lin','log','symlog']) – y-scalescale_density (
bool) – Whether to scale the density by the x-axis interval sizeintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)[source]#
Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)[source]#
Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!
- Parameters:
color (
str) – Color of the bars.file (
str) – File to save plot to.show (
bool) – Whether to show plot.interval_labels (
List[str]) – Labels for the intervals.intervals (
Sequence) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#
Plot SFS comparison.
- Parameters:
file (
str) – File to save plot to.labels (
List[str]) – Labels for the SFS.colors (
List[str]) – Colors for the SFS. Only for Python visualization backend.sfs_types (
List[Literal['modelled','observed','selected','neutral']]) – Types of SFS to plot.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if
use_subplotsisFalse.title (
str) – Plot titleuse_subplots (
bool) – Whether to use subplots. Only for Python visualization backend.show_monomorphic (
bool) – Whether to show monomorphic countskwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)[source]#
Plot neutral and selected SFS.
- Parameters:
file (
str) – File to save plot to.labels (
List[str]) – Labels for the SFS.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_all(show: bool = True)[source]#
Plot a bunch of plots.
- Parameters:
show (
bool) – Whether to show plot.
- get_stats_discretized(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#
Get statistics for the discretized DFE.
- Parameters:
ci_level (
float) – Confidence interval level.intervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Tuple[ndarray,ndarray,ndarray,ndarray,ndarray]- Returns:
Center values, errors around center, and confidence intervals.
- plot_inferred_parameters(confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#
Visualize the inferred parameters and their confidence intervals.
- Parameters:
scale (
Literal['lin','log','symlog']) – y-scale of the plot.title (
str) – Plot title.confidence_intervals (
bool) – Whether to show confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.ci_level (
float) – Confidence level for the confidence intervals.file (
str) – File to save plot to.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#
Visualize the inferred parameters and their confidence intervals.
- Parameters:
title (
str) – Plot title.file (
str) – File to save plot to.show (
bool) – Whether to show plot.
- Return type:
plt.Axes
- Returns:
Axes object
- Raises:
ValueError – If no inference objects are given or no bootstraps are found.
- plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)[source]#
Visualize the likelihoods of the optimization runs using a scatter plot.
- Parameters:
scale (
Literal['lin','log']) – y-scale of the plot.title (
str) – Plot title.file (
str) – File to save plot to.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (
str) – Label for the y-axis.
- Return type:
plt.Axes
- Returns:
Axes object
- lrt(ll_simple: float, ll_complex: float, df: int, m: int = 0)[source]#
Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).
- Parameters:
ll_simple (
float) – Log-likelihood of the simple model.ll_complex (
float) – Log-likelihood of the complex model.df (
int) – Total number of parameters being tested.m (
int) – Number of parameters at the boundary.
- Return type:
float- Returns:
p-value
- compare_nested(complex: BaseInference)[source]#
Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters. Low p-values indicate that the more complex model provides a significantly better fit.
- Parameters:
complex (
BaseInference) – More complex model.- Return type:
float|None- Returns:
p-value or None if the models are not nested.
- Raises:
ValueError – If the inference objects are not nested.
- compare_nested_models(do_bootstrap: bool = False)[source]#
Compare the various nested versions of the specified model using likelihood ratio tests. We check for inclusion of ancestral misidentification and beneficial mutations.
- Parameters:
do_bootstrap (
bool) – Whether to perform bootstrapping.- Return type:
Tuple[ndarray,Dict[str,BaseInference]]- Returns:
Matrix of p-values, dict of base inference objects
- plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = False)[source]#
Plot the p-values of nested likelihoods.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.remove_empty (
bool) – Whether to remove empty rows and columns.transpose (
bool) – Whether to transpose the matrix.cmap (
str) – Colormap to use.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (
bool) – Whether to perform bootstrapping.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', title: str = 'discretized DFE', ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}})#
Plot discretized DFE.
- Parameters:
file (
str) – File to save the plot toshow (
bool) – Whether to show the plotintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bars.confidence_intervals (
bool) – Whether to plot confidence intervalsci_level (
float) – Confidence interval levelbootstrap_type (
Literal['percentile','bca']) – Type of bootstrappoint_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.title (
str) – Title of the plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes
- get_dfe()[source]#
Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.
- Return type:
- Returns:
DFE object
- get_spectra()[source]#
Return a Spectra object containing the neutral, selected and modelled SFS.
- Return type:
- Returns:
Spectra object with types
neutral,selected, andmodelled.
- get_alpha(params: dict = None)[source]#
Get alpha, the proportion of beneficial non-synonymous substitutions.
- Parameters:
params (
dict) – DFE Parameters to use for calculation.- Return type:
float
- property alpha: float#
Cache alpha, the proportion of beneficial non-synonymous substitutions.
- get_bootstrap_params()[source]#
Get the parameters to be included in the bootstraps.
- Return type:
Dict[str,float]- Returns:
Parameters to be included in the bootstraps.
- get_bootstrap_param_names()[source]#
Get the parameters to be included in the bootstraps.
- Return type:
List[str]- Returns:
Parameters to be included in the bootstraps.
- get_optimized_param_names()[source]#
Get the parameters names for the parameters that are optimized.
- Return type:
List[str]- Returns:
List of parameter names.
- get_n_optimized()[source]#
Get the number of parameters that are optimized.
- Return type:
int- Returns:
Number of parameters that are optimized.
- create_config()[source]#
Create a config object from the inference object.
- Return type:
- Returns:
Config object.
- classmethod from_config(config: Config)[source]#
Load from config object.
- Parameters:
config (
Config) – Config object.- Return type:
Self- Returns:
Inference object.
- classmethod from_config_file(file: str, cache: bool = True)[source]#
Load from config file.
- Parameters:
file (
str) – Config file path, possibly URL.cache (
bool) – Whether to use the cache if available.
- Return type:
Self- Returns:
Inference object.
- get_x0()[source]#
Get initial values.
- Return type:
Dict[str,Dict[str,float]]- Returns:
Initial values.
- property param_names: List[str]#
Parameter names.
- Returns:
List of parameter names.
- classmethod from_file(file: str, classes=None)#
Load object from file.
- Parameters:
classes – Classes to be used for unserialization.
file (
str) – File to load from
- Return type:
Self
- classmethod from_json(json: str, classes=None)#
Unserialize object.
- Parameters:
classes – Classes to be used for unserialization
json (
str) – JSON string
- Return type:
Self
- get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')#
Get discretized DFE.
- Parameters:
intervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bins.confidence_intervals (
bool) – Whether to return confidence intervalsci_level (
float) – Confidence interval levelbootstrap_type (
Literal['percentile','bca']) – Type of bootstrappoint_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Tuple[ndarray,Optional[ndarray]]- Returns:
Array of values and array of deviations
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File to save to
- class JointInference(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#
Bases:
BaseInferenceEnabling the sharing of parameters among several Inference objects.
Example usage:
import fastdfe as fd # neutral SFS for two types sfs_neut = fd.Spectra(dict( pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652], pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31] )) # selected SFS for two types sfs_sel = fd.Spectra(dict( pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794], pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41] )) # create inference object inf = fd.JointInference( sfs_neut=sfs_neut, sfs_sel=sfs_sel, do_bootstrap=True ) # run inference inf.run()
- __init__(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), intervals_h: Tuple[float, float, int] = (0.0, 1.0, 21), h_callback: Callable[[ndarray], ndarray] = None, integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, method_mle: str = 'L-BFGS-B', n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = None, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = True, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#
Create instance.
- Parameters:
sfs_neut (
Spectra) – Neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.sfs_sel (
Spectra) – Selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.include_divergence (
bool) – Whether to include divergence in the likelihoodintervals_del (
Tuple[float,float,int]) –(start, stop, n_interval)for deleterious population-scaled selection coefficients. The intervals will be log10-spaced. Decreasing the number of intervals to100provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.intervals_ben (
Tuple[float,float,int]) – Same asintervals_delbut for positive selection coefficients. Decreasing the number of intervals to100provides nearly identical results while increasing speed, especially when precomputing across dominance coefficients.intervals_h (
Tuple[float,float,int]) –(start, stop, n_interval)for dominance coefficients which are linearly spaced. This is only used when inferring dominance coefficients. Values ofhbetween the edges will be interpolated linearly.h_callback (
Callable[[ndarray],ndarray]) – A function mapping the scalar parameterhand the array of selection coefficientsSto dominance coefficients of the same shape, allowing models wherehdepends onS. The default islambda h, S: np.full_like(S, h), keepinghconstant. Expected allele counts for a given dominance value are obtained by linear interpolation between precomputed values inintervals_h. The inferred parameter is still namedh, even if transformed byh_callback, and its bounds, scales, and initial values can be set viabounds,scales, andx0. The fitness of heterozygotes and mutation homozygotes is defined as1 + 2hsand1 + 2s, respectively.integration_mode (
Literal['midpoint','quad']) – Integration mode when computing expected SFS under semidominance.quadis not recommended.linearized (
bool) – Whether to discretize and cache the linearized integral mapping DFE to SFS or usescipy.integrate.quadin each call.Falsenot recommended.model (
Parametrization|str) – DFE parametrizationseed (
int) – Random seed. UseNonefor no seed.x0 (
Dict[str,Dict[str,float]]) – Dictionary of initial values in the form{type: {param: value}}bounds (
Dict[str,Tuple[float,float]]) – Bounds for the optimization in the form{param: (lower, upper)}scales (
Dict[str,Literal['lin','log','symlog']]) – Scales for the optimization in the form{param: scale}loss_type (
Literal['likelihood','L2']) – Loss typeopts_mle (
dict) – Options for the optimizationmethod_mle (
str) – Method to use for optimization. Seescipy.optimize.minimizefor available methods.n_runs (
int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.fixed_params (
Dict[str,Dict[str,float]]) – Parameters kept constant during optimization, given as{type: {param: value}}. By default, the ancestral misidentification rateepsis fixed to zero,hto 0.5, and the DFE parameters are fixed so that only the deleterious component of the DFE is estimated.shared_params (
List[SharedParams]) – List of shared parametersdo_bootstrap (
bool) – Whether to perform bootstrappingn_bootstraps (
int) – Number of bootstrapsn_bootstrap_retries (
int) – Number of optimization runs for each bootstrap sample. This parameter previously defined the number of retries per bootstrap sample when subsequent runs failed, but now it defines the total number of runs per bootstrap sample, taking the most likely one.parallelize (
bool) – Whether to parallelize computationsfolded (
bool) – Whether the SFS are folded. If not specified, the SFS will be folded if all of the given SFS appear to be folded.discretization (
Discretization) – Discretization instance. Mainly intended for internal use.optimization (
Optimization) – Optimization instance. Mainly intended for internal use.locked (
bool) – Whether inferences can be run from the class itself. Used internally.kwargs – Additional keyword arguments which are ignored.
- types: List[str]#
SFS types
- params_mle_raw: Optional[Dict[str, Dict[str, float]]]#
original MLE parameters before adding covariates and unpacked shared
Shared parameters with expanded ‘all’ type
- covariates: Dict[str, Covariate]#
Covariates indexed by parameter name
- fixed_params: Dict[str, Dict[str, float]]#
fixed parameters with expanded ‘all’ type
- discretization: Discretization#
Discretization instance
- marginal_inferences: Dict[str, BaseInference]#
Dictionary of marginal inferences indexed by type
- scales: Dict[str, Literal['lin', 'log', 'symlog']]#
parameter scales
- bounds: Dict[str, Tuple[float, float]]#
parameter bounds
- optimization: Optimization#
Joint optimization instance
- joint_inferences: Dict[str, BaseInference]#
Joint inference object indexed by type
Check if the shared parameters were specified correctly.
Check that no shared parameters are fixed and raise an error otherwise.
Get the names of the shared parameters.
- Return type:
List[str]- Returns:
Names of the shared parameters.
- run(**kwargs)[source]#
Run inference.
- Parameters:
kwargs – Additional keyword arguments
- Return type:
- Returns:
Modelled SFS.
- marginals_without_all()[source]#
Get marginal inference without ‘all’ type.
- Return type:
Dict[str,BaseInference]- Returns:
Dict of marginal inferences indexed by type.
- joint_inference_makes_sense()[source]#
Check if joint inference makes sense.
- Return type:
bool- Returns:
Whether joint inference makes sense.
- get_counts()[source]#
Get callback functions for modelling SFS counts from given parameters.
- Return type:
dict- Returns:
Dict of callback functions indexed by type.
- run_joint_without_covariates(do_bootstrap: bool = True)[source]#
Run joint inference without covariates. Note that the result of this function is cached.
- Return type:
- Returns:
Joint inference instance devoid of covariates.
- create_config()[source]#
Create a config object from the inference object.
- Return type:
- Returns:
Config object.
- perform_lrt_covariates(do_bootstrap: bool = False)[source]#
Perform likelihood ratio test against joint inference without covariates. In the simple model we share parameters across types. Low p-values indicate that including covariates significantly improves the model fit.
To access the JointInference object without covariates, you can call
run_joint_without_covariates(), which is cached.- Parameters:
do_bootstrap (
bool) – Whether to bootstrap.- Return type:
float- Returns:
Likelihood ratio test p-value.
- bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = False, n_retries: int = None, pbar: bool = True, pbar_title: str = 'Bootstrapping joint inference')[source]#
Perform the parametric bootstrap both for the marginal and joint inferences.
- Parameters:
n_samples (
int) – Number of bootstrap samples. Defaults ton_bootstraps.parallelize (
bool) – Whether to parallelize the bootstrap. Defaults toparallelize.update_likelihood (
bool) – Whether to update the likelihood to be the mean of the bootstrap samples and the original likelihood. This is not statistically valid but can stabilize maximum likelihood estimate subject optimization noise.n_retries (
int) – Number of optimization runs for each bootstrap sample. Defaults ton_bootstrap_retries.pbar (
bool) – Whether to show a progress bar.pbar_title (
str) – Title for the progress bar.
- Return type:
Optional[DataFrame]- Returns:
DataFrame with bootstrap samples
- get_x0()[source]#
Get initial values for joint inference.
- Return type:
Dict[str,Dict[str,float]]- Returns:
Dictionary of initial values indexed by type
Compare likelihood of joint inference with product of marginal likelihoods. This provides information about the goodness of fit achieved by the parameter sharing. Low p-values indicate that parameter sharing is not justified, i.e., that the marginal inferences provide a significantly better fit to the data. Note that it is more difficult to properly optimize the joint likelihood, which makes this test conservative, i.e., the reported p-value may be larger than what it should be.
- Parameters:
do_bootstrap (
bool) – Whether to perform bootstrapping.- Return type:
float- Returns:
p-value
- get_inferences(types: List[str] = None, labels: List[str] = None, show_marginals: bool = True)[source]#
Get all inference objects as dictionary.
- Parameters:
types (
List[str]) – Types to includelabels (
List[str]) – Labels for typesshow_marginals (
bool) – Whether to also show marginal inferences
- Return type:
Dict[str,BaseInference]- Returns:
Dictionary of base inference objects indexed by type and joint vs marginal subtypes
- plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), show_marginals: bool = True, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'original', title: str = 'discretized DFE comparison', labels: List[str] = None, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot discretized DFE comparing the different types.
- Parameters:
labels (
List[str]) – Labels for typestitle (
str) – Title of plotintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bars.show_marginals (
bool) – Whether to also show marginal inferencesbootstrap_type (
Literal['percentile','bca']) – Type of bootstrappoint_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.ci_level (
float) – Confidence levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save plot toshow (
bool) – Whether to show plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], types: List[str] = None, labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#
Plot SFS comparison.
- Parameters:
types (
List[str]) – Types to plotfile (
str) – File to save plot tolabels (
List[str]) – Labels for types.colors (
List[str]) – Colors for types. Only for Python visualization backend.sfs_types (
List[Literal['modelled','observed','selected','neutral']]) – Types of SFS to plotshow (
bool) – Whether to show plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if
use_subplotsisFalse.title (
str) – Plot titleuse_subplots (
bool) – Whether to use subplots. Only for Python visualization backend.show_monomorphic (
bool) – Whether to show monomorphic countskwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'DFE comparison', labels: List[str] = None, scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot discretized DFE. The special constants
np.infand-np.infare also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size usingscale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.- Parameters:
scale_density (
bool) – Whether to scale the density by the x-axis interval sizescale (
Literal['lin','log','symlog']) – y-scalelabels (
List[str]) – Labels for typestitle (
str) – Title of plotbootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save plot toshow (
bool) – Whether to show plotintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_inferred_parameters(file: str = None, confidence_intervals: bool = True, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, ax: plt.Axes = None, scale: Literal['lin', 'log', 'symlog'] = 'log', kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs: List[str])[source]#
Plot discretized DFE comparing the different types.
- Parameters:
labels (
List[str]) – Labels for typestitle (
str) – Title of plotbootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save plot toshow (
bool) – Whether to show plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
scale (
Literal['lin','log','symlog']) – y-scalekwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, **kwargs: List[str])[source]#
Plot discretized DFE comparing the different types.
- Parameters:
labels (
List[str]) – Labels for typestitle (
str) – Title of plotfile (
str) – File to save plot toshow (
bool) – Whether to show plot
- Return type:
plt.Axes
- Returns:
Axes object
- Raises:
ValueError – If no inference objects are given or no bootstraps are found.
- plot_covariate(index: int = 0, file: str = None, show: bool = True, title: str = None, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'original', show_types: bool = True, ci_level: float = 0.05, xlabel: str = 'cov', ylabel: str = None, ax: plt.Axes = None)[source]#
Plot the covariate given by the index.
- Parameters:
index (
int) – The index of the covariate.file (
str) – File to save plot to.show (
bool) – Whether to show plot.title (
str) – Plot title.bootstrap_type (
Literal['percentile','bca']) – Bootstrap type.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.show_types (
bool) – Whether to show types on second x-axis.ci_level (
float) – Confidence level.xlabel (
str) – X-axis label.ylabel (
str) – Y-axis label, defaults to the covariate parameter name.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object.
- get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#
Get discretized DFEs.
- Parameters:
intervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bins.confidence_intervals (
bool) – Whether to return confidence intervalsci_level (
float) – Confidence interval levelbootstrap_type (
Literal['percentile','bca']) – Type of bootstrappoint_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Dict[str,Tuple[ndarray,ndarray]]- Returns:
Dictionary of array of values and array of errors indexed by inference type
- get_bootstrap_params()[source]#
Get bootstrap parameters.
- Return type:
Dict[str,float]- Returns:
Bootstrap parameters
- to_json()[source]#
Serialize object. Note that the deserialized inference objects no longer share the same optimization instance among other things.
- Return type:
str- Returns:
JSON string
- property alpha: float | None#
The is no single alpha for the joint inference. Please refer to the
self.joint_inferences[t].alpha.- Returns:
None
- get_residual(k: int)[source]#
Residual of joint inference. We calculate the residual over the jointly inferred SFS for all types.
- Parameters:
k (
int) – Order of the norm- Return type:
float- Returns:
L2 residual
- check_fixed_params_exist()#
Check that the fixed parameters are valid.
- compare_nested(complex: BaseInference)#
Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters. Low p-values indicate that the more complex model provides a significantly better fit.
- Parameters:
complex (
BaseInference) – More complex model.- Return type:
float|None- Returns:
p-value or None if the models are not nested.
- Raises:
ValueError – If the inference objects are not nested.
- compare_nested_models(do_bootstrap: bool = False)#
Compare the various nested versions of the specified model using likelihood ratio tests. We check for inclusion of ancestral misidentification and beneficial mutations.
- Parameters:
do_bootstrap (
bool) – Whether to perform bootstrapping.- Return type:
Tuple[ndarray,Dict[str,BaseInference]]- Returns:
Matrix of p-values, dict of base inference objects
- evaluate_likelihood(params: Dict[str, Dict[str, float]])#
Evaluate likelihood function at given parameters.
- Parameters:
params (
Dict[str,Dict[str,float]]) – Parameters indexed by type and parameter name.- Return type:
float- Returns:
The likelihood.
- classmethod from_config(config: Config)#
Load from config object.
- Parameters:
config (
Config) – Config object.- Return type:
Self- Returns:
Inference object.
- classmethod from_config_file(file: str, cache: bool = True)#
Load from config file.
- Parameters:
file (
str) – Config file path, possibly URL.cache (
bool) – Whether to use the cache if available.
- Return type:
Self- Returns:
Inference object.
- classmethod from_file(file: str, classes=None)#
Load object from file.
- Parameters:
classes – Classes to be used for unserialization.
file (
str) – File to load from
- Return type:
Self
- classmethod from_json(json: str, classes=None)#
Unserialize object.
- Parameters:
classes – Classes to be used for unserialization
json (
str) – JSON string
- Return type:
Self
- get_alpha(params: dict = None)[source]#
Get alpha, the proportion of beneficial non-synonymous substitutions.
- Parameters:
params (
dict) – DFE parameters to use for calculation of format {type: {param: value}}.- Return type:
float
- get_bootstrap_param_names()#
Get the parameters to be included in the bootstraps.
- Return type:
List[str]- Returns:
Parameters to be included in the bootstraps.
- get_bounds_linear()#
Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Tuple[float,float]]- Returns:
Dictionary of bounds.
- get_dfe()#
Return a frozen DFE object containing the inferred parameters and (if available) the bootstrap samples.
- Return type:
- Returns:
DFE object
- get_fixed_param_names()#
Get the names of the fixed parameters.
- Return type:
List[str]
- get_n_optimized()#
Get the number of parameters that are optimized.
- Return type:
int- Returns:
Number of parameters that are optimized.
- get_optimized_param_names()#
Get the parameters names for the parameters that are optimized.
- Return type:
List[str]- Returns:
List of parameter names.
- get_scales_linear()#
Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Literal['lin']]- Returns:
Dictionary of scales.
- get_spectra()#
Return a Spectra object containing the neutral, selected and modelled SFS.
- Return type:
- Returns:
Spectra object with types
neutral,selected, andmodelled.
- get_stats_discretized(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')#
Get statistics for the discretized DFE.
- Parameters:
ci_level (
float) – Confidence interval level.intervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Tuple[ndarray,ndarray,ndarray,ndarray,ndarray]- Returns:
Center values, errors around center, and confidence intervals.
- get_summary()#
Get summary.
- Return type:
- Returns:
Inference results.
- lrt(ll_simple: float, ll_complex: float, df: int, m: int = 0)#
Perform a likelihood ratio test (LRT) with correction for boundary constraints as described in Molenberghs et Verbeke (2007) (https://doi.org/10.2307/1403361).
- Parameters:
ll_simple (
float) – Log-likelihood of the simple model.ll_complex (
float) – Log-likelihood of the complex model.df (
int) – Total number of parameters being tested.m (
int) – Number of parameters at the boundary.
- Return type:
float- Returns:
p-value
- property param_names: List[str]#
Parameter names.
- Returns:
List of parameter names.
- plot_all(show: bool = True)#
Plot a bunch of plots.
- Parameters:
show (
bool) – Whether to show plot.
- plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)#
Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)#
Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!
- Parameters:
color (
str) – Color of the bars.file (
str) – File to save plot to.show (
bool) – Whether to show plot.interval_labels (
List[str]) – Labels for the intervals.intervals (
Sequence) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)#
Visualize the likelihoods of the optimization runs using a scatter plot.
- Parameters:
scale (
Literal['lin','log']) – y-scale of the plot.title (
str) – Plot title.file (
str) – File to save plot to.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (
str) – Label for the y-axis.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = False)#
Plot the p-values of nested likelihoods.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.remove_empty (
bool) – Whether to remove empty rows and columns.transpose (
bool) – Whether to transpose the matrix.cmap (
str) – Colormap to use.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (
bool) – Whether to perform bootstrapping.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)#
Plot neutral and selected SFS.
- Parameters:
file (
str) – File to save plot to.labels (
List[str]) – Labels for the SFS.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- raise_if_locked()#
Raise an error if this object is locked.
- Raises:
Exception –
- run_if_required(*args, **kwargs)#
Run if not run yet.
- Parameters:
args – Arguments.
kwargs – Keyword arguments.
- Return type:
Optional[Spectrum]- Returns:
DFE parametrization and modelled SFS.
- runs: Optional[pd.DataFrame] = None#
Dataframe holding information on all optimization runs
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File to save to
- sfs_neut: Spectrum#
Neutral SFS
- sfs_sel: Spectrum#
Selected SFS
- model: Parametrization#
The DFE parametrization
- folded: bool#
Whether the SFS are folded
- n: int#
Sample size
- theta: float#
Estimate of theta from neutral SFS
- params_mle: Optional[Dict[str, float]]#
MLE estimates of the initial optimization
- sfs_mle: Optional[Spectrum]#
Modelled MLE SFS
- likelihood: Optional[float]#
Likelihood of the MLE, this value may be updated after bootstrapping
- likelihoods: Optional[List[float]]#
Likelihoods of the different ML runs, controlled by
n_runs
- n_runs: int#
Number of MLE runs to perform
- result: Optional['scipy.optimize.OptimizeResult']#
Numerical optimization result
- do_bootstrap: bool#
Whether to do bootstrapping
- n_bootstraps: int#
Number of bootstraps
- n_bootstrap_retries: int#
Number of optimization runs for each bootstrap sample
- parallelize: bool#
Whether to parallelize computations
- x0: Dict[str, Dict[str, float]]#
Initial values
- bootstraps: Optional[pd.DataFrame]#
Bootstrapped MLE parameter estimates
- bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#
Bootstrap optimization results
- L2_residual: Optional[float]#
L2 norm of differential between modelled and observed SFS
- seed: int | None#
Random number generator seed
- rng: np.random.Generator#
Random generator instance
- execution_time: float#
Total execution time in seconds
- locked: bool#
Whether inferences can be run from the class itself
- class Inference[source]#
Bases:
objectStatic utility methods for inference objects.
- static plot_discretized(inferences: List[AbstractInference], intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', file: str = None, show: bool = True, title: str = 'discretized DFEs', labels: Sequence = None, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#
Visualize several discretized DFEs given by the list of inference objects.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.intervals (
Sequence) – Intervals over(-inf, inf)to use for discretization.confidence_intervals (
bool) – Whether to plot confidence intervals.ci_level (
float) – Confidence level for confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.labels (
Sequence) – Labels for the DFEs.kwargs – Additional arguments for the plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- static plot_continuous(inferences: List[AbstractInference], intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'continuous DFEs', labels: Sequence = None, scale: Literal['lin', 'log', 'symlog'] = 'lin', scale_density: bool = False, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#
Visualize several DFEs given by the list of inference objects. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using
scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.intervals (
ndarray) – Intervals to use for discretization.confidence_intervals (
bool) – Whether to plot confidence intervals.ci_level (
float) – Confidence level for confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.labels (
Sequence) – Labels for the DFEs.scale (
Literal['lin','log','symlog']) – y-scale of the plot.scale_density (
bool) – Whether to scale the density by the x-axis interval size.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.kwargs – Additional arguments for the plot.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- static plot_inferred_parameters(inferences: List[AbstractInference], labels: Sequence, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#
Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Unique labels for the DFEs.scale (
Literal['lin','log','symlog']) – y-scale of the plot.confidence_intervals (
bool) – Whether to plot confidence intervals.ci_level (
float) – Confidence level for confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.kwargs – Additional arguments which are ignored.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- Raises:
ValueError – If no inference objects are given.
- static plot_inferred_parameters_boxplot(inferences: List[AbstractInference], labels: Sequence, file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#
Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Unique labels for the DFEs.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.kwargs – Additional arguments for the plot.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- Raises:
ValueError – If no inference objects are given or no bootstraps are found.
- static get_errors_params_mle(ci_level: float, confidence_intervals: bool, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence, bootstrap_type: Literal['percentile', 'bca'], point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#
Get errors and values for MLE params of inferences.
- Parameters:
ci_level (
float) – Confidence level for confidence intervals.confidence_intervals (
bool) – Whether to compute confidence intervals.inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Labels for the inferences.param_names (
Sequence) – Names of the parameters to get errors and values for.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Tuple[Dict[str,Optional[Tuple[ndarray,ndarray]]],Dict[str,ndarray]]- Returns:
dictionary of errors and dictionary of center values indexed by labels.
- static get_discretized(inferences: List[AbstractInference], labels: Sequence, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#
Get values and errors of discretized DFE.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Labels for the DFEs.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to useci_level (
float) – Confidence interval levelconfidence_intervals (
bool) – Whether to compute confidence intervalsintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bars.point_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Tuple[Dict[str,ndarray],Dict[str,Optional[ndarray]]]- Returns:
Dictionary of values and dictionary of errors indexed by labels.
- static get_stats_discretized(params: dict, bootstraps: DataFrame, model: Parametrization | str, ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile', point_estimate: Literal['original', 'mean', 'median'] = 'mean')[source]#
Compute errors and confidence interval for a discretized DFE.
- Parameters:
params (
dict) – Parameters of the modelbootstraps (
DataFrame) – Bootstrapped samplesmodel (
Parametrization|str) – DFE parametrizationci_level (
float) – Confidence interval levelintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrappoint_estimate (
Literal['original','mean','median']) – Whether to use ‘original’ MLE values, ‘mean’ or ‘median’ of bootstraps as point estimate.
- Return type:
Tuple[ndarray,ndarray,ndarray]- Returns:
Center values, errors around center, and confidence intervals.
- static compute_histogram(model: Parametrization | str, params: dict, intervals: ndarray)[source]#
Discretize the DFE given a DFE parametrization and its parameter values.
- Parameters:
model (
Parametrization|str) – DFE parametrizationparams (
dict) – Parameters of the modelintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.
- Return type:
ndarray- Returns:
Discretized DFE
- class InferenceResult(inference: BaseInference)[source]#
Bases:
SerializableInference result.
- __init__(inference: BaseInference)[source]#
Initialize object.
- Parameters:
inference (
BaseInference) – Inference object.
- classmethod from_file(file: str, classes=None)#
Load object from file.
- Parameters:
classes – Classes to be used for unserialization.
file (
str) – File to load from
- Return type:
Self
- classmethod from_json(json: str, classes=None)#
Unserialize object.
- Parameters:
classes – Classes to be used for unserialization
json (
str) – JSON string
- Return type:
Self
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File to save to
- to_json()#
Serialize object.
- Return type:
str- Returns:
JSON string
-
bootstraps:
DataFrame# Bootstrap samples
-
runs:
DataFrame# Optimization runs
Bases:
objectClass specifying the sharing of params among types.
allmeans all available types or params.Example usage:
import fastdfe as fd # neutral SFS for two types sfs_neut = fd.Spectra(dict( pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652], pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31] )) # selected SFS for two types sfs_sel = fd.Spectra(dict( pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794], pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41] )) # create inference object inf = fd.JointInference( sfs_neut=sfs_neut, sfs_sel=sfs_sel, shared_params=[fd.SharedParams(types=["pendula", "pubescens"], params=["eps", "S_d"])], do_bootstrap=True ) # run inference inf.run()
The params to share
The types to share
- class Covariate(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')[source]#
Bases:
objectClass defining a covariate which induces a relationship with one or many parameters. The relationship is defined by a callback function which modifies the parameters. The default callback introduces a linear relationship.
Below an example of introducing linear covariates for
S_d, the for mean strength of deleterious mutations (cf.GammaExpParametrization). Each of the three types is associated with one covariate. This we pass toJointInferencetogether with the stratified spectra:import fastdfe as fd cov = fd.Covariate( param='S_d', values=dict(type1=5, type2=3, type3=1) )
-
param:
str# The parameter to modify
-
values:
Dict[str,float]# The values of the covariate for each type
-
callback:
Optional[Callable] = None# The callback function to modify the parameters
-
bounds:
tuple= (0.0001, 10000.0)# The bounds of the covariate parameter to be estimated
-
x0:
float= 0# The initial value of the covariate
-
bounds_scale:
Literal['lin','log','symlog'] = 'symlog'# The scale of the bounds. See
scale_value()for details
- apply(covariate: float, type: str, params: Dict[str, float])[source]#
Apply the custom or default callback to modify the given parameters.
- Parameters:
covariate (
float) – The value of the covariate.type (
str) – The type of the relationship.params (
Dict[str,float]) – The input parameters.
- Return type:
Dict[str,float]- Returns:
Modified parameters.
- apply_default(covariate: float, type: str, params: Dict[str, float])[source]#
Modify the given parameters introducing a linear relationship with the given covariate.
- Parameters:
covariate (
float) – The value of the covariate.type (
str) – The type of the relationship.params (
Dict[str,float]) – The input parameters.
- Return type:
Dict[str,float]- Returns:
Modified parameters.
- __init__(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')#
-
param: