Inference#
- class BaseInference(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#
Bases:
AbstractInferenceBase inference class for inferring the SFS given one neutral and one selected SFS. Note that BaseInference is by default seeded.
Example usage:
import fastdfe as fd sfs_neut = fd.Spectrum([177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652]) sfs_sel = fd.Spectrum([797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794]) # create inference object inf = fd.BaseInference( sfs_neut=sfs_neut, sfs_sel=sfs_sel, n_runs=10, n_bootstraps=100, do_bootstrap=True ) # run inference inf.run() # plot discretized DFE inf.plot_discretized() # plot SFS comparison inf.plot_sfs_comparison()
- __init__(sfs_neut: Spectra | Spectrum, sfs_sel: Spectra | Spectrum, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, discretization: Discretization = None, optimization: Optimization = None, locked: bool = False, **kwargs)[source]#
Create BaseInference instance.
- Parameters:
sfs_neut (
Spectra|Spectrum) – The neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If aSpectraobject with more than one SFS is provided, theallattribute will be used.sfs_sel (
Spectra|Spectrum) – The selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate. If aSpectraobject with more than one SFS is provided, theallattribute will be used.intervals_del (
Tuple[float,float,int]) –(start, stop, n_interval)for deleterious population-scaled selection coefficients. The intervals will be log10-spaced.intervals_ben (
Tuple[float,float,int]) – Same asintervals_delbut for positive selection coefficients.integration_mode (
Literal['midpoint','quad']) – Integration mode for the DFE,quadnot recommendedlinearized (
bool) – Whether to use the linearized DFE,Falsenot recommendedmodel (
Parametrization|str) – Instance of DFEParametrization which parametrized the DFEseed (
int) – Seed for the random number generator. UseNonefor no seed.x0 (
Dict[str,Dict[str,float]]) – Dictionary of initial values in the form{'all': {param: value}}bounds (
Dict[str,Tuple[float,float]]) – Bounds for the optimization in the form{param: (lower, upper)}scales (
Dict[str,Literal['lin','log','symlog']]) – Scales for the optimization in the form ``{param: scale}`loss_type (
Literal['likelihood','L2']) – Type of loss function to use for optimization.opts_mle (
dict) – Options for the optimization.n_runs (
int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.fixed_params (
Dict[str,Dict[str,float]]) – Parameters held fixed during optimization, i.e.,{'all': {param: value}}do_bootstrap (
bool) – Whether to do bootstrapping.n_bootstraps (
int) – Number of bootstraps.n_bootstrap_retries (
int) – Number of retries for bootstraps that did not terminate normally.parallelize (
bool) – Whether to parallelize computations.folded (
bool) – Whether the SFS are folded. If not specified, the SFS will be folded if both of the given SFS appear to be folded.discretization (
Discretization) – Discretization instance. Mainly intended for internal use.optimization (
Optimization) – Optimization instance. Mainly intended for internal use.locked (
bool) – Whether to lock the instance.kwargs – Additional keyword arguments which are ignored.
- sfs_neut: Spectrum#
Neutral SFS
- sfs_sel: Spectrum#
Selected SFS
- model: Parametrization#
The DFE parametrization
- folded: bool#
Whether the SFS are folded
- n: int#
Sample size
- discretization: Discretization#
Discretization instance
- theta: float#
Estimate of theta from neutral SFS
- params_mle: Optional[Dict[str, float]]#
MLE estimates of the initial optimization
- sfs_mle: Optional[Spectrum]#
Modelled MLE SFS
- likelihood: Optional[float]#
Likelihood of the MLE, this value may be updated after bootstrapping
- likelihoods: Optional[List[float]]#
Likelihoods of the different ML runs, controlled by
n_runs
- n_runs: int#
Number of MLE runs to perform
- result: Optional['scipy.optimize.OptimizeResult']#
Numerical optimization result
- do_bootstrap: bool#
Whether to do bootstrapping
- n_bootstraps: int#
Number of bootstraps
- n_bootstrap_retries: int#
Number of retries for bootstraps that did not terminate normally
- parallelize: bool#
Whether to parallelize computations
- fixed_params: Dict[str, Dict[str, float]]#
Fixed parameters
- scales: Dict[str, Literal['lin', 'log', 'symlog']]#
parameter scales
- bounds: Dict[str, Tuple[float, float]]#
parameter bounds
- optimization: Optimization#
Optimization instance
- x0: Dict[str, Dict[str, float]]#
Initial values
- bootstraps: Optional[pd.DataFrame]#
Bootstrapped MLE parameter estimates
- bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#
Bootstrap optimization results
- L2_residual: Optional[float]#
L2 norm of differential between modelled and observed SFS
- seed: int | None#
Random number generator seed
- rng: np.random.Generator#
Random generator instance
- execution_time: float#
Total execution time in seconds
- locked: bool#
Whether inferences can be run from the class itself
- run_if_required(*args, **kwargs)[source]#
Run if not run yet.
- Parameters:
args – Arguments.
kwargs – Keyword arguments.
- Return type:
Optional[Spectrum]- Returns:
DFE parametrization and modelled SFS.
- run(do_bootstrap: bool = None, pbar: bool = None, **kwargs)[source]#
Perform the DFE inference.
- Parameters:
pbar (
bool) – Whether to show a progress bar.do_bootstrap (
bool) – Whether to perform bootstrapping.kwargs – Additional keyword arguments which are ignored.
- Return type:
- Returns:
Modelled SFS.
- get_counts()[source]#
Get callback functions for modelling SFS counts from given parameters.
- Return type:
dict- Returns:
Callback functions for modelling SFS counts for each type.
- evaluate_likelihood(params: Dict[str, Dict[str, float]])[source]#
Evaluate likelihood function at given parameters.
- Parameters:
params (
Dict[str,Dict[str,float]]) – Parameters indexed by type and parameter name.- Return type:
float- Returns:
The likelihood.
- get_residual(k: int)[source]#
Residual of the modelled and observed SFS.
- Parameters:
k (
int) – Order of the norm- Return type:
float- Returns:
L2 residual
- bootstrap(n_samples: int = None, parallelize: bool = None, update_likelihood: bool = True, n_retries: int = None, pbar: bool = True)[source]#
Perform the parametric bootstrap.
- Parameters:
n_samples (
int) – Number of bootstrap samples. Defaults ton_bootstraps.parallelize (
bool) – Whether to parallelize the bootstrap. Defaults toparallelize.update_likelihood (
bool) – Whether to update the likelihood to be the mean of the bootstrap samples.n_retries (
int) – Number of retries for bootstrap samples that did not terminate normally. Defaults ton_bootstrap_retries.pbar (
bool) – Whether to show a progress bar.
- Return type:
DataFrame- Returns:
Dataframe with bootstrap results.
- get_scales_linear()[source]#
Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Literal['lin']]- Returns:
Dictionary of scales.
- get_bounds_linear()[source]#
Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Tuple[float,float]]- Returns:
Dictionary of bounds.
- plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', title: str = 'DFE', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot continuous DFE. The special constants np.inf and -np.inf are also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using
scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.- Parameters:
title (
str) – Plot title.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.ci_level (
float) – Confidence level for confidence intervals.confidence_intervals (
bool) – Whether to plot confidence intervals.file (
str) – File to save plot to.show (
bool) – Whether to show plot.scale (
Literal['lin','log','symlog']) – y-scalescale_density (
bool) – Whether to scale the density by the x-axis interval sizeintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)[source]#
Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)[source]#
Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!
- Parameters:
color (
str) – Color of the bars.file (
str) – File to save plot to.show (
bool) – Whether to show plot.interval_labels (
List[str]) – Labels for the intervals.intervals (
Sequence) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#
Plot SFS comparison.
- Parameters:
file (
str) – File to save plot to.labels (
List[str]) – Labels for the SFS.colors (
List[str]) – Colors for the SFS. Only for Python visualization backend.sfs_types (
List[Literal['modelled','observed','selected','neutral']]) – Types of SFS to plot.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if
use_subplotsisFalse.title (
str) – Plot titleuse_subplots (
bool) – Whether to use subplots. Only for Python visualization backend.show_monomorphic (
bool) – Whether to show monomorphic countskwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)[source]#
Plot neutral and selected SFS.
- Parameters:
file (
str) – File to save plot to.labels (
List[str]) – Labels for the SFS.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_all(show: bool = True)[source]#
Plot a bunch of plots.
- Parameters:
show (
bool) – Whether to show plot.
- get_errors_discretized_dfe(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#
Get the discretized DFE errors.
- Parameters:
ci_level (
float) – Confidence interval level.intervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.
- Return type:
ndarray- Returns:
Arrays of errors, confidence intervals, bootstraps, means and values
- plot_inferred_parameters(confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#
Visualize the inferred parameters and their confidence intervals.
- Parameters:
scale (
Literal['lin','log','symlog']) – y-scale of the plot.title (
str) – Plot title.confidence_intervals (
bool) – Whether to show confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.ci_level (
float) – Confidence level for the confidence intervals.file (
str) – File to save plot to.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#
Visualize the inferred parameters and their confidence intervals.
- Parameters:
title (
str) – Plot title.file (
str) – File to save plot to.show (
bool) – Whether to show plot.
- Return type:
plt.Axes
- Returns:
Axes object
- Raises:
ValueError – If no inference objects are given or no bootstraps are found.
- plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)[source]#
Visualize the likelihoods of the optimization runs using a scatter plot.
- Parameters:
scale (
Literal['lin','log']) – y-scale of the plot.title (
str) – Plot title.file (
str) – File to save plot to.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (
str) – Label for the y-axis.
- Return type:
plt.Axes
- Returns:
Axes object
- lrt(ll_simple: float, ll_complex: float, df: int = 1)[source]#
Perform the likelihood ratio test (LRT).
Note
We do not adjust the test for parameters that are on the boundary of the parameter space.
- Parameters:
ll_simple (
float) – Log-likelihood of the simple model.ll_complex (
float) – Log-likelihood of the complex model.df (
int) – Degrees of freedom.
- Return type:
float- Returns:
p-value
- compare_nested(complex: BaseInference)[source]#
Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters.
- Parameters:
complex (
BaseInference) – More complex model.- Return type:
float|None- Returns:
p-value
- compare_nested_models(do_bootstrap: bool = True)[source]#
Compare the various nested versions of the specified model using likelihood ratio tests.
- Parameters:
do_bootstrap (
bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.- Return type:
(
ndarray,Dict[str, BaseInference])- Returns:
Matrix of p-values, dict of base inference objects
- plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = True)[source]#
Plot the p-values of nested likelihoods.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.remove_empty (
bool) – Whether to remove empty rows and columns.transpose (
bool) – Whether to transpose the matrix.cmap (
str) – Colormap to use.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (
bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'discretized DFE', ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}})#
Plot discretized DFE.
- Parameters:
title (
str) – Title of the plotbootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence interval levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save the plot toshow (
bool) – Whether to show the plotintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bars.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes
- get_alpha(params: dict = None)[source]#
Get alpha, the proportion of beneficial non-synonymous substitutions.
- Parameters:
params (
dict) – Parameters to use for calculation.- Return type:
float
- property alpha: float#
Cache alpha, the proportion of beneficial non-synonymous substitutions.
- get_bootstrap_params()[source]#
Get the parameters to be included in the bootstraps.
- Return type:
Dict[str,float]- Returns:
Parameters to be included in the bootstraps.
- get_bootstrap_param_names()[source]#
Get the parameters to be included in the bootstraps.
- Return type:
List[str]- Returns:
Parameters to be included in the bootstraps.
- get_optimized_param_names()[source]#
Get the parameters names for the parameters that are optimized.
- Return type:
List[str]- Returns:
List of parameter names.
- get_n_optimized()[source]#
Get the number of parameters that are optimized.
- Return type:
int- Returns:
Number of parameters that are optimized.
- create_config()[source]#
Create a config object from the inference object.
- Return type:
- Returns:
Config object.
- classmethod from_config(config: Config)[source]#
Load from config object.
- Parameters:
config (
Config) – Config object.- Return type:
Self- Returns:
Inference object.
- classmethod from_config_file(file: str, cache: bool = True)[source]#
Load from config file.
- Parameters:
file (
str) – Config file path, possibly URL.cache (
bool) – Whether to use the cache if available.
- Return type:
Self- Returns:
Inference object.
- get_x0()[source]#
Get initial values.
- Return type:
Dict[str,Dict[str,float]]- Returns:
Initial values.
- property param_names: List[str]#
Parameter names.
- Returns:
List of parameter names.
- classmethod from_file(file: str, classes=None)#
Load object from file.
- Parameters:
classes – Classes to be used for unserialization
file (
str) – File to load from
- Return type:
Self
- classmethod from_json(json: str, classes=None)#
Unserialize object.
- Parameters:
classes – Classes to be used for unserialization
json (
str) – JSON string
- Return type:
Self
- get_cis_params_mle(bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, param_names: list[str] | None = None)#
Get confidence intervals for the MLE parameters.
- Parameters:
bootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence interval levelparam_names (
Optional[list[str]]) – Names of the parameters to return confidence intervals for
- Return type:
Dict[str,Tuple[float,float]]- Returns:
Confidence intervals for the parameters
- get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile')#
Get discretized DFE.
- Parameters:
bootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence interval levelconfidence_intervals (
bool) – Whether to return confidence intervalsintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bins.
- Return type:
(
ndarray,ndarray)- Returns:
Array of values and array of deviations
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File to save to
- class JointInference(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, **kwargs)[source]#
Bases:
BaseInferenceEnabling the sharing of parameters among several Inference objects.
Example usage:
import fastdfe as fd # neutral SFS for two types sfs_neut = fd.Spectra(dict( pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652], pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31] )) # selected SFS for two types sfs_sel = fd.Spectra(dict( pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794], pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41] )) # create inference object inf = fd.JointInference( sfs_neut=sfs_neut, sfs_sel=sfs_sel, fixed_params=dict(eps=0), # fix eps to 0 do_bootstrap=True ) # run inference inf.run()
- __init__(sfs_neut: Spectra, sfs_sel: Spectra, include_divergence: bool = None, intervals_del: Tuple[float, float, int] = (-100000000.0, -1e-05, 1000), intervals_ben: Tuple[float, float, int] = (1e-05, 10000.0, 1000), integration_mode: Literal['midpoint', 'quad'] = 'midpoint', linearized: bool = True, model: Parametrization | str = 'GammaExpParametrization', seed: int = 0, x0: Dict[str, Dict[str, float]] = {}, bounds: Dict[str, Tuple[float, float]] = {}, scales: Dict[str, Literal['lin', 'log', 'symlog']] = {}, loss_type: Literal['likelihood', 'L2'] = 'likelihood', opts_mle: dict = {}, n_runs: int = 10, fixed_params: Dict[str, Dict[str, float]] = {}, shared_params: List[SharedParams] = [], covariates: List[Covariate] = [], do_bootstrap: bool = False, n_bootstraps: int = 100, n_bootstrap_retries: int = 2, parallelize: bool = True, folded: bool = None, **kwargs)[source]#
Create instance.
- Parameters:
sfs_neut (
Spectra) – Neutral SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.sfs_sel (
Spectra) – Selected SFS. Note that we require monomorphic counts to be specified in order to infer the mutation rate.include_divergence (
bool) – Whether to include divergence in the likelihoodintervals_del (
Tuple[float,float,int]) –(start, stop, n_interval)for deleterious population-scaled selection coefficients. The intervals will be log10-spaced.intervals_ben (
Tuple[float,float,int]) – Same asintervals_delbut for positive selection coefficients.integration_mode (
Literal['midpoint','quad']) – Integration mode,quadnot recommendedlinearized (
bool) – Whether to use the linearized model,Falsenot recommendedmodel (
Parametrization|str) – DFE parametrizationseed (
int) – Random seed. UseNonefor no seed.x0 (
Dict[str,Dict[str,float]]) – Dictionary of initial values in the form{type: {param: value}}bounds (
Dict[str,Tuple[float,float]]) – Bounds for the optimization in the form {param: (lower, upper)}scales (
Dict[str,Literal['lin','log','symlog']]) – Scales for the optimization in the form {param: scale}loss_type (
Literal['likelihood','L2']) – Loss typeopts_mle (
dict) – Options for the optimizationn_runs (
int) – Number of independent optimization runs out of which the best one is chosen. The first run will use the initial values if specified. Consider increasing this number if the optimization does not produce good results.fixed_params (
Dict[str,Dict[str,float]]) – Dictionary of fixed parameters in the form{type: {param: value}}shared_params (
List[SharedParams]) – List of shared parametersdo_bootstrap (
bool) – Whether to perform bootstrappingn_bootstraps (
int) – Number of bootstrapsn_bootstrap_retries (
int) – Number of retries for bootstraps that did not terminate normally.parallelize (
bool) – Whether to parallelize computationsfolded (
bool) – Whether the SFS are folded. If not specified, the SFS will be folded if all of the given SFS appear to be folded.kwargs – Additional keyword arguments which are ignored.
- types: List[str]#
SFS types
- params_mle_raw: Optional[Dict[str, Dict[str, float]]]#
original MLE parameters before adding covariates and unpacked shared
Shared parameters with expanded ‘all’ type
- covariates: Dict[str, Covariate]#
Covariates indexed by parameter name
- fixed_params: Dict[str, Dict[str, float]]#
fixed parameters with expanded ‘all’ type
- marginal_inferences: Dict[str, BaseInference]#
Dictionary of marginal inferences indexed by type
- scales: Dict[str, Literal['lin', 'log', 'symlog']]#
parameter scales
- bounds: Dict[str, Tuple[float, float]]#
parameter bounds
- optimization: Optimization#
Joint optimization instance
- joint_inferences: Dict[str, BaseInference]#
Joint inference object indexed by type
Check if the shared parameters were specified correctly.
Check that no shared parameters are fixed and raise an error otherwise.
Get the names of the shared parameters.
- Return type:
List[str]- Returns:
Names of the shared parameters.
- run(**kwargs)[source]#
Run inference.
- Parameters:
kwargs – Additional keyword arguments
- Return type:
- Returns:
Modelled SFS.
- marginals_without_all()[source]#
Get marginal inference without ‘all’ type.
- Return type:
Dict[str,BaseInference]- Returns:
Dict of marginal inferences indexed by type.
- joint_inference_makes_sense()[source]#
Check if joint inference makes sense.
- Return type:
bool- Returns:
Whether joint inference makes sense.
- get_counts()[source]#
Get callback functions for modelling SFS counts from given parameters.
- Return type:
dict- Returns:
Dict of callback functions indexed by type.
- run_joint_without_covariates(do_bootstrap: bool = True)[source]#
Run joint inference without covariates. Note that the result of this function is cached.
- Return type:
- Returns:
Joint inference instance devoid of covariates.
- create_config()[source]#
Create a config object from the inference object.
- Return type:
- Returns:
Config object.
- perform_lrt_covariates(do_bootstrap: bool = True)[source]#
Perform likelihood ratio test against joint inference without covariates. In the simple model we share parameters across types. Low p-values indicate that the covariates provide a significant improvement in the fit.
To access the JointInference object without covariates, you can call
run_joint_without_covariates(), which is cached.- Parameters:
do_bootstrap (
bool) – Whether to bootstrap. This improves the accuracy of the p-value. Note that if bootstrapping was performed previously without updating the likelihood, this won’t have any effect.- Return type:
float- Returns:
Likelihood ratio test p-value.
- bootstrap(n_samples: int = None, parallelize: bool = None, n_retries: int = None, update_likelihood: bool = True, **kwargs)[source]#
Perform the parametric bootstrap both for the marginal and joint inferences.
- Parameters:
n_samples (
int) – Number of bootstrap samples. Defaults ton_bootstraps.parallelize (
bool) – Whether to parallelize computations. Defaults toparallelize.n_retries (
int) – Number of retries for bootstraps that did not terminate normally. Defaults ton_bootstrap_retries.update_likelihood (
bool) – Whether to update the likelihood
- Return type:
Optional[DataFrame]- Returns:
DataFrame with bootstrap samples
- get_x0()[source]#
Get initial values for joint inference.
- Return type:
Dict[str,Dict[str,float]]- Returns:
Dictionary of initial values indexed by type
Compare likelihood of joint inference with product of marginal likelihoods. This provides information about the goodness of fit achieved by the parameter sharing. Low p-values indicate that parameter sharing is not justified, i.e., that the marginal inferences provide a better fit to the data. Note that it is more difficult to properly optimize the joint likelihood, which makes this test conservative, i.e., the reported p-value might be larger than what it really is.
- Parameters:
do_bootstrap (
bool) – Whether to perform bootstrapping. This improves the accuracy of the p-value. Note that if bootstrapping was performed previously without updating the likelihood, this won’t have any effect.- Return type:
float- Returns:
p-value
- get_inferences(types: List[str] = None, labels: List[str] = None, show_marginals: bool = True)[source]#
Get all inference objects as dictionary.
- Parameters:
types (
List[str]) – Types to includelabels (
List[str]) – Labels for typesshow_marginals (
bool) – Whether to also show marginal inferences
- Return type:
Dict[str,BaseInference]- Returns:
Dictionary of base inference objects indexed by type and joint vs marginal subtypes
- plot_discretized(file: str = None, show: bool = True, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), show_marginals: bool = True, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'discretized DFE comparison', labels: List[str] = None, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot discretized DFE comparing the different types.
- Parameters:
labels (
List[str]) – Labels for typestitle (
str) – Title of plotintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bars.show_marginals (
bool) – Whether to also show marginal inferencesbootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save plot toshow (
bool) – Whether to show plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_sfs_comparison(sfs_types: List[Literal['modelled', 'observed', 'selected', 'neutral']] = ['observed', 'modelled'], types: List[str] = None, labels: List[str] = None, colors: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None, title: str = 'SFS comparison', use_subplots: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}})[source]#
Plot SFS comparison.
- Parameters:
types (
List[str]) – Types to plotfile (
str) – File to save plot tolabels (
List[str]) – Labels for types.colors (
List[str]) – Colors for types. Only for Python visualization backend.sfs_types (
List[Literal['modelled','observed','selected','neutral']]) – Types of SFS to plotshow (
bool) – Whether to show plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend and if
use_subplotsisFalse.title (
str) – Plot titleuse_subplots (
bool) – Whether to use subplots. Only for Python visualization backend.show_monomorphic (
bool) – Whether to show monomorphic countskwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_continuous(file: str = None, show: bool = True, intervals: ndarray = None, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', title: str = 'DFE comparison', labels: List[str] = None, scale_density: bool = False, scale: Literal['lin', 'log', 'symlog'] = 'lin', kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot discretized DFE. The special constants
np.infand-np.infare also valid interval bounds. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size usingscale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.- Parameters:
scale_density (
bool) – Whether to scale the density by the x-axis interval sizescale (
Literal['lin','log','symlog']) – y-scalelabels (
List[str]) – Labels for typestitle (
str) – Title of plotbootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save plot toshow (
bool) – Whether to show plotintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_inferred_parameters(file: str = None, confidence_intervals: bool = True, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, ax: plt.Axes = None, scale: Literal['lin', 'log', 'symlog'] = 'log', kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs: List[str])[source]#
Plot discretized DFE comparing the different types.
- Parameters:
labels (
List[str]) – Labels for typestitle (
str) – Title of plotbootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence levelconfidence_intervals (
bool) – Whether to plot confidence intervalsfile (
str) – File to save plot toshow (
bool) – Whether to show plotax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
scale (
Literal['lin','log','symlog']) – y-scalekwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_inferred_parameters_boxplot(file: str = None, show: bool = True, title: str = 'inferred parameters', labels: List[str] = None, **kwargs: List[str])[source]#
Plot discretized DFE comparing the different types.
- Parameters:
labels (
List[str]) – Labels for typestitle (
str) – Title of plotfile (
str) – File to save plot toshow (
bool) – Whether to show plot
- Return type:
plt.Axes
- Returns:
Axes object
- Raises:
ValueError – If no inference objects are given or no bootstraps are found.
- plot_covariate(index: int = 0, file: str = None, show: bool = True, title: str = None, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', show_types: bool = True, ci_level: float = 0.05, xlabel: str = 'cov', ylabel: str = None, ax: plt.Axes = None)[source]#
Plot the covariate given by the index.
- Parameters:
index (
int) – The index of the covariate.file (
str) – File to save plot to.show (
bool) – Whether to show plot.title (
str) – Plot title.bootstrap_type (
Literal['percentile','bca']) – Bootstrap type.show_types (
bool) – Whether to show types on second x-axis.ci_level (
float) – Confidence level.xlabel (
str) – X-axis label.ylabel (
str) – Y-axis label, defaults to the covariate parameter name.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object.
- get_cis_params_mle(bootstrap_type: Literal['percentile', 'bca'] = 'percentile', ci_level: float = 0.05, param_names: list[str] | None = None)[source]#
Get confidence intervals for the parameters.
- Returns:
Confidence intervals for the parameters
- get_discretized(intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#
Get discretized DFEs.
- Parameters:
bootstrap_type (
Literal['percentile','bca']) – Type of bootstrapci_level (
float) – Confidence interval levelconfidence_intervals (
bool) – Whether to return confidence intervalsintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bins.
- Return type:
Dict[str,Tuple[ndarray,ndarray]]- Returns:
Dictionary of array of values and array of errors indexed by inference type
- get_bootstrap_params()[source]#
Get bootstrap parameters.
- Return type:
Dict[str,float]- Returns:
Bootstrap parameters
- to_json()[source]#
Serialize object. Note that the deserialized inference objects no longer share the same optimization instance among other things.
- Return type:
str- Returns:
JSON string
- property alpha: float | None#
The is no single alpha for the joint inference. Please refer to the
self.joint_inferences[t].alpha.- Returns:
None
- get_residual(k: int)[source]#
Residual of joint inference. We calculate the residual over the jointly inferred SFS for all types.
- Parameters:
k (
int) – Order of the norm- Return type:
float- Returns:
L2 residual
- check_fixed_params_exist()#
Check that the fixed parameters are valid.
- compare_nested(complex: BaseInference)#
Perform likelihood ratio test with given more complex model. The given model’s fixed parameters need to be a proper subset of this model’s fixed parameters.
- Parameters:
complex (
BaseInference) – More complex model.- Return type:
float|None- Returns:
p-value
- compare_nested_models(do_bootstrap: bool = True)#
Compare the various nested versions of the specified model using likelihood ratio tests.
- Parameters:
do_bootstrap (
bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.- Return type:
(
ndarray,Dict[str, BaseInference])- Returns:
Matrix of p-values, dict of base inference objects
- evaluate_likelihood(params: Dict[str, Dict[str, float]])#
Evaluate likelihood function at given parameters.
- Parameters:
params (
Dict[str,Dict[str,float]]) – Parameters indexed by type and parameter name.- Return type:
float- Returns:
The likelihood.
- classmethod from_config(config: Config)#
Load from config object.
- Parameters:
config (
Config) – Config object.- Return type:
Self- Returns:
Inference object.
- classmethod from_config_file(file: str, cache: bool = True)#
Load from config file.
- Parameters:
file (
str) – Config file path, possibly URL.cache (
bool) – Whether to use the cache if available.
- Return type:
Self- Returns:
Inference object.
- classmethod from_file(file: str, classes=None)#
Load object from file.
- Parameters:
classes – Classes to be used for unserialization
file (
str) – File to load from
- Return type:
Self
- classmethod from_json(json: str, classes=None)#
Unserialize object.
- Parameters:
classes – Classes to be used for unserialization
json (
str) – JSON string
- Return type:
Self
- get_alpha(params: dict = None)#
Get alpha, the proportion of beneficial non-synonymous substitutions.
- Parameters:
params (
dict) – Parameters to use for calculation.- Return type:
float
- get_bootstrap_param_names()#
Get the parameters to be included in the bootstraps.
- Return type:
List[str]- Returns:
Parameters to be included in the bootstraps.
- get_bounds_linear()#
Get linear bounds for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Tuple[float,float]]- Returns:
Dictionary of bounds.
- get_errors_discretized_dfe(ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile')#
Get the discretized DFE errors.
- Parameters:
ci_level (
float) – Confidence interval level.intervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.
- Return type:
ndarray- Returns:
Arrays of errors, confidence intervals, bootstraps, means and values
- get_fixed_param_names()#
Get the names of the fixed parameters.
- Return type:
List[str]
- get_n_optimized()#
Get the number of parameters that are optimized.
- Return type:
int- Returns:
Number of parameters that are optimized.
- get_optimized_param_names()#
Get the parameters names for the parameters that are optimized.
- Return type:
List[str]- Returns:
List of parameter names.
- get_scales_linear()#
Get linear scales for all parameters. We do this for the bootstraps as x0 should be close to MLE.
- Return type:
Dict[str,Literal['lin']]- Returns:
Dictionary of scales.
- get_summary()#
Get summary.
- Return type:
InferenceResults- Returns:
Inference results.
- lrt(ll_simple: float, ll_complex: float, df: int = 1)#
Perform the likelihood ratio test (LRT).
Note
We do not adjust the test for parameters that are on the boundary of the parameter space.
- Parameters:
ll_simple (
float) – Log-likelihood of the simple model.ll_complex (
float) – Log-likelihood of the complex model.df (
int) – Degrees of freedom.
- Return type:
float- Returns:
p-value
- property param_names: List[str]#
Parameter names.
- Returns:
List of parameter names.
- plot_all(show: bool = True)#
Plot a bunch of plots.
- Parameters:
show (
bool) – Whether to show plot.
- plot_bucket_sizes(file: str = None, show: bool = True, title: str = 'bucket sizes', ax: plt.Axes = None)#
Plot mass in each bucket for the MLE DFE. This can be used to check if the interval bounds and spacing are chosen appropriately.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_interval_density(file: str = None, show: bool = True, intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), interval_labels: List[str] = None, color: str = 'C0', ax: plt.Axes = None)#
Plot density of the discretization intervals chosen. Note that although this plot looks similar, this is not the DFE!
- Parameters:
color (
str) – Color of the bars.file (
str) – File to save plot to.show (
bool) – Whether to show plot.interval_labels (
List[str]) – Labels for the intervals.intervals (
Sequence) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_likelihoods(file: str = None, show: bool = True, title: str = 'likelihoods', scale: Literal['lin', 'log'] = 'lin', ax: plt.Axes = None, ylabel: str = 'lnl', **kwargs)#
Visualize the likelihoods of the optimization runs using a scatter plot.
- Parameters:
scale (
Literal['lin','log']) – y-scale of the plot.title (
str) – Plot title.file (
str) – File to save plot to.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
ylabel (
str) – Label for the y-axis.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_nested_models(file: str = None, show: bool = True, remove_empty: bool = False, transpose: bool = False, cmap: str = None, title: str = 'nested model comparison', ax: plt.Axes = None, do_bootstrap: bool = True)#
Plot the p-values of nested likelihoods.
- Parameters:
file (
str) – File to save plot to.show (
bool) – Whether to show plot.remove_empty (
bool) – Whether to remove empty rows and columns.transpose (
bool) – Whether to transpose the matrix.cmap (
str) – Colormap to use.title (
str) – Plot title.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
do_bootstrap (
bool) – Whether to perform bootstrapping. This is recommended to get more accurate p-values.
- Return type:
plt.Axes
- Returns:
Axes object
- plot_observed_sfs(labels: List[str] = None, file: str = None, show: bool = True, ax: plt.Axes = None)#
Plot neutral and selected SFS.
- Parameters:
file (
str) – File to save plot to.labels (
List[str]) – Labels for the SFS.show (
bool) – Whether to show plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes object
- raise_if_locked()#
Raise an error if this object is locked.
- Raises:
Exception –
- run_if_required(*args, **kwargs)#
Run if not run yet.
- Parameters:
args – Arguments.
kwargs – Keyword arguments.
- Return type:
Optional[Spectrum]- Returns:
DFE parametrization and modelled SFS.
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File to save to
- sfs_neut: Spectrum#
Neutral SFS
- sfs_sel: Spectrum#
Selected SFS
- model: Parametrization#
The DFE parametrization
- folded: bool#
Whether the SFS are folded
- n: int#
Sample size
- discretization: Discretization#
Discretization instance
- theta: float#
Estimate of theta from neutral SFS
- params_mle: Optional[Dict[str, float]]#
MLE estimates of the initial optimization
- sfs_mle: Optional[Spectrum]#
Modelled MLE SFS
- likelihood: Optional[float]#
Likelihood of the MLE, this value may be updated after bootstrapping
- likelihoods: Optional[List[float]]#
Likelihoods of the different ML runs, controlled by
n_runs
- n_runs: int#
Number of MLE runs to perform
- result: Optional['scipy.optimize.OptimizeResult']#
Numerical optimization result
- do_bootstrap: bool#
Whether to do bootstrapping
- n_bootstraps: int#
Number of bootstraps
- n_bootstrap_retries: int#
Number of retries for bootstraps that did not terminate normally
- parallelize: bool#
Whether to parallelize computations
- x0: Dict[str, Dict[str, float]]#
Initial values
- bootstraps: Optional[pd.DataFrame]#
Bootstrapped MLE parameter estimates
- bootstrap_results: Optional[List['scipy.optimize.OptimizeResult']]#
Bootstrap optimization results
- L2_residual: Optional[float]#
L2 norm of differential between modelled and observed SFS
- seed: int | None#
Random number generator seed
- rng: np.random.Generator#
Random generator instance
- execution_time: float#
Total execution time in seconds
- locked: bool#
Whether inferences can be run from the class itself
- class Inference[source]#
Bases:
objectStatic utility methods for inference objects.
- static plot_discretized(inferences: List[AbstractInference], intervals: Sequence = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'discretized DFEs', labels: Sequence = None, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#
Visualize several discretized DFEs given by the list of inference objects.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.intervals (
Sequence) – Intervals over(-inf, inf)to use for discretization.confidence_intervals (
bool) – Whether to plot confidence intervals.ci_level (
float) – Confidence level for confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.labels (
Sequence) – Labels for the DFEs.kwargs – Additional arguments for the plot.
ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- static plot_continuous(inferences: List[AbstractInference], intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'continuous DFEs', labels: Sequence = None, scale: Literal['lin', 'log', 'symlog'] = 'lin', scale_density: bool = False, ax: plt.Axes = None, kwargs_legend: dict = {'prop': {'size': 8}}, **kwargs)[source]#
Visualize several DFEs given by the list of inference objects. By default, the PDF is plotted as is. Due to the logarithmic scale on the x-axis, we may get a wrong intuition on how the mass is distributed, however. To get a better intuition, we can optionally scale the density by the x-axis interval size using
scale_density = True. This has the disadvantage that the density now changes for x, so that even a constant density will look warped.- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.intervals (
ndarray) – Intervals to use for discretization.confidence_intervals (
bool) – Whether to plot confidence intervals.ci_level (
float) – Confidence level for confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.labels (
Sequence) – Labels for the DFEs.scale (
Literal['lin','log','symlog']) – y-scale of the plot.scale_density (
bool) – Whether to scale the density by the x-axis interval size.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.kwargs – Additional arguments for the plot.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- static plot_inferred_parameters(inferences: List[AbstractInference], labels: Sequence, confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile', file: str = None, show: bool = True, title: str = 'parameter estimates', scale: Literal['lin', 'log', 'symlog'] = 'log', ax: plt.Axes = None, kwargs_legend: dict = {'loc': 'upper right', 'prop': {'size': 8}}, **kwargs)[source]#
Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Unique labels for the DFEs.scale (
Literal['lin','log','symlog']) – y-scale of the plot.confidence_intervals (
bool) – Whether to plot confidence intervals.ci_level (
float) – Confidence level for confidence intervals.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.kwargs – Additional arguments which are ignored.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- Raises:
ValueError – If no inference objects are given.
- static plot_inferred_parameters_boxplot(inferences: List[AbstractInference], labels: Sequence, file: str = None, show: bool = True, title: str = 'parameter estimates', **kwargs)[source]#
Visualize several discretized DFEs given by the list of inference objects. Note that the DFE parametrization needs to be the same for all inference objects.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Unique labels for the DFEs.file (
str) – Path to file to save the plot to.show (
bool) – Whether to show the plot.title (
str) – Title of the plot.kwargs – Additional arguments for the plot.
- Return type:
plt.Axes
- Returns:
Axes of the plot.
- Raises:
ValueError – If no inference objects are given or no bootstraps are found.
- static get_cis_params_mle(bootstrap_type: Literal['percentile', 'bca'], ci_level: float, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence)[source]#
Get confidence intervals for the MLE parameters.
- Parameters:
bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use for confidence intervals.ci_level (
float) – Confidence level for confidence intervals.inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Labels for the DFEs.param_names (
Sequence) – Names of the parameters.
- Return type:
dict[str,Optional[Dict[str,Tuple[float,float]]]]- Returns:
Dictionary of confidence intervals indexed by labels.
- static get_errors_params_mle(bootstrap_type: Literal['percentile', 'bca'], ci_level: float, confidence_intervals: bool, inferences: List[AbstractInference], labels: Sequence, param_names: Sequence)[source]#
Get errors and values for MLE params of inferences.
- Parameters:
bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to use.ci_level (
float) – Confidence level for confidence intervals.confidence_intervals (
bool) – Whether to compute confidence intervals.inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Labels for the inferences.param_names (
Sequence) – Names of the parameters to get errors and values for.
- Return type:
(
Dict[str,Optional[Tuple[ndarray,ndarray]]],Dict[str,ndarray])- Returns:
dictionary of errors and dictionary of values
- static get_discretized(inferences: List[AbstractInference], labels: Sequence, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), confidence_intervals: bool = True, ci_level: float = 0.05, bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#
Get values and errors of discretized DFE.
- Parameters:
inferences (
List[AbstractInference]) – List of inference objects.labels (
Sequence) – Labels for the DFEs.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap to useci_level (
float) – Confidence interval levelconfidence_intervals (
bool) – Whether to compute confidence intervalsintervals (
ndarray) – Array of interval boundaries over(-inf, inf)yieldingintervals.shape[0] - 1bars.
- Return type:
(
ndarray,Optional[ndarray])- Returns:
Array of values and array of errors
- static get_stats_discretized(params: dict, bootstraps: DataFrame, model: Parametrization | str, ci_level: float = 0.05, intervals: ndarray = array([-inf, -100., -10., -1., 0., 1., inf]), bootstrap_type: Literal['percentile', 'bca'] = 'percentile')[source]#
Compute errors and confidence interval for a discretized DFE.
- Parameters:
params (
dict) – Parameters of the modelbootstraps (
DataFrame) – Bootstrapped samplesmodel (
Parametrization|str) – DFE parametrizationci_level (
float) – Confidence interval levelintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.bootstrap_type (
Literal['percentile','bca']) – Type of bootstrap
- Return type:
(
ndarray,ndarray,ndarray,ndarray,ndarray)- Returns:
Arrays of errors, confidence intervals, bootstraps, means and values
- static compute_histogram(model: Parametrization | str, params: dict, intervals: ndarray)[source]#
Discretize the DFE given a DFE parametrization and its parameter values.
- Parameters:
model (
Parametrization|str) – DFE parametrizationparams (
dict) – Parameters of the modelintervals (
ndarray) – Array of interval boundaries yieldingintervals.shape[0] - 1bins.
- Return type:
ndarray- Returns:
Discretized DFE
Bases:
objectClass specifying the sharing of params among types.
allmeans all available types or params.Example usage:
import fastdfe as fd # neutral SFS for two types sfs_neut = fd.Spectra(dict( pendula=[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652], pubescens=[172528, 3612, 1359, 790, 584, 427, 325, 234, 166, 76, 31] )) # selected SFS for two types sfs_sel = fd.Spectra(dict( pendula=[797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 794], pubescens=[791106, 5326, 1741, 1005, 756, 546, 416, 294, 177, 104, 41] )) # create inference object inf = fd.JointInference( sfs_neut=sfs_neut, sfs_sel=sfs_sel, shared_params=[fd.SharedParams(types=["pendula", "pubescens"], params=["eps", "S_d"])], do_bootstrap=True ) # run inference inf.run()
The params to share
The types to share
- class Covariate(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')[source]#
Bases:
objectClass defining a covariate which induces a relationship with one or many parameters. The relationship is defined by a callback function which modifies the parameters. The default callback introduces a linear relationship.
Below an example of introducing linear covariates for
S_d, the for mean strength of deleterious mutations (cf.GammaExpParametrization). Each of the three types is associated with one covariate. This we pass toJointInferencetogether with the stratified spectra:import fastdfe as fd cov = fd.Covariate( param='S_d', values=dict(type1=5, type2=3, type3=1) )
-
param:
str# The parameter to modify
-
values:
Dict[str,float]# The values of the covariate for each type
-
callback:
Optional[Callable] = None# The callback function to modify the parameters
-
bounds:
tuple= (0.0001, 10000.0)# The bounds of the covariate parameter to be estimated
-
x0:
float= 0# The initial value of the covariate
-
bounds_scale:
Literal['lin','log','symlog'] = 'symlog'# The scale of the bounds. See
scale_value()for details
- apply(covariate: float, type: str, params: Dict[str, float])[source]#
Apply the custom or default callback to modify the given parameters.
- Parameters:
covariate (
float) – The value of the covariate.type (
str) – The type of the relationship.params (
Dict[str,float]) – The input parameters.
- Return type:
Dict[str,float]- Returns:
Modified parameters.
- apply_default(covariate: float, type: str, params: Dict[str, float])[source]#
Modify the given parameters introducing a linear relationship with the given covariate.
- Parameters:
covariate (
float) – The value of the covariate.type (
str) – The type of the relationship.params (
Dict[str,float]) – The input parameters.
- Return type:
Dict[str,float]- Returns:
Modified parameters.
- __init__(param: str, values: Dict[str, float], callback: Callable | None = None, bounds: tuple = (0.0001, 10000.0), x0: float = 0, bounds_scale: Literal['lin', 'log', 'symlog'] = 'symlog')#
-
param: