Quickstart#

The easiest way to get started is by using the BaseInference class which infers the DFE from a single pair of frequency spectra, one neutral and one selected. In this example we create Spectrum objects holding the SFS counts and pass them to BaseInference. Note that we are required to specify the number of monomorphic sites (the last and first entries of the specified counts which correspond to the number of mono-allelic sites where the ancestral and derived allele is fixed, respectively). By default, only the deleterious part of the DFE is inferred (cf. fixed_params).

library(fastdfe)
fd <- load_fastdfe()

# configure inference
inf <- fd$BaseInference(
  sfs_neut = fd$Spectrum(c(177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 0)),
  sfs_sel = fd$Spectrum(c(797939, 1329, 499, 265, 162, 104, 117, 90, 94, 119, 0)),
  do_bootstrap = FALSE
)

# run inference
sfs_models <- fd$BaseInference$run(inf)

INFO:BaseInference: No divergence counts provided, inferring from polymorphism only.
INFO:Discretization: Precomputing semidominant DFE-SFS transformation using midpoint integration.
Discretization>Precomputing: 100%|██████████| 9/9 [00:00<00:00, 17.35it/s]
INFO:Optimization: Optimizing 2 parameters: [all.b, all.S_d].
BaseInference>Performing inference: 100%|██████████| 10/10 [00:00<00:00, 72.29it/s]
INFO:BaseInference: Inference results: {all.S_d: -3.389e+04 ± 3.9, all.b: 0.1305 ± 1.6e-06, all.p_b: 0 ± 0, all.S_b: 1 ± 0, all.eps: 0 ± 0, all.h: 0.5 ± 0, likelihood: -35.44 ± 4.1e-09} (best_run ± std_across_runs)

fastdfe uses maximum likelihood estimation (MLE) to find the DFE. By default, 10 local optimization runs are carried out to make sure a reasonably good global optimum has been bound. The DFE furthermore needs to parametrized where GammaExpParametrization is used by default. We also report the standard deviation across optimization runs to give an idea of the reliability of the estimates. In this case, the standard deviations are low, indicating that the estimates are stable.

We can now plot the inferred DFE in discretized form (cf. plot_discretized()).

p <- fd$BaseInference$plot_discretized(inf)

../../_images/10f5a148b075ede4feba845aca0a8a4f9f0b65772a7424edaf2e98028e3aa567.png

We can also plot a comparison of the selected modelled and observed SFS (cf. plot_sfs_comparison()).

p <- fd$BaseInference$plot_sfs_comparison(inf)

../../_images/5239675be58fde45cef210c503f9d0eba930a56c622744fcdd1fa22abe9db9d6.png

Bootstrapping#

To quantify uncertainly we can perform parametric bootstrapping (cf. bootstrap())

bs <- inf$bootstrap()

# redo the plotting
p <- fd$BaseInference$plot_discretized(inf)

BaseInference>Bootstrapping (2 runs each): 100%|██████████| 100/100 [00:02<00:00, 46.82it/s]
INFO:BaseInference: Bootstrap summary: {all.S_d: -5.482e+04 ± 3.9e+04, all.b: 0.1331 ± 0.022, all.p_b: 0 ± 0, all.S_b: 1 ± 0, all.eps: 0 ± 0, all.h: 0.5 ± 0, likelihood: -42.9 ± 5.9, i_best_run: 0.47 ± 0.5, likelihoods_std: 0.003011 ± 0.022} (mean ± std)

../../_images/f88fe589bd54713f79f29702d9c29e63ec65a38b1f2ff739e91fba06e5a1f880.png

By default, we perform 2 optimization runs per bootstrap sample taking the best result (cf. n_bootstrap_retries). The standard deviation across runs is computed for each bootstrap sample, and the average of these standard deviations across all samples is reported to summarize the uncertainty of the bootstrap estimates. In this case, we can see that the uncertainty is quite low indicating reliable bootstrap estimates.

Quickstart

Contents

Quickstart#

Bootstrapping#