Spectrum class#

class Spectrum(data: Sequence[float])[source]#

Bases: Iterable

Class for holding and manipulating a site-frequency spectrum.

__init__(data: Sequence[float])[source]#

Initialize spectrum.

Parameters:

data (Sequence[float]) – SFS counts

property n: int#

The sample size.

Returns:

Sample size

property n_sites: float#

The total number of sites.

Returns:

Total number of sites

property n_div: float#

Number of divergence counts.

Returns:

Number of divergence counts

property has_div: bool#

Whether n_div was specified.

Returns:

Whether n_div was specified

property n_monomorphic: float#

Number of monomorphic sites.

Returns:

Number of monomorphic sites

property polymorphic: ndarray#

Get the polymorphic counts.

Returns:

Polymorphic counts

property n_polymorphic: ndarray#

Get the polymorphic counts.

Returns:

Polymorphic counts

to_list()[source]#

Convert to list.

Return type:

list

Returns:

SFS counts

to_spectra()[source]#

Convert to Spectra object.

Return type:

Spectra

Returns:

Spectra object

to_file(file: str)[source]#

Save object to file.

Parameters:

file (str) – File name

static from_file(file: str)[source]#

Load object from file.

Parameters:

file (str) – File name

Return type:

Spectrum

Returns:

Spectrum object

to_numpy()[source]#

Convert to array.

Return type:

ndarray

Returns:

SFS counts

property theta: float#

Calculate site-wise population mutation rate using Watterson’s estimator. Note that theta is given per site, i.e. Watterson’s estimator is divided by the total number of sites (n_sites).

property Theta: float#

Calculate genome-wide population mutation rate using Watterson’s estimator.

Note

Property Theta is not normalized by the total number of sites, unlike theta.

fold()[source]#

Fold the site-frequency spectrum.

Return type:

Spectrum

Returns:

Folded spectrum

misidentify(epsilon: float)[source]#

Introduce ancestral misidentification at rate epsilon. Note that monomorphic counts won’t be affected.

Parameters:

epsilon (float) – Misidentification rate (0 <= epsilon <= 1)

Return type:

Spectrum

Returns:

Spectrum with misidentification applied

Raises:

ValueError – If epsilon is not between 0 and 1

subsample(n: int, mode: Literal['random', 'probabilistic'] = 'probabilistic', seed: int | Generator = None)[source]#

Subsample spectrum to a given sample size.

Warning

If using the ‘random’ mode, The SFS counts are cast to integers before subsampling so this will only provide sensible results if the SFS counts are integers or if they are large enough to be approximated well by integers. The ‘probabilistic’ mode does not have this limitation.

Parameters:
  • n (int) – Sample size

  • mode (Literal['random', 'probabilistic']) – Subsampling mode. Either ‘random’ or ‘probabilistic’.

  • seed (int | Generator) – Random state or seed. Only for ‘random’ mode.

Return type:

Spectrum

Returns:

Subsampled spectrum

resample(seed: int | Generator = None)[source]#

Resample SFS assuming independent Poisson counts.

Parameters:

seed (int | Generator) – Random state or seed

Return type:

Spectrum

Returns:

Resampled spectrum.

is_folded()[source]#

Check if the site-frequency spectrum is folded.

Return type:

bool

Returns:

True if folded, False otherwise

normalize()[source]#

Normalize SFS so that all non-monomorphic counts add up to 1.

Return type:

Spectrum

Returns:

Normalized spectrum

copy()[source]#

Copy the spectrum.

Return type:

Spectrum

Returns:

Copy of the spectrum

static from_polymorphic(data: Sequence)[source]#

Create Spectrum from polymorphic counts only.

Parameters:

data (Sequence) – Polymorphic counts

Return type:

Spectrum

Returns:

Spectrum

static from_list(data: Sequence)[source]#

Create Spectrum from list.

Parameters:

data (Sequence) – SFS counts

Return type:

Spectrum

Returns:

Spectrum

static from_polydfe(polymorphic: Sequence, n_sites: float, n_div: float)[source]#

Create Spectra from polyDFE specification which treats the number of mutational target sites and the divergence counts separately.

Parameters:
  • polymorphic (Sequence) – Polymorphic counts

  • n_sites (float) – Total number of sites

  • n_div (float) – Number of divergence counts

Return type:

Spectrum

Returns:

Spectrum

plot(show: bool = True, file: str = None, title: str = None, log_scale: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#

Plot spectrum.

Parameters:
  • show (bool) – Whether to show plot.

  • file (str) – File to save plot to.

  • title (str) – Title of plot.

  • log_scale (bool) – Whether to use log scale on y-axis.

  • show_monomorphic (bool) – Whether to show monomorphic counts.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes

static standard_kingman(n: int, n_monomorphic: int = 0)[source]#

Get standard Kingman SFS.

Parameters:
  • n (int) – sample size

  • n_monomorphic (int) – Number of monomorphic sites.

Return type:

Spectrum

Returns:

Standard Kingman SFS

static get_neutral(theta: float, n_sites: float, n: int, r: Sequence[float] = None)[source]#

Obtain a standard neutral SFS for a given theta and number of sites.

Parameters:
  • theta (float) – Population mutation rate

  • n_sites (float) – Number of total sites

  • n (int) – Number of frequency classes

  • r (Sequence[float]) – Nuisance parameters that account for demography. An array of length n-1 whose elements are multiplied element-wise with the polymorphic counts of the Kingman SFS. By default, no demography effects are considered which is equivalent to r = [1] * (n-1). Note that non-default values of r will also affect estimates of the population mutation rate.

Return type:

Spectrum

Returns:

Neutral SFS

scale_theta(theta: float)[source]#

Scale the spectrum to a different theta value by

Parameters:

theta (float) – New theta value

Return type:

Spectrum

Returns:

Scaled spectrum