Spectrum class#
- class Spectrum(data: Sequence[float])[source]#
Bases:
IterableClass for holding and manipulating a site-frequency spectrum.
- __init__(data: Sequence[float])[source]#
Initialize spectrum.
- Parameters:
data (
Sequence[float]) – SFS counts
- property n: int#
The sample size.
- Returns:
Sample size
- property n_sites: float#
The total number of sites.
- Returns:
Total number of sites
- property n_div: float#
Number of divergence counts.
- Returns:
Number of divergence counts
- property has_div: bool#
Whether n_div was specified.
- Returns:
Whether n_div was specified
- property n_monomorphic: float#
Number of monomorphic sites.
- Returns:
Number of monomorphic sites
- property polymorphic: ndarray#
Get the polymorphic counts.
- Returns:
Polymorphic counts
- property n_polymorphic: ndarray#
Get the polymorphic counts.
- Returns:
Polymorphic counts
- static from_file(file: str)[source]#
Load object from file.
- Parameters:
file (
str) – File name- Return type:
- Returns:
Spectrum object
- property theta: float#
Calculate site-wise population mutation rate using Watterson’s estimator. Note that theta is given per site, i.e. Watterson’s estimator is divided by the total number of sites (
n_sites).
- property Theta: float#
Calculate genome-wide population mutation rate using Watterson’s estimator.
- misidentify(epsilon: float)[source]#
Introduce ancestral misidentification at rate epsilon. Note that monomorphic counts won’t be affected.
- Parameters:
epsilon (
float) – Misidentification rate (0 <= epsilon <= 1)- Return type:
- Returns:
Spectrum with misidentification applied
- Raises:
ValueError – If epsilon is not between 0 and 1
- subsample(n: int, mode: Literal['random', 'probabilistic'] = 'probabilistic', seed: int | Generator = None)[source]#
Subsample spectrum to a given sample size.
Warning
If using the ‘random’ mode, The SFS counts are cast to integers before subsampling so this will only provide sensible results if the SFS counts are integers or if they are large enough to be approximated well by integers. The ‘probabilistic’ mode does not have this limitation.
- Parameters:
n (
int) – Sample sizemode (
Literal['random','probabilistic']) – Subsampling mode. Either ‘random’ or ‘probabilistic’.seed (
int|Generator) – Random state or seed. Only for ‘random’ mode.
- Return type:
- Returns:
Subsampled spectrum
- resample(seed: int | Generator = None)[source]#
Resample SFS assuming independent Poisson counts.
- Parameters:
seed (
int|Generator) – Random state or seed- Return type:
- Returns:
Resampled spectrum.
- is_folded()[source]#
Check if the site-frequency spectrum is folded.
- Return type:
bool- Returns:
True if folded, False otherwise
- normalize()[source]#
Normalize SFS so that all non-monomorphic counts add up to 1.
- Return type:
- Returns:
Normalized spectrum
- static from_polymorphic(data: Sequence)[source]#
Create Spectrum from polymorphic counts only.
- Parameters:
data (
Sequence) – Polymorphic counts- Return type:
- Returns:
Spectrum
- static from_list(data: Sequence)[source]#
Create Spectrum from list.
- Parameters:
data (
Sequence) – SFS counts- Return type:
- Returns:
Spectrum
- static from_polydfe(polymorphic: Sequence, n_sites: float, n_div: float)[source]#
Create Spectra from polyDFE specification which treats the number of mutational target sites and the divergence counts separately.
- Parameters:
polymorphic (
Sequence) – Polymorphic countsn_sites (
float) – Total number of sitesn_div (
float) – Number of divergence counts
- Return type:
- Returns:
Spectrum
- plot(show: bool = True, file: str = None, title: str = None, log_scale: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)[source]#
Plot spectrum.
- Parameters:
show (
bool) – Whether to show plot.file (
str) – File to save plot to.title (
str) – Title of plot.log_scale (
bool) – Whether to use log scale on y-axis.show_monomorphic (
bool) – Whether to show monomorphic counts.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes
- static standard_kingman(n: int, n_monomorphic: int = 0)[source]#
Get standard Kingman SFS.
- Parameters:
n (
int) – sample sizen_monomorphic (
int) – Number of monomorphic sites.
- Return type:
- Returns:
Standard Kingman SFS
- static get_neutral(theta: float, n_sites: float, n: int, r: Sequence[float] = None)[source]#
Obtain a standard neutral SFS for a given theta and number of sites.
- Parameters:
theta (
float) – Population mutation raten_sites (
float) – Number of total sitesn (
int) – Number of frequency classesr (
Sequence[float]) – Nuisance parameters that account for demography. An array of lengthn-1whose elements are multiplied element-wise with the polymorphic counts of the Kingman SFS. By default, no demography effects are considered which is equivalent tor = [1] * (n-1). Note that non-default values ofrwill also affect estimates of the population mutation rate.
- Return type:
- Returns:
Neutral SFS