Skip to content

Calculate significant S* score thresholds

Input

To determine the significance of S* scores, users should use data from simulations. Users need to provide data from a simulated dataset (e.g. gravel_asn_scale_60k.simulated.data) for building a generalized additive model to predict the expected S* score under a demographic model without introgression. This file can be created with the command sstar quantile or the sstar pipeline sstar.snake in sstar-analysis. In the simulation, S* is calculated for a given number of SNPs per window and a given recombination rate. For each value of number of SNPs (or a grid of values in case both number of SNPs and recombination rate are used), the simulation should be of succifient scale (e.g. simulating 20,000 windows), of the same window size used for calculating S* in real data (e.g. 50kbp), and with the same number of individuals used in real data.

An example for the simulated dataset (using the software ms) is below:

S*_score SNP_number quantile log(local_recomb_rate)
0 10 0.8 -7.7500000013365

The meaning of each column:

  • The S*_score column is the S* score at a given quantile (column 3), in a simulated dataset for a given number of SNPs (column 2) and recombination rate (column 4).
  • The SNP_number column is the number of SNPs defined to occur within all windows of a simulated dataset for which quantiles and S* scores are obtained.
  • The quantile column is the quantile at which the S* score in column 1 is observed in the simulated dataset
  • The log(local_recomb_rate) column is the logarithm of the local recombination rate of a simulated dataset.

Users also need to provide the file containing S* scores estimated from real data (e.g. test.score.exp.results). Users can use a BED file (e.g. hum.windows.50k.10k.recomb.map) to specify local recombination rates across the genome. If a window is not found in the recombination map, while it is in the output file from sstar score, then this window will be ignored when calculating significant thresholds.

Users can estimate significant S* score thresholds giving a quantile 0.99 with the following command:

sstar threshold --score test.score.exp.results --sim-data gravel_asn_scale_60k.simulated.data --recomb-map hum.windows.50k.10k.recomb.map --quantile 0.99 --output test.threshold.results

The expected result above can be found in test.threshold.exp.results

Output

An example for the output is below:

chrom start end sample S*_score expected_S*_score local_recomb_rate quantile significant
21 9480000 9530000 NA06984 48493 53263.269232 1.29162 0.99 False
21 9650000 9700000 NA06984 55639 53754.835563 1.29162 0.99 True

The meanings of the first to fifth columns are the same as those in the output from sstar score. The meanings of the remaining columns:

  • The expected_S*_score column is the expected S* score calculated under the null model (i.e. without introgression).
  • The local_recomb_rate column is the local recombination rate in the current window.
  • The quantile column is the quantile used to determing whether the S* score is significant.
  • The significant column is an indicator indicating whether the S* score is significant (i.e. S*_score > expected_S*_score).

Settings

In the absence of a recombination map for the tested population, a uniform recombination rate can be used. Users may add the argument --recomb-rate instead of a recombination map to estimate thresholds.