Skip to content

SAI

sai is a Python package for Statistics for Adaptive Introgression. It detects candidate regions of adaptive introgression from population genomic datasets. Currently, it supports:

  • The average difference of squence divergence (\(D_D\) statistic) proposed by Huang et al. (2025).
  • The \(D^+\) and \(D_{anc}\) statistics proposed by Fang et al. (2024).
  • The distance fraction (\(d_f\) statistic) proposed by Pfeifer and Kapan (2019).
  • The number of uniquely shared sites (\(U\) statistic) and the quantile of the derived allele frequencies in such sites (\(Q\) statistic) proposed by Racimo et al. (2017).
  • The dynamic estimator of the proportion of introgression (\(f_d\) statistic) proposed by Martin et al. (2015).

sai does not require phased data, and supports an arbitrary number of source/donor populations (i.e., populations assumed to provide introgressed material) and arbitrary ploidy.

Requirements

sai works on Linux operating systems and tested with the following:

  • matplotlib=3.9.1
  • natsort=8.4.0
  • numpy=1.26.4
  • pandas=2.2.1
  • pysam=0.23.0
  • python=3.9.19
  • pytest=8.1.1
  • pytest-cov=6.0.0
  • scikit-allel=1.3.7
  • scipy=1.12.0

Installation

Users can first install mamba, and then install sai using the following commands:

git clone https://github.com/xin-huang/sai
cd sai
mamba env create -f build-env.yaml
mamba activate sai
pip install .

Users can also install sai from PYPI:

pip install sai-pg

Help

To get help information, users can use:

sai -h

This will display information for two commands:

Command Description
score Run the score command based on specified parameters
outlier Detect and output outlier rows based on quantile thresholds

If you need further help, such as such as reporting a bug or suggesting a feature, please open an issue.