Configuration
Essential Setup (main.yaml)
Most users only need to modify these core settings:
Species and Populations (Human Example)
# Main configuration for species and data paths
testing_mode: true # Set to false for production analysis
# Species identification
species: "Human"
# Populations to analyze
populations:
- YRI
- CHS
# Data location and file naming
data:
# Start with a small subset for testing
chromosomes:
- 21
- 22
# Tell selscape how your VCF files are named
vcf_files:
base_path: "resources/data"
file_prefix: ""
file_suffix: ".vcf.gz"
chr_name: "{ppl}.chr{i}" # {ppl} = population, {i} = chromosome
# Path to your sample metadata file
metadata: "resources/data/1KG_metadata.txt"
# Reference genome
reference:
species_code: "hg38"
human_code: "hg38"
Method Parameters
Betascan (Balancing Selection) ```yaml
core_frequencies: - 0.15 # Most sensitive for detecting balancing selection
frequency_filters: min_af: 0.05 # Remove rare variants max_af: 0.95 # Remove nearly fixed variants
folding:
use_folded: true # Use if you don't have ancestral states
**Selscan (Positive Selection)**
yaml
data_type:
unphased: true # Set to false if you have phased data
statistics: within_population: [ihs, nsl] cross_population: [xpehh, xpnsl]
frequency_filters: maf_thresholds: [0.05] # Minor allele frequency cutoff ```
Advanced Configuration
main.yaml Parameters
Parameter | Type | Description | Default |
---|---|---|---|
testing_mode | boolean | Use example data for testing | false |
species | string | Species identifier for outputs | "Human" |
populations | list | Population code to analyze | ["YRI", "CHS"] |
data.chromosomes | list | Chromosomes to include | [21, 22] |
data.vcf_files.base_path | string | VCF file directory | "resources/data" |
data.vcf_files.chr_name | string | Chromosome naming pattern | "{ppl}.chr{i}" |
reference.species_code | string | Reference genome build | "hg38" |
data.metadata | string | Sample metadata file path | "resources/data/1KG_metadata.txt" |
data.vcf_files.file_prefix | string | VCF filename prefix | "" |
data.vcf_files.file_suffix | string | VCF filename suffix | ".vcf.gz" |
data_sources | dict | Data source configurations | see config |
reference.annotation_reference | string | Reference genome for annotation databases | "hg38" |
betascan.yaml Parameters | Parameter |Type |Description | Default | |----------------|-------------------------------|-----------------------------|-----------------------------| |core_frequencies|list |Core allele frequencies for B1 | [0.15]| |frequency_filters.min_af |float |Minimum allele frequency |0.05 | |frequency_filters.max_af |float|Maximum allele frequency|0.95 | |folding.use_folded |boolean|Use folded SFS|true | |thresholds.top_percent |float|Top % for candidates|0.0005 | |quality_control.hwe_pvalue|float|Hardy-Weinberg p-value threshold|0.001| |quality_control.mask_repeats|boolean|Exclude repetitive regions|true|
selscan.yaml Parameters | Parameter |Type |Description | Default | |----------------|-------------------------------|-----------------------------|-----------------------------| |data_type.unphased|boolean |Handle unphased data | true| |statistics.within_population |list | Within-pop statistics |[ihs, nsl] | |statistics.cross_population |list|Cross-pop statistics|[xpehh, xpnsl] | |frequency_filters.maf_thresholds |list|MAF cutoffs|[0.05] |
dadi-cli.yaml Parameters | Parameter |Type |Description | Default | |----------------|-------------------------------|-----------------------------|-----------------------------| |flags.ratio|float |Transition/transversion ratio | 2.31| |flags.optimizations |integer|Number of optimization runs | 50| |grid_size.inference |string|Grid size for inference| "300 400 500"| |gamma_pts |integer|Gamma integration points| 2000| |grid_size.cache|string|Grid size for caching|"800 1000 1200"| |flags.bootstrap_replicates|integer|Bootstrap replicates|100| |flags.chunk_size|integer|Chunk size for bootstrap|1000000|
annovar.yaml Parameters |Parameter|Type|Description|Default| |----------------|-------------------------------|-----------------------------|-----------------------------| |species.{species}.genome_build|string|Reference genome build|"hg38"| |settings.threads|integer|Annotation threads|8|
Please check Snakemake Configuration documentation for additional information.