Skip to content

Overview of the reference files

References, panel of normals and design files used by the pipeline are listed below. They are defined in the config/config.yaml file and can be found in the ref_data/ folder of the project.

config entry / rule sub-entry file description
reference fasta GRCh38.fasta Reference genome in FASTA format
fai GRCh38.fasta.fai Index file for the reference genome
trf GRCh38.trf.bed Human tandem repeats (used by Severus and PBSV)
design_bed expected_coverage_annotated.bed BED file with panel target regions
sv_databases gnomad.v4.1.sv.sites.no_cnv.PASS.vcf.gz, sv_normal.vcf.gz Population SV VCF files queried directly by SVDB. List — add new population databases here.
cnvkit_batch normal_reference cnvkit.PoN.cnn Panel of normals for CNVkit
deepsomatic_t_only pon snv_normal.vcf.gz Panel of normals for DeepSomatic (used when use_deepsomatic: true)
severus_t_only pon PoN_1000G_hg38.tsv.gz Panel of normals for Severus (1000 Genomes project)
vntr GRCh38.trf.bed Tandem repeat BED file passed to Severus via --vntr
bcftools_filter_include_region panel expected_coverage_annotated.bed Regions to include in variant calling
general_report config/report.yaml General report configuration file