Scripts for the Publication:
Ullrich KK, Tautz D
Data sources
Genome mapping files for Mus musculus domesticus GER, Mus musculus domesticus FRA, Mus musculus domesticus IRA, Mus musculus musculus AFG, Mus musculus castaneus CAS and Mus spretus SPRE were obtained from http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_domesticus/genomes_bam/, http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_musculus/genomes_bam/, http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_castaneus/genomes_bam/, http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_spretus/genomes_bam/.
For mapping details please look into the original publication (Harr et al. 2016) http://www.nature.com/article-assets/npg/sdata/2016/sdata201675/extref/sdata201675-s7.docx.
Get masking regions for individual samples and natural populations
For masking genomic regions in natural populations which showed low coverage based on the genomic mapping BAM files we only considered the stable chromosomes from the reference GRCm38 mm10 http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/.
The BAM files were processed with 'genomeCoverageBed' to obtain site specific genome coverage and further united with 'unionBedGraphs'. The per population combined coverage was further processed to only retain regions with a coverage smaller than 5 resulting as the masking regions.
genomeCoverageBed example for the population Mus musculus musculus AFG:
#example for populaton Mmm_AFG:
#
#used BAM files:
#
#AFG1_396.bam
#AFG2_413.bam
#AFG3_416.bam
#AFG4_424.bam
#AFG5_435.bam
#AFG6_444.bam
$REFERENCE=mm10.fasta
for file in *.bam; do genomeCoverageBed -ibam $file -bga -g $REFFERENCE > $file".bga";done
unionBedGraphs example for the population Mus musculus musculus AFG:
INPUT1=AFG1_396.bam.bga
INPUT2=AFG2_413.bam.bga
INPUT3=AFG3_416.bam.bga
INPUT4=AFG4_424.bam.bga
INPUT5=AFG5_435.bam.bga
INPUT6=AFG6_444.bam.bga
OUTPUT=Mmm_AFG.combined.bga
unionBedGraphs -i $INPUT1 $INPUT2 $INPUT3 $INPUT4 $INPUT5 $INPUT6 | awk -v OFS='\t' 'BEGIN {sum=0} {for (i=4: i<=NF; i++) sum+=$1; print $1,$2,$3,sum; sum=0}' > $OUTPUT
get masking region example for the population Mus musculus musculus AFG:
INPUT=Mmm_AFG.combined.bga
OUTPUT=Mmm_AFG.combined.bga.stcov5
awk '{if($4<5) print $0}' $INPUT > $INPUT".stcov5"
bedtools merge -i $INPUT".stcov5" > $INPUT".stcov5.merge"
awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT
used software:
- bedtools v2.24.0 http://bedtools.readthedocs.io/en/latest/
- awk
SNP and INDEL calling
used software:
K80 distance calculation
Get population specific SNPs
used software:
Calculate population specific Consensus sequence
used software:
Calculate K80 distance between populations
Dxy distance calculation
Calculate Dxy distance between populations
used software:
Calculate Dxy distance between individuals and populations
used software: