# Scripts for the Publication: Ullrich KK, Tautz D ## Data sources Genome mapping files for _Mus musculus domesticus GER_, _Mus musculus domesticus FRA_, _Mus musculus domesticus IRA_, _Mus musculus musculus AFG_, _Mus musculus castaneus CAS_ and _Mus spretus SPRE_ were obtained from <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_domesticus/genomes_bam/>, <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_musculus/genomes_bam/>, <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_castaneus/genomes_bam/>, <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_spretus/genomes_bam/>. For mapping details please look into the original publication ([Harr et al. 2016](http://www.nature.com/articles/sdata201675)) <http://www.nature.com/article-assets/npg/sdata/2016/sdata201675/extref/sdata201675-s7.docx>. ## Get masking regions for individual samples and natural populations For masking genomic regions in natural populations which showed low coverage based on the genomic mapping BAM files we only considered the stable chromosomes from the reference GRCm38 _mm10_ <http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/>. The BAM files were processed with 'genomeCoverageBed' to obtain site specific genome coverage and further united with 'unionBedGraphs'. The per population combined coverage was further processed to only retain regions with a coverage smaller than 5 resulting as the masking regions. genomeCoverageBed example for the population _Mus musculus musculus AFG_: ``` #example for populaton Mmm_AFG: # #used BAM files: # #AFG1_396.bam #AFG2_413.bam #AFG3_416.bam #AFG4_424.bam #AFG5_435.bam #AFG6_444.bam $REFERENCE=mm10.fasta for file in *.bam; do genomeCoverageBed -ibam $file -bga -g $REFFERENCE > $file".bga";done ``` unionBedGraphs example for the population _Mus musculus musculus AFG_: ``` INPUT1=AFG1_396.bam.bga INPUT2=AFG2_413.bam.bga INPUT3=AFG3_416.bam.bga INPUT4=AFG4_424.bam.bga INPUT5=AFG5_435.bam.bga INPUT6=AFG6_444.bam.bga OUTPUT=Mmm_AFG.combined.bga unionBedGraphs -i $INPUT1 $INPUT2 $INPUT3 $INPUT4 $INPUT5 $INPUT6 | awk -v OFS='\t' 'BEGIN {sum=0} {for (i=4: i<=NF; i++) sum+=$1; print $1,$2,$3,sum; sum=0}' > $OUTPUT ``` get masking region example for the population _Mus musculus musculus AFG_: ``` INPUT=Mmm_AFG.combined.bga OUTPUT=Mmm_AFG.combined.bga.stcov5 awk '{if($4<5) print $0}' $INPUT > $INPUT".stcov5" bedtools merge -i $INPUT".stcov5" > $INPUT".stcov5.merge" awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT ``` _used software:_ + bedtools v2.24.0 <http://bedtools.readthedocs.io/en/latest/> + awk ## SNP and INDEL calling _used software:_ ## K80 distance calculation ### Get population specific SNPs _used software:_ ### Calculate population specific Consensus sequence _used software:_ ### Calculate K80 distance between populations ## Dxy distance calculation ### Calculate Dxy distance between populations _used software:_ ### Calculate Dxy distance between individuals and populations _used software:_