Gitlab Community Edition Instance

Skip to content
Snippets Groups Projects
README.md 2.96 KiB
Newer Older
Kristian Ullrich's avatar
Kristian Ullrich committed
# Scripts for the Publication:

Kristian Ullrich's avatar
Kristian Ullrich committed
Ullrich KK, Tautz D

Kristian Ullrich's avatar
Kristian Ullrich committed
## Data sources

Kristian Ullrich's avatar
Kristian Ullrich committed
Genome mapping files for _Mus musculus domesticus GER_, _Mus musculus domesticus FRA_, _Mus musculus domesticus IRA_, _Mus musculus musculus AFG_, _Mus musculus castaneus CAS_ and _Mus spretus SPRE_ were obtained from <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_domesticus/genomes_bam/>, <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_musculus/genomes_bam/>, <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_m_castaneus/genomes_bam/>, <http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/m_spretus/genomes_bam/>.

For mapping details please look into the original publication ([Harr et al. 2016](http://www.nature.com/articles/sdata201675)) <http://www.nature.com/article-assets/npg/sdata/2016/sdata201675/extref/sdata201675-s7.docx>.
Kristian Ullrich's avatar
Kristian Ullrich committed
## Get masking regions for individual samples and natural populations

Kristian Ullrich's avatar
Kristian Ullrich committed
For masking genomic regions in natural populations which showed low coverage based on the genomic mapping BAM files we only considered the stable chromosomes from the reference GRCm38 _mm10_ <http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/>.

The BAM files were processed with 'genomeCoverageBed' to obtain site specific genome coverage and further united with 'unionBedGraphs'. The per population combined coverage was further processed to only retain regions with a coverage smaller than 5 resulting as the masking regions.

genomeCoverageBed example for the population _Mus musculus musculus AFG_:
```
#example for populaton Mmm_AFG:
#
#used BAM files:
#
#AFG1_396.bam
#AFG2_413.bam
#AFG3_416.bam
#AFG4_424.bam
#AFG5_435.bam
#AFG6_444.bam

$REFERENCE=mm10.fasta

for file in *.bam; do genomeCoverageBed -ibam $file -bga -g $REFFERENCE > $file".bga";done
```

unionBedGraphs example for the population _Mus musculus musculus AFG_:
```
INPUT1=AFG1_396.bam.bga
INPUT2=AFG2_413.bam.bga
INPUT3=AFG3_416.bam.bga
INPUT4=AFG4_424.bam.bga
INPUT5=AFG5_435.bam.bga
INPUT6=AFG6_444.bam.bga

OUTPUT=Mmm_AFG.combined.bga

unionBedGraphs -i $INPUT1 $INPUT2 $INPUT3 $INPUT4 $INPUT5 $INPUT6 | awk -v OFS='\t' 'BEGIN {sum=0} {for (i=4: i<=NF; i++) sum+=$1; print $1,$2,$3,sum; sum=0}' > $OUTPUT
```

get masking region example for the population _Mus musculus musculus AFG_:
```
INPUT=Mmm_AFG.combined.bga
OUTPUT=Mmm_AFG.combined.bga.stcov5
awk '{if($4<5) print $0}' $INPUT > $INPUT".stcov5"
bedtools merge -i $INPUT".stcov5" > $INPUT".stcov5.merge"
awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT
```
Kristian Ullrich's avatar
Kristian Ullrich committed
_used software:_
Kristian Ullrich's avatar
Kristian Ullrich committed
+ bedtools v2.24.0 <http://bedtools.readthedocs.io/en/latest/>
+ awk
Kristian Ullrich's avatar
Kristian Ullrich committed

## SNP and INDEL calling

Kristian Ullrich's avatar
Kristian Ullrich committed
_used software:_
Kristian Ullrich's avatar
Kristian Ullrich committed

## K80 distance calculation

### Get population specific SNPs

Kristian Ullrich's avatar
Kristian Ullrich committed
_used software:_
Kristian Ullrich's avatar
Kristian Ullrich committed

### Calculate population specific Consensus sequence

Kristian Ullrich's avatar
Kristian Ullrich committed
_used software:_
Kristian Ullrich's avatar
Kristian Ullrich committed

### Calculate K80 distance between populations

## Dxy distance calculation

### Calculate Dxy distance between populations

Kristian Ullrich's avatar
Kristian Ullrich committed
_used software:_
Kristian Ullrich's avatar
Kristian Ullrich committed

### Calculate Dxy distance between individuals and populations

Kristian Ullrich's avatar
Kristian Ullrich committed
_used software:_