Gitlab Community Edition Instance

Skip to content
Snippets Groups Projects
Commit 9df63214 authored by Kristian Ullrich's avatar Kristian Ullrich
Browse files

added 'get_dK80.r' description

parent 47b16e01
No related branches found
No related tags found
No related merge requests found
......@@ -23,9 +23,9 @@ The per population combined coverage was further processed to only retain region
genomeCoverageBed example for the population _Mus musculus musculus AFG_:
```
#example for populaton Mmm_AFG:
#
#used BAM files:
#
#AFG1_396.bam
#AFG2_413.bam
#AFG3_416.bam
......@@ -92,9 +92,9 @@ For SNP and INDEL calling the BAM files were processed with 'samtools mpileup' a
samtools mpileup | bcftools call example for chromosome 1 for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
#used BAM files:
#
#AFG1_396.bam
#AFG2_413.bam
#AFG3_416.bam
......@@ -168,9 +168,9 @@ To get population specific CONSENSUS VCF files the VCF file produced with 'bcfto
vcftools example for chromosome 1 for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
#VCF IDs:
#
#396
#413
#416
......@@ -202,9 +202,9 @@ NOTE: For each population all analyzed chromosomes were merged into one file.
vcfparser.py mvcf2consensus example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
#VCF IDs:
#
#396
#413
#416
......@@ -227,7 +227,6 @@ NOTE: For each population all analyzed chromosomes were merged into one file.
vcfparser.py vcf2fasta example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
REFERENCE=mm10.fasta
INPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.vcf
......@@ -237,11 +236,9 @@ MASKFILE=Mmm_AFG.combined.bga.stcov5
python vcfparser.py vcf2fasta -ivcf $INPUT -o $OUTPUT -R $REFERENCE -samples Mmm_AFG.mv -chr chr1 -ibga $MASKFILE -cov2N 4
```
All pseudo-genome FASTA files can be obtained from:
http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/fasta/consensus/
_used software:_
......@@ -250,6 +247,58 @@ _used software:_
### Calculate dK80 distance between populations using the CONSENSUS pseudo-genome files
To calculate dK80 distance between populations, quartets ([X],[Y],[Z],[O]) are used. Within the population quartets, dK80 is calculated for trios (e.g. [X],[Z],[O]) on non-verlapping sequence windows ($w$) throughout the investigated genome between this population triplet on a window ($w_{i}$) as:
\begin{equation}
dK80_{XZO_{w_{i}}} = d_{XO_{w_{i}}} - d_{XZ_{w_{i}}}
\end{equation}
where $d_{XO_{w_{i}}} and $d_{XZ_{w_{i}}} are defined as the average Kimura's 2-parameter sequence distance ([Kimura 1980](https://www.ncbi.nlm.nih.gov/pubmed/7463489)) between the corresponding two populations calculated with the function 'dist.dna' of the of the R package 'ape' ([Paradis et al. 2004](https://academic.oup.com/bioinformatics/article/20/2/289/204981/APE-Analyses-of-Phylogenetics-and-Evolution-in-R)) using the model 'K80'. Prior the calculation of dK80 all sites with missing data within the specified window ($w_{i}$) and the specified populations were removed across the whole quartet with the 'Biostrings' R package ([Pages et al. 2009](https://bioconductor.org/packages/release/bioc/html/Biostrings.html)).
example for chromosome 1 for the dK80 calculation for the quartet [X]: _Mus musculus domesticus FRA; [Y]: _Mus musculus domesticus GER; [Z]: _Mus musculus domesticus IRA; [O]: _Mus musculus musculus AFG_:
```
#example for the quartet [X]: FRA; [Y]: GER; [Z]: IRA; [O]: AFG
#change the bottom part of the script 'get_dK80.r' for each chromosome and quartet
#here you can find the example for chromosome 1
popX <- "Mmd_FRA"
popY <- "Mmd_GER"
popZ <- "Mmd_IRA"
popO <- "Mmm_AFG"
TMP_DIR <- "/tmp"
popX.pos <- 2
popY.pos <- 3
popZ.pos <- 4
popO.pos <- 5
SEQ_FILE <- "http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/fasta/consensus/CAS_FRA_GER_IRA_AFG_SPRE.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.refmajorsample.ref.consensus.fasta"
chr <- "chr1"
OUT_FILE <- paste0(TMP_DIR,"/",popX,"_",popY,".",popZ,".",popO,".",chr,".tsv")
WSIZE <- 25000
WJUMP <- 25000
DISTMODEL <- "K80"
```
All dK80 files can be obtained from:
http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/browser_tracks/dK80/25kbp_sw/
NOTE: For each quartet comparison all analyzed chromosomes were merged into one file.
_used software:_
+ R version 3.4.1 (2017-06-30)
+ R package ape_4.1
+ R package Biostrings_2.40.2
+ R package S4Vectors_0.10.3
+ R package XVector_0.12.1
+ R package IRanges_2.6.1
+ R package BiocGenerics_0.18.0
## Simulation
To simulate genomes, first we estimated the number of pair-wise segregating sites with the CONSENSUS pseudo-genome files adding the reference mm10. Subsequently we used
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment