Gitlab Community Edition Instance

Skip to content
Snippets Groups Projects
Commit f543a201 authored by Kristian Ullrich's avatar Kristian Ullrich
Browse files

changed general setup

parent d3e7253d
No related branches found
No related tags found
No related merge requests found
# Scripts for the Publication:
Ullrich KK, Tautz D
Introgression patterns between house mouse subspecies and species reveal genomic windows of frequent exchange
Ullrich KK, Linnenbrink M, Tautz D
## Data sources
......@@ -10,9 +12,9 @@ For mapping details please look into the original publication ([Harr et al. 2016
## Get masking regions for individual samples and natural populations
For masking genomic regions in natural populations which showed low coverage based on the genomic mapping BAM files we only considered the stable chromosomes from the reference GRCm38 _mm10_ <http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/>.
For masking genomic regions in natural populations which showed low coverage based on the genomic mapping BAM files we only considered the stable chromosomes (chr1-chr19,chrX,chrY,chrM) from the reference GRCm38 _mm10_ <http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/>.
The BAM files were processed with 'genomeCoverageBed' to obtain site specific genome coverage and further united with 'unionBedGraphs'.
The BAM files were processed with 'genomeCoverageBed' from the bedtools software suite ([Quinlan and Hall 2010](https://academic.oup.com/bioinformatics/article/26/6/841/244688/BEDTools-a-flexible-suite-of-utilities-for)) to obtain site specific genome coverage and further united with 'unionBedGraphs'.
##### Population specific masking
......@@ -31,7 +33,7 @@ genomeCoverageBed example for the population _Mus musculus musculus AFG_:
#AFG5_435.bam
#AFG6_444.bam
$REFERENCE=mm10.fasta
REFERENCE=mm10.fasta
for file in *.bam; do genomeCoverageBed -ibam $file -bga -g $REFFERENCE > $file".bga";done
```
......@@ -60,6 +62,10 @@ bedtools merge -i $INPUT".stcov5" > $INPUT".stcov5.merge"
awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT
```
United masked files generated can be obtained from:
http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/masking/
##### Individual specific masking
For individuals regions with a coverage smaller than 5 were used as masking regions.
......@@ -76,15 +82,14 @@ awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT
_used software:_
+ bedtools v2.24.0 <http://bedtools.readthedocs.io/en/latest/>
+ bedtools v2.26.0 <http://bedtools.readthedocs.io/en/latest/>
+ awk
## SNP and INDEL calling
For SNP and INDEL calling the BAM files were processed with 'samtools mpileup' and 'bcftools call' with relaxed quality options to retain information in CNV regions.
Due to the amount of data first mpileup files were generated for each BAM file and merged with 'bcftools merge'. After merging mpileup files, 'bcftools call' was used to generate the final VCF file.
For SNP and INDEL calling the BAM files were processed with 'samtools mpileup' and 'bcftools call' ([Li et al. 2011](https://academic.oup.com/bioinformatics/article/25/16/2078/204688/The-Sequence-Alignment-Map-format-and-SAMtools)) with relaxed quality options to retain information in CNV regions. Each chromosome was analyzed seperately.
samtools mpileup | bcftools call example for the population _Mus musculus musculus AFG_:
samtools mpileup | bcftools call example for chromosome 1 for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
......@@ -97,9 +102,26 @@ samtools mpileup | bcftools call example for the population _Mus musculus muscul
#AFG5_435.bam
#AFG6_444.bam
#generating indiviual mpileup files for each BAM file
REFERENCE=mm10.fasta
$REFERENCE=mm10.fasta
PLOIDY=ploidy.txt
#chrX 1 171031299 M 1
#chrY 1 91744698 M 1
#chrY 1 91744698 F 0
#chrM 1 16299 F 1
#chrM 1 16299 M 1
#* * * M 2
#* * * F 2
SAMPLES=AFG_samples.txt
#396 M
#413 M
#416 M
#424 M
#435 F
#444 M
OUTPUT=mpileup.q0Q10.chr1.Mmm_AFG.bcfcall.mv.vcf
INPUT1=AFG1_396.bam
INPUT2=AFG2_413.bam
......@@ -108,63 +130,42 @@ INPUT4=AFG4_424.bam
INPUT5=AFG5_435.bam
INPUT6=AFG6_444.bam
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -vf $REFERENCE -o $INPUT1".mpileup.q0Q10.vcf" $INPUT1
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -vf $REFERENCE -o $INPUT2".mpileup.q0Q10.vcf" $INPUT2
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -vf $REFERENCE -o $INPUT3".mpileup.q0Q10.vcf" $INPUT3
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -vf $REFERENCE -o $INPUT4".mpileup.q0Q10.vcf" $INPUT4
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -vf $REFERENCE -o $INPUT5".mpileup.q0Q10.vcf" $INPUT5
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -vf $REFERENCE -o $INPUT6".mpileup.q0Q10.vcf" $INPUT6
bgzip $INPUT1".mpileup.q0Q10.vcf"
bgzip $INPUT2".mpileup.q0Q10.vcf"
bgzip $INPUT3".mpileup.q0Q10.vcf"
bgzip $INPUT4".mpileup.q0Q10.vcf"
bgzip $INPUT5".mpileup.q0Q10.vcf"
bgzip $INPUT6".mpileup.q0Q10.vcf"
tabix $INPUT1".mpileup.q0Q10.vcf.gz"
tabix $INPUT2".mpileup.q0Q10.vcf.gz"
tabix $INPUT3".mpileup.q0Q10.vcf.gz"
tabix $INPUT4".mpileup.q0Q10.vcf.gz"
tabix $INPUT5".mpileup.q0Q10.vcf.gz"
tabix $INPUT6".mpileup.q0Q10.vcf.gz"
BAMLIST=Mmm_AFG.list
#merge individual mpileup files
echo $INPUT1 >> $BAMLIST
echo $INPUT2 >> $BAMLIST
echo $INPUT3 >> $BAMLIST
echo $INPUT4 >> $BAMLIST
echo $INPUT5 >> $BAMLIST
echo $INPUT6 >> $BAMLIST
MPILEUPLIST=AFG.mpileup.list
#example for chromosome 1
echo $INPUT1".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT2".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT3".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT4".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT5".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT6".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
MPILEUPOUTPUT=Mmm_AFG.mpileup.q0Q10.vcf.gz
bcftools merge -m all -O z -o $MPILEUPOUTPUT -l $MPILEUPLIST
#call SNP and INDEL
OUTPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
bcftools call -O z -f GQ -m -v -o $OUTPUT $MPILEUPOUTPUT
samtools mpileup -q 0 -Q 10 -A -d 99999 -t DP,AD,ADF,ADR -r chr1 -uf $REFERENCE -b $BAMLIST | bcftools call -O v -f GQ -m -v --ploidy-file $PLOIDY -S $SAMPLES > $OUTPUT
bgzip $OUTPUT
tabix $OUTPUT".gz"
```
_used software:_
+ samtools 1.3.1-36-g613501f (using htslib 1.3.1-59-g0f2a88a)
+ bcftools 1.3.1-39-gd797e86 (using htslib 1.3.1-59-g0f2a88a)
+ samtools 1.3.1 (using htslib 1.3.1)
+ bcftools 1.3.1 (using htslib 1.3.1)
+ bgzip v1.3
+ tabix v1.3
All mpileup generated for each population can be obtained from:
http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/vcf/mpileup/
NOTE: For each population all analyzed chromosomes were merged into one file.
## K80 distance calculation
### Get population specific SNPs
To get population specific CONSENSUS VCF files the VCF file produced with 'bcftools call' was first re-coded into population specific VCF files with 'vcftools'. Further the population specific VCF file containing multiple individuals was parsed with 'vcfparser.py mvcf2consensus' to obtain a CONSENSUS VCF file for each population. This CONSENSUS VCF files were used to generate pseudo-genomes files per natural population with 'vcfparser.py vcf2fasta' using also the masking regions (see "Get masking regions for individual samples and natural populations").
To get population specific CONSENSUS VCF files the VCF file produced with 'bcftools call' was first re-coded into population specific VCF files with 'vcftools' ([Danecek et al. 2011](https://academic.oup.com/bioinformatics/article/27/15/2156/402296/The-variant-call-format-and-VCFtools)). Further the population specific VCF file containing multiple individuals was parsed with 'vcfparser.py mvcf2consensus' to obtain a CONSENSUS VCF file for each population. This CONSENSUS VCF files were used to generate pseudo-genomes files per natural population with 'vcfparser.py vcf2fasta' using also the masking regions (see "Get masking regions for individual samples and natural populations").
vcftools example for the population _Mus musculus musculus AFG_:
vcftools example for chromosome 1 for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
......@@ -186,12 +187,18 @@ echo "424" >> $POPIDS
echo "435" >> $POPIDS
echo "444" >> $POPIDS
GZVCF=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
OUTPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels
GZVCF=mpileup.q0Q10.chr1.Mmm_AFG.bcfcall.mv.vcf.gz
OUTPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels
vcftools --gzvcf $GZVCF --remove-indels --recode --recode-INFO-all --non-ref-ac-any 1 --keep $POPIDS --out $OUTPUT
```
All re-coded vcf files can be obtained from:
http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/vcf/recode/
NOTE: For each population all analyzed chromosomes were merged into one file.
vcfparser.py mvcf2consensus example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
......@@ -205,86 +212,44 @@ vcfparser.py mvcf2consensus example for the population _Mus musculus musculus AF
#435
#444
INPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels.recode.vcf
OUTPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels.recode.consensus
INPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.vcf
OUTPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus
python vcfparser.py mvcf2consensus -ivcf $INPUT -o $OUTPUT -cdp 11 -chr chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY -samples 396,413,416,424,435,444 -id Mmm_AFG.mv
python vcfparser.py mvcf2consensus -ivcf $INPUT -o $OUTPUT -cdp 11 -chr chr1 -samples 396,413,416,424,435,444 -id Mmm_AFG.mv
```
All CONSENSUS vcf files can be obtained from:
http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/vcf/consensus/
NOTE: For each population all analyzed chromosomes were merged into one file.
vcfparser.py vcf2fasta example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
REFERENCE=mm10.fasta
INPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels.recode.consensus.vcf
OUTPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.chr
INPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.vcf
OUTPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.chr1
MASKFILE=Mmm_AFG.combined.bga.stcov5
python vcfparser.py vcf2fasta -ivcf $INPUT -o $OUTPUT -R $REFERENCE -samples Mmm_AFG.mv -chr chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY -ibga $MASKFILE -cov2N 4
python vcfparser.py vcf2fasta -ivcf $INPUT -o $OUTPUT -R $REFERENCE -samples Mmm_AFG.mv -chr chr1 -ibga $MASKFILE -cov2N 4
```
_used software:_
+ vcftools v0.1.15
+ vcfparser.py <https://gitlab.gwdg.de/evolgen/introgression/blob/master/scripts/vcfparser.py>
### Calculate K80 distance between populations using the CONSENSUS pseudo-genome files
## Nucleotide diversity calculations
All pseudo-genome FASTA files can be obtained from:
### Get individual specific SNPs
To get individual specific VCF files the VCF file produced with 'bcftools call' was first re-coded for each individual with 'vcftools'. Further the individual specific VCF files were used to generate pseudo-genomes files per individual with 'vcfparser.py vcf2fasta' using individual masking regions (see "Get masking regions for individual samples and natural populations").
vcftools example for individual 396 of the population _Mus musculus musculus AFG_:
```
#example for individual 396 of the population Mmm_AFG:
#
#VCF IDs:
#
#396
GZVCF=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
OUTPUT=Mmm_AFG1.396.mpileup.q0Q10.bcfcall.mv.remIndels
vcftools --gzvcf $GZVCF --remove-indels --recode --recode-INFO-all --non-ref-ac-any 1 --indv 396 --out $OUTPUT
```
vcfparser.py vcf2fasta example for individual 396 of the population _Mus musculus musculus AFG_:
```
#example for individual 396 of the population Mmm_AFG:
#
REFERENCE=mm10.fasta
INPUT=Mmm_AFG1.396.mpileup.q0Q10.bcfcall.mv.remIndels.recode.consensus.vcf
OUTPUT=Mmm_AFG1.396.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.chr
MASKFILE=Mmm_AFG.combined.bga.stcov5
python vcfparser.py vcf2fasta -ivcf $INPUT -o $OUTPUT -R $REFERENCE -samples Mmm_AFG.mv -chr chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY -ibga $MASKFILE -cov2N 4
```
_used software:_
+ vcftools v0.1.15
+ vcfparser.py <https://gitlab.gwdg.de/evolgen/introgression/blob/master/scripts/vcfparser.py>
### Calculate nucleotide diversity within each population
_used software:_
+ variscan v2.0.3
## Dxy distance calculation
### Calculate Dxy distance between populations
_used software:_
### Calculate Dxy distance between individuals and populations
_used software:_
### Calculate dK80 distance between populations using the CONSENSUS pseudo-genome files
## Simulation
To simulate genomes, first we estimated the number of pair-wise segregating sites with the CONSENSUS pseudo-genome files adding the reference mm10. Subsequently we used
#!/usr/bin/python
# -*- coding: UTF-8 -*-
'''
author: Kristian K Ullrich
date: July 2017
email: ullrich@evolbio.mpg.de
License: MIT
The MIT License (MIT)
Copyright (c) 2017 Kristian K Ullrich
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
This software relies on the following python packages:
sys
os
argparse
numpy https://pypi.python.org/pypi/numpy
Bio https://github.com/biopython/DIST
'''
import sys
import os
import argparse
from Bio import SeqIO
import numpy as np
from Bio.SeqUtils import GC
def sliding_window_steps_generator(sw,j,start,end):
if end<=start:
print 'end must be smaller than start'
return
start_seq = np.arange(start,end,j)
end_seq = np.arange(start+sw-1,end,j)
end_seq = np.append(end_seq,np.repeat(end,len(start_seq)-len(end_seq)))
mid_seq = (start_seq+end_seq)/2
return([start_seq,end_seq])
#main
def main():
parser = argparse.ArgumentParser(prog='gccontent', usage='%(prog)s [options] [<arguments>...]',
description='script to extract GC content from nucleotide fasta file')
parser.add_argument('-i', help='specify FASTA input file')
parser.add_argument('-o', help='specify input file')
parser.add_argument('-w', help='specify sliding window size [default:25000]', default = '25000', type = int)
parser.add_argument('-j', help='specify sliding window jump [default:25000]', default = '25000', type = int)
args = parser.parse_args()
if args.i is None:
parser.print_help()
sys.exit('\nPlease specify FASTA input file')
args.i = os.path.abspath(args.i)
if args.o is None:
parser.print_help()
sys.exit('\nPlease specify output file')
args.o = os.path.abspath(args.o)
print args
fasta = SeqIO.parse(args.i,'fasta')
with open(args.o,'w') as outhandle:
for r in fasta:
tmp_windows_start, tmp_windows_end = sliding_window_steps_generator(args.w, args.j, 1, len(r.seq))
for s,e in zip(tmp_windows_start, tmp_windows_end):
tmp_N_count = r.seq[s:e].count('N')
tmp_N_percent = float(tmp_N_count) / float(len(r.seq[s:e]))
if tmp_N_percent==1.0:
tmp_GC_percent = 0.0
if tmp_N_percent!=1.0:
tmp_G_count = r.seq[s:e].count('G')
tmp_C_count = r.seq[s:e].count('C')
tmp_A_count = r.seq[s:e].count('A')
tmp_T_count = r.seq[s:e].count('T')
tmp_GC_percent = (float(tmp_G_count) + float(tmp_C_count)) / (float(len(r.seq[s:e])) - float(tmp_N_count))
outhandle.write('%s\t%i\t%i\t%f\t%f\n' % (r.id,s,e,tmp_N_percent,tmp_GC_percent))
if __name__ == '__main__':
main()
#author: Kristian K Ullrich
#date: July 2017
#email: ullrich@evolbio.mpg.de
#chr1 - IRA_AFG
#[X]: Mmd_FRA
#[Y]: Mmd_GER
#[Z]: Mmd_IRA
#[O]: Mmm_AFG
popX <- "Mmd_FRA"
popY <- "Mmd_GER"
popZ <- "Mmd_IRA"
popO <- "Mmm_AFG"
TMP_DIR <- "/tmp"
popX.pos <- 2
popY.pos <- 3
popZ.pos <- 4
popO.pos <- 5
SEQ_FILE <- "http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/introgression/mpileup_pop_mv/fasta/consensus/CAS_FRA_GER_IRA_AFG_SPRE.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.refmajorsample.ref.consensus.fasta"
chr <- "chr1"
OUT_FILE <- paste0(TMP_DIR,"/",popX,"_",popY,".",popZ,".",popO,".",chr,".tsv")
WSIZE <- 25000
WJUMP <- 25000
DISTMODEL <- "K80"
#####
library(Biostrings)
library(ape)
source("/data/codeprojects/r_scripts/sliding_window_steps_generator.r")
options(scipen=22)
calc_delta_consensus_four<-function(seq,popX.pos,popY.pos,popZ.pos,popO.pos,TMP,distMODEL="K80"){
MISSING<-NA
TREELENGTH<-NA
TREE<-NA
TREETOPOLOGY<-NA
dK80.XZO<-NA
dK80.YZO<-NA
OUT<-list(MISSING,dK80.XZO,dK80.YZO,TREELENGTH,TREE,TREETOPOLOGY)
names(OUT)<-c("MISSING","dK80.XZO","dK80.YZO","TREELENGTH","TREE","TREETOPOLOGY")
tmp.seq<-seq[c(popX.pos,popY.pos,popZ.pos,popO.pos)]
names(tmp.seq)<-c("X","Y","Z","O")
tmp.seq.XX<-gregexpr("X",tmp.seq[1])
tmp.seq.XY<-gregexpr("X",tmp.seq[2])
tmp.seq.XZ<-gregexpr("X",tmp.seq[3])
tmp.seq.XO<-gregexpr("X",tmp.seq[4])
tmp.seq.NX<-gregexpr("N",tmp.seq[1])
tmp.seq.NY<-gregexpr("N",tmp.seq[2])
tmp.seq.NZ<-gregexpr("N",tmp.seq[3])
tmp.seq.NO<-gregexpr("N",tmp.seq[4])
remove.pos<-unique(c(tmp.seq.XX[[1]],tmp.seq.XY[[1]],tmp.seq.XZ[[1]],tmp.seq.XO[[1]],tmp.seq.NX[[1]],tmp.seq.NY[[1]],tmp.seq.NZ[[1]],tmp.seq.NO[[1]]))
remove.pos<-remove.pos[remove.pos!="-1"]
if(length(remove.pos)!=0){
tmp.seq.rem<-BStringSet(apply(as.matrix(tmp.seq)[,-remove.pos,drop=FALSE],1,function(x) paste(x,collapse="")))
}
if(length(remove.pos)==0){
tmp.seq.rem<-tmp.seq
}
OUT$MISSING<-length(remove.pos)
writeXStringSet(tmp.seq.rem,file=paste0(TMP,"/tmp.seq.rem.fasta"))
tmp.dna<-read.dna(paste0(TMP,"/tmp.seq.rem.fasta"),format="fasta")
if(length(tmp.dna)==0){
return(OUT)
}
if(all(is.na(base.freq(tmp.dna)))){
return(OUT)
}
tmp.dist.dna.MODEL<-dist.dna(tmp.dna,as.matrix=T,model=distMODEL)
if(any(is.na(tmp.dist.dna.MODEL))){
return(OUT)
}
dXY<-tmp.dist.dna.MODEL[1,2]
dXZ<-tmp.dist.dna.MODEL[1,3]
dXO<-tmp.dist.dna.MODEL[1,4]
dYZ<-tmp.dist.dna.MODEL[2,3]
dYO<-tmp.dist.dna.MODEL[2,4]
dZO<-tmp.dist.dna.MODEL[3,4]
dK80.XZO<-dXO-dXZ
dK80.YZO<-dYO-dYZ
OUT$dK80.XZO<-dK80.XZO
OUT$dK80.YZO<-dK80.YZO
tmp.dna.nj<-midpoint(nj(tmp.dist.dna.MODEL))
OUT$TREELENGTH<-sum(tmp.dna.nj$edge.length)
OUT$TREE<-write.tree(tmp.dna.nj)
tmp.dna.nj$edge.length<-NULL
OUT$TREETOPOLOGY<-write.tree(tmp.dna.nj)
return(OUT)
}
#####MAIN#####
sink(OUT_FILE)
cat("#[X]: ")
cat(popX)
cat("\n")
cat("#[Y]: ")
cat(popY)
cat("\n")
cat("#[Z]: ")
cat(popZ)
cat("\n")
cat("#[O]: ")
cat(popO)
cat("\n")
cat("CHR\tSTART\tEND\t")
cat("MISSING\tdK80.XZO\tdK80.YZO\tTREELENGTH\tTREE\tTREETOPOLOGY\n")
tmp.chr.seq<-readBStringSet(SEQ_FILE)
tmp.sw<-sliding_window_steps_generator(window=WSIZE,jump=WJUMP,start.by=1,end.by=unique(width(tmp.chr.seq)))
pb<-txtProgressBar(min=1,max=dim(tmp.sw)[2],style=3)
for(i in 1:dim(tmp.sw)[2]){
tmp.out<-subseq(tmp.chr.seq,tmp.sw[1,i],tmp.sw[2,i])
fourout<-calc_delta_consensus_four(tmp.out,popX.pos,popY.pos,popZ.pos,popO.pos,TMP_DIR,DISTMODEL)
cat(chr)
cat("\t")
cat(tmp.sw[1,i])
cat("\t")
cat(tmp.sw[2,i])
cat("\t")
cat(unlist(fourout),sep="\t")
cat("\n")
}
close(pb)
sink(NULL)
#####
quit()
......@@ -2,14 +2,14 @@
# -*- coding: UTF-8 -*-
'''
Author: Krisian Ullrich
date: Dezember 2016
author: Kristian K Ullrich
date: July 2017
email: ullrich@evolbio.mpg.de
License: MIT
The MIT License (MIT)
Copyright (c) 2016 Kristian Ullrich
Copyright (c) 2017 Kristian K Ullrich
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
......
#author: Kristian K Ullrich
#date: July 2017
#email: ullrich@evolbio.mpg.de
sliding_window_steps_generator = function(window=10000,jump=1000,start.by=1,end.by=20000){
if(end.by<=start.by){
stop("end.by <= start.by")
}
start.seq = seq(start.by, end.by, by = jump)
end.seq = seq(start.by + window - 1, end.by, by = jump)
end.seq = c(end.seq, rep(end.by, length(start.seq)-length(end.seq)))
start.end.matrix = rbind(start.seq, end.seq)
start.end.matrix = rbind(start.end.matrix, apply(start.end.matrix, 2, function(x) {mean( c( x[1] - 1, x[2]) )} ) )
start.end.matrix[2,]<-as.numeric(sprintf("%.0f",start.end.matrix[2,]))
row.names(start.end.matrix) = c("start", "end", "mid")
return(start.end.matrix)
}
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment