Gitlab Community Edition Instance

Skip to content
Snippets Groups Projects
Commit c5744409 authored by Kristian Ullrich's avatar Kristian Ullrich
Browse files

added individual masking

parent f6f16f61
No related branches found
No related tags found
No related merge requests found
......@@ -50,6 +50,17 @@ get masking region example for the population _Mus musculus musculus AFG_:
```
INPUT=Mmm_AFG.combined.bga
OUTPUT=Mmm_AFG.combined.bga.stcov5
awk '{if($4<5) print $0}' $INPUT > $INPUT".stcov5"
bedtools merge -i $INPUT".stcov5" > $INPUT".stcov5.merge"
awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT
```
get masking region example for individual 396 of the population _Mus musculus musculus AFG_:
```
INPUT=AFG1_396.bam.bga
OUTPUT=AFG1_396.bam.bga.stcov5
awk '{if($4<5) print $0}' $INPUT > $INPUT".stcov5"
bedtools merge -i $INPUT".stcov5" > $INPUT".stcov5.merge"
awk -v OFS='\t' '{print $1,$2,$3,4}' $INPUT".stcov5.merge" > $OUTPUT
......@@ -121,13 +132,13 @@ echo $INPUT4".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT5".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
echo $INPUT6".mpileup.q0Q10.vcf.gz" >> $MPILEUPLIST
MPILEUPOUTPUT=AFG.mpileup.q0Q10.vcf.gz
MPILEUPOUTPUT=Mmm_AFG.mpileup.q0Q10.vcf.gz
bcftools merge -m all -O z -o $MPILEUPOUTPUT -l $MPILEUPLIST
#call SNP and INDEL
OUTPUT=AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
OUTPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
bcftools call -O z -f GQ -m -v -o $OUTPUT $MPILEUPOUTPUT
```
......@@ -143,13 +154,118 @@ _used software:_
### Get population specific SNPs
To get population specific CONSENSUS VCF files the VCF file produced with 'bcftools call' was first re-coded into population specific VCF files with 'vcftools'. Further the population specific VCF file containing multiple individuals was parsed with 'vcfparser.py mvcf2consensus' to obtain a CONSENSUS VCF file for each population. This CONSENSUS VCF files were used to generate pseudo-genomes files per natural population with 'vcfparser.py vcf2fasta' using also the masking regions (see "Get masking regions for individual samples and natural populations").
vcftools example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
#VCF IDs:
#
#396
#413
#416
#424
#435
#444
POPIDS=AFG.vcf.ids
echo "396" >> $POPIDS
echo "413" >> $POPIDS
echo "416" >> $POPIDS
echo "424" >> $POPIDS
echo "435" >> $POPIDS
echo "444" >> $POPIDS
GZVCF=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
OUTPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels
vcftools --gzvcf $GZVCF --remove-indels --recode --recode-INFO-all --non-ref-ac-any 1 --keep $POPIDS --out $OUTPUT
```
vcfparser.py mvcf2consensus example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
#VCF IDs:
#
#396
#413
#416
#424
#435
#444
INPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels.recode.vcf
OUTPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels.recode.consensus
python vcfparser.py mvcf2consensus -ivcf $INPUT -o $OUTPUT -cdp 11 -chr chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY -samples 396,413,416,424,435,444 -id Mmm_AFG.mv
```
vcfparser.py vcf2fasta example for the population _Mus musculus musculus AFG_:
```
#example for population Mmm_AFG:
#
REFERENCE=mm10.fasta
INPUT=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.remIndels.recode.consensus.vcf
OUTPUT=Mmm_AFG.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.chr
MASKFILE=Mmm_AFG.combined.bga.stcov5
python vcfparser.py vcf2fasta -ivcf $INPUT -o $OUTPUT -R $REFERENCE -samples Mmm_AFG.mv -chr chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY -ibga $MASKFILE -cov2N 4
```
_used software:_
+ vcftools v0.1.15
+ vcfparser.py <https://gitlab.gwdg.de/evolgen/introgression/blob/master/scripts/vcfparser.py>
### Calculate K80 distance between populations using the CONSENSUS pseudo-genome files
## Nucleotide diversity calculations
### Get individual specific SNPs
To get individual specific VCF files the VCF file produced with 'bcftools call' was first re-coded for each individual with 'vcftools'. Further the individual specific VCF files were used to generate pseudo-genomes files per individual with 'vcfparser.py vcf2fasta' using individual masking regions (see "Get masking regions for individual samples and natural populations").
vcftools example for individual 396 of the population _Mus musculus musculus AFG_:
```
#example for individual 396 of the population Mmm_AFG:
#
#VCF IDs:
#
#396
GZVCF=Mmm_AFG.mpileup.q0Q10.bcfcall.mv.vcf.gz
OUTPUT=Mmm_AFG1.396.mpileup.q0Q10.bcfcall.mv.remIndels
vcftools --gzvcf $GZVCF --remove-indels --recode --recode-INFO-all --non-ref-ac-any 1 --indv 396 --out $OUTPUT
```
vcfparser.py vcf2fasta example for individual 396 of the population _Mus musculus musculus AFG_:
```
#example for individual 396 of the population Mmm_AFG:
#
REFERENCE=mm10.fasta
INPUT=Mmm_AFG1.396.mpileup.q0Q10.bcfcall.mv.remIndels.recode.consensus.vcf
OUTPUT=Mmm_AFG1.396.mpileup.q0Q10.chr1.bcfcall.mv.remIndels.recode.consensus.chr
MASKFILE=Mmm_AFG.combined.bga.stcov5
python vcfparser.py vcf2fasta -ivcf $INPUT -o $OUTPUT -R $REFERENCE -samples Mmm_AFG.mv -chr chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY -ibga $MASKFILE -cov2N 4
```
_used software:_
### Calculate population specific Consensus sequence
+ vcftools v0.1.15
+ vcfparser.py <https://gitlab.gwdg.de/evolgen/introgression/blob/master/scripts/vcfparser.py>
### Calculate nucleotide diversity within each population
_used software:_
### Calculate K80 distance between populations
+ variscan v2.0.3
## Dxy distance calculation
......@@ -161,3 +277,6 @@ _used software:_
_used software:_
## Simulation
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment