From raw reads to variants Anna Johansson Uppsala,
From raw reads to variants Anna Johansson Uppsala, February 2019 Talk Overview Concepts Reference genome Variants Paired-end data NGS Workflow Quality control & Trimming Alignment Local realignment PCR duplicates & removal Base Quality Score Recalibration
Genome Reference Consortium A mosaic nucleic acid sequence ...GTGCGTAGACTGCTAGATCGAAGA... What changes between versions? First version: 150,000 gaps HG19: 250 gaps Variants A position where sample sequence does not agree with reference genome sequence Reference:
...GTGCGTAGACTGCTAGATCGAAGA... Variants A position where sample sequence does not agree with reference genome sequence Reference: Sample: ...GTGCGTAGACTGCTAGATCGAAGA... ...GTGCGTAGACTGATAGATCGAAGA... Variants Population based variant projects
Paired-end sequencing Paired-end data Illumina sequencing https://www.youtube.com/watch?v=fCd6B5HRaZ8 Paired-end data The forward and reverse reads are stored in two fastq files. ID_R1_001.fastq @HISEQ:100:C3MG8ACXX:5:1101:1160:2 197 1:N:0:ATCACG CAGTTGCGATGAGAGCGTTGAGAAGTATAATAGG
JJJHFFFFFFDDDDDDDDDDDDDDDEDCCDDDD Paired-end data The forward and reverse reads are stored in two fastq files. The order of pairs and naming is identical, except the designation of forward and reverse. ID_R1_001.fastq @HISEQ:100:C3MG8ACXX:5:1101:1160:2 197 1:N:0:ATCACG CAGTTGCGATGAGAGCGTTGAGAAGTATAATAGG AGTTAAACTGAGTAACAGGATAAGAAATAGTGAG ATATGGAAACGTTGTGGTCTGAAAGAAGATGT + [email protected] JIHGIIJJJJIJIJIJJJJIIJJJJJIIEIHHIJ HGHHHHHDFFFEDDDDDCDDDCDDDDDDDCDC
Sequence Read name Read groups Link information of sample id, library prep, flowcell and sequencing runs to fastq file. Good for error tracking! Detailed description in tutorial or
https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups RGID = Read group identifier usually derived from the combination of the sample id and run id RGLB = Library prep identifier RGPL = Platform (for us ILLUMINA) RGPU = Run identifier usually barcode of flowcell RGSM = Sample name Convert to Bam Bam file is a binary representation of the Sam file NGS workflow QC and trimming
Alignment Local Realignment Duplicate removal BQSR Variant calling Local realignment Problem: Reads are mapped one read at a time, this sometimes leads to single variants being split into multiple variants Solution: Realign such a region taking all reads into account Local realignment module load GATK
Genome Analysis ToolKit RealignerTargetCreator IndelRealigner Local realignment, still needed? HaplotypeCaller (HC) Mutect2 NGS workflow QC and trimming Alignment Local
Dont add unique information Optical duplicates NGS workflow QC and trimming Alignment Local Realignment Duplicate removal BQSR Variant calling Base Quality Score
Recalibration module load GATK Identifies and corrects systematic (non-random) technical errors made by the sequencer when estimating the quality score of each base call Correcting for over-/Underestimation of quality scores Helps fight false positive variant calls Rescues false negatives variant calls
Some errors can be due to the physics or chemistry of the sequencing reaction, some to manufacturing flaws in the equipment Errors are identified over several covariates, mainly related to sequence context, position in read or machine cycle NGS workflow QC and trimming Alignment Local Realignment Duplicate removal
Filtering module load GATK #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ VariantFiltration --filterExpression QUAL > 30 --filterName QUAL_filter --filterExpression "QUAL / DP < 10.0" --filterName QUALDP_filter Annotation module load annovar /snpEff / vep #CHROM POS ID REF ALT QUAL 20 14370 rs6054257 G A 29
Annotation module load annovar /snpEff / vep #CHROM POS ID REF ALT QUAL 20 14370 rs6054257 G A 29 Gene-based Non-synonymous/synonymous Region-based CpG-islands Conserved regions Predicted transcription factor binding sites
Filter-based dbSNP 1000G E COSMIC H E M A S E! C N T E SE FER U E
File naming conventions QC and trimming Alignment Local Realignment Duplicate removal BQSR Variant calling Use informative file
names create a new output file in each process Include description of process in output file name File naming conventions QC and trimming Sample.trimmed.fast q Alignment Sample.bam
Local Realignment Sample.realigned.ba m Duplicate removal Sample.dedup.bam BQSR Sample.bqsr.bam
Variant calling Sample.vcf Indices Most large files we work with need an index Different index for different file-types Bwa index creates one set of index for the reference that it needs for performing alignment Other programs like samtools produce other types of index for .fasta and .bam files needed by other programs 57 Flowchart of lab Index reference
FURTHER RESULTS ON SYSTEMS OF EQUATIONS AND INVERTIBILITY. THEOREM 1.6.1. Every system of linear equations has no solutions, or has exactly one solution, or has infinitely many solutions. THEOREM 1.6.2. If A is an invertible . nxn. matrix, then for...
Leadership - roles in CCA, class, house or school activities. Achievement - representation, attainment & accomplishment. Participation - attendance in CCA, intra-school, NDP. Service - Community Service. For details , refer to Student Diary pg 24-31 or School Website.
Resources needed to support strategies and meet objectives Marketing Control Sales objectives Sales forecast and quotas Expenditures against budget Periodic evaluation of all marketing objectives Marketing activity timetable Readjustments to the marketing plan Presenting and Selling the Plan Preparing for...
Schizophrenia is the most chronic and debilitating of all psychological disorders. It affects men and women equally, occurs in similar rates across ethnicities and across cultures, and affects at any one time approximately 3 million people in the United States...
First talking movie ever filmed was the "Jazz Singer." Animated pictures were first created using a devise called the "wheel of life." Impact on society: News, entertainment, job, created celebrities. Auguste & Louis Lumiere. Edison
Ready to download the document? Go ahead and hit continue!