Poster Presentation 28th Lorne Cancer Conference 2016

Improving the accuracy of somatic variant detection in whole exome sequencing data by applying optimised filtering schemes (#270)

Paul Wang 1 2 , Wendy T Parker 1 3 4 , David T Yeung 2 5 6 , Andreas W Schreiber 3 4 7 , Susan Branford 1 2 8
  1. School of Pharmacy and Medical Science, University of South Australia, Adelaide, SA, Australia
  2. Department of Genetics and Molecular Pathology, Centre for Cancer Biology, Adelaide, SA, Australia
  3. Centre for Cancer Biology, Adelaide, SA, Australia
  4. ACRF SA Cancer Genomics Facility, Centre for Cancer Biology, Adelaide, SA, Australia
  5. Department of Haematology, SA Pathology, Adelaide, SA, Australia
  6. School of Medicine, University of Adelaide, Adelaide, SA, Australia
  7. School of Biological Sciences, University of Adelaide, Adelaide, SA, Australia
  8. School of Molecular and Biomedical Science, University of Adelaide, Adelaide, SA, Australia

Aim

Detection of somatic variants in next generation sequencing (NGS) data is important for cancer research. While many dedicated somatic variant calling algorithms are available, comparisons between these callers showed significant discrepancies in variant detection, thus extensive and expensive validation of variants may be required to exclude false-positives. Improved confidence in variant detection may be achieved by using multiple callers, but requires significantly longer processing. We aimed to improve the accuracy of somatic variant calling and to limit the requirement for prolonged processing time by using optimised filters for variant calling algorithms.

Method

Whole exome sequencing data of 10 matching tumour/normal samples from chronic myeloid leukaemia patients was analysed using 7 published somatic variant callers. Individual components (pre-processing read filters, statistical model, and post-processing site filters) of each caller were assessed for their effectiveness. Optimised filter sets were applied to single caller results to improve the confidence of variant calling.

Results

A total of 39936 variants were detected in the 10 samples, but only 443 variants were called by by 6 or 7 callers (>95% validation rate, High Confidence variants), and the vast majority (39069 = 98%) were called by only 3 or fewer callers (<1% validation rate, Low Confidence variants). Applying our filtering method at low stringency setting, we were able to remove most of the LC variants (down to 2378) while retaining most of the HC variants (431). Filtering at high stringency setting, only 81 (0.2%) LC variants remains, but 409 (92%) of HC variants were retained.

Conclusion

Through systematic analysis and optimisation of filters, we have demonstrated significantly improved accuracy of single-caller somatic variant detection as well as overall consensus between callers. Application of appropriate filters to a limited number of callers will reduce the requirement for extensive validation and long data processing time in cancer research projects involving NGS data.