
Phil Stafford
VP, Bioinformatics
Bioinformatics
The field of bioinformatics has existed for centuries – the science of statistical testing has been used since the first systematic taxonomic studies in the early 18th century. How these apply to the field of precision medicine and cancer therapy is an evolving process. The first Sanger-Sequencing output in 1977 revealed that raw sequence data was useless without other genomes to compare. The genome of the lab yeast Saccharomyces cerevisiae was released in 1996, but thousands of sequencing experiments were done before bioinformaticians were confident they had a fully assembled genome. Once that was available, scientists could compare other organisms to S. cerevisiae, which led to comparative molecular genomics. Human sequencing forced the NIH to hire scientists specifically trained in the field of computer science, mathematics, and genetics.
Health improvements resulting from the human genome project were delayed because of the time and expense of implementing drug trials using gene mutations. Methotrexate, 5FU, Cisplatin and Taxol were given to cancer patients, but it took years before gene mutations or fusions that predicted chemotherapy outcome were found. The FDA defined a code for Level 1, 2 and 3 molecular biomarkers. This definition added a much-needed layer of precision to therapeutics and opportunities for diagnostic companies to obtain CDx approval for a blockbuster drug.
Behind every drug recommendation is a network of high-throughput next-gen sequencers, bioinformatics pipelines, high-performance compute clusters, and a team of bioinformatics scientists, geneticists, pathologists and evidence teams all working together to ensure the most advanced diagnosis possible.
Today, there are drugs that target tumors with complex molecular pathways across dozens or hundreds of genes. For example, we can predict responses to immunotherapy by linking patient HLA expression and personal neoepitopes. The field of bioinformatics can potentially aid in selecting adjuvant treatments by leveraging genomic data from tissue biopsy and/or blood sample.
A growing challenge is data management. Storage and security are critical. More so is the opportunity to learn from extremely large patient cohorts. Today, we can examine 50,000 – 100,000 patients simultaneously, reducing the risk of one-off results that cannot be repeated. That ability comes at a cost – the first genome at NIH could fit on one standard CD. Now exabyte storage is common, 1018 bytes is every word ever spoken by every human who has ever existed. This size is just enough to contain the full exomic and transcriptomic sequences for 100,000 patients. The value to this data is the ability to conduct synthetic clinical trials across tens of thousands of patients, artificial Intelligence algorithms learning and discovering new complex biomarkers, and better prediction of drug dosage, side effects and outcomes. As our molecular data grows in size, AI is no longer a luxury. We will depend on AI to learn our diseases, learn our mutations, and guide drug development. Our job is to support learning, training, and patient care with these advanced tools. And, that is exactly what we are doing at Caris and with our POA collaborators.