br Cancer Type Specific Benchmarks br We selected
Cancer Type-Specific Benchmarks
We selected cancer type-specific benchmarks to evaluate how well methods could identify driver mutations relevant for specific cancer types as opposed to others. Benchmarks which contained mutations only from a single cancer type or in a single gene that is a pervasive pan-cancer driver gene, such as TP53, were not used. We chose two benchmarks based on data independent of the TCGA: cell viability of MCF10A Etoposide and oncogenic mutations annotated by OncoKB on the MSK-IMPACT gene panel, described below. Mutations were scored using the corresponding cancer type models of CHASMplus, CanDrA, and CHASM, along with two high-performing methods which are not cancer type-specific (ParsSNP and REVEL).
Cell Viability of MCF10A Cells
MCF10A cells, a breast epithelium cell line, were used to assess cell viability of 698 missense mutations by Gordon Mills and colleagues (Ng et al., 2018). To assess each method’s ability to distinguish breast cancer-specific driver mutations, mutations that increased cell viability in known breast cancer driver genes were labeled as positive class. Mutations that did not increase cell viability or increased cell viability but were not found in breast cancer-specific genes were labeled as negative class. The latter likely represent pan-cancer drivers.
Breast cancer-specific genes were labeled based on the Cancer Gene Census (CGC, genes marked as relevant to ‘‘breast’’ cancer and somatic missense mutations, COSMIC v79) (Forbes et al., 2017) or the Cancer Genome Atlas (TCGA) (Cancer Genome Atlas Network, 2012b). While Mills and colleagues (Ng et al., 2018) also evaluated a pro-B cell (Ba/F3), an appropriate cancer type-specific model was not available for all methods.
MSK-IMPACT Gene Panel Mutations and OncoKB
We obtained all missense mutations from targeted sequencing with the MSK-IMPACT gene panel of 414 cancer-related genes originating from approximately 10,000 patients’ tumors (Zehir et al., 2017). Mutations annotated as ‘Oncogenic’ or ‘Likely Oncogenic’ in OncoKB (downloaded 4/3/2017) (Chakravarty et al., 2017) were selected (n=1,194). We chose four cancer types for which CanDrA v1.0 and CHASM had cancer type-specific models and which were sequenced by TGCA: breast Invasive Ductal Carcinoma (BRCA), Glioblastoma multiforme (GBM), colon adenocarcinoma (COAD), and High-Grade serous Ovarian Cancer (OV). For each cancer type, mutations were labeled as ‘positive’ class if they occurred in a cancer driver gene implicated for that cancer type and also a tumor of the same type. The remaining mutations were then labeled as ‘negative’ class. Cancer-specific driver genes were labeled based on evidence from the CGC or TCGA. Specifically, for the CGC, we required that a gene be marked as ‘‘breast’’ for BRCA, as ‘‘glioma’’ or ‘‘glioblastoma’’ for GBM, as ‘‘colon’’ or ‘‘colorectal’’ for COAD, and as ‘‘ovarian’’ for OV, except for the exclusion of a different subtype of clear cell ovarian. Cancer-specific driver genes from the TCGA were defined based on the published marker papers for their respective cancer types: BRCA (Cancer Genome Atlas Network, 2012b), GBM (Brennan et al., 2013), COAD (Cancer Genome Atlas Network, 2012a), and OV (Cancer Genome Atlas Research Network, 2011).
We selected 5 benchmarks to evaluate the performance of CHASMplus as a pan-cancer driver predictor. CHASMplus had the high-est area under the Receiver Operating Characteristics curve (auROC) of the 12 compared methods and on each of the pan-cancer benchmarks (p<0.05, DeLong test, Table S2, Figures S2B–S2F).
We examined TCGA driver mutation prioritization at exome-wide scale through a combined literature/heuristic evaluation. We first obtained a set of curated likely driver genes from the Cancer Gene Census (CGC, COSMIC v79) (Forbes et al., 2017). Only CGC genes that were labeled as somatic and marked as relevant for missense mutations were included. We labeled all recurrent missense mutations (n>1) in the CGC genes as the positive class, and remaining mutations as the negative class (Forbes et al., 2017).