br Unsurprisingly the type of
Unsurprisingly, the type of cancer also seemed to have an effect on the reliability of the eight methods. Kidney cancer had the best reliability measures compared to all tumor types that were included in this study. This could be due to hetero-geneity in the data where a specific tumor type may have a different number of molecular subtypes than other types of tu-mors. It is possible that for the four different tumor types that were tested in our study, a variable number of molecular sub-types were represented. A PCA using all available RNA-seq data indicated that at the genome-wide level, kidney cancer had a similar degree of heterogeneity since the spread be-tween patient samples was comparable for all types of cancer (Supplemental Fig. 1). However, at the gene-specific level, re-
sults from the k-means method indicated that kidney cancer had greater variability in gene MPP+ Iodide between the two clusters identified by k-means. Specifically, when the mean of expression for each cluster was calculated, the fold-change between these two means was largest in value for clusters that captured 50% of all patients (Fig. 4). For other tumor types, the largest fold-change values in mean cluster expression oc-curred for more skewed percentages, where clusters repre-sented less than 25% of all patients. It seemed likely therefore that for the kidney cancer dataset, separation into more domi-nant pairs of clusters occurred with genes that showed a more extreme degree of different expression as compared to other tumor types that were included in this study. For survival anal-ysis, when the expression data separates more definitively into groups with and without a patient survival event, it would certainly be easier to detect such a gene, independent of the method applied.
Accuracy assessment based on tumor type-specific positive controls demonstrated that Cox regression outperformed other survival analysis methods
The accuracy of the eight methods was assessed based on their ability to identify a set of tumor type-specific gene signa-tures which served as a surrogate for positive controls. ROC curves for each cancer dataset demonstrated highly variable performance (Fig. 5) and the area under the curve (AUC) val-ues in Table 1 ranged from poor to good performance (0.479 to
Fig. 4 Investigating the degree of variability in the different tumor types tested in this study. Using the results from k-means clustering, the variability in expression was assessed by investigating the log-fold change of average expression between the two clusters, and the percentage of patient samples greater than a threshold.
Table 1 Ranking of the eight methods based on accuracy. AUC values based on each method’s ability to identify a set of positive controls that were tumor type-specific and derived from the literature, over a range of thresholds.
Method Head & Neck Kidney Ovarian Prostate Mean Rank Based
0.78). On average, the Cox regression method had the high-est AUC value across all four cancer datasets (Table 1). For ovarian cancer, the Cox regression method had the highest AUC value, and the second-highest AUC value for the remain-ing three cancer types. Although the Cox regression had the highest average AUC value, the k-means, C-index, and the D-index also had similar performance in accuracy. Notably, the D-index had the highest AUC values for two cancer datasets (kidney and prostate cancers). In fact, the gains observed in AUC value with one method over another were generally quite small between Cox regression, k-means, C-index, and D-index (Table 1).
Indeed, an ANOVA on the AUC values shows that the type of method had a significant effect on the values
(P-value = 7.47 × 10−5) and a post-hoc test using Tukey’s HSD test confirmed that the k-means, C-index, D-index and Cox regression methods have significantly higher AUC values when compared individually to most other methods (4 out of 7 pairwise comparisons were statistically significant for each method, adjusted P-value < 0.05, Supplemental Table 1). In contrast, the KaplanScan, median-split, and 25th–75th per-centile split did not show a significant difference in AUC values when compared with one another. Similarly, Cox regression and k-means themselves were not significantly different from one another in terms of their AUC, and the same was ob-served for C-index compared with D-index. This could have been because these methods generally yielded very similar