# br gRNA growth and decay analysis We used

gRNA growth and decay analysis
We used a parametric method in which the cell population with damaged gene i grows as NiðtÞ = Nið0Þeða0 + di Þt, where a0 is the growth rate of unmodiÞed Filipin III and di is the change of the growth rate due to the gene deletion. Since the aliquot extracted at each time point is roughly the same and represents only a fraction of the entire population, the observed sgRNA counts ni do not correspond to Ni directly. The correspondence is only relative: if we deÞne cihni=Pnk as the compositional fraction of sgRNA species i, the corre-spondence is ci = NiPNk . As a result, the exponential can k only be determined up to a multiplicative constant, e di t =
k

A,cið0Þ=ciðtÞ. The constant is determined from the assumption that a gene deletion typically does not affect the growth rate. Math-

ð
Þ&
. We deÞne the statistic that measures the effect of gene deletion as xihe di t and calculate it for

every gene i from

ci ðtÞ

Since we were interested in genes essential for growth, we performed a single-tailed test for xi. We collected the three values of xi, one from each biological replicate, into a vector xi. A statistically signiÞcant effect would have all three values large (> 1) and consis-
tent. If xi were to denote position of a point in a three-dimensional space, we would be interested in points that lie close to the body

ðx,nÞ

3 is the unit vector in the

direction of the body diagonal and , denotes scalar product. A q-value (false discovery rate) for each gene was estimated as the num-ber of s-statistics not smaller than si expected in the null model divided by the observed number of s-statistics not smaller than si in the data. The null model was simulated numerically by permuting gene labels in xi for every experimental replicate, independently of each other, repeated 103 times.

STRING Interactome Network Analysis

The results from the CRISPR 3D experiment were integrated with the RNA-seq results using a network approach. We identiÞed likely CRISPR-essential genes by Þltering to include genes which had a false-discovery rate corrected p value of less than 0.5, resulting in 94 genes. We chose a relaxed Þlter here because the following Þltering steps would help eliminate false positives, and our network analysis method would help to amplify weak signals. These genes were further Þltered in two ways: Þrst, we included only genes which were expressed in the RNA-seq data (this resulted in 57 genes), and second, we further restricted by genes which had enriched expression in stem cells by > 2 log fold change in the RNA-seq (this resulted in 10 genes). These results were used to seed the network neighborhood exploration. We used the STRING mouse interactome (Szklarczyk et al., 2015) as our background network, including only high conÞdence interactions (edge weight > 700). The STRING interactome contains known and predicted functional protein-protein interactions. The interactions are assembled from a variety of sources, including genomic context predictions, high throughput lab experiments, and co-expression databases. Interaction conÞdence is a weighted combination of all lines of evidence, with higher quality experiments contributing more. The high conÞdence STRING interactome contains 13,863 genes, and 411,296 edges. Because not all genes are found in the interactome, our seed gene sets were further Þltered when integrated with the network. This resulted in 39 CRISPR-essential, RNA-expressed seed genes, and 5 CRISPR-essential, RNA differentially-expressed seed genes. After integrating the seed genes with the background interactome, we employed a network propagation algorithm to explore the network neighborhood around these seed genes. Network propagation is a powerful method for amplifying weak signals by tak-ing advantage of the fact that genes related to the same phenotype tend to interact. We implemented the network propagation method developed in Vanunu et al. (2010), which simulates how heat would diffuse, with loss, through the network by traversing the edges, starting from an initially hot set of ÔseedÕ nodes. At each step, one unit of heat is added to the seed nodes, and is then spread to the neighbor nodes. A constant fraction of heat is then removed from each node, so that heat is conserved in the system. After a number of iterations, the heat on the nodes converges to a stable value. This Þnal heat vector is a proxy for how close each node is to the seed set. For example, if a node was between two initially hot nodes, it would have an extremely high Þnal heat value, and if a node was quite far from the initially hot seed nodes, it would have a very low Þnal heat value. This process is described by the following as in Vanunu et al. (2010):