br For manual annotations we
For manual annotations, we performed an agreement analysis with a sub part of the training sentence-set. Four clinical researchers anno-tated a total of 100 sentences [MK, KK, JP, THB]. The Kappa score was 0.89 (see supplementary material 2). Based on the strong agreement between annotators, the remaining 208 sentences in the training set were annotated by a single clinician [KK]. The 100 note test-set was annotated by the research nurse [MF]. The objective of annotation was to identify whether a note had a positive or negative mention of a bone scan.
The NLP pipeline first pre-processed each clinical note, which en-tailed splitting the note into individual sentences, removing capitali-zation, numbers and punctuation, and excluding words smaller than three letters, except the word “no” and the abbreviation “NM” (Nuclear Medicine). Through this process, a note corresponded to a list of sen-tences and a sentence corresponded to a list of words. “Bone scan” was the only target key term.
The rule-based method applied a set of syntax rules to predict whether a sentence contained information related to a bone scan. The model used the ConText algorithm developed by Chapman et al . ConText is an algorithm derived from the NegEx algorithm to identify negative results in a free text. From regular expressions, it DCFH-DA determines whether information in clinical reports are mentioned as negated, hy-pothetical, historical, or experienced by someone other than the pa-tient. For this study, if bone scan information is negated, hypothetical or historical then we concluded the patient did not receive a bone scan for this note. In addition, if no modifier could be apply to the sentence then, by default, we classified the sentence as negated. We used 90% of the training dataset to build the rules manually and the remaining 10% to validate the model. This iterated process of rule building was used to develop the model.
2.9. Convolutional neural network method
After notes were pre-processed, we used the word2vec method im-plemented in Gensim  to form word embeddings.  Word2Vec is a technique to create a vector representing the semantic context of a word for each word in our corpus. If similar words share common contexts in the corpus, then it is assumed they have similar vectors. The word2vec method is self-supervised machine learning method that trains a 2-layer neural network to form word embeddings. Word2vec has two different architectures (skip-gram and Continuous Bag of Words (CBOW)) and two different algorithms (hierarchical softmax and negative sampling). We chose to generate vectors with a dimension of
300. We tried multiple configurations (described in supplementary material 3) and found that for our dataset the best configuration was a combination of the CBOW architecture and the hierarchical softmax algorithm. We also tried different window widths (i.e. the maximum distance between the current and predicted word within a sentence) and we chose a window width of 5. From the word embeddings, we created a two-dimensional matrix for each sentence where each row corresponded to a word in the sen-tence and each column to a dimension of the vector. Using this matrix, we applied the convolutional neural network (CNN) method to classify sentences.  CNN methods require a uniform size matrix as input. Therefore, we calculated that the maximum sentence size in the notes was 361. If the size of a sentence was smaller than 361, then we completed the sentence with a padding of “0”. Finally, each sentence corresponded to a matrix of 300 × 361.
The model architecture was implemented with the library TensorFlow  and was trained on the training data set. We tuned the model using the strategy described by Zhang and Wallace.  We used