In recent years, various sectors of the healthcare industry have seen a significant increase in the use of Artificial Intelligence (AI). This is especially true for cancer registries, where AI is being used to automate casefinding, data abstraction, and follow-up tasks in real time. With the refinement of AI and natural language processing (NLP), these historically manual registry processes can be streamlined to significantly increase efficiency and productivity. In this blog post, we'll explore which types of AI are best suited for cancer registries and their unique needs.
One of the key technological advancements in cancer reporting is the use of knowledge-based systems, also known as traditional AI. These systems have been in use for many years and have proven effective in solving complex problems in automation. For example, E-Path reads over 20 million pathology reports annually in the US alone to determine reportability. Its knowledge base contains a vast number of cancer-related expressions and inference rules that help identify reportable cancers. The system can accurately identify reportable cancers in fractions of a second, achieving sensitivity of 99% and specificity of 98%. This level of efficiency and accuracy would be impossible to achieve manually due to the ever-increasing volume of pathology reports received by cancer registries.
Knowledge-based systems heavily rely on the availability of models, which can be in the form of published rules or obtained through knowledge acquisition sessions with human experts. An example of published rules is the SEER reportability rules, which has grown to over 40 different casefinding configurations in the US. These rules are highly detailed and can be verified, debugged, and edited as needed. The use of English language rules allows for easy interpretation and ensures that the complexity of cancer cases is effectively captured. The rules are supported by morphology and topography codes, such as the ICD-03 codes, which play a crucial role in accurately identifying and classifying different types of cancers. When coding changes occur, knowledge-based systems can be directly updated to adapt to these changes, ensuring the system remains up-to-date and reliable.
There are advantages to using the knowledge-based approach, exemplified by systems like E-Path, which achieve high levels of accuracy and efficiency in identifying reportable cancers. But machine learning can also contribute to cancer reporting by aiding in the identification of report types and the de-identification of medical reports.
Reports that come into a cancer registry can vary greatly, from pathology reports to genomics reports to physician's notes. Being able to correctly identify the type of report is crucial because it allows for the application of specific processing pipelines and knowledge sources for each report type. For example, a system should be able to recognize whether a report is a pathology report or a genomics report before it knows how to process it. This can greatly improve the efficiency and accuracy of analyzing medical reports in cancer reporting.
Many reports used for research purposes need to be de-identified to protect patient privacy. Named entity recognition, a technique used in machine learning, can help identify and remove any personal identifying information from a report. This ensures that the reports can still be used for research without compromising patient confidentiality. It's a quick and efficient way to process large volumes of medical reports while maintaining privacy.
However, there are limitations to the use of machine learning in cancer registries. One major challenge is the availability of training data. Cancer reporting requires robust annotated training data that accurately represents the problem space. However, in certain cases, such as with cancer reportability, annotated training data may not exist due to frequent changes in coding standards. It takes time to acquire reports coded under new standards, causing the training data to lag. This makes it difficult to train machine learning models effectively in a timely manner.
Another limitation is the lack of explanation for the system's output. While some tasks, such as voice recognition and report type classification, don't require an explanation, cancer registry work often calls for an explanation for the results. Understanding why a certain result was derived is crucial for gaining trust in the system's accuracy and for validation purposes. Machine learning models can be seen as black boxes, lacking the ability to provide an automated explanation. In cases where the explanation of results is vital, a knowledge-based approach may be preferred.
Perhaps the most promising approach for cancer registries is the integration of a hybrid model that combines the strengths of both knowledge bases and machine learning, like in the E-Path software suite. By utilizing both approaches, we can effectively target different segments of the problem and leverage the combined power of these two methods. This combination optimizes both accuracy and efficiency.
Recently, we've also seen interest in large language models, like ChatGPT. These novel models have the potential to bring significant advantages to the table. However, incorporation of these models into existing systems is not a quick task. A considerable amount of groundwork still must be done to assess their utility within a cancer monitoring context and establish implementation methods efficiently.
While it's clear that the combination of techniques in hybrid AI models will pave the way for innovation in the cancer registry space, the journey has only just begun. The precision and efficiency of these models will improve over time, further reducing false positives and negatives. As we continue to delve into this exciting territory, the hope is that these improvements will result in earlier cancer detection, personalized treatment strategies, and ultimately, better patient outcomes.