In a previous blog we disclosed the results of a 181 case study, with sensitivity and specificity performance measured using culture as the ‘gold standard’. Further, those results were presented by Dr. Clark in a poster presentation at the November 2012 International Union Conference in Kuala Lumpur. The challenge in using culture as the performance measurement standard is that it is not 100% accurate. In an effort to scientifically validate the precision of our detection algorithms we developed an internal testing protocol based on the evaluation of ‘panel slides’.
Dr. Hiroyuki Yamada of the Research Institute of Japan contributed to our performance protocol by preparing a series of quality assurance panel slides. These slides were made from polyacrylamide-based artificial sputum (PBAS) mixed with cultured TB cells, and are highly analogous to actual sputum. Fifty (50) total slides were prepared with ten (10) slides each of P+, P++, P+++ and twenty (20) scanty (1-9 AFB) slides. The panel slides permitted testing of several important technology processes:
- Determination of algorithmic precision. Because panel slides have been prepared in a controlled setting, the severity of each case represented is known and accurate. Therefore, a panel slide prepared as a P++ should be detected by TBDx™ as a positive case and the load severity should be P++.
- For the very first time we were able to assess the performance of multiple image acquisition patterns and varying image capture quantities. The multiple image acquisition patterns included a circle, rectangle, and a square. We collected 100-200-300 fields-of-view for each acquisition pattern. Data collected from this portion of the protocol provided insight into performance variances based on pattern and number of FOVs, important to final product configuration.
- Additionally, we tested TBDx™ performance based on three different algorithms. The three algorithms were developed to create a range of sensitivity and specificity performance that could be employed based on user preference. This evaluation protocol determined the sensitivity range of the three algorithms against known positive cases.
We encountered one significant physical difference between the generated panel slides and normal stained sputum slides. The viscosity of the artificial sputum created smears with uneven thickness, a condition that we have experienced when evaluating ZN stained sputum slides. The thickness of the artificial sputum created a higher number of out-of-focus objects than would be expected in an Auramine stained slide. The viscosity of the sputum caused the bacilli to become suspended at varying levels within the smear, again, a condition that we have experienced with ZN stained slides. To address this issue we employed a multi-image process whereby images of the same FOV where captured at various focal plane depths. The multiple images were merged into one unified image and analyzed by TBDx™.
The protocol evaluation results were documented in an internal paper entitled, Automated Computer-Vision Detection of Mycobacteria Tuberculosis using TBDx Multi-Fusion Algorithms on Load Calculated Panel Slides, written by our software engineering colleague, Ajay Divekar, and distributed to various stakeholders within the international TB community. The paper presents a schematic of the multi-fusion algorithm approach along with the sensitivity results of two different algorithms. Sensitivity ranged between 86% – 90%. Because we did not prepare normal or negative panel slides we cannot present specificity percentages. However, we can present the latest specificity from recent tests of these algorithms, from Auramine-stained images acquired in November 2010 and September 2012. These results range from 72.8% to 86.6% and are noted in the paper.
- No more than 200 FOVs were needed to assess a case as P++ or P+++
- Whenever a dataset of 100 FOVs detected at least five (5) TB bacilli, the additional 200 FOVs acquired always classified the case as a P+
- Conversely, whenever a dataset of 100 FOVs detected less than five (5) TB bacilli, the additional 200 FOVs never classified the case as P+; it always remained a scanty case
It is our intention to acquire 100 or more negative panel slides and conduct the evaluation protocol again to generate a direct, relational specificity performance.
We anticipate receiving 65 slides from Johns Hopkins University (JHU) after the first of the year. At that time we will run a similar test. We expect these slides will be a combination of positive and negative patient cases. The data will be returned to JHU and scored. The results will be posted as soon as the evaluation testing is completed.