Principles for evaluating the clinical implementation of novel digital healthcare devices

Article information

J Korean Med Assoc. 2018;61(12):765-775
Publication date (electronic) : 2018 December 17
doi : https://doi.org/10.5124/jkma.2018.61.12.765
1Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
2Department of Radiology, Seoul St. Mary's Hospital, The Catholic University of Korea College of Medicine, Seoul, Korea
3Withsim Clinic, Seongnam, Korea
4Department of Radiology, Kyung Hee University Hospital at Gangdong, Seoul, Korea
5Department of Radiology and Center for Imaging Science, Samsung Medical Center, Seoul, Korea
6Department of Radiology, SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul, Korea
7Department of Radiology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
8Department of Radiology, Kyung Hee University Hospital, Kyung Hee University College of Medicine, Seoul, Korea
Corresponding author: Joo Hyeong Oh E-mail: ohjh6108@gmail.com
Received 2018 November 22; Accepted 2018 December 03.

Abstract

Abstract

With growing interest in novel digital healthcare devices, such as artificial intelligence (AI) software for medical diagnosis and prediction, and their potential impacts on healthcare, discussions have taken place regarding the regulatory approval, coverage, and clinical implementation of these devices. Despite their potential, ‘digital exceptionalism’ (i.e., skipping the rigorous clinical validation of such digital tools) is creating significant concerns for patients and healthcare stakeholders. This white paper presents the positions of the Korean Society of Radiology, a leader in medical imaging and digital medicine, on the clinical validation, regulatory approval, coverage decisions, and clinical implementation of novel digital healthcare devices, especially AI software for medical diagnosis and prediction, and explains the scientific principles underlying those positions. Mere regulatory approval by the Food and Drug Administration of Korea, the United States, or other countries should be distinguished from coverage decisions and widespread clinical implementation, as regulatory approval only indicates that a digital tool is allowed for use in patients, not that the device is beneficial or recommended for patient care. Coverage or widespread clinical adoption of AI software tools should require a thorough clinical validation of safety, high accuracy proven by robust external validation, documented benefits for patient outcomes, and cost-effectiveness. The Korean Society of Radiology puts patients first when considering novel digital healthcare tools, and as an impartial professional organization that follows scientific principles and evidence, strives to provide correct information to the public, make reasonable policy suggestions, and build collaborative partnerships with industry and government for the good of our patients.

Figure 1.

Hierarchy of artificial intelligence-related terms.

Figure 2.

Brief schematic summary of the processes for evaluating a novel health technology used by the Health Insurance Review and Assessment Service (HIRA) and the National Evidence-based Healthcare Collaborating Agency (NECA).

A checklist for robust clinical validation of the performance of a machine-learning algorithm

References

1. Korea Health Industry Development Institute. Medical device weekly newsletter [Internet] Cheongju: Korea Health Industry Development Institute; 2018. [cited 2018 Aug 10]. Available from: http://www.khidi.or.kr/newsLetter/preView?newsLetterId=464.
2. US Food and Drug Administration. FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems [Internet] Silver Spring: US Food and Drug Administration; 2018. [cited 2018 Aug 10]. Available from: https://www.fda.gov/newsevents/newsroom/pressannouncements/ucm604357.htm.
3. Bluemke DA. Radiology in 2018: are you working with AI or being replaced by AI? Radiology 2018;287:365–366.
4. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800–809.
5. Peterson ED, Harrington RA. Evaluating health technology through pragmatic trials: novel approaches to generate high-quality evidence. JAMA 2018;320:137–138.
6. The Lancet. Is digital medicine different? Lancet 2018;392:95.
7. AI diagnostics need attention. Nature 2018;555:285.
8. Park SH. Artificial intelligence in medicine: Beginner's guide. J Korean Soc Radiol 2018;78:301–308.
9. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A. Deep learning: a primer for radiologists. Radiographics 2017;37:2113–2131.
10. Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Piany-kh OS, Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current applications and future impact of machine learning in radiology. Radiology 2018;288:318–328.
11. Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging 2018 Jun 22. [Epub]. https://doi.org/10.1007/s13244-018-0639-9.
12. Lee M, Ahn J. The current status and future direction of Korean health technology assessment system. J Korean Med Assoc 2014;57:906–911.
13. Park SH. Regulatory approval versus clinical validation of artificial intelligence diagnostic tools. Radiology 2018;288:910–911.
14. US Food and Drug Administration. Digital health software precertification (Pre-Cert) program [Internet] Silver Spring: US Food and Drug Administration; 2018. [cited 2018 Aug 10]. Available from: https://www.fda.gov/medicaldevices/digitalhealth/digitalhealthprecertprogram/default.htm.
15. Ministry of Food and Drug Safety. Guideline for regulatory approval · evaluation of medical devices using big-data and artificial intelligence technology (guide for civilians) [Internet] Cheongju: Ministry of Food and Drug Safety; 2017. [cited 2018 Aug 10]. Available from: https://www.mfds.go.kr/brd/m_210/view.do?seq=13523.
16. Ministry of Food and Drug Safety. Guideline for evaluation of clinical efficacy of medical devices using artificial intelligence (guide for civilians) [Internet] Cheongju: Ministry of Food and Drug Safety; 2017. [cited 2018 Aug 10]. Available from: https://www.mfds.go.kr/brd/m_210/view.do?seq=13613.
17. US Food and Drug Administration. National Evaluation System for Health Technology (NEST) [Internet] Silver Spring: US Food and Drug Administration; 2018. [cited 2018 Aug 10]. Available from: https://www.fda.gov/aboutfda/centersoffices/officeofmedicalproductsandtobacco/cdrh/cdrhreports/ucm301912.htm.
18. Park SH, Kressel HY. Connecting technological innovation in artificial intelligence to real-world medical practice through rigorous clinical validation: what peer-reviewed medical journals could do. J Korean Med Sci 2018;33:e152.
19. Allen B. 2018 Data Science Summit: the economics of artificial intelligence in healthcare [Internet] Reston: American College of Radiology Data Science Institute; 2018. [cited 2018 Aug 10]. Available from: https://www.acrdsi.org/-/media/DSI/Files/2018-Summit-Presentations/Regulatory-Payment-and-EcosystemAllen.pdf?la=en.
20. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, Shilton A, Yearwood J, Dimitrova N, Ho TB, Venkatesh S, Berk M. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016;18:e323.
21. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500–510.
22. Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: recalibrating expectations. JAMA 2018;320:27–28.
23. AlBadawy EA, Saha A, Mazurowski MA. Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 2018;45:1150–1158.
24. Yuille AL, Liu C. Deep Nets: what have they ever done for vision? [Internet] Ithaca: arXiv.org; 2018. [cited 2018 Aug 10]. Available from: https://arxiv.org/abs/1805.04025.
25. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, Kim N. Deep learning in medical imaging: general overview. Korean J Radiol 2017;18:570–584.
26. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 2018;15:e1002683.
27. Sica GT. Bias in research studies. Radiology 2006;238:780–789.
28. US Food and Drug Administration. De Novo classification request for IDx-DR [Internet] Silver Spring: US Food and Drug Administration; 2018. [cited 2018 Aug 10]. Available from: https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN180001.pdf.
29. US Preventive Services Task Force. Bibbins-Domingo K, Grossman DC, Curry SJ, Davidson KW, Epling JW Jr, García FAR, Gillman MW, Harper DM, Kemper AR, Krist AH, Kurth AE, Landefeld CS, Mangione CM, Owens DK, Phillips WR, Phipps MG, Pignone MP, Siu AL. Screening for colorectal cancer: US Preventive Services Task Force Recommendation Statement. JAMA 2016;315:2564–2575.
30. Kim YJ, Lee WP. The process by which new health technology is listed for insurance coverage. J Korean Med Assoc 2014;57:927–933.
31. Fenton JJ. Is it time to stop paying for computer-aided mammography? JAMA Intern Med 2015;175:1837–1838.
32. Kohli A, Jha S. Why CAD failed in mammography. J Am Coll Radiol 2018;15:535–537.
33. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL. Breast Cancer Surveillance Consortium. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 2015;175:1828–1837.
34. Choi YS. For president Moon's pledge to ease regulations on medical devices to succeed [Internet] [place unknown]: Yoon Sup Choi's Healthcare Innovation; 2018. [cited 2018 Aug 10]. Available from: http://www.yoonsupchoi.com/2018/07/22/president-regulation.
35. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Naraya-naswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016;316:2402–2410.
36. Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY, Wong EY, Sabanayagam C, Baskaran M, Ibrahim F, Tan NC, Finkelstein EA, Lamoureux EL, Wong IY, Bressler NM, Sivaprasad S, Varma R, Jonas JB, He MG, Cheng CY, Cheung GC, Aung T, Hsu W, Lee ML, Wong TY. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017;318:2211–2223.
37. INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet 2017;389:1719–1729.
38. K2 Medical Systems. K2 INFANT-Guardian [Internet] Plymouth: K2 Medical Systems [cited 2018 Aug 10]. Available from: https://www.k2ms.com/infant/default.aspx.
39. Steinhubl SR, Waalen J, Edwards AM, Ariniello LM, Mehta RR, Ebner GS, Carter C, Baca-Motes K, Felicione E, Sarich T, Topol EJ. Effect of a home-based wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation: the mSToPS randomized clinical trial. JAMA 2018;320:146–155.

Article information Continued

Figure 1.

Hierarchy of artificial intelligence-related terms.

Figure 2.

Brief schematic summary of the processes for evaluating a novel health technology used by the Health Insurance Review and Assessment Service (HIRA) and the National Evidence-based Healthcare Collaborating Agency (NECA).

Table 1.

A checklist for robust clinical validation of the performance of a machine-learning algorithm

Characteristics of the dataset used for clinical validation
Is it representative of the target patients in real-world practice for which the algorithm will be used?
Was it obtained from other institutions than those that provided the data for algorithm development?
Was it derived from multiple institutions?
Was it captured with scanners different from those used to create the data
for algorithm development (e.g., a computed tomography scanner from a different vendor)?a)
Was it obtained using acquisition parameters different from those used to create the data for algorithm development (e.g., a different radiation
dose setting or image reconstruction method for computed tomography)?a)
Was it collected prospectively?

The more of these questions receive a “Yes” answer, the more generalizable the algorithm performance is. a)Applicable to imaging data.