An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Diagnostic Pathology logo

Artificial intelligence (AI) in medicine, current applications and future role with special emphasis on its potential and promise in pathology: present and future impact, obstacles including costs and acceptance among pathologists, practical and philosophical considerations. A comprehensive review

Zubair ahmad, shabina rahim, maha zubair, jamshid abdul-ghafar.

  • Author information
  • Article notes
  • Copyright and License information

Corresponding author.

Received 2020 Nov 9; Accepted 2021 Mar 4; Collection date 2021.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

The role of Artificial intelligence (AI) which is defined as the ability of computers to perform tasks that normally require human intelligence is constantly expanding. Medicine was slow to embrace AI. However, the role of AI in medicine is rapidly expanding and promises to revolutionize patient care in the coming years. In addition, it has the ability to democratize high level medical care and make it accessible to all parts of the world.

Among specialties of medicine, some like radiology were relatively quick to adopt AI whereas others especially pathology (and surgical pathology in particular) are only just beginning to utilize AI. AI promises to play a major role in accurate diagnosis, prognosis and treatment of cancers. In this paper, the general principles of AI are defined first followed by a detailed discussion of its current role in medicine. In the second half of this comprehensive review, the current and future role of AI in surgical pathology is discussed in detail including an account of the practical difficulties involved and the fear of pathologists of being replaced by computer algorithms. A number of recent studies which demonstrate the usefulness of AI in the practice of surgical pathology are highlighted.

AI has the potential to transform the practice of surgical pathology by ensuring rapid and accurate results and enabling pathologists to focus on higher level diagnostic and consultative tasks such as integrating molecular, morphologic and clinical information to make accurate diagnosis in difficult cases, determine prognosis objectively and in this way contribute to personalized care.

Keywords: Artificial intelligence, Medicine, Pathology

Introduction

The role of Artificial intelligence (AI) in the field of medicine is constantly expanding. AI promises to revolutionize patient care in the coming years with the aim of optimizing personalized medicine and tailoring it to the use of individual patients. Medicine embraced AI slowly. Some specialties such as radiology were quick to adopt AI. Others like pathology are only now beginning to utilize AI in clinical use. In this article, we will start by describing the general principles of AI. We will then analyze its current role in medicine in general by looking at various examples of its applications in specialties such as radiology and oncology. In the second half of this review, we will discuss the current and expected future role of AI in surgical pathology, the practical, financial and regulatory difficulties involved, the future of the microscope and the fear of pathologists of being replaced by computer algorithms. We will also discuss a number of recent studies which highlight the usefulness of AI in the practice of surgical pathology.

General principles and definitions

AI can be defined as the ability of computers to perform tasks that normally require human intelligence. It refers to the development and programming of computers and devices having human-like characteristics which can perform diverse complex functions such as driving an automatic car, diagnosing a medical condition, discovering fraud and money laundering, interpreting legal advice, proving mathematical theorems etc.

Machine learning (ML)

a subfield of AI, is defined as a computational system based on a set of algorithms that attempts to analyze vast and diverse data by using multiple layers of analysis. There are a number of ways in which a computer can be programmed to make intelligent judgments and it is essential to use the right algorithms for specific purposes. ML is one of the commonest AI techniques used for processing big data. It is a self-adaptive system that provides increasingly better analysis and patterns with experience and newly added data. These techniques have evolved hand – in – hand with the digital era which has brought about an explosion of data in all forms from all parts of the world. Enormous amount of data, known simply as big data is easily and readily accessible and can be shared through applications like cloud computing.

ML applies statistical methods to automatically learn from data and experience without explicit instructions. It has seen an explosion of interest in recent years. One technique in particular, known as deep learning, has produced ground breaking results in many important problems including image classification and speed recognition.

Recent growing interest and efforts to incorporate AI into any number of industries is mainly due to the rise of deep learning. Also known as deep neural learning or deep neural network or convoluted neural network (CNN), it is a type of machine learning algorithm that uses multiple layers of data for example multiple layers of image processing to access higher level features from the image. It is a function of AI which is inspired by the neurons of the human brain and imitates the functioning of the human brain in processing data and creating patterns which can be used in decision making. It is capable of learning unsupervised from data. Deep or convoluted neural networks are the most used ML techniques in the biomedical world. These artificial neural networks are interconnected and follow mathematical models. Their field of application is vast and allows the management of ‘big data’ in genomics and molecular biology. They are most commonly applied for analyzing visual images.

Through deep learning, AI recognizes patterns using various forms of neural networks based on the availability of big data repositories. The process is inexpensive and the computational processing power is easily accessible. The neural networks can acquire data rapidly via image scanners, digital cameras, remote sensors, electronic appliances or the Internet of Things (IoT). AI has the ability to learn from both unstructured and unlabeled data. It uses a hierarchical level of artificial neural networks to carry out the process of ML. Artificial neural networks used in AI are built, as mentioned above, like the human brain with neuron nodes connected together like a web. This is in contrast with traditional computer programs which build analysis with data in a linear way. The hierarchical design of deep learning systems enables machines to process data via a nonlinear approach [ 1 ].

Big data is usually unstructured and is so vast that it could take years, even decades for humans to understand it, process it and obtain relevant information from it. The unaided human brain cannot extract meaning from such vast data sets. Thus, it is necessary to use computers to identify patterns and associations and make inferences and predictions from the data. Companies in all spheres and fields have now realized the incredible potential of unraveling this wealth of information and are increasingly adapting to AI systems. Exponential growth in computing power, data storage and sensing technology is producing a new world in which incredible amount of data can be captured and analyzed.

Owing to the breakthrough of deep learning, AI rapidly developed in the 2010s. Utilizing highly advanced computer processing power and software technology, AI is expected to radically change our lives, industries and society as a whole. It has now entered the era of full- scale practical dissemination.

AI in medicine, the present and the future

Advances in ML algorithms are resulting in the replication of many medical tasks which currently require human expertise by AI systems at levels of accuracy similar to or greater than that achieved by human experts. In medicine , deep learning applications are increasingly being trained with large amounts of annotated data sets freeing medical specialists to focus on more productive tasks and projects. The potential of AI in medicine is limitless and can serve as a great boon to improve health care delivery in clinical practice.

According to Goldenberg et al., computer based decision support systems based on ML have the potential to revolutionize medicine by performing complex tasks that are currently assigned to specialists. ML systems can increase diagnostic accuracy, increase efficiency of thorough puts, better streamline clinical workflow, decrease human resource costs and improve treatment choices [ 2 ].

However effective use of AI in medicine requires synergistic transdisciplinary competencies. In medicine, recent promising biomedical and biomarker discoveries notwithstanding, individually tailored care is still far from reality and novel therapies which emerge from preclinical trials are very rarely translatable to evaluation for their diagnostic and therapeutic potential. The discrepancy between experimental data on new anticancer molecules and their actual use in diagnosis and therapy is due to a number of factors which include biological differences between human disease and animal models, inconsistencies in experimental methodologies, wrong interpretation of experimental results, lack of validation of such data by pathologists with long term experience in animal cancer models etc. For example, personalized care in oncology requires the synergistic combination of several disciplines such as nuclear medicine, radiology and surgical pathology which represent complementary approaches to diagnosis, prognosis and evaluation of therapeutic response. A structural collaboration model between these disciplines can accelerate the achievement of a medical paradigm which takes into consideration the uniqueness of every human being. A recent study examined fifteen papers focusing on early and accurate diagnosis of breast, lung and prostate cancer and lymphoma using innovative AI applications. One example was the application of a deep neural network in discriminating between malignant breast cancer lesions in mammographic images. The authors of this study were confident that the data they examined provides the scientific rationale for further investigations in translational medicine based on the combination between surgical pathology, radiology and nuclear medicine [ 3 ].

ML is pioneering a new paradigm for scientific research. Traditionally, the classic ‘hypothesis testing’ approach is used in research in which processing of data leads to explanatory mechanisms which then suggest further experiments that in turn lead to classic findings. As a result of rapid technological advances, however, many experiments now collect vast amounts of information and can be considered ‘hypothesis generating’. Investigations in genomics and other-omics are cases in point. The advent of image digitization has led to experiments which generate enormous gigabytes or terabytes of data. Fortunately, advances in deep learning allow the derivation of important qualitative and quantitative information from images, putting visual observation on the same playing field as molecular analysis. Since deep learning is also very useful for- omic analysis, it allows the amalgamation and interpretation of image based data with – omic information, allowing this data to be used for providing new and more accurate knowledge. In this new hypothesis generating paradigm, we hunt for meaning in a huge data set instead of proceeding one logical step at a time from observation to better explanations. Thus deep learning has turned the scientific process on its end [ 4 ].

AI has already become a major element in the health care landscape. It has already become a reality providing value in many fields of medicine for example in assessment of skin lesions, evaluating fundus retinography for detection of diabetic retinopathy, radiologic diagnosis for example interpretation chest radiographs etc. These examples highlight the value of AI in aiding clinicians to improve quality, safety, diagnosis and democratization of care. For example, AI has enable radiologists to read imaging studies from anywhere in the world at their own institution bringing expert care to parts of the world where it is not available. In using AI for better medical care the investment of time and manpower to validate model data sets is a major hurdle. Because Deep learning AI identifies patterns, the data used to train the AI model must be validated by medical specialists. Additional financial resources and considerable time are required to perfect AI models which can be deployed with confidence to assist in medical practice.

The question of legal responsibility will need to be resolved before AI can become common place in medicine especially in imaging specialties such as radiology and pathology. Who will be held accountable for an action resulting from an AI based decision? Who will be responsible for an error made through the use of an AI program? It cannot be emphasized enough that legal responsibility is a significant consideration in the medical field when implementing new technologies or procedures or when developing new drugs. Use of AI is new in medicine and no legal precedents are available. No matter how much more accurate AI tools are compared to their human counterparts, the possibility that data might be misinterpreted through false positive and false negative findings cannot be excluded. The issue of legal responsibility for AI based decisions in medicine is further complicated and obscured by the lack of clarity and confusion regarding major issues such as processing of sensitive personal information and data collection, consent, transparency, storage etc. The human element will remain an important factor in incorporating AI into wide practice in hospitals. In pathology, even if the process is completely automated with a routine digital imaging system and data base management, human agreement will likely be essential [ 5 ].

Although health care was slow to adopt AI, the pace of implementation is now accelerating at an impressive rate. In 2014, the acquisition of AI startups in health care was about 600 million dollars. In 2021, it is anticipated to reach 6.6 billion dollars. One reason health care is ripe for AI is “big data”. The health care industry has rich data sets which are ideal for AI [ 6 ]. Personalized care is the major objective of both basic and translational cancer research. Building an intelligent automated entity to evaluate, diagnose and treat patients in research settings is arguably the easiest part of designing an end –to- end medical AI system. There is a lot of hype and hopes surrounding emerging AI applications in medicine but the brittleness of these systems, the importance of defining the correct frameworks for their application and the need to ensure vigorous quality control including human supervision to avoid driving patients on ‘autopilot’ towards unexpected, unwanted and unhealthy outcomes are essential factors that need to be acknowledged. Since modern machine learning algorithms perform complex mathematical transformations to the input data, errors made by computational systems will require extra vigilance for detection and interpretation [ 7 ].

AI will in all probability transform clinical practice over the next decade. As AI building an intelligent automated entity to evaluate, diagnose and treat patients in research settings is arguably the easiest part of designing an end –to- end medical AI system. There is a lot of hype and hopes surrounding emerging AI applications in medicine but the brittleness of these systems, the importance of defining the correct frameworks for their application and the need to ensure vigorous quality control including human supervision to avoid driving patients on ‘autopilot’ towards unexpected, unwanted and unhealthy outcomes are essential factors that need to be acknowledged. Since modern machine learning algorithms perform complex mathematical transformations to the input data, errors made by computational systems will require extra vigilance for detection and interpretation [ 7 ].

AI will in all probability transform clinical practice over the next decade. As AI technologies are evolving at a fast pace and machine learning models are being updated with additional pieces of information, regulatory approval is essential. Recently, the Food and Drug Administration (FDA) announced a pilot certification approach that inspects both the AI developers and the product itself [ 8 ]. Such steps can ensure public trust in novel medical AI applications. Even if an AI system is designed only to advise physicians or health care personnel rather than to carry out the actual diagnosis and treatment tasks, it may still result in unintended harmful consequences. A recent study showed that over-reliance on decision support systems resulted in increased false negative rate in radiology diagnoses compared with the study scenario when the computer-aided diagnostic system was unavailable to the same group of radiologists- this is termed confirmatory bias. In addition, inexperienced medical practitioners may overreact to excessive warning messages - this is termed alert fatigue. Thus, AI developers need to address these challenges even if the systems only play an advisory role.

Trust in medical technology is closely related to its anticipated utility. If a perception emerges that a new technology is harmful and has untoward consequences, then the barriers to its acceptance will become next to insurmountable. This issue becomes even more complicated if the technology is complex and the general public and even the domain expert cannot fully evaluate its efficacy and potential hazards. This is termed ‘Frame Problem’. Frame problem can cause medical errors that will draw the attention of the public and lead to law suits against companies or organizations using, deploying or developing medical AI applications. The ‘black box’ nature (lack of full disclosure and information about the technology) of modern machine learning algorithms may further exacerbate and aggravate the issue. High profile examples of harmful or inadequate performance will result in extra scrutiny of the whole field and may hinder the development of more robust AI systems. It must be kept in mind that data driven AI algorithms are not immune from the ‘garbage - in garbage – out’ rule. ML algorithms are designed to identify the hidden patterns of the data and generate output projections based on what they have seen in the past. As many input data sets contain artifacts or biases, the models learnt from such data carry those biases and can potentially amplify them with harmful consequences for the patients. Thus, optimized machine learning models in medicine can be confounded by their training data, may not reflect an objective clinical assessment and lead to partiality and mistakes. Thus, great attention needs to be paid to data quality and provenance in order to foster ‘patient trust’ in AI systems and to avoid unethical practices even if only due to negligence. However, this is an expensive task. In other words, since modern machine learning algorithms make complex mathematical changes to the input data, biases and errors made by computational systems will require extra vigilance to detect and interpret [ 9 ].

Let us now examine various examples of the application of AI in various subspecialties of medicine including gastroenterology, ophthalmology, dermatology, surgery, radiology, oncology etc. The promise of AI in health care is the delivery of improved quality and safety of care and the potential to democratize expertise.

Nakagawa et al. developed a deep learning based AI system for assessment of superficial esophageal squamous cell carcinoma (SCC). Invasion of tumor depth is a critical factor which affects the choice of treatment. However, assessment of cancer depth is subjective and inter observer variability is common. The authors obtained 8660 non-magnified endoscopic (non-ME) and 5678 ME images as training data set and 405 non- ME images from 155 patients as validation set. The system showed sensitivity and specificity of 90.1 and 95.8% respectively and its performance in diagnosing depth of invasion in superficial SCC was comparable to that of experienced endoscopists [ 10 ].

Similarly, Horie et al. developed a deep learning CNN and tested its ability to diagnose esophageal SCC and adenocarcinoma. They retrospectively collected 8428 training images of known esophageal carcinoma from 384 patients at their hospital and then prepared 1118 test images from 47 patients with esophageal cancer and 50 patients without cancer to evaluate diagnostic accuracy. The CNN took just 27 s to analyze the test images with a sensitivity of 98%. It detected all cancers less than 10 mm in size. The authors were confident that more training would lead to even greater diagnostic accuracy thus facilitating early diagnosis with consequent better prognosis for patients with esophageal cancer [ 11 ].

Hirasawa et al. developed a CNN to detect gastric cancer in endoscopic images. They trained their CNN- based diagnostic system using 13,584 endoscopic images of gastric cancer. To evaluate the diagnostic accuracy, an independent test set of 2296 images collected from 69 consecutive patients with 77 gastric cancer lesions was applied to the constructed CNN. The CNN correctly diagnosed 71 of 77 cancer lesions with an overall sensitivity of 92.2%. 70 of the 71 lesions (98.6%) with a diameter of 6 mm or more as well as all invasive cancers were correctly detected. All missed lesions were superficially depressed and differentiated – type intra mucosal cancers that were difficult to distinguish from gastritis even for experienced endoscopists. Thus, the system developed by Hirasawa et al. was able to process numerous stored endoscopic images in a very short time with a clinically relevant diagnostic ability and could be applied to daily clinical practice to reduce the burden of endoscopists [ 12 ].

Esteva et al. developed a deep CNN to discriminate between the most common skin cancers including malignant melanoma. They compared their algorithm against 21 board certified dermatologists in evaluating biopsy proven clinical images and demonstrated at least equivalent if not outright superiority. The authors suggested that mobile devices, like smart phones, could be deployed with similar algorithms, permitting potentially low cost universal access to vital diagnostic care anywhere in the world [ 13 ]. Digital pathology is becoming the new standard of care in dermatology and personalized medicine. This collaboration between dermatopathology and dermatology (personalized dermatology) is aimed at therapy tailored to the specific needs of each patient [ 14 ].

In another recent study, Gulshan et al. applied a deep CNN approach to a test set of more than 128,000 retinal fundus images from adult patients with diabetes to identify referable diabetic retinopathy. The algorithm that they developed demonstrated a very high sensitivity and specificity for detecting referable diabetic retinopathy and macular edema. This study showed that AI will not be used to replace physicians but rather to perform simple, cost effective and widely available examinations and analyses which could help identify at risk patients who require referral for specialty care while reassuring other patients that retinopathy was not present [ 15 ].

In a recent study, Dong et al. used machine learning algorithms to predict the disease course of Crohn’s Disease in Chinese patients. Crohn’s disease is a complex disease and it is difficult to predict its course. Their proposed machine learning model accurately predicted the risk of surgical intervention in their patients. The authors were confident that it could be used to design treatment strategies tailored to individual Cohn’s Disease patients [ 16 ].

Even in surgery, the same critical question is now being asked. How will the digital revolution and AI change surgical practice? How will it translate in the practice of surgery? A recent study by Wall and Krummel believes that AI may impact surgery in the near future in three main areas. These include enhancement of training modalities, cognitive enhancement of the surgeon and procedural automation. The authors admit that there have been unanticipated missteps in the use of these technologies but have little doubts about their adoption in surgical practice. The authors agree that the promise of big data, AI and automation in surgery is high and believe that surgeons in the near future will need to become “digital surgeons” and must be prepared to adopt smarter training modalities, supervise the learning of machines that can enhance cognitive function and ultimately oversee autonomous surgery without allowing for a decay in their operating skills [ 17 ].

AI in radiology, already a success story

In radiology, AI is improving accuracy in diagnostic imaging. As patient images can be directly acquired in digital form for central archival and soft copy review, radiology practice has readily incorporated AI into clinical practice. An established digital imaging infrastructure has allowed a seamless embedding of AI into the radiology workflow. Large amounts of data can be translated and transmitted within minutes. For patient images generated by different imaging modalities such as X-ray, Magnetic Resonance Imaging (MRI), Computed Tomography (CT), ultrasound, and mammogram, deep learning AI can be automated to pinpoint accurately the areas of interest and diagnosis. Image recognition using AI with deep learning through CNNs has dramatically improved and is being increasingly applied for diagnostic imaging [ 5 ].

Radiology converted to digital images more than 25 years ago. It is well positioned to deploy AI for diagnostics. Several studies have shown considerable success in the use of AI for evaluating a variety of scan types including mammography for breast lesions, CT scans for lung nodules and infections and MRI images for brain tumors. By converting to digital images, radiology eliminated film, chemicals, developers and storage of films and solved problems related to loss of films and transport of films to where they are needed for example intensive care units (ICUs), operating rooms (ORs) and emergency departments. There was inherent value within these images for greater learning using computers to improve the quality, safety and efficiency of radiologists. However, the role of radiologist remains crucial. For example, chest x-ray films with atypical features would still need to be reviewed by radiologists to ensure that artifacts or unusual clinical contexts were adequately captured. An AI system will need to be continually calibrated by human feedback.

Lee et al. developed a deep learning based computer-aided diagnosis (CAD) system for use in diagnosis of cervical lymph node metastases by CT scan in thyroid cancer. The authors collected 995 axial CT images (647 benign and 348 malignant) from 202 patients who underwent CT for planning of surgery. Their system was able to classify cervical lymph node metastases with a high degree of accuracy. The authors believe that their model may be useful in the clinical setting for the above purpose [ 18 ].

A 2019 radiology based study tested the effect of AI in automatic identification of lung cancer nodules (1 mm and 5 mm thick) by CT in patients with stage T1 lung cancer and compared the results with manual screening of T1 lung cancer nodules by CT. It needs to be understood that using CT to screen lung cancer nodules is a huge workload. The AI recognition technology used in the study learned by computer neural network methods 5000 cases of stage T1 lung cancer patients with 1 mm and 5 mm thickness nodules. Following this learning, 500 cases of chest CT in stage T1 lung cancer patients with 1 mm and 5 mm thickness nodules were tested with AI and results were compared with those of artificial manual reading by radiologists. The detection rates were similar and no significant differences were noted. Thus, automatic learning of early lung cancer chest CT images by AI showed high specificity and sensitivity for early lung cancer identification and may prove invaluable in the near future in assisting doctors in the early diagnosis of small lung cancer nodules [ 19 ].

Value of a synergistic approach between disciplines using AI for cancer prognostication and therapy: AI, through the use of convolutional networks is also expected to play an important role in prediction of cancer outcome. Survival outcome is the most important outcome for cancer patients so that they can plan for themselves and their families. Thus, determining cancer progress is crucial to controlling suffering and death due to cancer. As the histologic diagnosis of cancer is an essential initial step in determining the line of therapy, pathologists have a great responsibility in the diagnosis of cancer. Histologic grading and staging systems based on the size or extent of invasion of the primary tumor, involvement of regional lymph nodes and presence or absence of distant metastases are currently used to predict the biologic behavior of cancer. In addition to histology, several other tools including genomic markers, gene expression and epigenetic modifications are also used to predict outcome in cancer patients [ 20 ].

Since personalized medicine is the main objective of cancer research, it is essential to form an alliance between imaging diagnostics (radiology and nuclear medicine) and surgical pathology since a structured collaboration model between these disciplines can speed up the achievement of a paradigm for personalized medicine. A recent study presented automatic glioma grade identification from MRI images using deep convolutional neural network. Glioma images were collected from government hospitals and the AI system categorized the tumors into four grades: low grade glioma, oligodendroglioma, anaplastic glioma and glioblastoma multiforme. The results showed reasonably good performance with high classification accuracies [ 21 ]. Another recent study by Mobadersany et al. utilized digital pathology images and genomic markers to predict overall survival of brain tumors. The authors examined the ability of AI in predicting overall survival in diffuse gliomas. Histologic grading and genomic classifications have independent prognostic power in such predictions. Until the recent past, histologic diagnosis and grading were used but the recent emergence of molecular subtyping has resolved the uncertainty related to lineage. Criteria for grading gliomas need to be redefined in the context of molecular subtyping. Improving the accuracy and objectivity of glioma grading will directly impact patient care by identifying patients with aggressive disease who require more aggressive therapeutic regimens and sparing those with less aggressive disease from unnecessary treatments. Currently, all grade III and IV diffuse gliomas are typically treated very aggressively with radiation and concomitant chemotherapy. In the above mentioned study, the AI software learned about survival from histologic images and created a unified framework which interpreted histology and genomic biomarkers for predicting time to event outcomes. The ability of predicting patient outcomes by AI was found to be more accurate than that of surgical pathologists. This study provides insights into applications of deep learning in medicine and the integration of histology and genomic data and provides methods for dealing with factors such as intratumoral heterogeneity [ 22 ]. Similarly, Muneer et al. used AI techniques for glioma grade identification and their results were excellent with greater than 90% accuracy [ 23 ]. The clinical success of Immunotherapy is driving the need to develop new prognostic and predictive assays for patient selection and stratification. This can be achieved by a combination of computational pathology and AI. A recent study critically assessed various computational approaches which can help in the development of a standardized methodology in the assessment of immune oncology biomarkers such as PDL-1. The authors discussed how integrated bio informatics allow the amalgamation of complex morphological phenotypes with AI. They provided an outline of ML and AI tools which can be applied in immuno -oncology for example pattern recognition in large and complex data sets and deep learning approaches for survival analysis. They were hopeful that combinations of surgical pathology and computational analysis will improve patient stratification in immuno – oncology. The authors are convinced that future clinical demands will be best met by dedicated research at the interface of surgical pathology and bio informatics supported by professional societies and by incorporating data sciences and digital image analysis in the professional education of pathologists [ 24 ].

AI in oncology

In Oncology, new AI intelligence platforms could in future assist in making therapeutic decisions in cancer patients [ 25 ]. In 2016, the results of a double-blind validation study were presented at the San Antonio Breast Cancer Symposium which demonstrated a strong concordance between treatment recommendations by a panel of oncologists and Watson for oncology (WFO), an AI platform which was developed by IBM Corporation, (Armonk, NY) in collaboration with Memorial Sloan Kettering Cancer Center. WFO computing system has the ability to extract and assess large amounts of structured and unstructured data from medical records. It then uses natural language processing and machine learning to present cancer treatment options. In this study, the authors compared the concordance between WFO and the multidisciplinary tumor board of the institution (a group of 12 to 15 oncologists). The degree of concordance was analyzed in 638 patients with breast cancer who had been treated at the hospital. The time it took for each method to issue recommendations was also analyzed. The study showed that 90% of WFO’s recommendations were concordant with those of the tumor board. Nearly 80% of the recommendations were concordant in patients with non – metastatic breast cancer. However, concordance was only 45% in patients with metastatic disease, 68% in patients with triple negative breast cancer and 35% in patients with HER 2/neu negative breast cancer. The authors noted that patients with triple negative breast cancer have fewer treatment options compared to those with HER 2/neu negative breast cancer. More complicated cancers lead to more divergent opinions regarding treatment. The authors found that it took an average of 20 min to manually capture and analyze the data and generate recommendations. However, after the oncologists gained more familiarity with the cases, the mean time decreased to approximately 12 min. WFO, on the other hand took only 40 s to capture and analyze the data and make recommendations. According to the authors of this study, WFO can provide treatment recommendations not only for patients with breast cancer but also for those with lung and colorectal cancer. The lead author’s institution in India recently adopted the WFO system to support oncologists in making quality, evidence based treatment decisions in cancer patients. The study however cautioned that though AI is a helpful step toward personalized medicine, it can only complement the physician’s work, not replace it. This is because when dealing with humans, many factors such as the context and preference of each patient, the patient-physician relationship, human touch and empathy are present which cannot be addressed by a machine [ 26 ].

Liu et al. in 2018 published a feasibility study using WFO to assess its ability to make treatment recommendations in Chinese patients with lung cancer. In the authors’ own words, “WFO is an outstanding representative AI in the medical field, and it can provide to cancer patients prompt treatment recommendations comparable with ones made by expert oncologists”. WFO is increasingly being used in China. The authors selected all lung cancer patients who were hospitalized and received antitumor treatment for the first time at their hospital. They used WFO to make treatment recommendations for their patients and then compared these recommendations to those made by (or treatment regimens administered by) their expert multi-disciplinary team. Almost 66% recommendations made by WFO were consistent with the recommendations made by the oncology experts. They concluded that though WFO recommendations were consistent with the experts in the majority of cases, they were still inconsistent in a significantly high proportion of cases and thus, WFO could not currently replace oncologists. They asserted that WFO can improve the efficiency of clinical work by providing assistance to doctors but it needs to learn the regional, ethnic characteristics of patients to improve its assistive ability [ 27 ].

AI in surgical pathology: its potential and promise, difficulties and obstacles, current applications and future directions

More than a decade ago, a number of articles described in detail the steps by which AI could be applied for routine tissue based diagnosis using ‘virtual slides’ or in other words the presentation of microscopic images as a whole in a digital matrix. These steps include the measurement of individual image quality and its correction if unsatisfactory, development of a pixel based diagnostic algorithm and its application in diagnosis and classification and feedback to order additional information. Examples include virtual Immunohistochemical slides, automated image classification, detection of relevant image information, and supervision by pathologists. These early studies hoped that pathologists will no longer be primary “water carriers” but will work as supervisors at a “higher level” and AI will allow them more time to concentrate on difficult cases for the benefit of their patients. Even in 2009, virtual slides were already in use for teaching and continuous education and first attempts to introduce them into routine work had begun. At that time the implementation of a complete connected AI supported system was in its childhood [ 28 ].

Advances in the quality of whole –slide images have set the stage for the clinical use of digital images in surgical pathology. Along with advances in computer image analysis, this raises the possibility for computer assisted diagnostics in pathology to improve histopathologic interpretation and clinical care.

Pathology was late to adopt digital imaging and computer assisted diagnostic technologies. This is partly due to practical and financial obstacles. It needs to be understood that unlike radiology, many of the practical benefits cannot be achieved with pathology digitization. A surgical pathology workflow that includes digital pathology will not reduce or remove the need to produce and ultimately store glass slides. Instead of any reductions, digital pathology will require additional workflows, personnel, equipment and importantly storage of data (it is estimated that digital pathology images constitute at least ten times larger files than radiology images) all on top of an already financially and operationally stressed health care system. Digital pathology will definitely bring some advantages especially in areas such as rapid teleconsultations with experts, quality and safety etc. Thus, proof of definite clinical value will be essential for widespread adoption of digital pathology. Given that digital pathology is likely to be costlier. AI in pathology will need to demonstrate improved efficiency, quality and safety [ 4 , 6 ].

A major challenge to the deployment of digital pathology was recently addressed. In April 2019, Philips received FDA clearance for a Pathology Solution to be used for primary pathology diagnostics. This device is used for scanning glass pathology slides and for reviewing these slides on computer monitors. This Pathology Solution has already been established as a predicate device that could pave the way for a host of other FDA approved whole-slide scanners for primary diagnostics to become available in the coming years. Thus, the expanding role of AI in health care, the reduced costs of digital data and the availability of usable digital images are now in alignment for digital pathology to succeed [ 6 ].

Telepathology is defined as the practice of routine pathology using telecommunication links to enable the electronic transmission of digital pathology images. It can be used for remotely rendering primary diagnoses, second opinion consultations, quality assurance, education and research. Until recently the use of telepathology in clinical patient care was limited mostly to large academic institutions. In addition to prohibitive costs and legal and regulatory issues technological problems, lack of universal standards and most importantly resistance from pathologists slowed its widespread use. The adoption of telepathology practice is likely to expand in order the meet the increased demands for subspecialist consultation, and as technology advances, to improve diagnostic accuracy and work flow [ 29 ].

There are five main categories of digital or (tele) pathology i.e. static, dynamic, robotic, whole slide imaging (WSI) and hybrid methods. Telepathology systems from any of these categories can be used and have been found to provide timely and accurate diagnoses similar to conventional microscopy. It is important that these systems meet clinical needs and are validated for the intended use. The decision to purchase a particular system will depend on the clinical application, specific needs and budget of the laboratory as well as the personal preferences of the telepathologists involved [ 30 ].

In addition, the recent development of tissue clearing technology introduces the possibility of 3D pathology which allows for the collection of the 3D context of tissue and would contribute to increased accuracy of automatic pathological diagnosis by machine learning [ 31 ].

Automated analysis of histological slides is now possible through WSI scanners which can acquire and store slides in the form of digital images. This scanning associated with deep learning algorithms allows recognition of lesions through automatic recognition of regions of interest previously validated by the pathologist. These computers aided diagnostic techniques have already been tested in breast pathology and dermatopathology [ 32 ].

A 2016 study tested techniques for preprocessing of free-text breast cancer pathology reports with the aim of facilitating the extraction of information relevant to cancer morphology, grading and staging. These techniques included using freely available software to classify the reports as semi-structured or unstructured based on their general layout, and using an open source language engineering framework to predict parts of the report text which contained information relevant to the cancer. The results showed that it was possible to predict the layout of reports and that the accuracy of prediction as to which segments of a report contained relevant certain information was sensitive to the report layout and the type of information sought [ 33 ].

The digital revolution is transforming the practice of diagnostic surgical pathology by integrating image analysis and machine learning into routine surgical pathology. Thus a clear need is being felt for a robust and evidence based framework in which to develop these new tools in a collaborative manner that meet regulatory approval. A number of regulatory steps have been approved and implemented by the FDA in the United States and the NCRI Cellular Molecular Pathology (CM-Path) initiative and the British In Vitro Diagnostic Association (BIVDA) in the United Kingdom. These bodies have set out a road map to help academia, industry and clinicians develop new software tools to the point of approved clinical use. A 2019 study compared two commonly used CNNs in surgical pathology because it is often assumed that the quality and format of the training image as well as the number of training images in different convolutional networks impacts the accuracy of diagnosis. The authors photographed 30 hematoxylin and eosin (H&E) stained slides with normal tissue or carcinoma from breast, colon and prostate generating 3000 partially overlapping images (1000 per tissue type). They found that the images from the two convolutional networks were similar in their accuracy and large numbers of unique H&E stained slides were not required for training optimal ML models in diagnostic surgical pathology. The authors reinforced the need for an evidence based approach by comparing different ML models in order to achieve the best practices for histopathological ML ensuring an accurate diagnosis [ 34 , 35 ]. Added value of quantitative AI in pathology includes the confirmation of equivocal findings noted by a pathologist, increasing sensitivity of feature detection, improving efficiency etc.

Role of AI in surgical pathology in poor countries

Many authorities believe that there is no denying that quantitative AI is part of the future of pathology. However, the significant cost issues involved may be currently prohibitive for poor, developing countries. In these countries, the focus on ‘telepathology’ as a possible solution will be inadequate. Incorporating AI in telepathology can provide temporary solutions until requisite financing schemes are implemented. AI is especially applicable to surgical pathology because diagnosis depends on pattern recognition which is a useful quality for digital applications that depend on machine learning. AI has the potential to create online data repositories that can be used for the diagnosis of pathological specimens (e. g breast cancer) worldwide, greatly reducing human and infrastructural resource burden. A recent study showed that machine learning algorithms achieved potentially faster and more accurate diagnoses than 11 pathologists in a simulated setting. It is obvious that incorporation of AI in telepathology will demand high resources and entail heavy costs. These issues could be directly addressed if value is clearly demonstrated and results in government and third-party payers investing in reimbursement strategies for its use in pathology. The recognition of AI as part of reimbursement strategies that reward value-based care would provide important incentives to develop and implement validated algorithms. In developed western countries, the private sector is showing interest in investing in AI and other new technologies in healthcare. AI incorporation in pathology should be explored as a bold and creative idea in developing countries to motivate investment by private sector. The global efforts currently going on towards developing sustainable pathology and laboratory medicine services in low and middle income countries should include AI in pathology especially surgical pathology [ 29 , 30 , 34 , 36 ].

However, other authorities disagree, at least in the context of surgical pathology. Some believe that although AI will play some role in diagnosis in the future, it could actually detract attention from proven, basic investments that are necessary to provide access to pathology and laboratory medicine services in low and middle income countries (LMICs). They argue that the use of AI in surgical pathology is still in its infancy and its ability to generate accurate diagnoses is yet to be proven. They believe that AI in telepathology will only be implemented in LMICs if it has first been successfully implemented in high income countries (HICs) where currently its role in day to day patient care remains unclear. These authors further argue that although telepathology could be used within integrated, tiered laboratory networks in LMICs, with slides prepared and scanned in lower levels and transferred to higher levels within the network for interpretation and consultation, telepathology relies on access to technology that is neither affordable nor practical in these countries. Most LMICs even lack the capacity to generate the slides needed as a prerequisite for telepathology and do not have access to these integrated networks. Even if it was affordable and possible to develop histopathology services, implement telepathology systems and transmit images taken in LMICs to pathologists in HICs as a temporary solution, there are insufficient numbers of pathologists in HICs to interpret images for large numbers of patients even in their own countries [ 37 ].

Will AI replace microscopes and pathologists? Notwithstanding the many difficulties and obstacles, it is now widely accepted that the use of AI will transform clinical practice over the next decade and some authors believe that an early impact of this will likely be the integration of image analysis and ML into routine surgical pathology. With a digital revolution transforming the reporting practice of diagnostic surgical pathology, a proliferation of image analysis software tools has resulted worldwide. This has ignited a hot debate among pathologists whether with increasing availability and refinement of image analysis software, surgical pathologists will ultimately be replaced by computer algorithms. In other words, will AI algorithms and computer programs replace pathologists and what therefore is the future of surgical pathologists? It is already widely accepted that AI algorithms “will be incredibly useful in medical research, diagnosis (and) complex treatment planning”. Currently, there appear to be many hurdles to replacing human microscopists with computer algorithms. On the practical side, as discussed previously there are significant financial barriers and costs to incorporating slide scanners and computers into pathology workflow, although presumably hospitals would undertake these steps if it was proved that computer algorithms improve diagnostic accuracy or increase the efficiency of pathologists [ 5 , 38 ].

An especially important and interesting question is: will computer algorithms surpass humans in diagnostic abilities? Some authorities are skeptical and believe that notwithstanding the success of AI in radiology and cardiology, it is at present difficult to envision how AI can be integrated effectively into routine pathology practice. Pathology departments generate high resolution microscopy images which unlike radiology and cardiology do not correlate to equivalent standardized digital imaging formats and workflows. Images in pathology require a manual process of tissue biopsy, specimen preparation and staining before digitization. Currently, development of such state - of - the - art computer vision algorithms requires millions of training images. Although WSI which involves scanning the whole tissue on glass slides and digitizing the images is useful and allows many pathology slides to be analyzed efficiently within a relatively short period of time, the system nevertheless suffers from complications associated with acceptance, speed, the ability to digitize all types of tissues as well as issues of data resolution, storage and regulation. More importantly, establishing a whole slide image database of millions of images is currently not practical. Although there are research projects experimenting with digitized pathologic images, there is currently no standardized digital pathologic imaging workflow. Another problem is the size of data. It is currently impossible to directly feed whole slide images into algorithms because each one contains about 10GB of data. However newer studies have demonstrated a promising approach to circumvent this problem by dividing whole slide images into smaller patches and then training an algorithm to classify these patches into different categories. Once this is done, statistical summaries of patch diagnosis are fed into a machine learning algorithm to classify the entire image into a single diagnosis. Recent studies have shown that algorithms developed in this way are able to distinguish subtypes of non – small cell carcinoma of lung with an accuracy similar to that of expert pulmonary pathologists. In breast cancer, combining the predictions of human pathologists and algorithms led to an 85% decrease in human error in detecting metastatic breast cancer [ 5 , 39 – 41 ].

In 2017, FDA approved the first WSI system which may encourage the pathology community to begin standardizing and using digitization on a larger scale thereby streamlining the exchange of information. The multidimensional nature of radiology or cardiology images allows them to be viewed on a 2 or 3 or 4 dimensional plane which provides rich information via AI pattern recognition. On the other hand, pathology images are digitized for 2 dimensional imaging of the sample which does not provide as much information as it would on a 3 or 4 dimensional plane. Currently, in the absence of standardized pathologic imaging workflow, the exchange and translation of information to other information systems, physicians and health systems is very difficult. Large centralized image archives of digitized pathologic images are not accessible compared to other medical imaging archives creating additional obstacles to the successful integration of AI into pathology practice. The development of such large centralized image archives or databases will require immense capital investment which may not be feasible for smaller hospitals. Thus even assuming that high speed AI algorithms can be developed to accurately detect and diagnose digitized pathology images, the gain in productivity from automation may be low in the face of the immense financial costs involved. One viable solution to this problem may be pathologic imaging management in the cloud computing environment provided adequate security privacy issue safeguards are ensured and the speed of image transfer over the internet is sufficient. If these are successful and a standardized digital imaging infrastructure in pathology is established, it will allow AI to become a powerful asset to pathologists who seek to better bridge the gap between research and patient care. In the near future, it is likely that technological advances like highly efficient automated whole slide scanner systems, innovative AI platforms, and pathologist friendly image annotation and analysis systems will become increasingly prominent in the daily professional lives of pathologists [ 5 ].

Many authors now predict that computers will become increasingly integrated into the pathology work flow and will especially be useful when they can improve accuracy in answering questions which are difficult for pathologists. It is predicted that computer programs i will be able to count mitotic figures or quantitatively grade immunohistochemistry stains more accurately than pathologists and could identify regions of interest in cytopathology slides thus reducing the time a pathologist would need to spend in screening. Some authors also predict that, over time, as computers gain more and more discriminatory abilities, they will reduce the amount of time it takes for pathologists to render diagnoses and in the process reduce the demand for pathologists as microscopists potentially enabling pathologists to focus their cognitive resources on higher level diagnostic and consultative tasks such as integrating molecular, morphologic and clinical information to assist in treatment and clinical management decisions for individual patients, in other words on personalized care. It is predicted that digital pathology, WSI and AI will be synergistic technologies to human cognition. The question of “human versus computer” is already being refined to “human versus human with computer”. It is believed that AI will enable pathologists to focus more on higher level cognitive tasks by performing the repetitive detailed tasks which require accuracy and speed and which humans find mind-numbing and consequently error prone. Pathologic diagnosis is considered to be a well-thought-out cognitive opinion, benefiting from the pathologists’ training and experience and subject to their biases. It is argued that the professional value of pathologists comes from their ability to give the most appropriate (even if not the most perfect) opinion in the clinical context and that human pathologists constantly recalibrate their diagnosis based on even small but significant bits of clinical and patient specific information provided through physician notes, pathology reports, verbal or written communications with clinicians etc. It is believed by many authors that a person working in partnership with an information resource is better than that same person unassisted. In other words, they favor “human versus human with computer” rather than “human versus computer”. They believe that a sunny era of AI assistance in pathology is on the horizon and do not believe in dark clouds of AI competition replacing pathologists. However, at the same time, they are realistic enough to recognize that eventually even the cognitive lead of human pathologists will narrow as new and better AI products emerge. Currently the whole frame work of AI, digital pathology and WSI depends on financial factors and remains undefined. It may, ironically, depend on human ability to overcome financial, technological and regulatory obstacles [ 42 , 43 ].

Education of pathologists will be the greatest challenge and will require the longest times. AI methods will need to be integrated into all pathology training programs. Future generations of pathologists will need to be comfortable using digital images and other data in combination with computer algorithms in their daily practice. Optimistically, 5 to 10 years will be required to build such a work force even in developed countries and that too only if the process begins now. Many believe that AI may be just what pathology has been waiting for. While still requiring evaluation within a normal surgical pathology workflow, deep learning has the opportunity to assist pathologists by improving the efficiency of their work, standardizing quality and providing better prognostic information. Like in the case of immunohistochemistry and molecular diagnostics, there is little risk of pathologists being replaced. Although their workflow will likely change, their contribution to patient care will continue to be critically important. The diagnostic process is too complicated and diverse to be trusted to hard-wired algorithms alone. It is hoped that AI and human pathologists will be natural cooperators, not natural competitors. Thus, it appears AI will not replace the microscopist or take over pathology anytime soon. The hypothesis that intuition and creativity combined with the raw computing of AI heralds an age where well designed and executed AI algorithms will solve complex problems and replace the microscopist are not true. The microscope will probably be around for a long time. However, AI will likely play an increasingly important role in diagnostic microscopy. In the words of Granter et al., “winter may be coming but hopefully it will be gentle and mild.” [ 6 , 38 , 42 ].

Current applications of AI in pathology

Let us now examine various examples of application of AI in practical surgical pathology especially in cancers and its impact on patient care.

As far back in 2007, Wild et al. used AI to predict the risk of progression to muscle invasion in non-muscle invasive bladder cancers. They integrated urinary bladder cancer arrays with artificial neural networks for this purpose. Although the recurrence rate of non-invasive bladder cancer is high, the majority of tumors are indolent and can be managed by endoscopic means alone. On the other hand, the prognosis of muscle invasion in bladder cancer is poor and radical treatment is required if cure is to be obtained. The authors developed a predictive panel of 11 genes to identify tumor progression. They found that the combination of genes analyzed using artificial neural networks was able to significantly stratify risk of tumor progression which was very difficult to identify by clinicopathological means [ 44 ].

In lung cancer, computational analysis of histological images using AI is now being increasingly applied in order to improve the diagnostic accuracy. Tumor phenotype usually reflects the overall effect of molecular alterations on the behavior of cancer cells and provides a practical visual reading of the aggressiveness of the tumor. However, in some cases, the human evaluation of histological images is subjective and lacks reproducibility. AI is now being increasingly used in routine clinical practice for the optimization of histological and cytological classification, prognostic prediction and genomic profiling of patients with lung cancer. However, there are still several challenges which need to be addressed for successful utilization of AI in the accurate diagnosis and prognostication of lung cancer [ 45 ].

Histopathological assessment of lung cancer slides is usually accurate in reaching a correct diagnosis. However, the prediction of prognosis is not very accurate. In a 2016 study, Yu et al. used fully automated microscopic pathology image factors to predict the prognosis of non-small cell lung cancer. They obtained 2186 H&E stained whole slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA) and an additional 294 histological images from Stanford Tissue Array (STA) database. From these images, they extracted 9879 quantitative image features and selected the top features by using regularized machine learning methods. They were able to distinguish short term survivors from long term survivors in patients with stage I adenocarcinoma and squamous cell carcinoma and showed that automatically derived image features can predict the prognosis of lung cancer patients accurately. Their findings were statistically significant. The authors are confident that their methods are extensible to histopathology images of cancers from other organs [ 39 ].

In a 2018 study, Kumar et al. built an automatic informatics methodology capable of identifying statistically significant associations between clinical findings of non-small cell lung cancers (NSCLC) in unstructured texts of patient pathology reports and the various clinically actionable genetic mutations identified from next generation sequencing (NGS) in NSCLC (EGFR, KRAS, BRAF and PIK3CA). Their findings were statistically significant ( p -value < 0.05) and showed associations to mutations in specific genes which were consistent with published literature. Like Yu et al., Kumar et al. were also confident that their approach is extensible to other cancers and provide the first steps toward understanding the role of genetic mutations in the development and treatment of different types of cancer [ 40 ].

Considerable work utilizing AI has already been done in breast cancer pathology. In breast cancer, the most common malignancy in women worldwide, earlier diagnosis and better adjuvant therapy have substantially improved patient outcome in recent decades. Although pathological diagnosis has proved to be instrumental in guiding breast cancer treatment, new challenges have emerged as increased understanding of breast cancer in the last few years has revealed its complex nature. As patient demand for personalized breast cancer therapy grows, there is an urgent need for more precise biomarker assessment and more accurate histologic diagnosis to make better therapy decisions. The digitization of pathology data has opened the door to faster, more reproducible and more precise diagnosis through computerized image analysis. Software to assist diagnostic breast pathology through image processing techniques have been around for years. But recent breakthroughs in AI promise to fundamentally change the way breast cancer is detected and treated in the near future [ 46 ].

Nodal metastasis of breast cancer, or any other cancer for that matter, influences therapy decisions. Identification of tumor cells in lymph nodes can be laborious and error prone especially for small tumor foci. Steiner et al. developed a deep learning algorithm for the detection of breast cancer metastases in lymph nodes. In this study, six pathologists reviewed 70 digitized slides from lymph node sections unassisted and assisted by algorithms. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high likelihood of containing tumor. Algorithm assisted pathologists demonstrated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micro metastases (91% vs 83%, p -value = 0.02). Average review time per image was also significantly shorter with assistance than without assistance for both micro metastases and negative images ( p -value = 0.002 and 0.018 respectively). The pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micro metastases to be significantly easier when interpreted with assistance ( p -value = 0.0005). This study demonstrates the potential of a deep learning algorithm to improve pathologist accuracy and efficiency in a digital pathology workflow [ 47 ].

In a 2017 study, Yala et al. developed a machine learning model to extract pertinent tumor characteristics from breast pathology reports which enabled them to create a large database. Their system was trained to extract 20 separate categories of information. They trained their system from two data sets which consisted of 6295 and 10,841 manually attenuated reports. It is important to note that extracting information manually from electronic medical records is a time-consuming and expensive process. The authors tested the accuracy of their model on 500 reports that did not overlap with the training set. The model achieved accuracy of 90% for correctly parsing all carcinoma and atypia categories for a given patient. The average accuracy for individual categories was 97%. Using this classifier, they created a database of 91,505 breast pathology reports from which information was extracted. They also developed a user – friendly interface to the data base that allows physicians to easily identify patients with target characteristics. The authors believe that their model has the potential to reduce the effort required for analyzing large amounts of data from medical records and to minimize the cost and time required to extract scientific information from these data [ 48 ].

In a 2018 study, Bychkov et al. developed and trained a deep learning network to predict outcome of colorectal cancer based on images of tumor tissue samples. They evaluated a set of digitized H&E stained tumor tissue microarray (TMA) samples from 420 colorectal cancer patients with known clinicopathological and outcome data. Their results showed that deep learning based outcome prediction with only small tissue areas as input outperformed visual histological assessment performed by human experts in stratification of patients into low risk and high risk categories. They suggested that state –of-the-art deep learning techniques can extract more prognostic information from the tissue morphology of colorectal cancer than an experienced human observer [ 49 ].

Yoshida et al. in a 2018 study evaluated the classification accuracy of a newly developed e – Pathologist image analysis software in gastric biopsies. They obtained and stained 3062 consecutive gastric biopsy specimens and digitalized the specimen slides. Two experienced gastrointestinal pathologists evaluated each slide for histological diagnosis. The authors compared the three tier (positive for carcinoma or suspicion of carcinoma; caution for adenoma or suspicion of a neoplastic lesion; or negative for a neoplastic lesion) or two – tier (negative or positive) classification results of human pathologists with the e-Pathologist. Overall concordance rate was 55.6%. For negative specimens, concordance rate was 90.6% but for positive biopsy specimens, concordance rate was less than 50%. For the two tier classification, sensitivity and specificity were 89.5 and 50.7% respectively. The authors concluded that although there are limitations to the application of automated histopathological classification of gastric biopsy specimens in the clinical setting, the results show promise for the future [ 50 ].

The shortage of well – annotated pathology image data for training deep neural networks is currently a major issue because of the high costs involved. To overcome this, transfer learning techniques are generally used to reinforce the capacity of deep neural networks. In order to further boost the performance of the state-of-the-art deep neural networks and overcome the insufficiency of well annotated data, Qu et al. presented a novel stepwise fine tuning-based deep learning scheme for gastric pathology image classification. Their proposed scheme proved capable of making the deep neural network imitating the pathologist’s professional observations and of acquiring pathology – related knowledge in advance but with very limited extra cost in data annotation. They conducted their experiments with both well-annotated gastric pathology data and the proposed target-correlative intermediate data on several state-of-the art deep neural networks. Their results demonstrated the feasibility and superiority of their proposed scheme for boosting the classification performance [ 51 ].

A recent study by Liu et al. evaluated the application and clinical implementation of a state-of-the-art deep-learning-based AI algorithm. (Lymph Node Assistant or LYNA) for detection of metastatic breast cancer in sentinel lymph node biopsies. They obtained whole slide images from H&E stained lymph nodes from 399 patients (publicly available Camelyon 16 challenge data set). LYNA was developed by using 270 slides and evaluated on the remaining 129 slides. The findings were compared with 108 slides (86 blocks) from 20 patients obtained from an independent laboratory using a different scanner to measure reproducibility. LYNA achieved a slide-level area under the receiver operating characteristic (AUC) of 99% and a tumor level sensitivity of 91%. It was not affected by common histological artifacts such as poor staining, air bubbles or over fixation. The AI algorithm exhaustively evaluated every tissue patch on a slide and achieved higher tumor-level sensitivity than, and comparable slide-level performance to, pathologists. This study once again showed that AI algorithms may improve the pathologist’s productivity and reduce the number of false negatives associated with morphologic detection of tumor cells. The authors provide a frame work to aid practicing pathologists in assessing AI algorithms for adoption into their workflow (akin to how a pathologist assesses immunohistochemistry results) [ 52 ].

An International contest was held recently to have a machine detect sentinel lymph node metastases of breast cancer. It was termed the CAMELYON 16 grand challenge (Cancer Metastases in Lymph Nodes Challenge) and different teams submitted deep learning algorithms for this purpose. The findings of AI algorithms were compared with those of a panel of 11 pathologists with varying degrees of expertise in breast pathology. Although the top algorithms performed better than the 11 pathologists when they were under time constraint, they did not perform differently from the pathologists when the latter had unlimited time. However, the fact that the algorithms detected nodal micro metastases at the same rate or better than pathologists was exciting. The fact that pathologists were able to do as well or even better than the algorithms when there were no time constraints is significant. The CAMELYON 16 challenge highlights a significant opportunity for AI in pathology, namely assisting pathologists with screening for lesions [ 53 ].

In surgery for laryngeal carcinoma, preservation of adjacent healthy tissue is very important. Thus, accurate and rapid intraoperative histology of laryngeal tissue is critical to achieve optimal surgical outcomes. Zhang et al. used deep-learning based Stimulating Raman Scattering (SRS) microscopy to provide accurate automated diagnosis of laryngeal squamous cell carcinoma on fresh surgical specimens without fixation, sectioning, staining or processing. The authors demonstrated near perfect concordance between SRS and standard histology evaluated by pathologists. Their deep learning based SRS classified 33 surgical specimens with 100% accuracy. The authors contended that SRS histology integrated with deep learning algorithms has the potential for providing rapid intraoperative diagnosis which could help in the surgical management of laryngeal carcinoma [ 54 ].

Yadav et al. recently showed how AI is optimizing the detection and management of prostate cancer through integration of machine learning based identification of Gleason scores from pathology slides with genomics, imaging (especially MRI) and biomarkers [ 55 ].

It is predicted that the laboratory of the future will be more highly automated and dominated by robotics and will be more connected to take advantage of the benefits of AI and the Internet of Things [ 56 ].

AI, the theory and development of computer systems which can perform tasks which normally require human intelligence, is slowly becoming part of everyday modern life. Health care was slow to embrace AI but the pace of implementation has now picked up. Computer based decision support systems based on Machine Learning can perform complex tasks which are currently assigned to specialists. This has the potential to revolutionize medicine by increasing diagnostic accuracy, improving clinical workflow, decreasing costs of human resources and improving therapeutics. Growing interest in AI and machine learning in diverse industries including health care is mainly due to the rise of Deep Learning, a process through which AI recognizes patterns using various forms of neural networks which resemble the human brain and which are in turn based on the availability of big data repositories. The promise of AI in health care is to deliver improved quality and safety of care and to democratize expertise through use of mobile devices such as smart phones which can be deployed with algorithms and potentially be accessible universally at low cost anywhere in the world delivering vital diagnostic care. Health care is ripe for AI because it has rich data sets (big data) which are ideal for AI since computers require large data sets to learn. AI is thus fast becoming a major element in the health care landscape. AI algorithms will in the coming future play an important role in predicting cancer outcome and assisting in therapeutic decisions for cancer patients. In fields like radiology which has already embraced digital operations, an AI revolution is already in progress. Deep neural networks will be able to provide a synergistic combination of disciplines such as radiology, nuclear medicine and surgical pathology which will hopefully allow the achievement of a medical paradigm which recognizes that every human being is unique. Although pathology especially surgical pathology was late to adopt AI, mainly due to practical and financial obstacles, and will require resources for additional workflows, personnel, equipment, storage of data, the time is now ripe, with rapid development of new and better AI technology at lower cost (reduced costs of digital data and availability of digital images) for AI to succeed in surgical pathology. Various studies cited above demonstrate the increasingly effective role of AI in surgical pathology. By increasing speed and accuracy of diagnosis and by improving prognostication, use of AI is translating into better patient care. AI will, in the near future, not replace pathologists but by performing routine repetitive tasks quickly and accurately, allow pathologists to give time to more complex cognitive tasks and enable them to play a much greater and effective role in cancer prognosis and therapeutics. Thus pathologists need to embrace AI and derive benefit from it by training themselves. This will require considerable time as AI methods will need to be integrated into pathology training programs and pathologists will need to be comfortable using digital images and data with computer algorithms in their daily practice. With regulatory control also being established by government agencies in many countries such as US and UK, the reliability of AI and the trust of the general public will increase substantially and help in better care of patients which is the prime purpose of all efforts to improve medical technology. In this context, the synergistic collaboration between fields such as oncology, radio diagnostics and surgical pathology will play a major role. Financial barriers will need to be overcome especially for poor developing countries so that they can also benefit from improvements in AI application in medicine and pathology.

Acknowledgements

Not applicable.

Abbreviations

Artificial Intelligence

Machine Learning

Convoluted neural networks

Internet of Things

Squamous Cell Carcinoma,

Intensive care unit

Operating room

Computer aided diagnosis

Magnetic Resonance Imaging

Computed Tomography

Watson for oncology

Whole slide imaging

Food and Drug Administration

Cellular Molecular Pathology

British In Vitro Diagnostic Association

Hematoxylin and Eosin

Low and middle income country

High income country

The Cancer Genome Atlas

Stanford Tissue Array

Non-small cell lung cancer

Next generation sequencing

Tissue microarray

Lymph node assistant

Stimulating Raman Scattering.

Authors’ contributions

ZA, SR and MZ did the literature review and drafted the manuscript; JAG participated with the corresponding, reviewing, editing the drafted manuscript as per journal policy, and submission of the article. All authors participated in the design of the study. All authors read and approved the final manuscript.

No financial support was provided for this study.

Availability of data and materials

Data and materials of this work are available from the corresponding author on reasonable request.

Declarations

Ethics approval and consent to participate.

This is a review article and no procedures were performed.

Consent for publication

Competing interests.

It is declared that all authors have no conflict of interest.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Zubair Ahmad, Email: [email protected].

Shabina Rahim, Email: [email protected].

Maha Zubair, Email: [email protected].

Jamshid Abdul-Ghafar, Email: [email protected].

  • 1. Farnell DA, Huntsman D, Bashashati A. The coming 15 years in gynaecological pathology: digitisation, artificial intelligence, and new technologies. Histopathology. 2020;76(1):171–177. doi: 10.1111/his.13991. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 2. Goldenberg SL, Nir G, Salcudean SE. A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol. 2019;16(7):391–403. doi: 10.1038/s41585-019-0193-3. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 3. Bonanno E, Toschi N, Bombonati A, Muto P, Schillaci O. Imaging diagnostic and pathology in the Management of Oncological-Patients. Contrast Media Mol Imaging. 2019;2019:2513680. doi: 10.1155/2019/2513680. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 4. Cohen S, Furie MB. Artificial intelligence and pathobiology join forces in the American journal of pathology. Am J Pathol. 2019;189(1):4–5. doi: 10.1016/j.ajpath.2018.11.002. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 5. Wong STC. Is pathology prepared for the adoption of artificial intelligence? Cancer Cytopathol. 2018;126(6):373–375. doi: 10.1002/cncy.21994. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 6. Golden JA. Deep learning algorithms for detection of lymph node metastases from breast Cancer: helping artificial intelligence be seen. JAMA. 2017;318(22):2184–2186. doi: 10.1001/jama.2017.14580. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 7. Ash JS, Berg M, Coiera E. Some unintended consequences of information technology in health care: the nature of patient care information system-related errors. J Am Med Inform Assoc. 2004;11(2):104–112. doi: 10.1197/jamia.M1471. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 8. U.S food and Drug Administration. Digital Health Software Precertification (Pre-cert) Program. 2018.  https://www.fda.gov/medical-devices/digital-health-center-excellence/digital-health-software-precertification-pre-cert-program . Accessed 2 Aug 2018.
  • 9. Yu KH, Kohane IS. Framing the challenges of artificial intelligence in medicine. BMJ Qual Saf. 2019;28(3):238–241. doi: 10.1136/bmjqs-2018-008551. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 10. Nakagawa K, Ishihara R, Aoyama K, Ohmori M, Nakahira H, et al. Classification for invasion depth of esophageal squamous cell carcinoma using a deep neural network compared with experienced endoscopists. Gastrointest Endosc. 2019;90(3):407–414. doi: 10.1016/j.gie.2019.04.245. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 11. Horie Y, Yoshio T, Aoyama K, Yoshimizu S, Horiuchi Y, et al. Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest Endosc. 2019;89(1):25–32. doi: 10.1016/j.gie.2018.07.037. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 12. Hirasawa T, Aoyama K, Tanimoto T, Ishihara S, Shichijo S, et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer. 2018;21(4):653–660. doi: 10.1007/s10120-018-0793-2. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 13. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. doi: 10.1038/nature21056. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 14. Pavlova O, Gilliet M, Hohl D. Digital pathology for the dermatologist. Rev Med Suisse. 2020;16(688):618–621. [ PubMed ] [ Google Scholar ]
  • 15. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–2410. doi: 10.1001/jama.2016.17216. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 16. Dong Y, Xu L, Fan Y, Xiang P, Gao X, et al. A novel surgical predictive model for Chinese Crohn's disease patients. Medicine (Baltimore) 2019;98(46):e17510. doi: 10.1097/MD.0000000000017510. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 17. Wall J, Krummel T. The digital surgeon: How big data, automation, and artificial intelligence will change surgical practice. J Pediatr Surg. 2020;55S:47–50. doi: 10.1016/j.jpedsurg.2019.09.008. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 18. Lee JH, Ha EJ, Kim JH. Application of deep learning to the diagnosis of cervical lymph node metastasis from thyroid cancer with CT. Eur Radiol. 2019;29(10):5452–5457. doi: 10.1007/s00330-019-06098-8. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 19. Liu X, Zhou H, Hu Z, Jin Q, Wang J, Ye B. Clinical application of artificial intelligence recognition technology 
in the diagnosis of stage T1 lung Cancer. Zhongguo Fei Ai Za Zhi. 2019;22(5):319–323. doi: 10.3779/j.issn.1009-3419.2019.05.09. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 20. Kudo Y. Predicting cancer outcome: artificial intelligence vs. pathologists. Oral Dis. 2019;25(3):643–645. doi: 10.1111/odi.12954. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 21. Nakamoto T, Takahashi W, Haga A, Takahashi S, Kiryu S, Nawa K, Ohta T, Ozaki S, Nozawa Y, Tanaka S, Mukasa A, Nakagawa K. Prediction of malignant glioma grades using contrast-enhanced T1-weighted and T2-weighted magnetic resonance images based on a radiomic analysis. Sci Rep. 2019;9(1):19411. doi: 10.1038/s41598-019-55922-0. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 22. Mobadersany P, Yousefi S, Amgad M, Gutman DA, Barnholtz-Sloan JS, et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci U S A. 2018;115(13):E2970–E2979. doi: 10.1073/pnas.1717139115. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 23. Ahammed Muneer KV, Rajendran VR, K PJ Glioma Tumor Grade Identification Using Artificial Intelligent Techniques. J Med Syst. 2019;43(5):113. doi: 10.1007/s10916-019-1228-2. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 24. Koelzer VH, Sirinukunwattana K, Rittscher J, Mertz KD. Precision immunoprofiling by image analysis and artificial intelligence. Virchows Arch. 2019;474(4):511–522. doi: 10.1007/s00428-018-2485-z. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 25. Printz C. Artificial intelligence platform for oncology could assist in treatment decisions. Cancer. 2017;123(6):905. doi: 10.1002/cncr.30655. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 26. Somashekhar SP, Sepúlveda MJ, Puglielli S, Norden AD, et al. Watson for oncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board. Ann Oncol. 2018;29(2):418–423. doi: 10.1093/annonc/mdx781. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 27. Liu C, Liu X, Wu F, Xie M, Feng Y, et al. Using artificial intelligence (Watson for oncology) for treatment recommendations amongst Chinese patients with lung Cancer: feasibility study. J Med Internet Res. 2018;20(9):e11087. doi: 10.2196/11087. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 28. Kayser K, Gortler J, Bogovac M, Bogovac A, Goldmann T, et al. AI (artificial intelligence) in histopathology--from image analysis to automated diagnosis. Folia Histochem Cytobiol. 2009;47(3):355–361. doi: 10.2478/v10042-009-0087-y. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 29. Farahani N, Pantanowitz L. Overview of Telepathology. Surg Pathol Clin. 2015;8(2):223–231. doi: 10.1016/j.path.2015.02.018. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 30. Dietz RL, Hartman DJ, Zheng L, Wiley C, Pantanowitz L. Review of the use of telepathology for intraoperative consultation. Expert Rev Med Devices. 2018;15(12):883–890. doi: 10.1080/17434440.2018.1549987. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 31. Susaki EA. Recent developments in a automated diagnosis of pathological images and three-dimensional histopathology. Brain Nerve. 2019;71(7):723–732. doi: 10.11477/mf.1416201344. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 32. Zemouri R, Devalland C, Valmary-Degano S, Zerhouni N. Neural network: a future in pathology? Ann Pathol. 2019;39(2):119–129. doi: 10.1016/j.annpat.2019.01.004. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 33. Napolitano G, Marshall A, Hamilton P, Gavin AT. Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction. Artif Intell Med. 2016;70:77–83. doi: 10.1016/j.artmed.2016.06.001. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 34. Colling R, Pitman H, Oien K, Rajpoot N, Macklin P, et al. Artificial intelligence in digital pathology: a roadmap to routine use in clinical practice. J Pathol. 2019;249(2):143–150. doi: 10.1002/path.5310. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 35. Jones AD, Graff JP, Darrow M, Borowsky A, Olson KA, et al. Impact of pre-analytical variables on deep learning accuracy in histopathology. Histopathology. 2019;75(1):39–53. doi: 10.1111/his.13844. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 36. Saravanan C, Schumacher V, Brown D, Dunstan R, Galarneau JR, et al. Meeting report: tissue-based image analysis. Toxicol Pathol. 2017;45(7):983–1003. doi: 10.1177/0192623317737468. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 37. Nabi J. Artificial intelligence can augment global pathology initiatives. Lancet. 2018;392(10162):2351–2352. doi: 10.1016/S0140-6736(18)32209-8. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 38. Granter SR, Beck AH, Papke DJ., Jr Straw men, deep learning, and the future of the human Microscopist: response to "artificial intelligence and the pathologist: future Frenemies?". Arch Pathol Lab Med. 2017;141(5):624. doi: 10.5858/arpa.2017-0023-ED. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 39. Yu KH, Zhang C, Berry GJ, Altman RB, Re C, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474. doi: 10.1038/ncomms12474. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 40. Kumar N, Tafe LJ, Higgins JH, Peterson JD, de Abreu FB, et al. Identifying associations between somatic mutations and clinicopathologic findings in lung cancer pathology reports. Methods Inf Med. 2018;57(1):63–73. doi: 10.3414/ME17-01-0039. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 41. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318(22):2199–2210. doi: 10.1001/jama.2017.14585. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 42. Granter SR, Beck AH, Papke DJ., Jr AlphaGo, deep learning, and the future of the human Microscopist. Arch Pathol Lab Med. 2017;141(5):619–621. doi: 10.5858/arpa.2016-0471-ED. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 43. Sharma G, Carter A. Artificial intelligence and the pathologist: future Frenemies? Arch Pathol Lab Med. 2017;141(5):622–623. doi: 10.5858/arpa.2016-0593-ED. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 44. Wild PJ, Catto JW, Abbod MF, Linkens DA, Herr A, et al. Artificial intelligence and bladder cancer arrays. Verh Dtsch Ges Pathol. 2007;91:308–319. [ PubMed ] [ Google Scholar ]
  • 45. Heeke S, Delingette H, Fanjat Y, Long-Mira E, Lassalle S, et al. The age of artificial intelligence in lung cancer pathology: between hope, gloom and perspectives. Ann Pathol. 2019;39(2):130–6. doi: 10.1016/j.annpat.2019.01.003. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 46. Robertson S, Azizpour H, Smith K, Hartman J. Digital image analysis in breast pathology-from image processing techniques to artificial intelligence. Transl Res. 2018;194:19–35. doi: 10.1016/j.trsl.2017.10.010. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 47. Steiner DF, MacDonald R, Liu Y, Truszkowski P, Hipp JD, et al. Impact of deep learning assistance on the Histopathologic review of lymph nodes for metastatic breast Cancer. Am J Surg Pathol. 2018;42(12):1636–1646. doi: 10.1097/PAS.0000000000001151. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 48. Yala A, Barzilay R, Salama L, Griffin M, Sollender G, et al. Using machine learning to parse breast pathology reports. Breast Cancer Res Treat. 2017;161(2):203–211. doi: 10.1007/s10549-016-4035-1. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 49. Bychkov D, Linder N, Turkki R, Nordling S, Kovanen PE, et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci Rep. 2018;8(1):3395. doi: 10.1038/s41598-018-21758-3. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 50. Yoshida H, Shimazu T, Kiyuna T, Marugame A, Yamashita Y, et al. Automated histological classification of whole-slide images of gastric biopsy specimens. Gastric Cancer. 2018;21(2):249–257. doi: 10.1007/s10120-017-0731-8. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 51. Qu J, Hiruta N, Terai K, Nosato H, Murakawa M, et al. Gastric pathology image classification using stepwise fine-tuning for deep neural networks. J Healthc Eng. 2018;2018:8961781. doi: 10.1155/2018/8961781. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 52. Liu Y, Kohlberger T, Norouzi M, Dahl GE, Smith JL, et al. Artificial intelligence-based breast Cancer nodal metastasis detection: insights into the black box for pathologists. Arch Pathol Lab Med. 2019;143(7):859–868. doi: 10.5858/arpa.2018-0147-OA. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 53. Litjens G, Bandi P, Bejnordi BE, Geessink O, Balkenhol M, et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. Gigascience. 2018;7(6). 10.1093/gigascience/giy065. [ DOI ] [ PMC free article ] [ PubMed ]
  • 54. Zhang L, Wu Y, Zheng B, Su L, Chen Y, et al. Rapid histology of laryngeal squamous cell carcinoma with deep-learning based stimulated Raman scattering microscopy. Theranostics. 2019;9(9):2541–2554. doi: 10.7150/thno.32655. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 55. Yadav KK, How AI. Is optimizing the detection and Management of Prostate Cancer. IEEE Pulse. 2018;9(5):19. doi: 10.1109/MPUL.2018.2866354. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 56. Kricka LJ. History of disruptions in laboratory medicine: what have we learned from predictions? Clin Chem Lab Med. 2019;57(3):308–311. doi: 10.1515/cclm-2018-0518. [ DOI ] [ PubMed ] [ Google Scholar ]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

  • View on publisher site
  • PDF (421.8 KB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

  • Research article
  • Open access
  • Published: 10 April 2021

The role of artificial intelligence in healthcare: a structured literature review

  • Silvana Secinaro 1 ,
  • Davide Calandra 1 ,
  • Aurelio Secinaro 2 ,
  • Vivek Muthurangu 3 &
  • Paolo Biancone 1  

BMC Medical Informatics and Decision Making volume  21 , Article number:  125 ( 2021 ) Cite this article

183k Accesses

358 Citations

28 Altmetric

Metrics details

Background/Introduction

Artificial intelligence (AI) in the healthcare sector is receiving attention from researchers and health professionals. Few previous studies have investigated this topic from a multi-disciplinary perspective, including accounting, business and management, decision sciences and health professions.

The structured literature review with its reliable and replicable research protocol allowed the researchers to extract 288 peer-reviewed papers from Scopus. The authors used qualitative and quantitative variables to analyse authors, journals, keywords, and collaboration networks among researchers. Additionally, the paper benefited from the Bibliometrix R software package.

The investigation showed that the literature in this field is emerging. It focuses on health services management, predictive medicine, patient data and diagnostics, and clinical decision-making. The United States, China, and the United Kingdom contributed the highest number of studies. Keyword analysis revealed that AI can support physicians in making a diagnosis, predicting the spread of diseases and customising treatment paths.

Conclusions

The literature reveals several AI applications for health services and a stream of research that has not fully been covered. For instance, AI projects require skills and data quality awareness for data-intensive analysis and knowledge-based management. Insights can help researchers and health professionals understand and address future research on AI in the healthcare field.

Peer Review reports

Artificial intelligence (AI) generally applies to computational technologies that emulate mechanisms assisted by human intelligence, such as thought, deep learning, adaptation, engagement, and sensory understanding [ 1 , 2 ]. Some devices can execute a role that typically involves human interpretation and decision-making [ 3 , 4 ]. These techniques have an interdisciplinary approach and can be applied to different fields, such as medicine and health. AI has been involved in medicine since as early as the 1950s, when physicians made the first attempts to improve their diagnoses using computer-aided programs [ 5 , 6 ]. Interest and advances in medical AI applications have surged in recent years due to the substantially enhanced computing power of modern computers and the vast amount of digital data available for collection and utilisation [ 7 ]. AI is gradually changing medical practice. There are several AI applications in medicine that can be used in a variety of medical fields, such as clinical, diagnostic, rehabilitative, surgical, and predictive practices. Another critical area of medicine where AI is making an impact is clinical decision-making and disease diagnosis. AI technologies can ingest, analyse, and report large volumes of data across different modalities to detect disease and guide clinical decisions [ 3 , 8 ]. AI applications can deal with the vast amount of data produced in medicine and find new information that would otherwise remain hidden in the mass of medical big data [ 9 , 10 , 11 ]. These technologies can also identify new drugs for health services management and patient care treatments [ 5 , 6 ].

Courage in the application of AI is visible through a search in the primary research databases. However, as Meskò et al. [ 7 ] find, the technology will potentially reduce care costs and repetitive operations by focusing the medical profession on critical thinking and clinical creativity. As Cho et al. and Doyle et al. [ 8 , 9 ] add, the AI perspective is exciting; however, new studies will be needed to establish the efficacy and applications of AI in the medical field [ 10 ].

Our paper will also concentrate on AI strategies for healthcare from the accounting, business, and management perspectives. The authors used the structured literature review (SLR) method for its reliable and replicable research protocol [ 11 ] and selected bibliometric variables as sources of investigation. Bibliometric usage enables the recognition of the main quantitative variables of the study stream [ 12 ]. This method facilitates the detection of the required details of a particular research subject, including field authors, number of publications, keywords for interaction between variables (policies, properties and governance) and country data [ 13 ]. It also allows the application of the science mapping technique [ 14 ]. Our paper adopted the Bibliometrix R package and the biblioshiny web interface as tools of analysis [ 14 ].

The investigation offers the following insights for future researchers and practitioners:

bibliometric information on 288 peer-reviewed English papers from the Scopus collection.

Identification of leading journals in this field, such as Journal of Medical Systems, Studies in Health Technology and Informatics, IEEE Journal of Biomedical and Health Informatics, and Decision Support Systems.

Qualitative and quantitative information on authors’ Lotka’s law, h-index, g-index, m-index, keyword, and citation data.

Research on specific countries to assess AI in the delivery and effectiveness of healthcare, quotes, and networks within each region.

A topic dendrogram study that identifies five research clusters: health services management, predictive medicine, patient data, diagnostics, and finally, clinical decision-making.

An in-depth discussion that develops theoretical and practical implications for future studies.

The paper is organised as follows. Section  2 lists the main bibliometric articles in this field. Section  3 elaborates on the methodology. Section  4 presents the findings of the bibliometric analysis. Section  5 discusses the main elements of AI in healthcare based on the study results. Section  6 concludes the article with future implications for research.

Related works and originality

As suggested by Zupic and Čater [ 15 ], a research stream can be evaluated with bibliometric methods that can introduce objectivity and mitigate researcher bias. For this reason, bibliometric methods are attracting increasing interest among researchers as a reliable and impersonal research analytical approach [ 16 , 17 ]. Recently, bibliometrics has been an essential method for analysing and predicting research trends [ 18 ]. Table  1 lists other research that has used a similar approach in the research stream investigated.

The scientific articles reported show substantial differences in keywords and research topics that have been previously studied. The bibliometric analysis of Huang et al. [ 19 ] describes rehabilitative medicine using virtual reality technology. According to the authors, the primary goal of rehabilitation is to enhance and restore functional ability and quality of life for patients with physical impairments or disabilities. In recent years, many healthcare disciplines have been privileged to access various technologies that provide tools for both research and clinical intervention.

Hao et al. [ 20 ] focus on text mining in medical research. As reported, text mining reveals new, previously unknown information by using a computer to automatically extract information from different text resources. Text mining methods can be regarded as an extension of data mining to text data. Text mining is playing an increasingly significant role in processing medical information. Similarly, the studies by dos Santos et al. [ 21 ] focus on applying data mining and machine learning (ML) techniques to public health problems. As stated in this research, public health may be defined as the art and science of preventing diseases, promoting health, and prolonging life. Using data mining and ML techniques, it is possible to discover new information that otherwise would be hidden. These two studies are related to another topic: medical big data. According to Liao et al. [ 22 ], big data is a typical “buzzword” in the business and research community, referring to a great mass of digital data collected from various sources. In the medical field, we can obtain a vast amount of data (i.e., medical big data). Data mining and ML techniques can help deal with this information and provide helpful insights for physicians and patients. More recently, Choudhury et al. [ 23 ] provide a systematic review on the use of ML to improve the care of elderly patients, demonstrating eligible studies primarily in psychological disorders and eye diseases.

Tran et al. [ 2 ] focus on the global evolution of AI research in medicine. Their bibliometric analysis highlights trends and topics related to AI applications and techniques. As stated in Connelly et al.’s [ 24 ] study, robot-assisted surgeries have rapidly increased in recent years. Their bibliometric analysis demonstrates how robotic-assisted surgery has gained acceptance in different medical fields, such as urological, colorectal, cardiothoracic, orthopaedic, maxillofacial and neurosurgery applications. Additionally, the bibliometric analysis of Guo et al. [ 25 ] provides an in-depth study of AI publications through December 2019. The paper focuses on tangible AI health applications, giving researchers an idea of how algorithms can help doctors and nurses. A new stream of research related to AI is also emerging. In this sense, Choudhury and Asan’s [ 26 ] scientific contribution provides a systematic review of the AI literature to identify health risks for patients. They report on 53 studies involving technology for clinical alerts, clinical reports, and drug safety. Considering the considerable interest within this research stream, this analysis differs from the current literature for several reasons. It aims to provide in-depth discussion, considering mainly the business, management, and accounting fields and not dealing only with medical and health profession publications.

Additionally, our analysis aims to provide a bibliometric analysis of variables such as authors, countries, citations and keywords to guide future research perspectives for researchers and practitioners, as similar analyses have done for several publications in other research streams [ 15 , 16 , 27 ]. In doing so, we use a different database, Scopus, that is typically adopted in social sciences fields. Finally, our analysis will propose and discuss a dominant framework of variables in this field, and our analysis will not be limited to AI application descriptions.

Methodology

This paper evaluated AI in healthcare research streams using the SLR method [ 11 ]. As suggested by Massaro et al. [ 11 ], an SLR enables the study of the scientific corpus of a research field, including the scientific rigour, reliability and replicability of operations carried out by researchers. As suggested by many scholars, the methodology allows qualitative and quantitative variables to highlight the best authors, journals and keywords and combine a systematic literature review and bibliometric analysis [ 27 , 28 , 29 , 30 ]. Despite its widespread use in business and management [ 16 , 31 ], the SLR is also used in the health sector based on the same philosophy through which it was originally conceived [ 32 , 33 ]. A methodological analysis of previously published articles reveals that the most frequently used steps are as follows [ 28 , 31 , 34 ]:

defining research questions;

writing the research protocol;

defining the research sample to be analysed;

developing codes for analysis; and

critically analysing, discussing, and identifying a future research agenda.

Considering the above premises, the authors believe that an SLR is the best method because it combines scientific validity, replicability of the research protocol and connection between multiple inputs.

As stated by the methodological paper, the first step is research question identification. For this purpose, we benefit from the analysis of Zupic and Čater [ 15 ], who provide several research questions for future researchers to link the study of authors, journals, keywords and citations. Therefore, RQ1 is “What are the most prominent authors, journal keywords and citations in the field of the research study?” Additionally, as suggested by Haleem et al. [ 35 ], new technologies, including AI, are changing the medical field in unexpected timeframes, requiring studies in multiple areas. Therefore, RQ2 is “How does artificial intelligence relate to healthcare, and what is the focus of the literature?” Then, as discussed by Massaro et al. [ 36 ], RQ3 is “What are the research applications of artificial intelligence for healthcare?”.

The first research question aims to define the qualitative and quantitative variables of the knowledge flow under investigation. The second research question seeks to determine the state of the art and applications of AI in healthcare. Finally, the third research question aims to help researchers identify practical and theoretical implications and future research ideas in this field.

The second fundamental step of the SLR is writing the research protocol [ 11 ]. Table  2 indicates the currently known literature elements, uniquely identifying the research focus, motivations and research strategy adopted and the results providing a link with the following points. Additionally, to strengthen the analysis, our investigation benefits from the PRISMA statement methodological article [ 37 ]. Although the SLR is a validated method for systematic reviews and meta-analyses, we believe that the workflow provided may benefit the replicability of the results [ 37 , 38 , 39 , 40 ]. Figure  1 summarises the researchers’ research steps, indicating that there are no results that can be referred to as a meta-analysis.

figure 1

Source : Authors’ elaboration on Liberati et al. [ 37 ]

PRISMA workflow.

The third step is to specify the search strategy and search database. Our analysis is based on the search string “Artificial Intelligence” OR “AI” AND “Healthcare” with a focus on “Business, Management, and Accounting”, “Decision Sciences”, and “Health professions”. As suggested by [ 11 , 41 ] and motivated by [ 42 ], keywords can be selected through a top-down approach by identifying a large search field and then focusing on particular sub-topics. The paper uses data retrieved from the Scopus database, a multi-disciplinary database, which allowed the researchers to identify critical articles for scientific analysis [ 43 ]. Additionally, Scopus was selected based on Guo et al.’s [ 25 ] limitations, which suggest that “future studies will apply other databases, such as Scopus, to explore more potential papers” . The research focuses on articles and reviews published in peer-reviewed journals for their scientific relevance [ 11 , 16 , 17 , 29 ] and does not include the grey literature, conference proceedings or books/book chapters. Articles written in any language other than English were excluded [ 2 ]. For transparency and replicability, the analysis was conducted on 11 January 2021. Using this research strategy, the authors retrieved 288 articles. To strengthen the study's reliability, we publicly provide the full bibliometric extract on the Zenodo repository [ 44 , 45 ].

The fourth research phase is defining the code framework that initiates the analysis of the variables. The study will identify the following:

descriptive information of the research area;

source analysis [ 16 ];

author and citation analysis [ 28 ];

keywords and network analysis [ 14 ]; and

geographic distribution of the papers [ 14 ].

The final research phase is the article’s discussion and conclusion, where implications and future research trends will be identified.

At the research team level, the information is analysed with the statistical software R-Studio and the Bibliometrix package [ 15 ], which allows scientific analysis of the results obtained through the multi-disciplinary database.

The analysis of bibliometric results starts with a description of the main bibliometric statistics with the aim of answering RQ1, What are the most prominent authors, journal keywords and citations in the field of the research study?, and RQ2, How does artificial intelligence relate to healthcare, and what is the focus of the literature? Therefore, the following elements were thoroughly analysed: (1) type of document; (2) annual scientific production; (3) scientific sources; (4) source growth; (5) number of articles per author; (6) author’s dominance ranking; (7) author’s h-index, g-index, and m-index; (8) author’s productivity; (9) author’s keywords; (10) topic dendrogram; (11) a factorial map of the document with the highest contributions; (12) article citations; (13) country production; (14) country citations; (15) country collaboration map; and (16) country collaboration network.

Main information

Table  3 shows the information on 288 peer-reviewed articles published between 1992 and January 2021 extracted from the Scopus database. The number of keywords is 946 from 136 sources, and the number of keywords plus, referring to the number of keywords that frequently appear in an article’s title, was 2329. The analysis period covered 28 years and 1 month of scientific production and included an annual growth rate of 5.12%. However, the most significant increase in published articles occurred in the past three years (please see Fig.  2 ). On average, each article was written by three authors (3.56). Finally, the collaboration index (CI), which was calculated as the total number of authors of multi-authored articles/total number of multi-authored articles, was 3.97 [ 46 ].

figure 2

Source : Authors’ elaboration

Annual scientific production.

Table  4 shows the top 20 sources related to the topic. The Journal of Medical Systems is the most relevant source, with twenty-one of the published articles. This journal's main issues are the foundations, functionality, interfaces, implementation, impacts, and evaluation of medical technologies. Another relevant source is Studies in Health Technology and Informatics, with eleven articles. This journal aims to extend scientific knowledge related to biomedical technologies and medical informatics research. Both journals deal with cloud computing, machine learning, and AI as a disruptive healthcare paradigm based on recent publications. The IEEE Journal of Biomedical and Health Informatics investigates technologies in health care, life sciences, and biomedicine applications from a broad perspective. The next journal, Decision Support Systems, aims to analyse how these technologies support decision-making from a multi-disciplinary view, considering business and management. Therefore, the analysis of the journals revealed that we are dealing with an interdisciplinary research field. This conclusion is confirmed, for example, by the presence of purely medical journals, journals dedicated to the technological growth of healthcare, and journals with a long-term perspective such as futures.

The distribution frequency of the articles (Fig.  3 ) indicates the journals dealing with the topic and related issues. Between 2008 and 2012, a significant growth in the number of publications on the subject is noticeable. However, the graph shows the results of the Loess regression, which includes the quantity and publication time of the journal under analysis as variables. This method allows the function to assume an unlimited distribution; that is, feature can consider values below zero if the data are close to zero. It contributes to a better visual result and highlights the discontinuity in the publication periods [ 47 ].

figure 3

Source growth. Source : Authors’ elaboration

Finally, Fig.  4 provides an analytical perspective on factor analysis for the most cited papers. As indicated in the literature [ 48 , 49 ], using factor analysis to discover the most cited papers allows for a better understanding of the scientific world’s intellectual structure. For example, our research makes it possible to consider certain publications that effectively analyse subject specialisation. For instance, Santosh’s [ 50 ] article addresses the new paradigm of AI with ML algorithms for data analysis and decision support in the COVID-19 period, setting a benchmark in terms of citations by researchers. Moving on to the application, an article by Shickel et al. [ 51 ] begins with the belief that the healthcare world currently has much health and administrative data. In this context, AI and deep learning will support medical and administrative staff in extracting data, predicting outcomes, and learning medical representations. Finally, in the same line of research, Baig et al. [ 52 ], with a focus on wearable patient monitoring systems (WPMs), conclude that AI and deep learning may be landmarks for continuous patient monitoring and support for healthcare delivery.

figure 4

Factorial map of the most cited documents.

This section identifies the most cited authors of articles on AI in healthcare. It also identifies the authors’ keywords, dominance factor (DF) ranking, h-index, productivity, and total number of citations. Table  5 identifies the authors and their publications in the top 20 rankings. As the table shows, Bushko R.G. has the highest number of publications: four papers. He is the editor-in-chief of Future of Health Technology, a scientific journal that aims to develop a clear vision of the future of health technology. Then, several authors each wrote three papers. For instance, Liu C. is a researcher active in the topic of ML and computer vision, and Sharma A. from Emory University Atlanta in the USA is a researcher with a clear focus on imaging and translational informatics. Some other authors have two publications each. While some authors have published as primary authors, most have published as co-authors. Hence, in the next section, we measure the contributory power of each author by investigating the DF ranking through the number of elements.

Authors’ dominance ranking

The dominance factor (DF) is a ratio measuring the fraction of multi-authored articles in which an author acts as the first author [ 53 ]. Several bibliometric studies use the DF in their analyses [ 46 , 54 ]. The DF ranking calculates an author’s dominance in producing articles. The DF is calculated by dividing the number of an author’s multi-authored papers as the first author (Nmf) by the author's total number of multi-authored papers (Nmt). This is omitted in the single-author case due to the constant value of 1 for single-authored articles. This formulation could lead to some distortions in the results, especially in fields where the first author is entered by surname alphabetical order [ 55 ].

The mathematical equation for the DF is shown as:

Table  6 lists the top 20 DF rankings. The data in the table show a low level of articles per author, either for first-authored or multi-authored articles. The results demonstrate that we are dealing with an emerging topic in the literature. Additionally, as shown in the table, Fox J. and Longoni C. are the most dominant authors in the field.

Authors’ impact

Table  7 shows the impact of authors in terms of the h-index [ 56 ] (i.e., the productivity and impact of citations of a researcher), g-index [ 57 ] (i.e., the distribution of citations received by a researcher's publications), m-index [ 58 ] (i.e., the h-index value per year), total citations, total paper and years of scientific publication. The H-index was introduced in the literature as a metric for the objective comparison of scientific results and depended on the number of publications and their impact [ 59 ]. The results show that the 20 most relevant authors have an h-index between 2 and 1. For the practical interpretation of the data, the authors considered data published by the London School of Economics [ 60 ]. In the social sciences, the analysis shows values of 7.6 for economic publications by professors and researchers who had been active for several years. Therefore, the youthfulness of the research area has attracted young researchers and professors. At the same time, new indicators have emerged over the years to diversify the logic of the h-index. For example, the g-index indicates an author's impact on citations, considering that a single article can generate these. The m-index, on the other hand, shows the cumulative value over the years.

The analysis, also considering the total number of citations, the number of papers published and the year of starting to publish, thus confirms that we are facing an expanding research flow.

Authors’ productivity

Figure  5 shows Lotka’s law. This mathematical formulation originated in 1926 to describe the publication frequency by authors in a specific research field [ 61 ]. In practice, the law states that the number of authors contributing to research in a given period is a fraction of the number who make up a single contribution [ 14 , 61 ].

figure 5

Lotka’s law.

The mathematical relationship is expressed in reverse in the following way:

where y x is equal to the number of authors producing x articles in each research field. Therefore, C and n are constants that can be estimated in the calculation.

The figure's results are in line with Lotka's results, with an average of two publications per author in a given research field. In addition, the figure shows the percentage of authors. Our results lead us to state that we are dealing with a young and growing research field, even with this analysis. Approximately 70% of the authors had published only their first research article. Only approximately 20% had published two scientific papers.

Authors’ keywords

This section provides information on the relationship between the keywords artificial intelligence and healthcare . This analysis is essential to determine the research trend, identify gaps in the discussion on AI in healthcare, and identify the fields that can be interesting as research areas [ 42 , 62 ].

Table  8 highlights the total number of keywords per author in the top 20 positions. The ranking is based on the following elements: healthcare, artificial intelligence, and clinical decision support system . Keyword analysis confirms the scientific area of reference. In particular, we deduce the definition as “Artificial intelligence is the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages” [ 2 , 63 ]. Panch et al. [ 4 ] find that these technologies can be used in different business and management areas. After the first keyword, the analysis reveals AI applications and related research such as machine learning and deep learning.

Additionally, data mining and big data are a step forward in implementing exciting AI applications. According to our specific interest, if we applied AI in healthcare, we would achieve technological applications to help and support doctors and medical researchers in decision-making. The link between AI and decision-making is the reason why we find, in the seventh position, the keyword clinical decision support system . AI techniques can unlock clinically relevant information hidden in the massive amount of data that can assist clinical decision-making [ 64 ]. If we analyse the following keywords, we find other elements related to decision-making and support systems.

The TreeMap below (Fig.  6 ) highlights the combination of possible keywords representing AI and healthcare.

figure 6

Keywords treemap.

The topic dendrogram in Fig.  7 represents the hierarchical order and the relationship between the keywords generated by hierarchical clustering [ 42 ]. The cut in the figure and the vertical lines facilitate an investigation and interpretation of the different clusters. As stated by Andrews [ 48 ], the figure is not intended to find the perfect level of associations between clusters. However, it aims to estimate the approximate number of clusters to facilitate further discussion.

figure 7

Topic dendrogram.

The research stream of AI in healthcare is divided into two main strands. The blue strand focuses on medical information systems and the internet. Some papers are related to healthcare organisations, such as the Internet of Things, meaning that healthcare organisations use AI to support health services management and data analysis. AI applications are also used to improve diagnostic and therapeutic accuracy and the overall clinical treatment process [ 2 ]. If we consider the second block, the red one, three different clusters highlight separate aspects of the topic. The first could be explained as AI and ML predictive algorithms. Through AI applications, it is possible to obtain a predictive approach that can ensure that patients are better monitored. This also allows a better understanding of risk perception for doctors and medical researchers. In the second cluster, the most frequent words are decisions , information system , and support system . This means that AI applications can support doctors and medical researchers in decision-making. Information coming from AI technologies can be used to consider difficult problems and support a more straightforward and rapid decision-making process. In the third cluster, it is vital to highlight that the ML model can deal with vast amounts of data. From those inputs, it can return outcomes that can optimise the work of healthcare organisations and scheduling of medical activities.

Furthermore, the word cloud in Fig.  8 highlights aspects of AI in healthcare, such as decision support systems, decision-making, health services management, learning systems, ML techniques and diseases. The figure depicts how AI is linked to healthcare and how it is used in medicine.

figure 8

Word cloud.

Figure  9 represents the search trends based on the keywords analysed. The research started in 2012. First, it identified research topics related to clinical decision support systems. This topic was recurrent during the following years. Interestingly, in 2018, studies investigated AI and natural language processes as possible tools to manage patients and administrative elements. Finally, a new research stream considers AI's role in fighting COVID-19 [ 65 , 66 ].

figure 9

Keywords frequency.

Table  9 represents the number of citations from other articles within the top 20 rankings. The analysis allows the benchmark studies in the field to be identified [ 48 ]. For instance, Burke et al. [ 67 ] writes the most cited paper and analyses efficient nurse rostering methodologies. The paper critically evaluates tangible interdisciplinary solutions that also include AI. Immediately thereafter, Ahmed M.A.'s article proposes a data-driven optimisation methodology to determine the optimal number of healthcare staff to optimise patients' productivity [ 68 ]. Finally, the third most cited article lays the groundwork for developing deep learning by considering diverse health and administrative information [ 51 ].

This section analyses the diffusion of AI in healthcare around the world. It highlights countries to show the geographies of this research. It includes all published articles, the total number of citations, and the collaboration network. The following sub-sections start with an analysis of the total number of published articles.

Country total articles

Figure  9 and Table  10 display the countries where AI in healthcare has been considered. The USA tops the list of countries with the maximum number of articles on the topic (215). It is followed by China (83), the UK (54), India (51), Australia (54), and Canada (32). It is immediately evident that the theme has developed on different continents, highlighting a growing interest in AI in healthcare. The figure shows that many areas, such as Russia, Eastern Europe and Africa except for Algeria, Egypt, and Morocco, have still not engaged in this scientific debate.

Country publications and collaboration map

This section discusses articles on AI in healthcare in terms of single or multiple publications in each country. It also aims to observe collaboration and networking between countries. Table  11 and Fig.  10 highlight the average citations by state and show that the UK, the USA, and Kuwait have a higher average number of citations than other countries. Italy, Spain and New Zealand have the most significant number of citations.

figure 10

Articles per country.

Figure  11 depicts global collaborations. The blue colour on the map represents research cooperation among nations. Additionally, the pink border linking states indicates the extent of collaboration between authors. The primary cooperation between nations is between the USA and China, with two collaborative articles. Other collaborations among nations are limited to a few papers.

figure 11

Collaboration map.

Artificial intelligence for healthcare: applications

This section aims to strengthen the research scope by answering RQ3: What are the research applications of artificial intelligence for healthcare?

Benefiting from the topical dendrogram, researchers will provide a development model based on four relevant variables [ 69 , 70 ]. AI has been a disruptive innovation in healthcare [ 4 ]. With its sophisticated algorithms and several applications, AI has assisted doctors and medical professionals in the domains of health information systems, geocoding health data, epidemic and syndromic surveillance, predictive modelling and decision support, and medical imaging [ 2 , 9 , 10 , 64 ]. Furthermore, the researchers considered the bibliometric analysis to identify four macro-variables dominant in the field and used them as authors' keywords. Therefore, the following sub-sections aim to explain the debate on applications in healthcare for AI techniques. These elements are shown in Fig.  12 .

figure 12

Dominant variables for AI in healthcare.

Health services management

One of the notable aspects of AI techniques is potential support for comprehensive health services management. These applications can support doctors, nurses and administrators in their work. For instance, an AI system can provide health professionals with constant, possibly real-time medical information updates from various sources, including journals, textbooks, and clinical practices [ 2 , 10 ]. These applications' strength is becoming even more critical in the COVID-19 period, during which information exchange is continually needed to properly manage the pandemic worldwide [ 71 ]. Other applications involve coordinating information tools for patients and enabling appropriate inferences for health risk alerts and health outcome prediction [ 72 ]. AI applications allow, for example, hospitals and all health services to work more efficiently for the following reasons:

Clinicians can access data immediately when they need it.

Nurses can ensure better patient safety while administering medication.

Patients can stay informed and engaged in their care by communicating with their medical teams during hospital stays.

Additionally, AI can contribute to optimising logistics processes, for instance, realising drugs and equipment in a just-in-time supply system based totally on predictive algorithms [ 73 , 74 ]. Interesting applications can also support the training of personnel working in health services. This evidence could be helpful in bridging the gap between urban and rural health services [ 75 ]. Finally, health services management could benefit from AI to leverage the multiplicity of data in electronic health records by predicting data heterogeneity across hospitals and outpatient clinics, checking for outliers, performing clinical tests on the data, unifying patient representation, improving future models that can predict diagnostic tests and analyses, and creating transparency with benchmark data for analysing services delivered [ 51 , 76 ].

Predictive medicine

Another relevant topic is AI applications for disease prediction and diagnosis treatment, outcome prediction and prognosis evaluation [ 72 , 77 ]. Because AI can identify meaningful relationships in raw data, it can support diagnostic, treatment and prediction outcomes in many medical situations [ 64 ]. It allows medical professionals to embrace the proactive management of disease onset. Additionally, predictions are possible for identifying risk factors and drivers for each patient to help target healthcare interventions for better outcomes [ 3 ]. AI techniques can also help design and develop new drugs, monitor patients and personalise patient treatment plans [ 78 ]. Doctors benefit from having more time and concise data to make better patient decisions. Automatic learning through AI could disrupt medicine, allowing prediction models to be created for drugs and exams that monitor patients over their whole lives [ 79 ].

  • Clinical decision-making

One of the keyword analysis main topics is that AI applications could support doctors and medical researchers in the clinical decision-making process. According to Jiang et al. [ 64 ], AI can help physicians make better clinical decisions or even replace human judgement in healthcare-specific functional areas. According to Bennett and Hauser [ 80 ], algorithms can benefit clinical decisions by accelerating the process and the amount of care provided, positively impacting the cost of health services. Therefore, AI technologies can support medical professionals in their activities and simplify their jobs [ 4 ]. Finally, as Redondo and Sandoval [ 81 ] find, algorithmic platforms can provide virtual assistance to help doctors understand the semantics of language and learning to solve business process queries as a human being would.

Patient data and diagnostics

Another challenging topic related to AI applications is patient data and diagnostics. AI techniques can help medical researchers deal with the vast amount of data from patients (i.e., medical big data ). AI systems can manage data generated from clinical activities, such as screening, diagnosis, and treatment assignment. In this way, health personnel can learn similar subjects and associations between subject features and outcomes of interest [ 64 ].

These technologies can analyse raw data and provide helpful insights that can be used in patient treatments. They can help doctors in the diagnostic process; for example, to realise a high-speed body scan, it will be simpler to have an overall patient condition image. Then, AI technology can recreate a 3D mapping solution of a patient’s body.

In terms of data, interesting research perspectives are emerging. For instance, we observed the emergence of a stream of research on patient data management and protection related to AI applications [ 82 ].

For diagnostics, AI techniques can make a difference in rehabilitation therapy and surgery. Numerous robots have been designed to support and manage such tasks. Rehabilitation robots physically support and guide, for example, a patient’s limb during motor therapy [ 83 ]. For surgery, AI has a vast opportunity to transform surgical robotics through devices that can perform semi-automated surgical tasks with increasing efficiency. The final aim of this technology is to automate procedures to negate human error while maintaining a high level of accuracy and precision [ 84 ]. Finally, the -19 period has led to increased remote patient diagnostics through telemedicine that enables remote observation of patients and provides physicians and nurses with support tools [ 66 , 85 , 86 ].

This study aims to provide a bibliometric analysis of publications on AI in healthcare, focusing on accounting, business and management, decision sciences and health profession studies. Using the SLR method of Massaro et al. [ 11 ], we provide a reliable and replicable research protocol for future studies in this field. Additionally, we investigate the trend of scientific publications on the subject, unexplored information, future directions, and implications using the science mapping workflow. Our analysis provides interesting insights.

In terms of bibliometric variables, the four leading journals, Journal of Medical Systems , Studies in Health Technology and Informatics , IEEE Journal of Biomedical and Health Informatics , and Decision Support Systems , are optimal locations for the publication of scientific articles on this topic. These journals deal mainly with healthcare, medical information systems, and applications such as cloud computing, machine learning, and AI. Additionally, in terms of h-index, Bushko R.G. and Liu C. are the most productive and impactful authors in this research stream. Burke et al.’s [ 67 ] contribution is the most cited with an analysis of nurse rostering using new technologies such as AI. Finally, in terms of keywords, co-occurrence reveals some interesting insights. For instance, researchers have found that AI has a role in diagnostic accuracy and helps in the analysis of health data by comparing thousands of medical records, experiencing automatic learning with clinical alerts, efficient management of health services and places of care, and the possibility of reconstructing patient history using these data.

Second, this paper finds five cluster analyses in healthcare applications: health services management, predictive medicine, patient data, diagnostics, and finally, clinical decision-making. These technologies can also contribute to optimising logistics processes in health services and allowing a better allocation of resources.

Third, the authors analysing the research findings and the issues under discussion strongly support AI's role in decision support. These applications, however, are demonstrated by creating a direct link to data quality management and the technology awareness of health personnel [ 87 ].

The importance of data quality for the decision-making process

Several authors have analysed AI in the healthcare research stream, but in this case, the authors focus on other literature that includes business and decision-making processes. In this regard, the analysis of the search flow reveals a double view of the literature. On the one hand, some contributions belong to the positivist literature and embrace future applications and implications of technology for health service management, data analysis and diagnostics [ 6 , 80 , 88 ]. On the other hand, some investigations also aim to understand the darker sides of technology and its impact. For example, as Carter [ 89 ] states, the impact of AI is multi-sectoral; its development, however, calls for action to protect personal data. Similarly, Davenport and Kalakota [ 77 ] focus on the ethical implications of using AI in healthcare. According to the authors, intelligent machines raise issues of accountability, transparency, and permission, especially in automated communication with patients. Our analysis does not indicate a marked strand of the literature; therefore, we argue that the discussion of elements such as the transparency of technology for patients is essential for the development of AI applications.

A large part of our results shows that, at the application level, AI can be used to improve medical support for patients (Fig.  11 ) [ 64 , 82 ]. However, we believe that, as indicated by Kalis et al. [ 90 ] on the pages of Harvard Business Review, the management of costly back-office problems should also be addressed.

The potential of algorithms includes data analysis. There is an immense quantity of data accessible now, which carries the possibility of providing information about a wide variety of medical and healthcare activities [ 91 ]. With the advent of modern computational methods, computer learning and AI techniques, there are numerous possibilities [ 79 , 83 , 84 ]. For example, AI makes it easier to turn data into concrete and actionable observations to improve decision-making, deliver high-quality patient treatment, adapt to real-time emergencies, and save more lives on the clinical front. In addition, AI makes it easier to leverage capital to develop systems and facilities and reduce expenses at the organisational level [ 78 ]. Studying contributions to the topic, we noticed that data accuracy was included in the debate, indicating that a high standard of data will benefit decision-making practitioners [ 38 , 77 ]. AI techniques are an essential instrument for studying data and the extraction of medical insight, and they may assist medical researchers in their practices. Using computational tools, healthcare stakeholders may leverage the power of data not only to evaluate past data ( descriptive analytics ) but also to forecast potential outcomes ( predictive analytics ) and to define the best actions for the present scenario ( prescriptive analytics ) [ 78 ]. The current abundance of evidence makes it easier to provide a broad view of patient health; doctors should have access to the correct details at the right time and location to provide the proper treatment [ 92 ].

Will medical technology de-skill doctors?

Further reflection concerns the skills of doctors. Studies have shown that healthcare personnel are progressively being exposed to technology for different purposes, such as collecting patient records or diagnosis [ 71 ]. This is demonstrated by the keywords (Fig.  6 ) that focus on technology and the role of decision-making with new innovative tools. In addition, the discussion expands with Lu [ 93 ], which indicates that the excessive use of technology could hinder doctors’ skills and clinical procedures' expansion. Among the main issues arising from the literature is the possible de-skilling of healthcare staff due to reduced autonomy in decision-making concerning patients [ 94 ]. Therefore, the challenges and discussion we uncovered in Fig.  11 are expanded by also considering the ethical implications of technology and the role of skills.

Implications

Our analysis also has multiple theoretical and practical implications.

In terms of theoretical contribution, this paper extends the previous results of Connelly et al., dos Santos et al, Hao et al., Huang et al., Liao et al. and Tran et al. [ 2 , 19 , 20 , 21 , 22 , 24 ] in considering AI in terms of clinical decision-making and data management quality.

In terms of practical implications, this paper aims to create a fruitful discussion with healthcare professionals and administrative staff on how AI can be at their service to increase work quality. Furthermore, this investigation offers a broad comprehension of bibliometric variables of AI techniques in healthcare. It can contribute to advancing scientific research in this field.

Limitations

Like any other, our study has some limitations that could be addressed by more in-depth future studies. For example, using only one research database, such as Scopus, could be limiting. Further analysis could also investigate the PubMed, IEEE, and Web of Science databases individually and holistically, especially the health parts. Then, the use of search terms such as "Artificial Intelligence" OR "AI" AND "Healthcare" could be too general and exclude interesting studies. Moreover, although we analysed 288 peer-reviewed scientific papers, because the new research topic is new, the analysis of conference papers could return interesting results for future researchers. Additionally, as this is a young research area, the analysis will be subject to recurrent obsolescence as multiple new research investigations are published. Finally, although bibliometric analysis has limited the subjectivity of the analysis [ 15 ], the verification of recurring themes could lead to different results by indicating areas of significant interest not listed here.

Future research avenues

Concerning future research perspectives, researchers believe that an analysis of the overall amount that a healthcare organisation should pay for AI technologies could be helpful. If these technologies are essential for health services management and patient treatment, governments should invest and contribute to healthcare organisations' modernisation. New investment funds could be made available in the healthcare world, as in the European case with the Next Generation EU programme or national investment programmes [ 95 ]. Additionally, this should happen especially in the poorest countries around the world, where there is a lack of infrastructure and services related to health and medicine [ 96 ]. On the other hand, it might be interesting to evaluate additional profits generated by healthcare organisations with AI technologies compared to those that do not use such technologies.

Further analysis could also identify why some parts of the world have not conducted studies in this area. It would be helpful to carry out a comparative analysis between countries active in this research field and countries that are not currently involved. It would make it possible to identify variables affecting AI technologies' presence or absence in healthcare organisations. The results of collaboration between countries also present future researchers with the challenge of greater exchanges between researchers and professionals. Therefore, further research could investigate the difference in vision between professionals and academics.

In the accounting, business, and management research area, there is currently a lack of quantitative analysis of the costs and profits generated by healthcare organisations that use AI technologies. Therefore, research in this direction could further increase our understanding of the topic and the number of healthcare organisations that can access technologies based on AI. Finally, as suggested in the discussion section, more interdisciplinary studies are needed to strengthen AI links with data quality management and AI and ethics considerations in healthcare.

In pursuing the philosophy of Massaro et al.’s [ 11 ] methodological article, we have climbed on the shoulders of giants, hoping to provide a bird's-eye view of the AI literature in healthcare. We performed this study with a bibliometric analysis aimed at discovering authors, countries of publication and collaboration, and keywords and themes. We found a fast-growing, multi-disciplinary stream of research that is attracting an increasing number of authors.

The research, therefore, adopts a quantitative approach to the analysis of bibliometric variables and a qualitative approach to the study of recurring keywords, which has allowed us to demonstrate strands of literature that are not purely positive. There are currently some limitations that will affect future research potential, especially in ethics, data governance and the competencies of the health workforce.

Availability of data and materials

All the data are retrieved from public scientific platforms.

Tagliaferri SD, Angelova M, Zhao X, Owen PJ, Miller CT, Wilkin T, et al. Artificial intelligence to improve back pain outcomes and lessons learnt from clinical classification approaches: three systematic reviews. NPJ Digit Med. 2020;3(1):1–16.

Article   Google Scholar  

Tran BX, Vu GT, Ha GH, Vuong Q-H, Ho M-T, Vuong T-T, et al. Global evolution of research in artificial intelligence in health and medicine: a bibliometric study. J Clin Med. 2019;8(3):360.

Article   PubMed Central   Google Scholar  

Hamid S. The opportunities and risks of artificial intelligence in medicine and healthcare [Internet]. 2016 [cited 2020 May 29]. http://www.cuspe.org/wp-content/uploads/2016/09/Hamid_2016.pdf

Panch T, Szolovits P, Atun R. Artificial intelligence, machine learning and health systems. J Glob Health. 2018;8(2):020303.

Article   PubMed   PubMed Central   Google Scholar  

Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of artificial intelligence for computer-assisted drug discovery | chemical reviews. Chem Rev. 2019;119(18):10520–94.

Article   CAS   PubMed   Google Scholar  

Burton RJ, Albur M, Eberl M, Cuff SM. Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections. BMC Med Inform Decis Mak. 2019;19(1):171.

Meskò B, Drobni Z, Bényei E, Gergely B, Gyorffy Z. Digital health is a cultural transformation of traditional healthcare. Mhealth. 2017;3:38.

Cho B-J, Choi YJ, Lee M-J, Kim JH, Son G-H, Park S-H, et al. Classification of cervical neoplasms on colposcopic photography using deep learning. Sci Rep. 2020;10(1):13652.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Doyle OM, Leavitt N, Rigg JA. Finding undiagnosed patients with hepatitis C infection: an application of artificial intelligence to patient claims data. Sci Rep. 2020;10(1):10521.

Shortliffe EH, Sepúlveda MJ. Clinical decision support in the era of artificial intelligence. JAMA. 2018;320(21):2199–200.

Article   PubMed   Google Scholar  

Massaro M, Dumay J, Guthrie J. On the shoulders of giants: undertaking a structured literature review in accounting. Account Auditing Account J. 2016;29(5):767–801.

Junquera B, Mitre M. Value of bibliometric analysis for research policy: a case study of Spanish research into innovation and technology management. Scientometrics. 2007;71(3):443–54.

Casadesus-Masanell R, Ricart JE. How to design a winning business model. Harvard Business Review [Internet]. 2011 Jan 1 [cited 2020 Jan 8]. https://hbr.org/2011/01/how-to-design-a-winning-business-model

Aria M, Cuccurullo C. bibliometrix: an R-tool for comprehensive science mapping analysis. J Informetr. 2017;11(4):959–75.

Zupic I, Čater T. Bibliometric methods in management and organization. Organ Res Methods. 2015;1(18):429–72.

Secinaro S, Calandra D. Halal food: structured literature review and research agenda. Br Food J. 2020. https://doi.org/10.1108/BFJ-03-2020-0234 .

Rialp A, Merigó JM, Cancino CA, Urbano D. Twenty-five years (1992–2016) of the international business review: a bibliometric overview. Int Bus Rev. 2019;28(6):101587.

Zhao L, Dai T, Qiao Z, Sun P, Hao J, Yang Y. Application of artificial intelligence to wastewater treatment: a bibliometric analysis and systematic review of technology, economy, management, and wastewater reuse. Process Saf Environ Prot. 2020;1(133):169–82.

Article   CAS   Google Scholar  

Huang Y, Huang Q, Ali S, Zhai X, Bi X, Liu R. Rehabilitation using virtual reality technology: a bibliometric analysis, 1996–2015. Scientometrics. 2016;109(3):1547–59.

Hao T, Chen X, Li G, Yan J. A bibliometric analysis of text mining in medical research. Soft Comput. 2018;22(23):7875–92.

dos Santos BS, Steiner MTA, Fenerich AT, Lima RHP. Data mining and machine learning techniques applied to public health problems: a bibliometric analysis from 2009 to 2018. Comput Ind Eng. 2019;1(138):106120.

Liao H, Tang M, Luo L, Li C, Chiclana F, Zeng X-J. A bibliometric analysis and visualization of medical big data research. Sustainability. 2018;10(1):166.

Choudhury A, Renjilian E, Asan O. Use of machine learning in geriatric clinical care for chronic diseases: a systematic literature review. JAMIA Open. 2020;3(3):459–71.

Connelly TM, Malik Z, Sehgal R, Byrnes G, Coffey JC, Peirce C. The 100 most influential manuscripts in robotic surgery: a bibliometric analysis. J Robot Surg. 2020;14(1):155–65.

Guo Y, Hao Z, Zhao S, Gong J, Yang F. Artificial intelligence in health care: bibliometric analysis. J Med Internet Res. 2020;22(7):e18228.

Choudhury A, Asan O. Role of artificial intelligence in patient safety outcomes: systematic literature review. JMIR Med Inform. 2020;8(7):e18599.

Forliano C, De Bernardi P, Yahiaoui D. Entrepreneurial universities: a bibliometric analysis within the business and management domains. Technol Forecast Soc Change. 2021;1(165):120522.

Secundo G, Del Vecchio P, Mele G. Social media for entrepreneurship: myth or reality? A structured literature review and a future research agenda. Int J Entrep Behav Res. 2020;27(1):149–77.

Dal Mas F, Massaro M, Lombardi R, Garlatti A. From output to outcome measures in the public sector: a structured literature review. Int J Organ Anal. 2019;27(5):1631–56.

Google Scholar  

Baima G, Forliano C, Santoro G, Vrontis D. Intellectual capital and business model: a systematic literature review to explore their linkages. J Intellect Cap. 2020. https://doi.org/10.1108/JIC-02-2020-0055 .

Dumay J, Guthrie J, Puntillo P. IC and public sector: a structured literature review. J Intellect Cap. 2015;16(2):267–84.

Dal Mas F, Garcia-Perez A, Sousa MJ, Lopes da Costa R, Cobianchi L. Knowledge translation in the healthcare sector. A structured literature review. Electron J Knowl Manag. 2020;18(3):198–211.

Mas FD, Massaro M, Lombardi R, Biancuzzi H. La performance nel settore pubblico tra misure di out-put e di outcome. Una revisione strutturata della letteratura ejvcbp. 2020;1(3):16–29.

Dumay J, Cai L. A review and critique of content analysis as a methodology for inquiring into IC disclosure. J Intellect Cap. 2014;15(2):264–90.

Haleem A, Javaid M, Khan IH. Current status and applications of Artificial Intelligence (AI) in medical field: an overview. Curr Med Res Pract. 2019;9(6):231–7.

Paul J, Criado AR. The art of writing literature review: what do we know and what do we need to know? Int Bus Rev. 2020;29(4):101717.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6(7):e1000100.

Biancone PP, Secinaro S, Brescia V, Calandra D. Data quality methods and applications in health care system: a systematic literature review. Int J Bus Manag. 2019;14(4):p35.

Secinaro S, Brescia V, Calandra D, Verardi GP, Bert F. The use of micafungin in neonates and children: a systematic review. ejvcbp. 2020;1(1):100–14.

Bert F, Gualano MR, Biancone P, Brescia V, Camussi E, Martorana M, et al. HIV screening in pregnant women: a systematic review of cost-effectiveness studies. Int J Health Plann Manag. 2018;33(1):31–50.

Levy Y, Ellis TJ. A systems approach to conduct an effective literature review in support of information systems research. Inf Sci Int J Emerg Transdiscipl. 2006;9:181–212.

Chen G, Xiao L. Selecting publication keywords for domain analysis in bibliometrics: a comparison of three methods. J Informet. 2016;10(1):212–23.

Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J. 2007;22(2):338–42.

Article   PubMed   CAS   Google Scholar  

Sicilia M-A, Garcìa-Barriocanal E, Sànchez-Alonso S. Community curation in open dataset repositories: insights from zenodo. Procedia Comput Sci. 2017;1(106):54–60.

Secinaro S, Calandra D, Secinaro A, Muthurangu V, Biancone P. Artificial Intelligence for healthcare with a business, management and accounting, decision sciences, and health professions focus [Internet]. Zenodo; 2021 [cited 2021 Mar 7]. https://zenodo.org/record/4587618#.YEScpl1KiWh .

Elango B, Rajendran D. Authorship trends and collaboration pattern in the marine sciences literature: a scientometric Study. Int J Inf Dissem Technol. 2012;1(2):166–9.

Jacoby WG. Electoral inquiry section Loess: a nonparametric, graphical tool for depicting relationships between variables q. In 2000.

Andrews JE. An author co-citation analysis of medical informatics. J Med Libr Assoc. 2003;91(1):47–56.

PubMed   PubMed Central   Google Scholar  

White HD, Griffith BC. Author cocitation: a literature measure of intellectual structure. J Am Soc Inf Sci. 1981;32(3):163–71.

Santosh KC. AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data. J Med Syst. 2020;44(5):93.

Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2018;22(5):1589–604.

Baig MM, GholamHosseini H, Moqeem AA, Mirza F, Lindén M. A systematic review of wearable patient monitoring systems—current challenges and opportunities for clinical adoption. J Med Syst. 2017;41(7):115.

Kumar S, Kumar S. Collaboration in research productivity in oil seed research institutes of India. In: Proceedings of fourth international conference on webometrics, informetrics and scientometrics. p. 28–1; 2008.

Gatto A, Drago C. A taxonomy of energy resilience. Energy Policy. 2020;136:111007.

Levitt JM, Thelwall M. Alphabetization and the skewing of first authorship towards last names early in the alphabet. J Informet. 2013;7(3):575–82.

Saad G. Exploring the h-index at the author and journal levels using bibliometric data of productive consumer scholars and business-related journals respectively. Scientometrics. 2006;69(1):117–20.

Egghe L. Theory and practise of the g-index. Scientometrics. 2006;69(1):131–52.

Schreiber M. A modification of the h-index: the hm-index accounts for multi-authored manuscripts. J Informet. 2008;2(3):211–6.

Engqvist L, Frommen JG. The h-index and self-citations. Trends Ecol Evol. 2008;23(5):250–2.

London School of Economics. 3: key measures of academic influence [Internet]. Impact of social sciences. 2010 [cited 2021 Jan 13]. https://blogs.lse.ac.uk/impactofsocialsciences/the-handbook/chapter-3-key-measures-of-academic-influence/ .

Lotka A. The frequency distribution of scientific productivity. J Wash Acad Sci. 1926;16(12):317–24.

Khan G, Wood J. Information technology management domain: emerging themes and keyword analysis. Scientometrics. 2015;9:105.

Oxford University Press. Oxford English Dictionary [Internet]. 2020. https://www.oed.com/ .

Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2(4):230–43.

Calandra D, Favareto M. Artificial Intelligence to fight COVID-19 outbreak impact: an overview. Eur J Soc Impact Circ Econ. 2020;1(3):84–104.

Bokolo Anthony Jnr. Use of telemedicine and virtual care for remote treatment in response to COVID-19 pandemic. J Med Syst. 2020;44(7):132.

Burke EK, De Causmaecker P, Berghe GV, Van Landeghem H. The state of the art of nurse rostering. J Sched. 2004;7(6):441–99.

Ahmed MA, Alkhamis TM. Simulation optimization for an emergency department healthcare unit in Kuwait. Eur J Oper Res. 2009;198(3):936–42.

Forina M, Armanino C, Raggio V. Clustering with dendrograms on interpretation variables. Anal Chim Acta. 2002;454(1):13–9.

Wartena C, Brussee R. Topic detection by clustering keywords. In: 2008 19th international workshop on database and expert systems applications. 2008. p. 54–8.

Hussain AA, Bouachir O, Al-Turjman F, Aloqaily M. AI Techniques for COVID-19. IEEE Access. 2020;8:128776–95.

Agrawal A, Gans JS, Goldfarb A. Exploring the impact of artificial intelligence: prediction versus judgment. Inf Econ Policy. 2019;1(47):1–6.

Chakradhar S. Predictable response: finding optimal drugs and doses using artificial intelligence. Nat Med. 2017;23(11):1244–7.

Fleming N. How artificial intelligence is changing drug discovery. Nature. 2018;557(7707):S55–7.

Guo J, Li B. The application of medical artificial intelligence technology in rural areas of developing countries. Health Equity. 2018;2(1):174–81.

Aisyah M, Cockcroft S. A snapshot of data quality issues in Indonesian community health. Int J Netw Virtual Organ. 2014;14(3):280–97.

Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8.

Mehta N, Pandit A, Shukla S. Transforming healthcare with big data analytics and artificial intelligence: a systematic mapping study. J Biomed Inform. 2019;1(100):103311.

Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet. 2019;393(10181):1577–9.

Bennett CC, Hauser K. Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif Intell Med. 2013;57(1):9–19.

Redondo T, Sandoval AM. Text Analytics: the convergence of big data and artificial intelligence. Int J Interact Multimed Artif Intell. 2016;3. https://www.ijimai.org/journal/bibcite/reference/2540 .

Winter JS, Davidson E. Big data governance of personal health information and challenges to contextual integrity. Inf Soc. 2019;35(1):36–51.

Novak D, Riener R. Control strategies and artificial intelligence in rehabilitation robotics. AI Mag. 2015;36(4):23–33.

Tarassoli SP. Artificial intelligence, regenerative surgery, robotics? What is realistic for the future of surgery? Ann Med Surg (Lond). 2019;17(41):53–5.

Saha SK, Fernando B, Cuadros J, Xiao D, Kanagasingam Y. Automated quality assessment of colour fundus images for diabetic retinopathy screening in telemedicine. J Digit Imaging. 2018;31(6):869–78.

Gu D, Li T, Wang X, Yang X, Yu Z. Visualizing the intellectual structure and evolution of electronic health and telemedicine research. Int J Med Inform. 2019;130:103947.

Madnick S, Wang R, Lee Y, Zhu H. Overview and framework for data and information quality research. J Data Inf Qual. 2009;1:1.

Chen X, Liu Z, Wei L, Yan J, Hao T, Ding R. A comparative quantitative study of utilizing artificial intelligence on electronic health records in the USA and China during 2008–2017. BMC Med Inform Decis Mak. 2018;18(5):117.

Carter D. How real is the impact of artificial intelligence? Bus Inf Surv. 2018;35(3):99–115.

Kalis B, Collier M, Fu R. 10 Promising AI Applications in Health Care. 2018;5.

Biancone P, Secinaro S, Brescia V, Calandra D. Management of open innovation in healthcare for cost accounting using EHR. J Open Innov Technol Market Complex. 2019;5(4):99.

Kayyali B, Knott D, Van Kuiken S. The ‘big data’ revolution in US healthcare [Internet]. McKinsey & Company. 2013 [cited 2020 Aug 14]. https://healthcare.mckinsey.com/big-data-revolution-us-healthcare/ .

Lu J. Will medical technology deskill doctors? Int Educ Stud. 2016;9(7):130–4.

Hoff T. Deskilling and adaptation among primary care physicians using two work innovations. Health Care Manag Rev. 2011;36(4):338–48.

Picek O. Spillover effects from next generation EU. Intereconomics. 2020;55(5):325–31.

Sousa MJ, Dal Mas F, Pesqueira A, Lemos C, Verde JM, Cobianchi L. The potential of AI in health higher education to increase the students’ learning outcomes. TEM J. 2021. ( In press ).

Download references

Acknowledgements

The authors are grateful to the Editor-in-Chief for the suggestions and all the reviewers who spend a part of their time ensuring constructive feedback to our research article.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Management, University of Turin, Turin, Italy

Silvana Secinaro, Davide Calandra & Paolo Biancone

Ospedale Pediatrico Bambino Gesù, Rome, Italy

Aurelio Secinaro

Institute of Child Health, University College London, London, UK

Vivek Muthurangu

You can also search for this author in PubMed   Google Scholar

Contributions

SS and PB, Supervision; Validation, writing, AS and VM; Formal analysis, DC and AS; Methodology, DC; Writing; DC, SS and AS; conceptualization, VM, PB; validation, VM, PB. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Davide Calandra .

Ethics declarations

Ethical approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Secinaro, S., Calandra, D., Secinaro, A. et al. The role of artificial intelligence in healthcare: a structured literature review. BMC Med Inform Decis Mak 21 , 125 (2021). https://doi.org/10.1186/s12911-021-01488-9

Download citation

Received : 24 December 2020

Accepted : 01 April 2021

Published : 10 April 2021

DOI : https://doi.org/10.1186/s12911-021-01488-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Patient data

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

ai in medical field research paper

Advertisement

Advertisement

Explainable and interpretable artificial intelligence in medicine: a systematic bibliometric review

  • Open access
  • Published: 27 February 2024
  • Volume 4 , article number  15 , ( 2024 )

Cite this article

You have full access to this open access article

ai in medical field research paper

  • Maria Frasca 1 ,
  • Davide La Torre 2 ,
  • Gabriella Pravettoni 1 , 3 &
  • Ilaria Cutica 1  

5801 Accesses

16 Citations

7 Altmetric

Explore all metrics

This review aims to explore the growing impact of machine learning and deep learning algorithms in the medical field, with a specific focus on the critical issues of explainability and interpretability associated with black-box algorithms. While machine learning algorithms are increasingly employed for medical analysis and diagnosis, their complexity underscores the importance of understanding how these algorithms explain and interpret data to take informed decisions. This review comprehensively analyzes challenges and solutions presented in the literature, offering an overview of the most recent techniques utilized in this field. It also provides precise definitions of interpretability and explainability, aiming to clarify the distinctions between these concepts and their implications for the decision-making process. Our analysis, based on 448 articles and addressing seven research questions, reveals an exponential growth in this field over the last decade. The psychological dimensions of public perception underscore the necessity for effective communication regarding the capabilities and limitations of artificial intelligence. Researchers are actively developing techniques to enhance interpretability, employing visualization methods and reducing model complexity. However, the persistent challenge lies in finding the delicate balance between achieving high performance and maintaining interpretability. Acknowledging the growing significance of artificial intelligence in aiding medical diagnosis and therapy, and the creation of interpretable artificial intelligence models is considered essential. In this dynamic context, an unwavering commitment to transparency, ethical considerations, and interdisciplinary collaboration is imperative to ensure the responsible use of artificial intelligence. This collective commitment is vital for establishing enduring trust between clinicians and patients, addressing emerging challenges, and facilitating the informed adoption of these advanced technologies in medicine.

Similar content being viewed by others

ai in medical field research paper

Interpretability and Explainability of Machine Learning Models: Achievements and Challenges

ai in medical field research paper

A Comprehensive Study of Explainable Artificial Intelligence in Healthcare

ai in medical field research paper

A critical moment in machine learning in medicine: on reproducible and interpretable learning

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

The rise of Artificial Intelligence (AI) led to a revolutionary transformation in the medical field, redefining how diagnostic and therapeutic challenges are addressed. This synergy is revolutionizing the concept of personalized patient care, opening up new perspectives in prevention, therapy, and optimization of medical resources. In this evolving scenario, AI algorithms allow us to analyze complex biomedical data, identify and extract hidden patterns, and drive the decision-making process. The ability to process large amounts of information quickly and efficiently allows for earlier diagnoses, targeted treatments and more efficient management of medical conditions [ 1 ].

Unfortunately, understanding how these systems make decisions remains a critical issue for clinicians, professionals, patients and stakeholders involved in the process. There is an ongoing debate on the transparency of decision-making processes, the interpretability of algorithms, and ethical issues related to the adoption of automated systems in the clinical context [ 2 ].

Although current research results indicate that AI algorithms can outperform humans in certain analytical tasks, the lack of interpretability and explainability limit the adoption of AI-based solutions in the medical context, as it raises legal and ethical concerns, potentially hindering progress and preventing new technologies from realizing their full potential in improving health care.

In AI systems theory, we can identify two distinct methods for categorizing models and algorithms: Interpretable (i.e. white-box) and non-interpretable (i.e. black-box) [ 3 ]. This differentiation is based on the clarity of the relationship between the input–output data and the outcomes generated by the model. White-box models have recognizable and understandable characteristics that help explaining the influence of variables on predicting outcomes, for instance linear regression models and decision trees belong to this family. On the other hand, black-box models are based on highly complex structures where the processes, parameters and predictions are unknown, for instance, in the case of deep learning algorithms and random forest models. Black-box models might incorporate harmful biases [ 4 , 5 ] and this could hurt clinicians’ confidence and trust. Indeed, if an algorithm is trained on data reflecting cultural or social biases, it can result in discriminatory decisions that exacerbate inequalities rather than reduce them. This raises important ethical questions about fairness and justice towards the individuals involved.

Finally, while white-box models feature easier-to-understand processes and results, black-box models show better performance and accuracy. This can be mainly attributed to the intrinsic ability of black-box models to learn complex and non-linear representations of data. Their more complex and flexible structure allows to capture intricate details and hidden relationships in the data, improving the model’s ability to make more accurate predictions. While white-box models often rely on simpler and more interpretable relationships [ 3 ].

It has also to be considered the rising of ethical concerns related to the opacity of algorithms. Regulatory frameworks such as the General Data Protection Regulation (GDPR) [ 6 ], a European legislative framework, defines the legal requirements for acquisition, storage, transfer, processing and analysis of health data [ 7 ]. The GDPR also emphasizes the "right to explanation", which gives individuals the right to understand how automated decisions can affect them. It also highlights the concept of accountability, placing the responsibility on organizations to demonstrate compliance with regulations. This implies greater transparency in data processing and the need to clarify how algorithms operate when involving personal data.

In summary, this paper aims to explore the interpretability and explainability of Machine Learning (ML) and Deep Learning (DL) algorithms within the medical domain. We will investigate the particular challenges highlighted in existing literature and analyze the ramifications of algorithmic opacity. We will also examine specific case studies that show how interpretability and explainability can make a difference in daily clinical practice, and explore the most relevant technical approaches to generate easy-to-understand explanations.

The paper is structured as follows: Sect.  1 presents an introduction to the topic of explainability and interpretability of AI in the medical field. In Sect.  2 we provide a brief discussion on explainability and interpretability in artificial intelligence. Section  3 presents the investigations already available in the literature on the interpretability and explainability of AI algorithms in the medical field.

Section 4 defines the inclusion and exclusion criteria for the construction of our dataset. In section 5 we describe the results obtained from the Scopus and Web of Science databases and propose seven research questions. We highlight the results obtained by comparing the two datasets, considering different levels of detail. In particular, in subsections 5.1 , 5.3 , 5.3 , we provide a quantitative analysis of the state of the art relating to the application of interpretability and explainability techniques in AI in the medical field, in particular by focusing on the main channels used for publication and in which countries the most active are located research centres. In Sect.  6 we proceed with the analysis of the 10 papers chosen for the analysis and in subsections 6.1 , 6.2 , 6.3 , we analyze the application domains of the proposed techniques, provide technical insights on the formalization of the problems and discuss the performance metrics used for the evaluation and finally we consider the challenges faced by each paper. In Sect.  7 we discuss the selected papers and compare them with our definition of explainability and interpretability of AI algorithms in the medical field. Finally in Sect.  8 we provide conclusions of the proposed study.

2 Explainability and interpretability

AI algorithms’ interpretability and explainability represent one of the fundamental pillars in the research and development of advanced AI systems [ 8 ].

Explainability is often confused with interpretability when interpretability is a prerequisite for explainability [ 9 ]. Furthermore, interpretability is sometimes defined as a part of explainability [ 10 ]. Ambiguity can arise when, in contexts of discussion about AI, the two words are used interchangeably, without precise distinction between the two concepts. However, the difference is important and can influence how we evaluate and implement AI algorithms.

The Explainability property of a model includes the ability to provide detailed and understandable reasoning for specific decisions. Explainability usually involves the capability to identify and understand the parameters within a system, understand nodes, architectures, and computational units that process and transmit information, and understand the significance of each component Within a system.

Interpretability, on the other hand, is the ability to understand the general behaviour of a model without necessarily delving into each decision. Thus, interpretability can be seen as a broader component of explainability, providing a general, high-level view of the model without going into specific details. Interpretability pertains to the extent to which a system’s cause-and-effect relationships can be understood. It also involves not only understanding but also describing how a system operates or behaves. In general AI systems may vary in terms of their interpretability, with some being more transparent and interpretable than others. This is also related to the notion of degree of interpretability. To summarize, we might say that:

Interpretability. It is about the degree to which a cause and effect can be seen inside a system and refers to a certain way something is understood or described. As a result, it can identify the cause of a problem and foresee what will happen if the input or computational parameters change [ 10 ].

Explainability, It refers to the ability to provide explicit and understandable justifications or reasoning for model-specific decisions, it is the ability to recognize the parameters and understand what a node stands for and its significance within a system [ 10 ].

Existing methods to support explainability and interpretability of AI systems are often divided into a priori and a posteriori approaches. A priori explainability and interpretability techniques refer to concepts known or presumed before experience or observation of specific data. In this context, a priori explainability and interpretability refer to integrated techniques or design considerations applied during the creation and design of an AI model. They can be summarized as follows:

The architecture simplicity can enhance interpretability by employing models with simpler topologies, such as linear models or shallow neural networks [ 11 ];

Feature engineering involves selecting or creating features that reflect understandable or well-known concepts in a given domain. Regularization techniques, such as penalizing large weights to encourage sparsity and enhance interpretability, can be employed [ 12 ];

Regularization algorithms, including L1 and L2 regularization, can be utilized to promote sparsity and improve interpretability [ 13 ];

Implementing a depth limitation can prevent the development of overly complex systems, contributing to the interpretability of the model.

A posteriori explainability and interpretability techniques refer to concepts derived or deduced from experience or observation of specific data. In this context, a posteriori explainability and interpretability refer to techniques applied after the model has been trained to understand the specific decisions made by the model. They include:

LIME (Local Interpretable Model-agnostic Explanations), which provides local explanations for certain input instances and offers a post-hoc comprehension of model decisions, is one technique that serves as an example [ 14 ];

SHAP (SHapley Additive exPlanations), which uses the Shapley values idea to assign feature contributions to each input variable and explains predictions made by the model in the past [ 15 ];

Feature visualizations, which use specific data instances to build retrospective visualizations of the most significant aspects;

A sensitivity analysis to assess how slight changes in the inputs might have affected the model’s predictions in the past [ 16 ].

Usually a posteriori approach is not considered during system design and it is concerned with extracting explanatory information from existing systems [ 17 ] (typically based on the black-box approach).

Developing an explainable and interpretable AI system is usually challenging and variable because its complexity will depend on the main purpose of the model as well as on its related variables and features. There are many reasons why explainability and interpretability can be desirable or necessary in AI systems [ 18 ] among which are ethical, legal, and practical. The following are some of the main obstacles to their implementation:

Opacity of complex models: Deep learning techniques, such as deep neural networks, can build highly complex models with millions of parameters. Even the model’s developers themselves may find it challenging to comprehend how the model comes to its conclusions due to its intricacy.

Trade-off between performance and interpretability: It is frequently required to utilize more complex models to gain higher performance in terms of accuracy and generalization. But more intricate models typically have a harder time being understood. A significant problem is striking the ideal balance between performance and interoperability.

Bias in training data: If a model is trained on biased data, it may carry over these biases and produce discriminating or inaccurate decisions. It’s critical to comprehend how the model uses the data and find any unintentional biases.

Interpretability of features: A key component of interpretability is knowing which features are pertinent to model decisions. In many situations, particularly with complicated models like deep neural networks, it can be challenging to pinpoint which parts impact the model.

Scalability: Because large-scale deep learning models contain many more parameters and demand more computer resources for interpretive analysis, interpretability can be more challenging to achieve in these models.

Changes in model behaviour: A machine learning algorithm’s behaviour may alter over time due to modifications to the input data or the training it receives. It can be difficult to maintain an interpretable model in the face of such changes.

Social acceptance: Even if a model may be interpreted in theory, it may be challenging to convince users to do so, especially if the justifications offered do not line up with their intuitions.

To address these challenges, scientists and engineers are developing various techniques and approaches to improve the interpretability and explainability of AI algorithms. This includes using visualization techniques, interpreting features, generating textual explanations, and reducing model complexity.

Finding a balance between high performance and interpretability is an ongoing challenge, but essential to ensuring that AI can be used ethically and responsibly for the benefit of society as a whole [ 19 ]. Advances in this field are critical to shaping the future of artificial intelligence.

3 Related work

In the extant literature, there are already some review articles that present and discuss trends and challenges related to the integration of the notions of interpretability and explainability into AI systems. We do not include them in our literature analysis, so this section provides an overview of the most significant contributions.

Biran et al. [ 20 ] look at various approaches for making machine learning models understandable. The authors investigate methods for enhancing user confidence and boosting real-world adoption by making complex models accessible and intelligible. The authors categorize the many interpretation techniques they look at into several groups. Rule-based approaches, global and local procedures, visualization techniques, and model representation-based approaches are some of these categories. Additionally, they demonstrate the practical importance of these interpretation strategies by highlighting how they are used in particular fields like finance, marketing, and health.

Guidotti et al. [ 21 ] provides a comprehensive overview of the methods proposed in the literature to explain decision-making systems based on opaque and dark machine learning models. They identified several explainability problems and provided a formal definition for each. Identified “black box” problems include the model explainability problem, the outcome explainability problem, the model inspection problem, and the transparent box design problem. The analysis of the literature led to the conclusion that, although many approaches have been proposed to explain "black boxes", some important scientific questions remain unresolved.

By incorporating viewpoints from the social sciences, Miller et al. [ 22 ] explore the function of explanations in AI in this paper. The authors emphasize the need to make algorithm conclusions more user-acceptable and understandable while discussing the significance of explanations in artificial intelligence systems. They look at the many difficulties in giving convincing explanations, taking into account variations in how explanations are perceived on a cultural, social, and psychological level. The authors stress how crucial it is to involve users in the design of explanations to make sure they are useful and well-received. To increase the comprehension and social acceptance of intelligent technology, the paper proposes an interdisciplinary method to address the complexity of explanations in AI.

Arrieta et al. [ 19 ] investigate the concepts, taxonomies, prospects, and difficulties within Explainable Artificial Intelligence (XAI) to address the moral and societal ramifications associated with the secrecy of AI algorithms. The authors stress the significance of creating more understandable models to encourage the responsible use of AI and the significance of comprehending and interpreting algorithmic judgments to guarantee responsible governance. Based on factors including the type of explanation, the level of granularity, and the manner of application, the authors suggest various taxonomies to categorize XAI techniques. The authors also discuss difficulties in implementing XAI, such as the necessity to manage the trade-off between precision and interpretability and the role of privacy in connection to model explainability. Other difficulties include the need to balance explainability and complexity.

Tjia et al. [ 23 ] discuss the importance of interpretability in black x machine decisions in a medical context. They provide an overview of the interpretations proposed in different studies and classify them according to their clarity. In the medical field, such explanations are essential to justify the reliability of algorithmic decisions. However, this article highlights challenges such as risks associated with manipulating explanations and the quality of training data. It also highlights the importance of specialized training to correctly interpret algorithm descriptions in a medical context. Finally, this article calls for a critical approach to the use of algorithmic interpretation, which should be seen as a complementary support to medical decisions until a more robust approach to interpretability is developed.

Stiglic et al. [ 24 ] emphasize the significance of interpretability in machine learning (ML) models in the context of healthcare. The authors divided interpretability approaches into two main categories: one centred on personalized (local) interpretation, which emphasizes thorough justifications at the individual level, and the other concerned with the synthesis of prediction models on a population level (global), useful for getting a broad overview of trends. Additionally, they divided interpretability techniques into two groups: model-specific procedures and model-neutral strategies. The former analyses predictions made by a particular ML model, like a neural network. On the other hand, model-agnostic approaches offer clear justifications for any ML model’s predictions, regardless of its architecture.

Amann et al. [ 7 ] discuss the issue of interpretability in the use of artificial intelligence (AI) in the healthcare industry, highlighting that while AI-based systems have demonstrated superior performance over humans in specific analytical tasks, the lack of interpretability has drawn criticism. The authors use the case of AI-based clinical decision support systems as a starting point for their multidisciplinary analysis of the applicability of interpretability for medical AI, taking into account the perspectives of technology, law, medicine, and patients. Based on the findings of this conceptual study, an ethical evaluation of the "Principles of Biomedical Ethics" by Beauchamp and Childress (autonomy, beneficence, nonmaleficence, and justice) is carried out to ascertain the necessity of interpretability in medical AI. Each domain draws attention to a distinct collection of factors and ideals crucial to comprehending the function of interpretability in clinical practice. The importance of taking into account the interaction between human actors and medical AI was emphasized from both a medical and patient perspective. The absence of interpretability in clinical decision support systems poses a threat to fundamental medical ethical principles and may have unfavourable effects on both individual and public health.

In the paper [ 25 ] Mehrabi et al conduct in-depth research on bias and bias in machine learning models. The authors look at the causes of bias, and how it manifests in models and solutions to these issues. They identified the main causes of bias in machine learning models, including characteristics included in the models, training data, and the learning process itself. The authors investigated different metrics to evaluate bias in models, such as group fairness, individual fairness, and fairness of opportunity, and they analyzed several strategies to mitigate bias in machine learning models, including gathering balanced data, adjusting model weights, and implementing fairness metrics to evaluate model performance. They examine the difficulties brought about by these measurements as well.

In this work, Chakrobartty et al. [ 26 ] focus on the recent advancement of XAI in the setting of medicine to offer a comprehensive overview of XAI approaches and techniques noted in the literature. The article addresses XAI approaches and techniques utilized in ML systems in the medical industry through a thorough literature review. The given conceptual framework for categorizing XAI approaches and techniques aids in the organization and discussion of the available literature. The balance between interpretability and accuracy is emphasized in the paper as a major subject in the literature, with some studies emphasizing interpretability in addition to accuracy.

In [ 2 ] the authors use the systematic mapping procedure to review the literature on interpretability strategies utilized in the medical area. The following factors were taken into account: the locations and years of publications; the types of contributions; the medical and ML disciplines; the ML objectives; the interpretation of "black box" ML techniques; the examination of interpretability techniques; the performance of the techniques; the best techniques; and, lastly, the datasets used in the evaluations of interpretability techniques. The results show an increase in the number of interpretability studies over time, with a predominance of solution proposals and empirical types based on experiments, after selecting 179 articles (1994–2020) from six digital libraries (ScienceDirect, IEEE Xplore, ACM Digital Library, SpringerLink, Wiley, and Google Scholar). The most common medical activities, fields, and ML aims were discovered to be classification, oncology, and diagnosis. The most popular "black box" ML approaches for interpretability are artificial neural networks. Accuracy, integrity, and the number of rules were other criteria that were frequently employed to gauge interpretability.

In [ 27 ] the authors consider the problem of interpretability. They emphasize that while AI systems have displayed outstanding performance in numerous clinical activities, efforts to make the AI more "interpretable" or explicable have been prompted by the lack of transparency of some of its black boxes. The paper argues that clinicians may favour interpretable systems even at the expense of maximum accuracy, defending the importance of interpretability. This inclination is supported by the fact that to get the intended benefits, doctors must employ AI. The authors make the point that giving accuracy priority over interpretability could be a "lethal bias," reducing the advantages of AI for patients.

Combi et al. look into the application of explainable artificial intelligence (XAI) in biomedical settings in [ 17 ],. The authors identify five key topics that demand more study. The first point is the importance of bridging the symbolic and sub-symbolic machine learning methodologies. Engineering explainability in intelligent systems is another major problem, and overcoming it calls for a thorough investigation of the structural, functional, and behavioural traits of various intelligent systems as well as the requirements of their users. The assessment and enhancement of the results of explainable elements and methodologies is the third key part. It is emphasized that the study must examine the effects on users’ beliefs, attitudes, and behaviour and how accurately intelligent systems make decisions. Determining whether explainability is required also becomes a concern. Finally, the need to look into user-centred explainability artefact design becomes apparent. For XAI, user-centred design is essential.

Farah et al. [ 28 ] review key ideas for creating medical devices using AI and emphasize the value of algorithm performance, interpretability, and explainability. According to the literature review, three crucial criteria-performance, interpretability, and explainability-have been highlighted by health technology assessment organizations as being crucial for establishing trust in AI-based medical devices and are therefore essential for their evaluation. Based on the model’s structure and the data at hand, suggestions were given for how and when to evaluate performance. Furthermore, methods for supporting their evaluation have been developed, taking into account the fact that interpretability and explainability can be challenging concepts to define mathematically. An estimated regulatory requirements flowchart for the development and assessment of AI-based medical devices has been made available.

In [ 29 ] Ali et al. give a summary of recent developments and trends in the field of the explainability and interpretability of AI algorithms. Using a hierarchical categorization method, the authors categorize the XAI techniques into four categories: (i) data explainability, (ii) model explainability, (iii) post-hoc explainability, and (iv) evaluation of explanations. They also provide information on existing evaluation measures, open-source software, and datasets with potential future study topics. They also discussed explainability’s significance in terms of legal constraints, user viewpoints, and application orientation, which they refer to as XAI issues. The authors reviewed 410 critical publications that were published between January 2016 and October 2022 to assess XAI approaches and evaluations. The proposed framework for the end-to-end implementation of an XAI system combines evaluation approaches with design objectives, among them XAI considerations.

Finally, in [ 30 ] Band et al. looked into the uses of explainable artificial intelligence (XAI) in the healthcare industry. XAI aims to make the outcomes of artificial intelligence (AI) and machine learning (ML) algorithms in decision-making systems transparent, fair, accurate, general, and understandable. In this article, a critical evaluation of earlier research on the interpretability of ML and AI techniques in medical systems is conducted. The article also covers the potential impact of AI and ML on healthcare services.

4 Research methodology

This section describes the research methodology used in this study. It consists of five phases, which can be summarized as follows: (i) Definition of the research question, ii) Preliminary data analysis, iii) Definition of inclusion and exclusion criteria and iv) Identification of relevant studies based on inclusion and exclusion criteria v) Data Extraction and analysis. For each document analyzed, we considered: the problem addressed, the formalization of the problem, the approach used, and the challenges faced.

The first step is to define the research questions. Specifically, the following questions were considered:

RQ1: How many scientific studies have been published between 2013 and 2023 regarding the interpretability and explainability of ML and DL algorithms?

RQ2: What are the most relevant publication channels?

RQ3: Which countries had the most active research centres?

RQ4: What application areas and methods were used?

RQ5: What are the most interpretability and explainability algorithms?

RQ6: What metrics were used to evaluate performance?

RQ7: What were the challenges addressed?

The databases we use to collect papers are those of the search engines Scopus and Web of Science.

Scopus is a comprehensive academic research database, offering abstracts and citations across various disciplines. Authors, connections, and citation trends are all covered. For transdisciplinary research, citation analysis, and evaluating the significance of articles, researchers use Scopus. The database has options for setting up alerts, extensive search capabilities, and journal analytics. Institutions often grant access, and Scopus is frequently used for literature reviews and keeping up with the most recent research [ 31 ]. Web of Science is an academic research database that provides access to a wide range of scientific and academic information. It uses information from scientific journals, conferences, patents, and other sources. It is renowned for its multidisciplinary reporting. Users can investigate relationships between scientific papers using Web of Science’s sophisticated search and analysis features. It is frequently used to assess the influence of academic work, spot trends in the field, and find academic partners. Libraries or academic institutions are often the only ways to access the Web of Science [ 32 ].

To limit the scope of our research to the notions of interpretability and explainability of ML and DL algorithms, we defined the following search string: "((explainable OR interpretable OR interpretability OR explainability) AND ((machine AND learning) OR (deep AND learning) OR (artificial AND intelligence)))" . The search produced 26,951 results for the Scopus database and 21,633 results for the Web Of Science database. To refine the results of our analysis, we used the following inclusion and exclusion criteria.

Inclusion criteria:

Written in English;

Papers that had been written between 2013 and 2023;

The study must be either a journal article, or a proceeding paper, or a book chapter.

The focus is clearly on the interpretability and explainability of the ML and DL algorithms;

If there are duplicate articles, the most recent version is included.

Exclusion criteria:

If there are duplicate articles, the most recent version will be included.

Our analysis produced 23,805 results for the database Scopus and 19,709 for the database Web of Science. Subsequently, we investigated how many documents there are by subject area, as shown in Figs.  1 and 2 for the Scopus and WOS databases, respectively.

figure 1

Documents by subject area for the Scopus database

figure 2

Documents by subject area for the Web of Science dataset

Then we selected only those papers related to the medical area and identified 3,178 documents for the database Scopus and 805 documents for the database WOS. We compared the two datasets and took into consideration only those papers listed in both datasets. Our final dataset was composed of 448 papers.

Then we combined information on the index keywords of these documents with the number of citations and the year of publication. Specifically, we calculated the frequency of each keyword to identify the most commonly used applications and methods in the literature. For this purpose, we standardized the keywords to avoid spelling inconsistencies. These values were combined with citation count and publication year to identify the most recent relevant studies. If index keywords were missing, we used the author’s keywords. For documents without authors or index keywords, we used the title as the relevant keyword. To identify the documents to be included in our analysis, we applied the following criteria:

Occurrence of keywords (i.e., most common keywords in the papers in the dataset);

Year of publication;

Number of citations based on the year of publication.

Following these criteria, we identified and thoroughly examined 10 papers that addressed the research questions outlined previously. In the subsequent sections, we begin by examining the initial documents retrieved from Scopus and Web of Science (WOS) using the predefined search strings and applying specific inclusion/exclusion criteria.

The analysis of the initial set of papers, selected based on the search strings outlined in Sect.  4 , was used to address the first four research questions, namely RQ1, RQ2, RQ3, and RQ4.

5.1 RQ1: How many scientific studies have been published between 2013 and 2023 regarding the interpretability and explainability of ML and DL algorithms?

This research question aims to quantify the interest of the international scientific community in the application of AI interpretability and explainability methods in the medical field over the last 10 years. As shown in Figs.  3 and 4 , the number of publications remained relatively low until 2018, with the number of publications each year being less than eighty. There has been a rapid growth of interest in this topic since 2019, reaching 974 articles for the Scopus database and 164 articles for the WOS database in 2023, demonstrating the growing interest in this topic in recent years. It is important to note that the data for the year 2023 is current as of October 2023.

figure 3

Academic studies published from 2013 to 2023 in Scopus database (in blue the original data, in red the estimated values). From this analysis, we can see that the annual growth rate in Scopus is 0.5075

figure 4

Academic studies published from 2013 to 2023 in the WOS database (in blue the original data, in red the estimated values). From this analysis we can see that the annual growth rate in WoS is 0.421

We then compared the data from the two databases and extracted only the papers common to both, obtaining 448 scientific studies, and we will only consider these papers for the following analyses.

5.2 RQ2: What are the most relevant publication channels?

With this research question, we aim to show which are the main channels used for disseminating research in the application of explainability and interpretability techniques in AI. The Table  1 shows the results of our analysis.

5.3 RQ3: In which country were located the most active research centres?

This research question focuses on countries whose research centres contribute to the study of the explainability and interpretability of AI in the medical field. In Fig.  5 we only focused on countries with at least 5 publications and, as we can see, the largest number of articles came from research centres located in the United States (114 articles), followed by China (80 articles), Italy (27 articles), Spain (16 articles), United Kingdom (16 articles), South Korea (13 articles), Canada (12 articles)2, France (10 articles), Taiwan (10 articles), Netherland (8 articles), Austria (7 articles ), Germany (7 articles), Singapore (7 articles), India (6 articles) and Japan (5 articles). It is important to note the presence of articles jointly written by research centres located in multiple countries. To show the relationships between co-authors, in Fig.  6 , we represent the countries with at least 5 occurrences among the analyzed documents. As we can see, the countries with the most connections are the United States and China (9 connections).

figure 5

Number of publications per Country on explainability and interpretability of AI in the medical field

figure 6

Co-author relationships with country as a unit of analysis

5.4 RQ4: What application areas and methods were used?

This research question aims to analyze the application domains and techniques used for the explainability and interpretability of AI in the medical field. To do this, we analyzed the keywords in the index of the 448 articles that were not excluded. Figure  7 shows the application domain. We grouped keywords into macro areas and, as you can see from the image, most of the articles were classified in one of them.

Regarding the proposed approaches, we followed the same procedure as previously described for application domains grouping keywords that refer to the same method, shown in Fig.  8 .

figure 7

Overview of the application domains

figure 8

Overview of the methods used

Furthermore, we performed a bibliographic analysis of the co-occurrence of the index keywords using VOSviewer [ 33 ] as shown in Figure   9 . Having a co-occurrence means that 2 keywords occur in the same work. After a data cleaning process, VOSviewer detected 10 clusters by considering keywords with 3 occurrences at least. In the diagram, each cluster corresponds to a colour, and each element within a cluster corresponds to a colour. The size of the circle and the label of the circle depend on the number of occurrences of the related keyword. The lines between elements describe the co-occurrences of keywords in an article. Each cluster groups together keywords identifying an application domain and/or the approaches used to address problems related to the explainability and interpretability of AI algorithms in the medical field.

figure 9

Bibliometrics analysis on the co-occurrence of index keywords

6 Analysis of the main papers

In this section, we focus on the 10 documents chosen using the selection criteria for the main documents (see Sect.  5 ). First, we provide a high-level analysis of the application of the domain and interpretability and explainability approaches in AI used in the medical field. Then, we give an overview of the formalization of the problem of interpretability and explainability of algorithms used in the AI field (i.e. methodologies used, type of research) (research question RQ5). Next, we analyze the performance measures used for the evaluation of the results (research question RQ6). Finally, we evaluate the main challenges faced (research question RQ7). In Table 2 , we indicate for each article (first column) the year of publication (second column), the number of citations (third column), the method underlying the proposed technique (fourth column), and the dataset used for the analysis (fifth column).

The most important application domains are shown in Fig.  7 . In particular, they include COVID-19, Alzheimer’s disease, cardiac disease, electrocardiograms, brain and breast cancer.

In Table 2 we also show that the main techniques used, that are neural networks [ 34 , 35 , 36 , 37 ] (such as CNN, LSTM and Boltzmann machine) [ 38 , 39 ] followed by other machine learning algorithms (such as K-nearest neighbours, Logistic Regression, Naïve Bayes, Random Forest and Support Vector Machines).

6.1 RQ5: What are the most interpretability and explainability algorithms?

The focus of this research question is on technical aspects that may prove beneficial for practitioners in gaining insight into the environment explored by the authors. The table encapsulates the problem formulation details extracted from the chosen articles. Each paper is scrutinized to discern the interpretability and explainability techniques employed. Additionally, insights are furnished regarding the datasets utilized in the experiments, delineating between real and synthetic data. It is important to note that not all documents explicitly provide this information. Consequently, instances where the dataset information is unspecified are denoted with "N/A." The most used techniques are:

Decision support systems: or DSS are capable of converting the output of these algorithms into comprehensible graphics. The aspects that have the greatest impact on decisions are highlighted using graphs, heat maps, and other visual representations. A well-designed DSS also provides a user interface that is simple to use even for those without a thorough understanding of machine learning [ 7 , 38 ].

Gradient-weighted Class Activation Mapping: or Grad-Cam useful for highlighting the parts of an image given as input to CNN that have most influenced the network’s decision in recognizing a specific class of objects. It evaluates the gradients of the output level of the network concerning the class we are examining. These gradients tell us how much each part of the image contributed to the decision. Then, these gradients are used to weigh various aspects of the image, particularly those of the last convolutional layer of the network. The result is an activation map that can be overlaid on the original image. This map visually tells us which regions of the image were crucial for the network in making its decision [ 34 ].

Shapley Additive Expansion or SHAP is an advanced explainability technique that helps us decompose and understand complex model decisions, it is based on cooperative game theory. To evaluate the importance of a variable, SHAP performs random permutations of the variables, evaluating how the model’s predictions change compared to the original input. Each value represents how much each variable contributes on average to the model’s predictions. The final result is a SHAP graph, which shows, for each prediction of the model, how much each variable influenced the decision [ 36 , 40 ].

Local Interpretable Model-agnostic Explanations or LIME to explain a specific prediction of the model select a sample of data similar to the one we are examining. Then, it introduces some perturbations, or small random changes, to these characteristics to create a “perturbed” data set. It then uses the machine learning model to make predictions about these perturbed instances, and the resulting predictions are weighted based on how similar each perturbed instance is to the original [ 40 ].

6.2 RQ6: What metrics were used to evaluate performance?

This research question aims to provide an overview of the metrics used to measure the performance of the algorithms utilized in the 10 selected papers. In the second column of the Table, we report information on the metrics found in the articles. As we can see in Table  3 , performance measures vary widely depending on the application domain and the objective of the method proposed in each article. But the most used metrics are:

Accuracy: is an indicator of how well the model can correctly classify instances in the dataset and is the ratio of correct predictions to total predictions. In ML and DL algorithms, accuracy indicates how close measurements are to the true value [ 41 ].

Precision: is a measure that provides information on the quality of positive predictions made by a model. Thus, precision is calculated as the ratio of true positives to the sum of true positives and false positives, that is, how precise the model is in declaring objects positive. [ 42 ].

Recall: indicates the ability of a model to correctly identify all the positive instances present in a dataset. Recall is obtained by dividing the number of true positives by the sum of true positives and false negatives. In essence, recall gives us an idea of how sensitive the model is in detecting positive examples while trying to minimize false negatives [ 43 ].

AUC: evaluate the discrimination capacity of a classification model. The AUC represents the area under this ROC curve. It measures how well the model can distinguish between positive and negative classes [ 44 ].

6.3 RQ7: What were the challenges addressed?

Artificial intelligence (AI) in the medical field has revolutionized the approach to the diagnosis, treatment and management of pathologies, posing new challenges in personalizing care. This innovative panorama, as we already specified in the previous sections, raises crucial questions about the transparency, interpretability and ethical executability of AI predictive and decision-making models. To address these challenges, we have identified and reviewed 10 papers in the literature that employ cutting-edge methodologies. From rapidly identifying COVID-19 using X-ray images to enhancing the explainability of AI-driven clinical decision support systems, these studies provide a comprehensive overview of the advancements and obstacles in the integration of artificial intelligence in health and medicine. In the third column of Table 3 , we summarize the challenges discussed in these papers. Similar to performance metrics, Table 3 illustrates the wide range of challenges encountered, influenced by the specific application context and objectives of each method proposed in the literature. Below we explore in detail all the papers taken into consideration.

Brunese et al. [ 34 ] address their challenge by proposing the use of deep learning for the automatic and rapid diagnosis of coronavirus disease (COVID-19) using chest X-rays. The proposed methodology consists of three phases. The first phase is used to detect pneumonia, the second phase is used to distinguish between pneumonia and COVID-19, and the last phase is used to identify areas with COVID-19 symptoms on the x-ray used. Experimental results from 6,523 chest x-rays showed effectiveness with an average time to detect COVID-19 infection of 2.5 s and an average accuracy of 97%. The approach uses transfer learning with the VGG-16 model and has 96% accuracy in differentiating healthy patients from those with common lung diseases, and 98% accuracy has been achieved in detecting COVID-19.

Amann et al. [ 7 ] consider the problem of explainability in the field of AI in the healthcare sector, focusing on AI-based clinical decision support systems. This study takes a multidisciplinary approach by approaching from technical, legal, medical and patient perspectives and analyzes the importance of the explainability of AI in medicine. The findings highlight the risks to individual and public health when explainability is omitted from clinical decision support systems.

Ghorbani et al. [ 35 ] propose the application of convolutional neural networks (CNNs) to echocardiographic images for cardiac analysis. The deep learning model used called EchoNet identifies local cardiac structures, estimates cardiac performance indicators and predicts systemic phenotypes such as age, gender, weight and height that can influence cardiovascular risk. In this study, This study suggests that integrating interpretive frameworks can help identify regions of interest, contribute to a better understanding of normal echocardiogram variations, and reveal features missed by clinicians.

Brinati et al. [ 38 ] suggest a novel strategy for identifying COVID-19-positive patients using machine learning models based on common blood tests. The authors created two machine-learning models that consider standard data from blood tests and elements like age and gender. The algorithms’ accuracy ranged from 82% to 86%, while their sensitivity ranged from 92% to 95%. The model’s interpretability is based on a decision tree that also received medical approval, demonstrating the reliability of the chosen characteristics.

Lamy et al. [ 39 ] present a technique for breast cancer that uses "Case-Based Reasoning" and functions as both an automatically executable algorithm and a graphical user interface for explanations. CBR allows for easily justified results using analogous cases as examples, in contrast to "black box" methods. A scatterplot based on multidimensional dimension reduction is used in the visual interface. The visual interface combines polar-MDS scatterplots with "rainbow boxes" to transform the CBR problem into a "colour dominance" challenge.

Thorsen et al. [ 36 ] use machine learning models based on temporal data, enabling real-time predictions, to enhance the 90-day death prognosis for ICU patients. One hour of time resolution is used to train an LSTM neural network model. With a Matthews correlation coefficient and area under the ROC curve increasing from 0.29 and 0.73 at admission to 0.57 and 0.88 at discharge, the results demonstrate that predictive performance increases over time. Input from a fifth hospital’s data is used to externally validate the model. To explain predictions, the Shapley algorithm is used to pinpoint the traits that influence predictions at various time steps.

Elshawi et al. [ 40 ] discuss how to improve the interpretability of machine learning models, particularly those that use data from cardiorespiratory fitness to predict the risk of developing hypertension using a random forest model. Different model-agnostic explanation strategies, categorized as global and local interpretation strategies, are used. They pointed out that although local interpretations concentrate on particular cases, global interpretations aid clinicians in comprehending the overall conditional distribution described by the response function. The authors draw attention to the fact that, depending on the demands of the application, both global and local techniques may be appropriate. LIME offers local explanations based on outdated data points and regional regression models. Even though it allows doctors to make judgments about how the patient’s characteristics have changed over time, LIME is criticized for its instability and the addition of linearity in the local model. Shapley’s value prediction, on the other hand, divides the difference between the median prediction and the estimated distance between the characteristic values. Despite providing an equal distribution of contributions, the model is computationally expensive and requires access to the addenda used to add the model.

Tran et al. [ 37 ] describe a computational framework that makes use of a non-negative restrictive Boltzmann machine to utilize electronic medical record data with little to no human oversight. By embedding medical objects in a low-dimensional vector space, this framework produces a new representation of those objects. This model enables algebraic and statistical operations including item grouping, risk stratification, and projection onto a 2D plane. Two requirements are added to the model parameters: (a) non-negative coefficients, and (b) structural regularity, to enhance the model’s interpretability. The generated representation aids in short-term risk classification and displays clusters of traits that are clinically significant.

Halpern et al. [ 45 ] concentrated on the value of Electronic Medical Records (EMR) in identifying the best methods for patient care. The suggested approach intends to effectively acquire statistically driven phenotypes with little manual assistance. To support clinical choices in real-time, a phenotype library was created that represents patients using structured and unstructured data from EMRs. Concerning prospectively acquired baseline data, eight of these phenotypes were assessed using retrospective data on emergency room patients. The findings demonstrate that the resulting phenotypes are interpretable, quick to construct, and perform as well as phenotypes learnt statistically from a large number of manual labelling.

The Receiver Operating Characteristic (ROC) curves and the area under the ROC curve (AUC) are used by Carrington et al. [ 46 ] to evaluate the effectiveness of classification models and diagnostic tests. We focus in particular on the issue of employing unbalanced data, when positive and negative classes are not equally represented. For ROC data, the authors suggest a new concordant partial AUC and a new partial c statistic, which are crucial metrics and approaches for comprehending and interpreting certain ROC curves and AUC regions. These new partial measures are validated for their equivalence and are obtained from the AUC and c statistics, respectively. They are examined using two authentic breast cancer datasets as well as a traditional ROC example.

7 Discussion

This paper aims to evaluate the scientific community’s interest in applying AI interpretability and explainability methods to the medical field over the past decade. After analyzing publication trends from 2013 to 2023, we have observed a significant increase in interest, especially from 2019, with 974 articles indexed by Scopus and 164 indexed by WOS in 2023. Our study was based on 448 common articles. Our analysis included trends in publications, dissemination channels, contributing countries and collaboration networks. Significant contributors were identified, with the United States leading the list (114 articles), followed by China, Italy, and others.

To sharpen our conclusions, we focused our analysis on 10 selected documents. Major application domains include COVID-19, Alzheimer’s disease, cardiac disease, electrocardiograms, and brain and breast cancer, reflecting the impact of these topics on society. We also delved into the most used AI algorithms such as CNN, LSTM and the Boltzmann machine and the most used techniques for explainability and interpretability such as decision support systems, Grad-Cam, SHAP and LIME.

According to our research findings, the rapid integration of Machine Learning (ML) and Deep Learning (DL) techniques has led to growing concerns about these methodologies. There is an increasing apprehension regarding the complexity of AI algorithms and the absence of transparency in data mining and decision-making processes.

The literature analysis conducted in this study reveals that the majority of research on interpretability and explainability in AI algorithms within the medical field focuses on neural networks and machine learning algorithms. Many of the examined papers employ interpretability techniques for AI algorithms, with several of them emphasizing the close connection between interpretability and explainability [ 10 ].

Indeed, in certain instances, interpretability and explainability are frequently treated as interchangeable terms. The scientific community needs to establish clear definitions and unambiguous vocabulary to facilitate the transfer of results and information more effectively. AI models currently lack a formal structure that would assist users in gaining confidence in the employed techniques.

To the best of our knowledge, to date, to improve explainability and interpretability algorithms, the scientific community is working on:

Built-in interpretability: ML and DL algorithms are designed so that they are inherently more interpretable and understandable. The goal is to make clear the reasons behind the decisions made by a model, thus reducing the opacity of complex models. We also try to design these algorithms seeking a balance between interpretability and performance [ 47 ].

Self-interpretable models: involves the creation of architectures or models that are understandable in themselves, without the need for post hoc techniques. These models aim to provide transparency in their decisions right from the design and training phase [ 48 ] [ 49 ].

Post hoc explainability methods: are techniques used to interpret and explain already trained AI models, which may be complex and difficult to understand. These methods attempt to provide retroactive explanations about the decisions made by a model, without affecting its original learning process [ 50 ].

Stakeholder involvement: To ensure greater acceptance and understanding, there was growing interest in involving doctors, patients and other stakeholders in the design and interpretation of models [ 51 ].

8 Conclusion

Medical diagnosis, therapy, and decision-making rely heavily on ML and DL algorithms, yet the increasing complexity of these algorithms presents challenges, particularly in balancing high performance with interpretability. Transparent decision-making, ethical and regulatory compliance, and trust from healthcare and medical providers are essential for deploying AI models in these domains. Accuracy is paramount, prompting careful consideration of whether to prioritize accuracy or explainability and interpretability given the contrast between interpretable white-box and non-interpretable black-box techniques. Interpretability and explainability are crucial for establishing user trust, ensuring legal and ethical compliance, and fostering broader social acceptance. They transcend technical concerns, becoming moral and practical imperatives in the medical environment, where algorithmic decisions have significant effects. Assessing prior research highlights the demand for explainable and interpretable AI algorithms in medicine. Despite progress in tools such as visualization, feature interpretation, and model complexity reduction, challenges persist. These include the opacity of large models, trade-offs between performance and interpretability, and biases in training data. The ethical significance of AI in healthcare and medicine cannot be ignored. Addressing concerns such as bias, privacy, security, and legal liability is essential to ensure the ethical use of AI. On the other side, the psychological impact of algorithmic judgments on users, especially in healthcare and medicine, is linked to clear communication about the capabilities and limitations of artificial intelligence systems. Takeaway lessons include the need for a strong commitment to transparency, fairness, and patient-centred design of AI models in health and medicine as well as establishing precise criteria and standards to enhance trust among users and stakeholders.

Data availability

Our data have been freely downloaded from Scopus and Web of Science using the keywords mentioned in the paper. The dataset used in our analysis can be easily replicated.

London AJ. Artificial intelligence and black-box medical decisions: accuracy versus explainability. Hastings Cent Rep. 2019;49(1):15–21.

Article   PubMed   Google Scholar  

Hakkoum H, Abnane I, Idri A. Interpretability in the medical field: a systematic mapping and review study. Appl Soft Comput. 2022;117: 108391.

Article   Google Scholar  

Loyola-Gonzalez O. Black-box vs. white-box: understanding their advantages and weaknesses from a practical point of view. IEEE Access. 2019;7:154096–113.

Kolasinska A, Lauriola I, Quadrio G. Do people believe in artificial intelligence? a cross-topic multicultural study. In Proceedings of the 5th EAI International Conference on Smart Objects and Technologies for Social Good, 2019:31–6.

Gilvary C, Madhukar N, Elkhader J, Elemento O. The missing pieces of artificial intelligence in medicine. Trends Pharmacol Sci. 2019;40(8):555–64.

Article   CAS   PubMed   Google Scholar  

General Data Protection Regulation. General data protection regulation (GDPR). Intersoft Consulting. Accessed in October, 2018;24(1).

Amann J, Blasimme A, Vayena E, Frey D, Madai VI. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):1–9.

Phillips PJ, Hahn CA, Fontana PC, Broniatowski DA, Przybocki MA. Four principles of explainable artificial intelligence, vol. 18. Gaithersburg: National Institute of Standards and Technology; 2020.

Google Scholar  

Nassih Rym, Berrado Abdelaziz. State of the art of fairness, interpretability and explainability in machine learning: Case of prim. In Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications, 2020:1–5.

Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: a review of machine learning interpretability methods. Entropy. 2020;23(1):18.

Article   ADS   PubMed   PubMed Central   Google Scholar  

Alicioglu G, Sun B. A survey of visual analytics for explainable artificial intelligence methods. Comput Graph. 2022;102:502–20.

Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2017;22(5):1589–604.

Article   PubMed   PubMed Central   Google Scholar  

Shashanka M, Raj B, Smaragdis P. Sparse overcomplete latent variable decomposition of counts data. Adv Neural Inform Process Syst, 2007;20.

Ribeiro MT, Singh S, Guestrin C. ”why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016:1135–44.

Langer M, Oster D, Speith T, Hermanns H, Kästner L, Schmidt E, Sesing A, Baum K. What do we want from explainable artificial intelligence (XAI)?—a stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif Intell. 2021;296: 103473.

Article   MathSciNet   Google Scholar  

Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci. 2019;116(44):22071–80.

Article   ADS   MathSciNet   CAS   PubMed   PubMed Central   Google Scholar  

Combi C, Amico B, Bellazzi R, Holzinger A, Moore JH, Zitnik M, Holmes JH. A manifesto on explainability for artificial intelligence in medicine. Artif Intell Med. 2022;133: 102423.

von Eschenbach WJ. Transparency and the black box problem: why we do not trust ai. Philos Technol. 2021;34(4):1607–22.

Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil-López S, Molina D, Benjamins R, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible ai. Inf Fus. 2020;58:82–115.

Biran O, Cotton C. Explanation and justification in machine learning: a survey. In IJCAI-17 workshop on explainable AI (XAI), 2017;8:8–13.

Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv (CSUR). 2018;51(5):1–42.

Miller T. Explanation in artificial intelligence: insights from the social sciences. Artif Intell. 2019;267:1–38.

Tjoa E, Guan C. A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Trans Neural Netw Learn Syst. 2020;32(11):4793–813.

Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip Rev Data Min Knowl Discov. 2020;10(5): e1379.

Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv(CSUR). 2021;54(6):1–35.

Chakrobartty S, El-Gayar O. Explainable artificial intelligence in the medical domain: a systematic review. 2021.

Hatherley J, Sparrow R, Howard M. The virtues of interpretable medical artificial intelligence. Camb Q Healthc Ethics, 2022:1–10.

Farah L, Murris JM, Borget I, Guilloux A, Martelli NM, Katsahian SIM. Assessment of performance, interpretability, and explainability in artificial intelligence-based health technologies: what healthcare stakeholders need to know. Mayo Clin Proc. 2023;1(2):120–38.

Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, Guidotti R, Del Ser J, Díaz-Rodríguez N, Herrera F. Explainable artificial intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Inf Fusion. 2023;99: 101805.

Band SS, Yarahmadi A, Hsu C-C, Biyari M, Sookhak M, Ameri R, Dehzangi I, Chronopoulos AT, Liang H-W. Application of explainable artificial intelligence in medical health: A systematic review of interpretability methods. Inform Med Unlocked. 2023;40: 101286.

Ballew BS. Elsevier’s scopus® database. J Electron Resour Med Libr. 2009;6(3):245–52.

Drake M. Encyclopedia of library and information science, vol. 1. Boca Raton: CRC Press; 2003.

Van Eck N, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 2010;84(2):523–38.

Brunese L, Mercaldo F, Reginelli A, Santone A. Explainable deep learning for pulmonary disease and coronavirus covid-19 detection from x-rays. Comput Methods Programs Biomed. 2020;196: 105608.

Ghorbani A, Ouyang D, Abid A, He B, Chen JH, Harrington RA, Liang DH, Ashley EA, Zou JY. Deep learning interpretation of echocardiograms. NPJ Digit Med. 2020;3(1):10.

Thorsen-Meyer H-C, Nielsen AB, Nielsen AP, Kaas-Hansen BS, Toft P, Schierbeck J, Strøm T, Chmura PJ, Heimann M, Dybdahl L, et al. Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit Health. 2020;2(4):e179–91.

Tran T, Luo W, Phung D, Harvey R, Berk M, Kennedy RL, Venkatesh S. Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments. BMC Psychiatry. 2014;14(1):76.

Brinati D, Campagner A, Ferrari D, Locatelli M, Banfi G, Cabitza F. Detection of covid-19 infection from routine blood exams with machine learning: a feasibility study. J Med Syst. 2020;44:1–12.

Lamy J-B, Sekar B, Guezennec G, Bouaud J, Séroussi B. Explainable artificial intelligence for breast cancer: a visual case-based reasoning approach. Artif Intell Med. 2019;94:42–53.

Elshawi R, Al-Mallah MH, Sakr S. On the interpretability of machine learning-based model for predicting hypertension. BMC Med Inform Decis Mak. 2019;19(1):1–32.

Menditto A, Patriarca M, Magnusson B. Understanding the meaning of accuracy, trueness and precision. Accredit Qual Assur. 2007;12:45–7.

Article   CAS   Google Scholar  

Prenesti E, Gosmaro F. Trueness, precision and accuracy: a critical overview of the concepts as well as proposals for revision. Accredit Qual Assur. 2015;20:33–40.

Buckland M, Gey F. The relationship between recall and precision. J Am Soc Inform Sci. 1994;45(1):12–9.

Huang J, Ling CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng. 2005;17(3):299–310.

Halpern Y, Horng S, Choi Y, Sontag D. Electronic medical record phenotyping using the anchor and learn framework. J Am Med Inform Assoc. 2016;23(4):731–40.

Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F, Manuel DG. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med Inform Decis Mak. 2020;20:1–12.

Mariotti E, Moral JMA, Gatt A. Exploring the balance between interpretability and performance with carefully designed constrainable neural additive models. Inf Fus. 2023;99: 101882.

Ashwath VA, Sikha OK, Benitez R. TS-CNN: a three-tier self-interpretable CNN for multi-region medical image classification. IEEE Access; 2023.

La Rosa B, Capobianco R, Nardi D. A self-interpretable module for deep image classification on small data. Appl Intell. 2023;53(8):9115–47.

Dwivedi R, Dave D, Naik H, Singhal S, Omer R, Patel P, Qian B, Wen Z, Shah T, Morgan G, et al. Explainable AI (XAI): core ideas, techniques, and solutions. ACM Comput Surv. 2023;55(9):1–33.

Anwar SM. Expert systems for interpretable decisions in the clinical domain. In: Byrne MF, Parsa N, Greenhill AT, Chahal D, Ahmad O, Bagci U, editors. AI in clinical medicine: a practical guide for healthcare professionals. Hoboken: Wiley Online Library; 2023. p. 66–72.

Chapter   Google Scholar  

Cho B-J, Choi YJ, Lee M-J, Kim JH, Son G-H, Park S-H, Kim H-B, Joo Y-J, Cho H-Y, Kyung MS, et al. Classification of cervical neoplasms on colposcopic photography using deep learning. Sci Rep. 2020;10(1):1–10.

Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR. Automated detection of covid-19 cases using deep neural networks with x-ray images. Comput Biol Med. 2020;121: 103792.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017:2097–106.

Tran T, Nguyen TD, Phung D, Venkatesh S. Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM). J Biomed Inform. 2015;54:96–105.

Tran T, Phung D, Luo W, Venkatesh S. Stabilized sparse ordinal regression for medical risk stratification. Knowl Inform Syst. 2015;43:555–82.

Download references

Author information

Authors and affiliations.

Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy

Maria Frasca, Gabriella Pravettoni & Ilaria Cutica

SKEMA Business School, Université Côte d’Azur, Sophia Antipolis, Nice, France

Davide La Torre

Applied Research Division for Cognitive and Psychological Science, IEO, European Institute of Oncology, Milano, Italy

Gabriella Pravettoni

You can also search for this author in PubMed   Google Scholar

Contributions

The authors contributed equally to this paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Davide La Torre .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Frasca, M., La Torre, D., Pravettoni, G. et al. Explainable and interpretable artificial intelligence in medicine: a systematic bibliometric review. Discov Artif Intell 4 , 15 (2024). https://doi.org/10.1007/s44163-024-00114-7

Download citation

Received : 31 October 2023

Accepted : 23 February 2024

Published : 27 February 2024

DOI : https://doi.org/10.1007/s44163-024-00114-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 20 January 2022

AI in health and medicine

  • Pranav Rajpurkar   ORCID: orcid.org/0000-0002-8030-3727 1   na1 ,
  • Emma Chen 2   na1 ,
  • Oishi Banerjee 2   na1 &
  • Eric J. Topol   ORCID: orcid.org/0000-0002-1478-4729 3  

Nature Medicine volume  28 ,  pages 31–38 ( 2022 ) Cite this article

148k Accesses

618 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Medical research

Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human–AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI’s potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

195,33 € per year

only 16,28 € per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

ai in medical field research paper

Similar content being viewed by others

ai in medical field research paper

Foundation models for generalist medical artificial intelligence

ai in medical field research paper

Guiding principles for the responsible development of artificial intelligence tools for healthcare

ai in medical field research paper

A short guide for medical professionals in the era of artificial intelligence

Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316 , 2402–2410 (2016).

Article   Google Scholar  

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115–118 (2017).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15 , e1002686 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 25 , 65–69 (2019).

Wiens, J. et al. Do no harm: a roadmap for responsible machine learning for health care. Nat. Med. 25 , 1337–1340 (2019).

Article   CAS   PubMed   Google Scholar  

Kanagasingam, Y. et al. Evaluation of artificial intelligence-based grading of diabetic retinopathy in primary care. JAMA Netw. Open 1 , e182665 (2018).

Beede, E. et al. A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems 1–12 (Association for Computing Machinery, 2020); https://dl.acm.org/doi/abs/10.1145/3313831.3376718

Kiani, A. et al. Impact of a deep learning assistant on the histopathologic classification of liver cancer. NPJ Digit. Med. 3 , 23 (2020).

Lin, H. et al. Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in eye clinics: a multicentre randomized controlled trial. EClinicalMedicine 9 , 52–59 (2019).

Gong, D. et al. Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study. Lancet Gastroenterol. Hepatol. 5 , 352–361 (2020).

Article   PubMed   Google Scholar  

Wang, P. et al. Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study. Lancet Gastroenterol. Hepatol. 5 , 343–351 (2020).

Hollon, T. C. et al. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Nat. Med. 26 , 52–58 (2020).

Phillips, M. et al. Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Netw. Open 2 , e1913436 (2019).

Nimri, R. et al. Insulin dose optimization using an automated artificial intelligence-based decision support system in youths with type 1 diabetes. Nat. Med. 26 , 1380–1384 (2020).

Wijnberge, M. et al. Effect of a machine learning-derived early warning system for intraoperative hypotension vs. standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery. J. Am. Med. Assoc. 323 , 1052–1060 (2020).

Wismüller, A. & Stockmaster, L. A prospective randomized clinical trial for measuring radiology study reporting time on Artificial Intelligence-based detection of intracranial hemorrhage in emergent care head CT. in Medical Imaging 2020: Biomedical Applications in Molecular, Structural, and Functional Imaging vol. 11317, 113170M (International Society for Optics and Photonics, 2020).

Liu, X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Br. Med. J. 370 , m3164 (2020).

Rivera, S. C. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 26 , 1351–1363 (2020).

Centers for Medicare & Medicaid Services. Medicare Program; Hospital Inpatient Prospective Payment Systems for Acute Care Hospitals and the Long-Term Care Hospital Prospective Payment System and Final Policy Changes and Fiscal Year 2021 Rates; Quality Reporting and Medicare and Medicaid Promoting Interoperability Programs Requirements for Eligible Hospitals and Critical Access Hospitals. Fed. Regist. 85 , 58432–59107 (2020).

Benjamens, S., Dhunnoo, P. & Meskó, B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit. Med. 3 , 118 (2020).

Wu, N. et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans. Med. Imaging 39 , 1184–1194 (2020).

McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577 , 89–94 (2020).

Ghorbani, A. et al. Deep learning interpretation of echocardiograms. NPJ Digit. Med. 3 , 10 (2020).

Ouyang, D. et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580 , 252–256 (2020).

Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25 , 954–961 (2019).

Huynh, E. et al. Artificial intelligence in radiation oncology. Nat. Rev. Clin. Oncol. 17 , 771–781 (2020).

Huang, P. et al. Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. Lancet Digit. Health 1 , e353–e362 (2019).

Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25 , 1054–1056 (2019).

Jackson, H. W. et al. The single-cell pathology landscape of breast cancer. Nature 578 , 615–620 (2020).

Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25 , 1301–1309 (2019).

Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1 , 800–810 (2020).

Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25 , 1519–1525 (2019).

Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology: new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16 , 703–715 (2019).

Zhou, D. et al. Diagnostic evaluation of a deep learning model for optical diagnosis of colorectal cancer. Nat. Commun. 11 , 2961 (2020).

Zhao, S. et al. Magnitude, risk factors, and factors associated with adenoma miss rate of tandem colonoscopy: a systematic review and meta-analysis. Gastroenterology 156 , 1661–1674 (2019).

Freedman, D. et al. Detecting deficient coverage in colonoscopies. IEEE Trans. Med. Imaging 39 , 3451–3462 (2020).

Liu, H. et al. Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA Ophthalmol. 137 , 1353–1360 (2019).

Milea, D. et al. Artificial intelligence to detect papilledema from ocular fundus photographs. N. Engl. J. Med. 382 , 1687–1695 (2020).

Wolf, R. M., Channa, R., Abramoff, M. D. & Lehmann, H. P. Cost-effectiveness of autonomous point-of-care diabetic retinopathy screening for pediatric patients with diabetes. JAMA Ophthalmol. 138 , 1063–1069 (2020).

Xie, Y. et al. Artificial intelligence for teleophthalmology-based diabetic retinopathy screening in a national programme: an economic analysis modelling study. Lancet Digit. Health 2 , e240–e249 (2020).

Arcadu, F. et al. Deep learning algorithm predicts diabetic retinopathy progression in individual patients. NPJ Digit. Med. 2 , 92 (2019).

Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577 , 706–710 (2020).

Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16 , 1315–1322 (2019).

Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17 , 184–192 (2020).

Greener, J.G. et al. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 10 , 3977 (2019).

Chabon, J. J. et al. Integrating genomic features for non-invasive early lung cancer detection. Nature 580 , 245–251 (2020).

Luo, H. et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci. Transl. Med. 12 , eaax7533 (2020).

Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570 , 385–389 (2019).

Gussow, A. B. et al. Machine-learning approach expands the repertoire of anti-CRISPR protein families. Nat. Commun. 11 , 3784 (2020).

Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10 , 4284 (2019).

Bhattacharyya, R. P. et al. Simultaneous detection of genotype and phenotype enables rapid and accurate antibiotic susceptibility determination. Nat. Med. 25 , 1858–1864 (2019).

Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 181 , 475–483 (2020).

Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37 , 1038–1040 (2019).

Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36 , 1234–1240 (2020).

CAS   PubMed   Google Scholar  

Zhu, Y., Li, L., Lu, H., Zhou, A. & Qin, X. Extracting drug-drug interactions from texts with BioBERT and multiple entity-aware attentions. J. Biomed. Inform. 106 , 103451 (2020).

Smit, A. et al. CheXbert: Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing 1500–1519 (2020).

Sarker, A., Gonzalez-Hernandez, G., Ruan, Y. & Perrone, J. Machine learning and natural language processing for geolocation-centric monitoring and characterization of opioid-related social media chatter. JAMA Netw. Open 2 , e1914672 (2019).

Claassen, J. et al. Detection of brain activation in unresponsive patients with acute brain injury. N. Engl. J. Med. 380 , 2497–2505 (2019).

Porumb, M., Stranges, S., Pescapè, A. & Pecchia, L. Precision medicine and artificial intelligence: a pilot study on deep learning for hypoglycemic events detection based on ECG. Sci. Rep. 10 , 170 (2020).

Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394 , 861–867 (2019).

Chan, J., Raju, S., Nandakumar, R., Bly, R. & Gollakota, S. Detecting middle ear fluid using smartphones. Sci. Transl. Med. 11 , eaav1102 (2019).

Willett, F. R., Avansino, D. T., Hochberg, L. R., Henderson, J. M. & Shenoy, K. V. High-performance brain-to-text communication via handwriting. Nature 593 , 249–254 (2021).

Green, E. M. et al. Machine learning detection of obstructive hypertrophic cardiomyopathy using a wearable biosensor. NPJ Digit. Med. 2 , 57 (2019).

Thorsen-Meyer, H.-C. et al. Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit. Health 2 , e179–e191 (2020).

Porter, P. et al. A prospective multicentre study testing the diagnostic accuracy of an automated cough sound centred analytic system for the identification of common respiratory disorders in children. Respir. Res. 20 , 81 (2019).

Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572 , 116–119 (2019).

Kehl, K. L. et al. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 5 , 1421–1429 (2019).

Huang, S.-C., Pareek, A., Seyyedi, S., Banerjee, I. & Lungren, M. P. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit. Med. 3 , 136 (2020).

Wang, C. et al. Quantitating the epigenetic transformation contributing to cholesterol homeostasis using Gaussian process. Nat. Commun. 10 , 5052 (2019).

Li, Y. et al. Inferring multimodal latent topics from electronic health records. Nat. Commun. 11 , 2536 (2020).

Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571 , 95–98 (2019).

Li, X. et al. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat. Commun. 11 , 2338 (2020).

Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16 , 1139–1145 (2019).

Urteaga, I., McKillop, M. & Elhadad, N. Learning endometriosis phenotypes from patient-generated data. NPJ Digit. Med. 3 , 88 (2020).

Brbić, M. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat. Methods 17 , 1200–1206 (2020).

Seymour, C. W. et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. J. Am. Med. Assoc. 321 , 2003–2017 (2019).

Article   CAS   Google Scholar  

Fries, J. A. et al. Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences. Nat. Commun. 10 , 3111 (2019).

Jin, L. et al. Deep learning enables structured illumination microscopy with low light levels and enhanced speed. Nat. Commun. 11 , 1934 (2020).

Vishnevskiy, V. et al. Deep variational network for rapid 4D flow MRI reconstruction. Nat. Mach. Intell. 2 , 228–235 (2020).

Masutani, E. M., Bahrami, N. & Hsiao, A. Deep learning single-frame and multiframe super-resolution for cardiac MRI. Radiology 295 , 552–561 (2020).

Rana, A. et al. Use of deep learning to develop and analyze computational hematoxylin and eosin staining of prostate core biopsy images for tumor diagnosis. JAMA Netw. Open 3 , e205111 (2020).

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1 , e271–e297 (2019).

Chen, P.-H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25 , 1453–1457 (2019).

Patel, B. N. et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ Digit. Med. 2 , 111 (2019).

Sim, Y. et al. Deep convolutional neural network–based software improves radiologist detection of malignant lung nodules on chest radiographs. Radiology 294 , 199–209 (2020).

Park, A. et al. Deep learning–assisted diagnosis of cerebral aneurysms using the HeadXNet model. JAMA Netw. Open 2 , e195600 (2019).

Steiner, D. F. et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am. J. Surg. Pathol. 42 , 1636–1646 (2018).

Jain, A. et al. Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Netw. Open 4 , e217249 (2021).

Seah, J. C. Y. et al. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit. Health 3 , e496–e506 (2021).

Rajpurkar, P. et al. CheXaid: deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with HIV. NPJ Digit. Med. 3 , 115 (2020).

Kim, H.-E. et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit. Health 2 , e138–e148 (2020).

Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat. Med. 26 , 1229–1234 (2020).

van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27 , 775–784 (2021).

Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology 295 , 4–15 (2020).

Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. in Proceedings of the AAAI Conference on Artificial Intelligence vol. 33, 590–597 (2019).

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17 , 195 (2019).

DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3 , 610–619 (2021).

Cutillo, C. M. et al. Machine intelligence in healthcare: perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit. Med. 3 , 47 (2020).

Sendak, M. P., Gao, M., Brajer, N. & Balu, S. Presenting machine learning model information to clinical end users with model facts labels. NPJ Digit. Med. 3 , 41 (2020).

Saporta, A. et al. Deep learning saliency maps do not accurately highlight diagnostically relevant regions for medical image interpretation. Preprint at medRxiv https://doi.org/10.1101/2021.02.28.21252634 (2021).

Ehsan, U. et al . The who in explainable AI: how AI background shapes perceptions of AI explanations. Preprint at https://arxiv.org/abs/2107.13509 (2021).

Reyes, M. et al. On the interpretability of artificial intelligence in radiology: Challenges and opportunities. Radio. Artif. Intell. 2 , e190043 (2020).

Liu, C. et al . On the replicability and reproducibility of deep learning in software engineering. Preprint at https://arxiv.org/abs/2006.14244 (2020).

Beam, A. L., Manrai, A. K. & Ghassemi, M. Challenges to the reproducibility of machine learning models in health care. J. Am. Med. Assoc. 323 , 305–306 (2020).

Gerke, S., Babic, B., Evgeniou, T. & Cohen, I. G. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. NPJ Digit. Med. 3 , 53 (2020).

Lee, C. S. & Lee, A. Y. Clinical applications of continual learning machine learning. Lancet Digit. Health 2 , e279–e281 (2020).

Food and Drug Administration. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD): Discussion Paper and Request for Feedback (FDA, 2019).

Morley, J. et al. The debate on the ethics of AI in health care: a reconstruction and critical review. SSRN http://dx.doi.org/10.2139/ssrn.3486518 (2019.

Price, W. N., Gerke, S. & Cohen, I. G. Potential liability for physicians using artificial intelligence. J. Am. Med. Assoc. 322 , 1765–1766 (2019).

Larson, D. B., Magnus, D. C., Lungren, M. P., Shah, N. H. & Langlotz, C. P. Ethics of using and sharing clinical imaging data for artificial intelligence: a proposed framework. Radiology 295 , 675–682 (2020).

Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2 , 305–311 (2020).

Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H. & Ferrante, E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc. Natl Acad. Sci. USA 117 , 12592–12594 (2020).

Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in plain sight: reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383 , 874–882 (2020).

Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366 , 447–453 (2019).

Cirillo, D. et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit. Med. 3 , 81 (2020).

Download references

Acknowledgements

We thank A. Tamkin and N. Phillips for their feedback. E.J.T. receives funding support from US National Institutes of Health grant UL1TR002550.

Author information

These authors contributed equally: Pranav Rajpurkar, Emma Chen, Oishi Banerjee.

Authors and Affiliations

Department of Biomedical Informatics, Harvard University, Cambridge, MA, USA

Pranav Rajpurkar

Department of Computer Science, Stanford University, Stanford, CA, USA

Emma Chen & Oishi Banerjee

Scripps Translational Science Institute, San Diego, CA, USA

Eric J. Topol

You can also search for this author in PubMed   Google Scholar

Contributions

P.R. and E.J.T. conceptualized this Review. E.C., O.B. and P.R. were responsible for the design and synthesis of this Review. All authors contributed to writing and editing the manuscript.

Corresponding author

Correspondence to Eric J. Topol .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Medicine thanks Despina Kontos and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Karen O’Leary was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Rajpurkar, P., Chen, E., Banerjee, O. et al. AI in health and medicine. Nat Med 28 , 31–38 (2022). https://doi.org/10.1038/s41591-021-01614-0

Download citation

Received : 23 July 2021

Accepted : 05 November 2021

Published : 20 January 2022

Issue Date : January 2022

DOI : https://doi.org/10.1038/s41591-021-01614-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Artificial intelligence as a tool for creating patient visit summary: a scoping review and guide to implementation in an erectile dysfunction clinic.

  • Supanut Lumbiganon
  • Elia Abou Chawareb
  • Faysal A. Yafi

Current Urology Reports (2025)

Beyond CAR-T: The rise of CAR-NK cell therapy in asthma immunotherapy

  • Mohadeseh Mohammad Taheri
  • Fatemeh Javan
  • Seyed Shamseddin Athari

Journal of Translational Medicine (2024)

Advancing EEG prediction with deep learning and uncertainty estimation

  • Mats Tveter
  • Thomas Tveitstøl
  • Ira R. J. Hebold Haraldsen

Brain Informatics (2024)

Applying the UTAUT2 framework to patients’ attitudes toward healthcare task shifting with artificial intelligence

  • Weiting Huang
  • Wen Chong Ong
  • Jasper Tromp

BMC Health Services Research (2024)

Modeling of artificial intelligence-based respiratory motion prediction in MRI-guided radiotherapy: a review

  • Xiangbin Zhang
  • Renming Zhong

Radiation Oncology (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

ai in medical field research paper

IMAGES

  1. (PDF) Application of artificial intelligence in medical field- A review

    ai in medical field research paper

  2. research paper on artificial intelligence in healthcare

    ai in medical field research paper

  3. (PDF) Current status and applications of Artificial Intelligence (AI

    ai in medical field research paper

  4. (PDF) Use of Artificial Intelligence in Healthcare and Medicine

    ai in medical field research paper

  5. BDCC

    ai in medical field research paper

  6. (PDF) Artificial intelligence (AI) in medicine, current applications

    ai in medical field research paper

VIDEO

  1. AI in Clinical Medicine

  2. Generative AI in Medical Education

  3. Information Session: Certificate in Medical Software & Medical Artificial Intelligence

  4. AI application and Medical Education

COMMENTS

  1. Artificial Intelligence: How is It Changing Medical Sciences and Its

    History of AI in Medical Field Great advances have been made in using artificially intelligent systems in case of patient diagnosis. For example, in the field of visually oriented specialties, such as dermatology,[ 12 , 13 ] clinical imaging data has been used by Esteva et al .[ 6 ] and Hekler et al .[ 14 ] to develop classification models to ...

  2. Artificial intelligence in medicine: current trends and future

    Artificial intelligence (AI) research within medicine is growing rapidly. In 2016, healthcare AI projects attracted more investment than AI projects within any other sector of the global economy. 1 However, among the excitement, there is equal scepticism, with some urging caution at inflated expectations. 2 This article takes a close look at current trends in medical AI and the future ...

  3. (PDF) Artificial intelligence in medicine

    The evolution of artificial intelligence (AI) in the medical field contributes to the progression of programs specifically designed to help doctors make diagnoses, make treatment decisions, and ...

  4. Artificial intelligence (AI) in medicine, current applications and

    Keywords: Artificial intelligence, Medicine, Pathology. Introduction. The role of Artificial intelligence (AI) in the field of medicine is constantly expanding. AI promises to revolutionize patient care in the coming years with the aim of optimizing personalized medicine and tailoring it to the use of individual patients. Medicine embraced AI ...

  5. The role of artificial intelligence in healthcare: a structured

    This paper evaluated AI in healthcare research streams using the SLR method [].As suggested by Massaro et al. [], an SLR enables the study of the scientific corpus of a research field, including the scientific rigour, reliability and replicability of operations carried out by researchers.As suggested by many scholars, the methodology allows qualitative and quantitative variables to highlight ...

  6. Explainable and interpretable artificial intelligence in medicine: a

    This review aims to explore the growing impact of machine learning and deep learning algorithms in the medical field, with a specific focus on the critical issues of explainability and interpretability associated with black-box algorithms. While machine learning algorithms are increasingly employed for medical analysis and diagnosis, their complexity underscores the importance of understanding ...

  7. (Pdf) the Role of Artificial Intelligence in Healthcare: a Systematic

    Artificial Intelligence (AI) represents a transformative force reshaping the landscape of numerous sectors, from finance and education to transportation and beyond. At its cor e, AI

  8. (Pdf) Artificial Intelligence and Medical Research: Accelerating

    Abstract. Healthcare and the biom edical sciences are going through a. revolutionary period due to the introduction of Artificial Intelligence. (AI) in medical research. This chapter examines the ...

  9. AI in health and medicine

    AI has the potential to reshape medicine and make healthcare more accurate, efficient and accessible; this Review discusses recent progress, opportunities and challenges toward achieving this goal.

  10. Artificial Intelligence in Medicine

    Artificial intelligence (AI) has gained recent public prominence with the release of deep-learning models that can generate anything from art to term papers with minimal human intervention. This ...