Advertisement
Rapid Communication| Volume 4, 100096, 2023

Machine learning analysis of confounding variables of a convolutional neural network specific for abdominal aortic aneurysms

  • Roger T. Tomihama
    Affiliations
    Department of Radiology, Section of Vascular and Interventional Radiology, Linda University School of Medicine, Loma Linda, CA
    Search for articles by this author
  • Justin R. Camara
    Affiliations
    Department of Radiology, Section of Vascular and Interventional Radiology, Linda University School of Medicine, Loma Linda, CA
    Search for articles by this author
  • Sharon C. Kiang
    Correspondence
    Correspondence: Sharon C. Kiang, MD, Department of Surgery, Division of Vascular Surgery, Linda University School of Medicine, 11175 Campus St, Ste 21123, Loma Linda, CA 92350
    Affiliations
    Department of Surgery, Division of Vascular Surgery, Linda University School of Medicine, Loma Linda, CA

    Department of Surgery, Division of Vascular Surgery, VA Loma Linda Healthcare System, Loma Linda, CA
    Search for articles by this author
Open AccessPublished:January 13, 2023DOI:https://doi.org/10.1016/j.jvssci.2022.11.004

      Abstract

      Objective

      To identify confounding variables influencing the accuracy of a convolutional neural network (CNN) specific for infrarenal abdominal aortic aneurysms (AAAs) on computed tomography angiograms (CTAs).

      Methods

      A Health Insurance Portability and Accountability Act-compliant, institutional review board-approved, retrospective study analyzed abdominopelvic CTA scans from 200 patients with infrarenal AAAs and 200 propensity-matched control patients. An AAA-specific trained CNN was developed by the application of transfer learning to the VGG-16 base model using model training, validation, and testing techniques. Model accuracy and area under the curve were analyzed based on data sets (selected, balanced, or unbalanced), aneurysm size, extra-abdominal extension, dissections, and mural thrombus. Misjudgments were analyzed by review of heatmaps, via gradient weighted class activation, overlaid on CTA images.

      Results

      The trained custom CNN model reported high test group accuracies of 94.1%, 99.1%, and 99.6% and area under the curve of 0.9900, 0.9998, and 0.9993 in selected (n = 120), balanced (n = 3704), and unbalanced image sets (n = 31,899), respectively. Despite an eightfold difference between balanced and unbalanced image sets, the CNN model demonstrated high test group sensitivities (98.7% vs 98.9%) and specificities (99.7% vs 99.3%) in unbalanced and balanced image sets, respectively. For aneurysm size, the CNN model demonstrates decreasing misjudgments as aneurysm size increases: 47% (16/34) for aneurysms <3.3 cm, 32% (11/34) for aneurysms 3.3 to 5 cm, and 20% (7/34) for aneurysms >5 cm. Aneurysms containing measurable mural thrombus were over-represented within type II (false-negative) misjudgments compared with type I (false-positive) misjudgments (71% vs 15%, P < .05). Inclusion of extra-abdominal aneurysm extension (thoracic or iliac artery) or dissection flaps in these imaging sets did not decrease the model's overall accuracy, indicating that the model performance was excellent without the need to clean the data set of confounding or comorbid diagnoses.

      Conclusions

      Analysis of an AAA-specific CNN model can accurately screen and identify infrarenal AAAs on CTA despite varying pathology and quantitative data sets. The highest anatomic misjudgments were with small aneurysms (<3.3 cm) or the presence of mural thrombus. Accuracy of the CNN model is maintained despite the inclusion of extra-abdominal pathology and imbalanced data sets.

      Keywords

      Artificial intelligence (AI) aims to develop machine learning systems that demonstrate the properties of human intelligence.
      A convolutional neural network (CNN), a subdiscipline in AI, has been spotlighted in medical imaging for solving computer-based visual tasks (ie, image analysis, object identification, categorization, and segmentation). The application of a CNN has been investigated in a wide range of medical fields and could potentially lead to the development of new approaches for the diagnosis, prognosis, or treatment of patients.
      • Rajkomar A.
      • Dean J.
      • Kohane I.
      Machine learning in medicine.
      ,
      • Ngiam K.Y.
      • Khor I.W.
      Big data and machine learning algorithms for health-care delivery.
      Abdominal aortic aneurysm (AAA) rupture represents a life-threatening disease.
      • Nordon I.M.
      • Hinchliffe R.J.
      • Loftus I.M.
      • Thompson M.M.
      Pathophysiology and epidemiology of abdominal aortic aneurysms.
      ,
      • Golledge J.
      • Muller J.
      • Daugherty A.
      • Norman P.
      Abdominal aortic aneurysm: pathogenesis and implications for management.
      In the era of personalized medicine and big data analytics, AI AAA imaging programs have the potential ability to predict personalized-patient outcomes.
      • Rajkomar A.
      • Dean J.
      • Kohane I.
      Machine learning in medicine.
      ,
      • Ngiam K.Y.
      • Khor I.W.
      Big data and machine learning algorithms for health-care delivery.
      ,
      • Lareyre F.
      • Adam C.
      • Carrier M.
      • Dommerc C.
      • Mialhe C.
      • Raffort J.
      A fully automated pipeline for mining abdominal aortic aneurysm using image segmentation.
      • Raffort J.
      • Adam C.
      • Carrier M.
      • Ballaith A.
      • Coscas R.
      • Jean-Baptiste E.
      • et al.
      Artificial intelligence in abdominal aortic aneurysm.
      • Dey D.
      • Slomka P.J.
      • Leeson P.
      • Comaniciu D.
      • Shrestha S.
      • Sengupta P.P.
      • et al.
      Artificial intelligence in cardiovascular imaging: JACC state-of-the-art review.
      One of the biggest challenges with these deep learning machine models in medical imaging is the variables influencing the accuracy and generalizability across institutions. CNN models amplify the aspects of the input that are important for discrimination and suppress irrelevant variations (ie, normal variants, confounding pathology, and small data sets).
      • Panch T.
      • Szolovits P.
      • Atun R.
      Artificial intelligence, machine learning and health systems.
      ,
      • Miotto R.
      • Wang F.
      • Wang S.
      • Jiang X.
      • Dudley J.T.
      Deep learning for healthcare: review, opportunities and challenges.
      Thus, the quality and applicability of AI algorithms across institutions may be questionable without extensive testing and subanalysis.
      The goal of this study is to analyze confounding variables influencing the accuracy of a newly developed CNN specific for detecting infrarenal AAAs. Our previous machine learning model (without segmentation training/programming) automatically detects the presence of an infrarenal AAA in various locations and sizes with nearly 99% accuracy. The output of the model is a binary classifier that automatically recognizes the presence or absence of an AAA on computed tomography angiograms (CTAs) of the abdomen and pelvis.
      • Camara J.R.
      • Tomihama R.T.
      • Pop A.
      • Shedd M.P.
      • Dobrowski B.S.
      • Knox C.J.
      • et al.
      Development of a convolutional neural network to detect abdominal aortic aneurysms.
      The AI model accuracy will be analyzed on simulated real-world confounding variables such as data set size (segmented, balanced, or unbalanced), aneurysm size, extra-abdominal extension, dissections, and mural thrombus.

      Methods

      Study population

      The local institutional review board approved this Health Insurance Portability and Accountability Act-compliant study and waived the requirement for written informed consent. A retrospective review of the hospital’s internal radiology database (mPower Clinical Analytics; Nuance Communications, Inc) identified 4821 CTA scans of the abdomen and pelvis performed between January 2015 and January 2020. Within this group, 398 CTAs of the abdomen and pelvis reported the presence of an aortic aneurysm. These examinations were individually reviewed for the presence of infrarenal AAAs (diameter >3.0 cm). From this group, 68 cases were excluded because of ruptured aneurysm, absence of an infrarenal AAA, prior repair of an infrarenal AAA, image nonavailability in the picture archiving and communication system and/or protocol errors (absence of intravenous [IV] contrast material, etc). Subsequently, 200 CTA scans containing infrarenal AAAs were identified. Clinical and demographic data (ie, date of birth, patient gender, presence or absence of hypertension, history of tobacco use, and scanner type) were collected from the medical record system. For the development of a propensity-matched control group, analysis of the 4821 CTA scans of the abdomen and pelvis identified 200 propensity-matched nonaneurysmal aorta control patients who were selected based on similar demographics, comorbidities, and technical imaging factors of the study group.

      Convolutional neural network model development

      As described in our prior work, for the initial CNN model development, axial reconstructions from all selected CT scans were exported in noncompressed JPEG format at preset window widths and levels.
      • Camara J.R.
      • Tomihama R.T.
      • Pop A.
      • Shedd M.P.
      • Dobrowski B.S.
      • Knox C.J.
      • et al.
      Development of a convolutional neural network to detect abdominal aortic aneurysms.
      All axial reconstruction images were resized to 512 × 512 pixels. A total of 6175 axial images containing infrarenal AAAs were sorted. A total of 100,249 axial nonaneurysmal images were sorted. The aneurysm set was randomized to 60% training (n = 3705), 10% validation (n = 618), and 30% testing (n = 1852) subsets. A nonaneurysm set was generated through sampling of nonaneurysm axial reconstruction images at fixed intervals. The nonaneurysm set was randomized to 60% training (n = 3705), 10% validation (n = 618), and 30% testing (n = 1852) subsets.
      The VGG-16 neural network architecture was selected for its robust performance in a variety of image recognition tasks.
      • Guan Q.
      • Wang Y.
      • Ping B.
      • Li D.
      • Du J.
      • Qin Y.
      • et al.
      Deep convolutional neural network VGG-16 model for differential diagnosing of papillary thyroid carcinomas in cytological images: a pilot study.
      • Yoon H.J.
      • Kim S.
      • Kim J.-H.
      • Keum J.-S.
      • Oh S.-I.
      • Jo J.
      • et al.
      A lesion-based convolutional neural network improves endoscopic detection and depth prediction of early gastric cancer.
      • Lee K.-S.
      • Jung S.-K.
      • Ryu J.-J.
      • Shin S.-W.
      • Choi J.
      Evaluation of transfer learning with deep convolutional neural networks for screening osteoporosis in dental panoramic radiographs.
      • Tuzoff D.V.
      • Tuzova L.N.
      • Bornstein M.M.
      • Krasnov A.S.
      • Kharchenko M.A.
      • Nikolenko S.I.
      • et al.
      Tooth detection and numbering in panoramic radiographs using convolutional neural networks.
      • Geng L.
      • Zhang S.
      • Tong J.
      • Xiao Z.
      Lung segmentation method with dilated convolution based on VGG-16 network.
      Transfer learning was applied to the neural network using ImageNet, a pretrained CNN developed using over 14 million hand-labeled images in over 20,000 categories.
      ImageNet.
      Image augmentation was applied to the training set during model development using imgaug (version 0.2.5).
      imgaug. imgaug 0.4.0 documentation.
      ,
      • Huang D.
      • Feng M.
      Understanding deep convolutional networks for biomedical imaging: a practical tutorial.
      Normalization, rotation, flipping, width, height, zoom level, and shear intensity were varied. No segmentation of axial reconstruction images was performed. Stochastic gradient descent was employed as the model optimizer. The penultimate network layer consisted of a dense layer containing 1024 neurons. The last fully connected layer was connected to a logistic layer using a rectified linear unit as the activation function for binary output (infrarenal AAA or nonaneurysm). Initial learning rate, decay, and momentum were set to 1 × 10−3, 1 × 10−6, and 0.9, respectively. Gradient descent optimization was applied via Nesterov accelerated gradient. The model was trained for 40 epochs with batch sizes of 15 to stable convergence of the loss function in the validation set. To address class imbalance, the majority class (nonaneurysm) was undersampled to the same size as the minority class (infrarenal AAA). Model development and analysis were performed using Keras (version 2.4.3), TensorFlow (version 2.4.1), imgaug (version 0.2.5), Scipy (version 1.2.1), NumPy (version 1.8.2), scikit-learn (version 0.23.1), and Matplotlib (version 3.2.2). All experiments were performed on a computer equipped with an NVIDIA Quadro P5000 graphical processing unit with 16 GB GDDR5 video memory.
      The model was assessed for overall diagnostic accuracy at the image level. Loss and accuracy of training and validation groups were plotted by epoch to observe for stable convergence of model performance
      • Camara J.R.
      • Tomihama R.T.
      • Pop A.
      • Shedd M.P.
      • Dobrowski B.S.
      • Knox C.J.
      • et al.
      Development of a convolutional neural network to detect abdominal aortic aneurysms.
      (Fig 1).
      Figure thumbnail gr1
      Fig 1The design and performance of the optimized CNN model for AAA. (A) Flowchart of the study process depicting patient selection and study design. (B) In the optimized CNN model, as the number of epochs increases, there is an appropriate reduction of the loss function and increase in overall accuracy demonstrating an optimal fitting for model performance.

      Machine learning analysis of data confounding variables influencing the accuracy of output

      Optimization of the model included randomization to sets of 60%, 10%, and 30% for model training, validation, and testing, respectively. A total of 6175 axial images containing infrarenal AAAs were sorted. A total of 100,249 axial nonaneurysmal images were sorted. The aneurysm set was randomized to 60% training (n = 3705), 10% validation (n = 618), and 30% testing (n = 1852) subsets. A numerically balanced nonaneurysm set was generated through the sampling of nonaneurysm axial reconstruction images at fixed intervals. The balanced nonaneurysm set was randomized to 60% training (n = 3705), 10% validation (n = 618), and 30% testing (n = 1852) subsets. Training and validation subsets were used for model hyperparameter tuning. Test subsets were used for the evaluation of model performance.
      • Camara J.R.
      • Tomihama R.T.
      • Pop A.
      • Shedd M.P.
      • Dobrowski B.S.
      • Knox C.J.
      • et al.
      Development of a convolutional neural network to detect abdominal aortic aneurysms.
      After the finalized optimization of the model’s hyperparameters, the model’s accuracy and area under the curve (AUC) were then subanalyzed based on data set variations and pathological variables. In order to replicate real-world institutional variability and applicability, three variable sized data sets were created and tested for accuracy: selected, balanced, and unbalanced (120 images, 3704 images, and 31,899 images, respectively). In the subanalysis of pathology, the following variables were evaluated for misjudgments and accuracy: aneurysm size (<3.3 cm, 3.4-5.0 cm, >5.1 cm), extra-abdominal extension, dissections, and mural thrombus.
      In regard to the variable data set size, a confusion matrix (two-by-two) table was generated from each testing set. Sensitivity, specificity, positive predictive value, and negative predictive value were calculated from the classification results. In regard to the pathological variables, misjudgments were analyzed by review of heatmaps, via gradient weighted class activation, overlaid on CTA images. Plots and figures were generated by Matplotlib and converted to vector graphic format in Visio Professional 2019 (Microsoft) or OmniGraffle Pro (version 7.18.1; The Omni Group).

      Results

      The demographics of the propensity-matched groups (AAA vs non-AAA) were similar: age (73.2 years and 72.1 years, P = .359), male gender (71.5% and 72.0%, P = .999), tobacco use (79.9% and 76.5%, P = .891), or history of hypertension (86.3% and 82.7%, P = .878).
      • Camara J.R.
      • Tomihama R.T.
      • Pop A.
      • Shedd M.P.
      • Dobrowski B.S.
      • Knox C.J.
      • et al.
      Development of a convolutional neural network to detect abdominal aortic aneurysms.
      The trained custom CNN model reported high test group accuracies of 94.1%, 99.1%, and 99.6% and AUC of 0.9900, 0.9998, and 0.9993 in selected (n = 400), balanced (n = 3704), and unbalanced image sets (n = 31,899), respectively (Table I). As demonstrated by the confusion matrices, despite an eightfold difference between balanced and unbalanced image sets (3704 vs 31,889), the CNN model demonstrated high test group sensitivities (98.7% vs 98.9%) and specificities (99.7% vs 99.3%) in unbalanced and balanced image sets, respectively (Fig 2).
      TableThe trained custom CNN model reported high test group accuracies in varying sized datasets
      Varying data set sizesSensitivity, %Specificity, %Accuracy, %AUC
      Selected image set (n = 120)88.3100.094.10.99
      Balanced image set (n = 3704)98.999.399.10.9993
      Unbalanced image set (n = 31,899)98.799.799.60.9998
      AUC, Area under the curve.
      Figure thumbnail gr2
      Fig 2Despite an eight fold difference between balanced (A) and unbalanced (B) image sets (3,704 vs 31,889), the CNN model demonstrated low rates of misjudgements in the balanced and unbalanced image sets, respectively.
      In the subanalysis of these misjudgment cases (n = 34, 0.092%), the CNN model demonstrates improving accuracy as the aneurysm size increases: 47% (16/34) for aneurysms <3.3 cm, 32% (11/34) for aneurysms 3.3 to 5 cm, and 20% (7/34) for aneurysms >5 cm (Fig 3). The presence of a measurable mural thrombus was also a notable confounding variable present in 50% of these misjudgments (Fig 4). Aneurysms containing measurable mural thrombus were over-represented within type II (false-negative) misjudgments compared with type I (false-positive) misjudgments (71% vs 15%, P < .05). The average thickness of the mural thrombi that caused a false-negative misjudgment was 12.8 mm (±6.1 mm). Inclusion of extra-abdominal aneurysm extension (n = 347 images) or dissection flaps (n = 80 images) in these imaging sets did not appear to decrease the model’s overall accuracy because they were not heavily represented in the error set relative to their incidence in the overall data set.
      Figure thumbnail gr3
      Fig 3The CNN model demonstrates improving accuracy as aneurysm size increases: 47% (16/34) for aneurysms <3.3 cm, 32% (11/34) for aneurysms 3.3-5 cm, and 20% (7/34) for aneurysms >5 cm. Below are heat maps generated via gradient weighted class activation mapping overlaid on CT images. (A) This aneurysm bordered on ectasia, very close to the 3 cm threshold contributed to the false positive misjudgement. (B) The relatively small size of the enhancing region in combination with mural thrombus contributed to a false negative misjudgement.
      Figure thumbnail gr4
      Fig 4The presence of a measurable mural thrombus was a notable confounding variable that was present in 50% of these misjudgments. Below are heat maps generated via gradient weighted class activation mapping overlaid on CT images. (A). Correct classification. The algorithm successfully avoid the misjudgment of an aneurysm with significant mural thrombus. (B). Incorrect classification. The relatively small size of the enhancing region in combination with mural thrombus contributed to a false negative misjudgement.

      Discussion

      The implementation of AI in medicine is undergoing continuous evolution. The integration of AI imaging and biologic analysis could potentially lead to the development of revolutionary predictable models for the management of patients.
      • Rajkomar A.
      • Dean J.
      • Kohane I.
      Machine learning in medicine.
      ,
      • Ngiam K.Y.
      • Khor I.W.
      Big data and machine learning algorithms for health-care delivery.
      Before the application of machine learning to image analysis, manual human interpretation was required to convert an image finding into a binary or categorical variable for analysis. However, with the use of a CNN, we now have the ability to objectively break down an image into large biostatistical data sets. The tidal wave of medical records, in the form of imaging data, clinical data, and genomic data, is only likely to exponentially increase. Thus, the future of medicine research is likely to be even more data dependent with the synergy between medical scientists and AI technology becoming more pronounced. Based on the analysis of large data sets of disease profiles and treatment responses, machine learning programs will likely provide the opportunity to predict personalized, patient-specific, clinical outcomes.
      • Bohr A.
      • Memarzadeh K.
      The rise of artificial intelligence in healthcare applications.
      ,
      • Ahuja A.S.
      The impact of artificial intelligence in medicine on the future role of the physician.
      The main challenge in AI data sciences applications is developing high-fidelity, widespread applicability. Because CNNs are created by training on sample cases of the general population, there is no ability to provide every possible anatomic scenario for AAA that has ever existed. Because of this, there will be many factors that can challenge the CNN’s widespread applicability. These “confounding” factors include varying imaging modalities (different types and qualities of CT scan imaging), pathologies (atherosclerotic plaques, dissection flaps, mural thrombus, and penetrating ulcers), protocols (timing of IV contrast opacification of the aorta), and practices (small hospital setting vs large tertiary referral center). In typical AI data sciences applications, the larger the input data set (ie, thousands to hundreds of thousands), the more accurate the algorithm output for sorting true signal from noise. However, surgical clinical research is often limited by the total number of patients (ie, hundreds to thousands) who can be studied at a particular institution. Thus, the quality and applicability of AI algorithms across institutions may be questionable without extensive testing and subanalysis. This study analyzed the anatomic and data set variables influencing the accuracy of a newly developed CNN specific for detecting infrarenal AAAs.
      Some of the early AI studies in vascular surgery were implemented to assess the predictive nature of clinical markers for AAA patient outcomes.
      • Turton E.P.
      • Scott D.J.
      • Delbridge M.
      • Snowden S.
      • Kester R.C.
      Ruptured abdominal aortic aneurysm: a novel method of outcome prediction using neural network technology.
      ,
      • Wise E.S.
      • Hocking K.M.
      • Brophy C.M.
      Prediction of in-hospital mortality after ruptured abdominal aortic aneurysm repair using an artificial neural network.
      However, imaging is an integral component for the diagnosis, surveillance, and management of AAAs. In the vascular surgery literature, there have been studies in the past that have examined the role of semiautomated AAA image analysis focusing on image segmentation.
      • de Bruijne M.
      • van Ginneken B.
      • Viergever M.A.
      • Niessen W.J.
      Interactive segmentation of abdominal aortic aneurysms in CTA images.
      • Subasic M.
      • Loncaric S.
      • Sorantin E.
      3-D image analysis of abdominal aortic aneurysm.
      • Zhuge F.
      • Rubin G.D.
      • Sun S.
      • Napel S.
      An abdominal aortic aneurysm segmentation method: level set with region and statistical information.
      • Joldes G.R.
      • Miller K.
      • Wittek A.
      • Forsythe R.O.
      • Newby D.E.
      • Doyle B.J.
      BioPARR: a software system for estimating the rupture potential index for abdominal aortic aneurysms.
      Although these were significant advances in programming techniques, nevertheless these programs all require some baseline manual input. However, Lareyre et al
      • Lareyre F.
      • Adam C.
      • Carrier M.
      • Dommerc C.
      • Mialhe C.
      • Raffort J.
      A fully automated pipeline for mining abdominal aortic aneurysm using image segmentation.
      recently described a fully automated pipeline to characterize the AAA, including the presence of intraluminal thrombus and calcifications. This rapid method was tested on a set of 40 patients with CTA images and demonstrated a good correlation with results obtained from manual segmentation by human experts.
      • Lareyre F.
      • Adam C.
      • Carrier M.
      • Dommerc C.
      • Mialhe C.
      • Raffort J.
      A fully automated pipeline for mining abdominal aortic aneurysm using image segmentation.
      In addition, Mohammadi et al
      • Mohammadi S.
      • Mohammadi M.
      • Dehlaghi V.
      • Ahmadi A.
      Automatic segmentation, detection, and diagnosis of abdominal aortic aneurysm (AAA) using convolutional neural networks and Hough Circles algorithm.
      reported and designed a CNN classifier for the aorta where detection is 98.62% and a Hough Circles algorithm that classified a group of 120 aorta patches according to their diameter with an accuracy of 98.33%.
      Our fully automated, novel, trained CNN model demonstrated a robust accuracy of 99.1% (95% confidence interval: 98.72%-99.36%) and an AUC of 0.9900 tested on 3600 images from 400 patients in two propensity-matched cohorts.
      • Camara J.R.
      • Tomihama R.T.
      • Pop A.
      • Shedd M.P.
      • Dobrowski B.S.
      • Knox C.J.
      • et al.
      Development of a convolutional neural network to detect abdominal aortic aneurysms.
      These results are derived from real-world, unaltered, nonsegmented images that contain varying acquisition methods, contrast agent used, resolution, concomitant comorbid pathology, and noise and artifacts. With this robust CNN, we have demonstrated a proof of concept model that can be used for a variety of potential future applications.
      Although there is great potential for medical imaging for CNNs, there are many confounding variables for its widespread applicability due to varying imaging modalities, pathologies, protocols, and practices.
      • Azulay A.
      • Weiss Y.
      Why do deep convolutional networks generalize so poorly to small image transformations?.
      • Chen C.
      • Bai W.
      • Davies R.H.
      • Bhuva A.N.
      • Manisty C.H.
      • Augusto J.B.
      • et al.
      Improving the generalizability of convolutional neural network-based segmentation on CMR images.
      • Hesamian M.H.
      • Jia W.
      • He X.
      • Kennedy P.
      Deep learning techniques for medical image segmentation: achievements and challenges.
      Because of this, the next step in the development process has been to address these CNNs’ applicability to real-world scenarios. In the 2017 American Association of Physicists in Medicine challenge, the winning CNN for cardiac autosegmentation demonstrated decreased accuracy when applied to another local institution data, compared with the testing cases from the challenge.
      • Feng X.
      • Bernard M.E.
      • Hunter T.
      • Chen Q.
      Improving accuracy and robustness of deep convolutional neural network based thoracic OAR segmentation.
      Chen et al
      • Chen C.
      • Bai W.
      • Davies R.H.
      • Bhuva A.N.
      • Manisty C.H.
      • Augusto J.B.
      • et al.
      Improving the generalizability of convolutional neural network-based segmentation on CMR images.
      proposed a method to offer a potential solution to improve CNN-based model generalizability for the cross-scanner image segmentation tasks. Khened et al
      • Khened M.
      • Kollerathu V.A.
      • Krishnamurthi G.
      Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers.
      proposed a novel network structure with residual connections to improve another CNN network generalizability. They pointed out that networks with a large number of parameters may easily suffer from overfitting problems with limited data. As with any new science or technological advancement, the pitfalls and challenges become more apparent when a more detailed subanalysis of these tools is performed. In our current study, we performed a preliminary subanalysis of our AAA-specific CNN model in regard to varying pathology and quantitative data sets. The highest anatomic misjudgments were with small aneurysms (<3.3 cm) or the presence of mural thrombus. The accuracy of the CNN model is maintained despite the inclusion of extra-abdominal pathology and imbalanced data sets. With this information, vascular surgeons can better design and fine tune their future AI algorithms for AAA research. To the best of our knowledge, this is the first work to explore the generalizability of a CNN-based AI algorithm for the CTA image analysis of AAAs with variable data set sizes, concomitant comorbid aortic pathology, multiple scanners, and techniques.
      There are several limitations of this study. First, this is a retrospective single-center study with a limited number of subjects who had certain exclusion criteria (ruptured aneurysm, prior repair of an infrarenal AAA, and/or protocol errors [absence of IV contrast material, timing issues, etc]). Second, the sample size was underpowered for the subanalysis of comprehensive anatomic copathology; there was a lack of all varieties of aneurysm sizes and all forms of comorbid aortic pathologies (mural thrombus, dissection, extra-abdominal extension, etc). Third, the model is not 100% accurate and still demonstrates a <1% misjudgment rate.
      In summary, a preliminary subanalysis of an AAA-specific CNN model can accurately screen and identify infrarenal AAAs on CTA despite varying pathology and quantitative data sets. The highest anatomic misjudgments were with small aneurysms (<3.3 cm) or the presence of a mural thrombus. The accuracy of the CNN model is maintained despite the inclusion of extra-abdominal pathology and imbalanced data sets.

      Appendix (online only).

      References

      1. Frankish K. Ramsey W.M. The Cambridge Handbook of Artificial Intelligence. Cambridge University Press, 2014: 354
        • Rajkomar A.
        • Dean J.
        • Kohane I.
        Machine learning in medicine.
        N Engl J Med. 2019; 380: 1347-1358
        • Ngiam K.Y.
        • Khor I.W.
        Big data and machine learning algorithms for health-care delivery.
        Lancet Oncol. 2019; 20: e262-e273
        • Nordon I.M.
        • Hinchliffe R.J.
        • Loftus I.M.
        • Thompson M.M.
        Pathophysiology and epidemiology of abdominal aortic aneurysms.
        Nat Rev Cardiol. 2011; 8: 92-102
        • Golledge J.
        • Muller J.
        • Daugherty A.
        • Norman P.
        Abdominal aortic aneurysm: pathogenesis and implications for management.
        Arterioscler Thromb Vasc Biol. 2006; 26: 2605-2613
        • Lareyre F.
        • Adam C.
        • Carrier M.
        • Dommerc C.
        • Mialhe C.
        • Raffort J.
        A fully automated pipeline for mining abdominal aortic aneurysm using image segmentation.
        Sci Rep. 2019; 9: 13750
        • Raffort J.
        • Adam C.
        • Carrier M.
        • Ballaith A.
        • Coscas R.
        • Jean-Baptiste E.
        • et al.
        Artificial intelligence in abdominal aortic aneurysm.
        J Vasc Surg. 2020; 72: 321-333.e1
        • Dey D.
        • Slomka P.J.
        • Leeson P.
        • Comaniciu D.
        • Shrestha S.
        • Sengupta P.P.
        • et al.
        Artificial intelligence in cardiovascular imaging: JACC state-of-the-art review.
        J Am Coll Cardiol. 2019; 73: 1317-1335
        • Panch T.
        • Szolovits P.
        • Atun R.
        Artificial intelligence, machine learning and health systems.
        J Glob Health. 2018; 8: 020303
        • Miotto R.
        • Wang F.
        • Wang S.
        • Jiang X.
        • Dudley J.T.
        Deep learning for healthcare: review, opportunities and challenges.
        Brief Bioinform. 2018; 19: 1236-1246
        • Camara J.R.
        • Tomihama R.T.
        • Pop A.
        • Shedd M.P.
        • Dobrowski B.S.
        • Knox C.J.
        • et al.
        Development of a convolutional neural network to detect abdominal aortic aneurysms.
        J Vasc Surg Cases Innov Tech. 2022; 8: 305-311
        • Guan Q.
        • Wang Y.
        • Ping B.
        • Li D.
        • Du J.
        • Qin Y.
        • et al.
        Deep convolutional neural network VGG-16 model for differential diagnosing of papillary thyroid carcinomas in cytological images: a pilot study.
        J Cancer. 2019; 10: 4876-4882
        • Yoon H.J.
        • Kim S.
        • Kim J.-H.
        • Keum J.-S.
        • Oh S.-I.
        • Jo J.
        • et al.
        A lesion-based convolutional neural network improves endoscopic detection and depth prediction of early gastric cancer.
        J Clin Med Res. 2019; 8: 1310
        • Lee K.-S.
        • Jung S.-K.
        • Ryu J.-J.
        • Shin S.-W.
        • Choi J.
        Evaluation of transfer learning with deep convolutional neural networks for screening osteoporosis in dental panoramic radiographs.
        J Clin Med Res. 2020; 9: 392
        • Tuzoff D.V.
        • Tuzova L.N.
        • Bornstein M.M.
        • Krasnov A.S.
        • Kharchenko M.A.
        • Nikolenko S.I.
        • et al.
        Tooth detection and numbering in panoramic radiographs using convolutional neural networks.
        Dentomaxillofacial Radiol. 2019; 48: 20180051
        • Geng L.
        • Zhang S.
        • Tong J.
        • Xiao Z.
        Lung segmentation method with dilated convolution based on VGG-16 network.
        Comput Assist Surg (Abingdon). 2019; 24: 27-33
      2. ImageNet.
        (Available at:)
        http://image-net.org/index
        Date accessed: February 26, 2021
      3. imgaug. imgaug 0.4.0 documentation.
        (Available at:)
        http://imgaug.readthedocs.io
        Date accessed: February 26, 2021
        • Huang D.
        • Feng M.
        Understanding deep convolutional networks for biomedical imaging: a practical tutorial.
        Conf Proc IEEE Eng Med Biol Soc. 2019; 2019: 857-863
        • Bohr A.
        • Memarzadeh K.
        The rise of artificial intelligence in healthcare applications.
        in: Bohr A. Memarzadeh K. Artificial Intelligence in Healthcare. Academic Press, 2020: 25-60
        • Ahuja A.S.
        The impact of artificial intelligence in medicine on the future role of the physician.
        PeerJ. 2019; 7: e7702
        • Turton E.P.
        • Scott D.J.
        • Delbridge M.
        • Snowden S.
        • Kester R.C.
        Ruptured abdominal aortic aneurysm: a novel method of outcome prediction using neural network technology.
        Eur J Vasc Endovasc Surg. 2000; 19: 184-189
        • Wise E.S.
        • Hocking K.M.
        • Brophy C.M.
        Prediction of in-hospital mortality after ruptured abdominal aortic aneurysm repair using an artificial neural network.
        J Vasc Surg. 2015; 62: 8-15
        • de Bruijne M.
        • van Ginneken B.
        • Viergever M.A.
        • Niessen W.J.
        Interactive segmentation of abdominal aortic aneurysms in CTA images.
        Med Image Anal. 2004; 8: 127-138
        • Subasic M.
        • Loncaric S.
        • Sorantin E.
        3-D image analysis of abdominal aortic aneurysm.
        Stud Health Technol Inform. 2000; 77: 1195-1200
        • Zhuge F.
        • Rubin G.D.
        • Sun S.
        • Napel S.
        An abdominal aortic aneurysm segmentation method: level set with region and statistical information.
        Med Phys. 2006; 33: 1440-1453
        • Joldes G.R.
        • Miller K.
        • Wittek A.
        • Forsythe R.O.
        • Newby D.E.
        • Doyle B.J.
        BioPARR: a software system for estimating the rupture potential index for abdominal aortic aneurysms.
        Sci Rep. 2017; 7: 4641
        • Mohammadi S.
        • Mohammadi M.
        • Dehlaghi V.
        • Ahmadi A.
        Automatic segmentation, detection, and diagnosis of abdominal aortic aneurysm (AAA) using convolutional neural networks and Hough Circles algorithm.
        Cardiovasc Eng Technol. 2019; 10: 490-499
        • Azulay A.
        • Weiss Y.
        Why do deep convolutional networks generalize so poorly to small image transformations?.
        (Available at:)
        http://arxiv.org/abs/1805.12177
        Date: 2018
        Date accessed: April 1, 2023
        • Chen C.
        • Bai W.
        • Davies R.H.
        • Bhuva A.N.
        • Manisty C.H.
        • Augusto J.B.
        • et al.
        Improving the generalizability of convolutional neural network-based segmentation on CMR images.
        Front Cardiovasc Med. 2020; 7: 105
        • Hesamian M.H.
        • Jia W.
        • He X.
        • Kennedy P.
        Deep learning techniques for medical image segmentation: achievements and challenges.
        J Digit Imaging. 2019; 32: 582-596
        • Feng X.
        • Bernard M.E.
        • Hunter T.
        • Chen Q.
        Improving accuracy and robustness of deep convolutional neural network based thoracic OAR segmentation.
        Phys Med Biol. 2020; 65: 07NT01
        • Khened M.
        • Kollerathu V.A.
        • Krishnamurthi G.
        Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers.
        Med Image Anal. 2019; 51: 21-45