Validation of clinical acceptability of deep-learning-based automated segmentation of organs-at-risk for head-and-neck radiotherapy treatment planning

J. John Lucido, Todd A. DeWees, Todd R. Leavitt, Aman Anand, Chris J. Beltran, Mark D. Brooke, Justine R. Buroker, Robert L. Foote, Olivia R. Foss, Angela M. Gleason, Teresa L. Hodge, Cían O. Hughes, Ashley E. Hunzeker, Nadia N. Laack, Tamra K. Lenz, Michelle Livne, Megumi Morigami, Douglas J. Moseley, Lisa M. Undahl, Yojan PatelErik J. Tryggestad, Megan Z. Walker, Alexei Zverovitch, Samir H. Patel

Research output: Contribution to journalArticlepeer-review


Introduction: Organ-at-risk segmentation for head and neck cancer radiation therapy is a complex and time-consuming process (requiring up to 42 individual structure, and may delay start of treatment or even limit access to function-preserving care. Feasibility of using a deep learning (DL) based autosegmentation model to reduce contouring time without compromising contour accuracy is assessed through a blinded randomized trial of radiation oncologists (ROs) using retrospective, de-identified patient data. Methods: Two head and neck expert ROs used dedicated time to create gold standard (GS) contours on computed tomography (CT) images. 445 CTs were used to train a custom 3D U-Net DL model covering 42 organs-at-risk, with an additional 20 CTs were held out for the randomized trial. For each held-out patient dataset, one of the eight participant ROs was randomly allocated to review and revise the contours produced by the DL model, while another reviewed contours produced by a medical dosimetry assistant (MDA), both blinded to their origin. Time required for MDAs and ROs to contour was recorded, and the unrevised DL contours, as well as the RO-revised contours by the MDAs and DL model were compared to the GS for that patient. Results: Mean time for initial MDA contouring was 2.3 hours (range 1.6-3.8 hours) and RO-revision took 1.1 hours (range, 0.4-4.4 hours), compared to 0.7 hours (range 0.1-2.0 hours) for the RO-revisions to DL contours. Total time reduced by 76% (95%-Confidence Interval: 65%-88%) and RO-revision time reduced by 35% (95%-CI,-39%-91%). All geometric and dosimetric metrics computed, agreement with GS was equivalent or significantly greater (p<0.05) for RO-revised DL contours compared to the RO-revised MDA contours, including volumetric Dice similarity coefficient (VDSC), surface DSC, added path length, and the 95%-Hausdorff distance. 32 OARs (76%) had mean VDSC greater than 0.8 for the RO-revised DL contours, compared to 20 (48%) for RO-revised MDA contours, and 34 (81%) for the unrevised DL OARs. Conclusion: DL autosegmentation demonstrated significant time-savings for organ-at-risk contouring while improving agreement with the institutional GS, indicating comparable accuracy of DL model. Integration into the clinical practice with a prospective evaluation is currently underway.

Original languageEnglish (US)
Article number1137803
JournalFrontiers in Oncology
StatePublished - 2023


  • autosegmentation
  • clinical validation
  • comprehensive
  • deep learning
  • head and neck cancer
  • organs-at-risk
  • radiation therapy

ASJC Scopus subject areas

  • Oncology
  • Cancer Research


Dive into the research topics of 'Validation of clinical acceptability of deep-learning-based automated segmentation of organs-at-risk for head-and-neck radiotherapy treatment planning'. Together they form a unique fingerprint.

Cite this