Scalable radiotherapy data curation infrastructure for deep-learning based autosegmentation of organs-at-risk: A case study in head and neck cancer

E. Tryggestad, A. Anand, C. Beltran, J. Brooks, J. Cimmiyotti, N. Grimaldi, T. Hodge, A. Hunzeker, J. J. Lucido, N. N. Laack, R. Momoh, D. J. Moseley, S. H. Patel, A. Ridgway, S. Seetamsetty, S. Shiraishi, L. Undahl, R. L. Foote

Research output: Contribution to journalArticlepeer-review


In this era of patient-centered, outcomes-driven and adaptive radiotherapy, deep learning is now being successfully applied to tackle imaging-related workflow bottlenecks such as autosegmentation and dose planning. These applications typically require supervised learning approaches enabled by relatively large, curated radiotherapy datasets which are highly reflective of the contemporary standard of care. However, little has been previously published describing technical infrastructure, recommendations, methods or standards for radiotherapy dataset curation in a holistic fashion. Our radiation oncology department has recently embarked on a large-scale project in partnership with an external partner to develop deep-learning-based tools to assist with our radiotherapy workflow, beginning with autosegmentation of organs-at-risk. This project will require thousands of carefully curated radiotherapy datasets comprising all body sites we routinely treat with radiotherapy. Given such a large project scope, we have approached the need for dataset curation rigorously, with an aim towards building infrastructure that is compatible with efficiency, automation and scalability. Focusing on our first use-case pertaining to head and neck cancer, we describe our developed infrastructure and novel methods applied to radiotherapy dataset curation, inclusive of personnel and workflow organization, dataset selection, expert organ-at-risk segmentation, quality assurance, patient de-identification, data archival and transfer. Over the course of approximately 13 months, our expert multidisciplinary team generated 490 curated head and neck radiotherapy datasets. This task required approximately 6000 human-expert hours in total (not including planning and infrastructure development time). This infrastructure continues to evolve and will support ongoing and future project efforts.

Original languageEnglish (US)
Article number936134
JournalFrontiers in Oncology
StatePublished - Aug 29 2022


  • artificial intelligence
  • autosegmentation
  • convolutional neural network
  • curation
  • deep learning
  • head and neck cancer
  • organs-at-risk
  • radiotherapy

ASJC Scopus subject areas

  • Oncology
  • Cancer Research


Dive into the research topics of 'Scalable radiotherapy data curation infrastructure for deep-learning based autosegmentation of organs-at-risk: A case study in head and neck cancer'. Together they form a unique fingerprint.

Cite this