TY - JOUR
T1 - Transferable Visual Words
T2 - Exploiting the Semantics of Anatomical Patterns for Self-Supervised Learning
AU - Haghighi, Fatemeh
AU - Taher, Mohammad Reza Hosseinzadeh
AU - Zhou, Zongwei
AU - Gotway, Michael B.
AU - Liang, Jianming
N1 - Funding Information:
Manuscript received December 9, 2020; accepted January 11, 2021. Date of publication February 22, 2021; date of current version September 30, 2021. This work was supported in part by the ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, in part by the NIH under Award R01HL128785, in part by the GPUs through the ASU Research Computing, and in part by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under Grant ACI-1548562. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH or NSF. (Corresponding author: Jianming Liang.) Fatemeh Haghighi and Mohammad Reza Hosseinzadeh Taher are with the School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail:fhaghigh@asu.edu; mhossei2@asu.edu).
Publisher Copyright:
© 1982-2012 IEEE.
PY - 2021/10/1
Y1 - 2021/10/1
N2 - This paper introduces a new concept called 'transferable visual words' (TransVW), aiming to achieve annotation efficiency for deep learning in medical image analysis. Medical imaging - focusing on particular parts of the body for defined clinical purposes - generates images of great similarity in anatomy across patients and yields sophisticated anatomical patterns across images, which are associated with rich semantics about human anatomy and which are natural visual words. We show that these visual words can be automatically harvested according to anatomical consistency via self-discovery, and that the self-discovered visual words can serve as strong yet free supervision signals for deep models to learn semantics-enriched generic image representation via self-supervision (self-classification and self-restoration). Our extensive experiments demonstrate the annotation efficiency of TransVW by offering higher performance and faster convergence with reduced annotation cost in several applications. Our TransVW has several important advantages, including (1) TransVW is a fully autodidactic scheme, which exploits the semantics of visual words for self-supervised learning, requiring no expert annotation; (2) visual word learning is an add-on strategy, which complements existing self-supervised methods, boosting their performance; and (3) the learned image representation is semantics-enriched models, which have proven to be more robust and generalizable, saving annotation efforts for a variety of applications through transfer learning. Our code, pre-trained models, and curated visual words are available at https://github.com/JLiangLab/TransVW.
AB - This paper introduces a new concept called 'transferable visual words' (TransVW), aiming to achieve annotation efficiency for deep learning in medical image analysis. Medical imaging - focusing on particular parts of the body for defined clinical purposes - generates images of great similarity in anatomy across patients and yields sophisticated anatomical patterns across images, which are associated with rich semantics about human anatomy and which are natural visual words. We show that these visual words can be automatically harvested according to anatomical consistency via self-discovery, and that the self-discovered visual words can serve as strong yet free supervision signals for deep models to learn semantics-enriched generic image representation via self-supervision (self-classification and self-restoration). Our extensive experiments demonstrate the annotation efficiency of TransVW by offering higher performance and faster convergence with reduced annotation cost in several applications. Our TransVW has several important advantages, including (1) TransVW is a fully autodidactic scheme, which exploits the semantics of visual words for self-supervised learning, requiring no expert annotation; (2) visual word learning is an add-on strategy, which complements existing self-supervised methods, boosting their performance; and (3) the learned image representation is semantics-enriched models, which have proven to be more robust and generalizable, saving annotation efforts for a variety of applications through transfer learning. Our code, pre-trained models, and curated visual words are available at https://github.com/JLiangLab/TransVW.
KW - 3D medical imaging
KW - Self-supervised learning
KW - anatomical patterns
KW - and 3D pre-trained models
KW - computational anatomy
KW - transfer learning
KW - visual words
UR - http://www.scopus.com/inward/record.url?scp=85101785729&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101785729&partnerID=8YFLogxK
U2 - 10.1109/TMI.2021.3060634
DO - 10.1109/TMI.2021.3060634
M3 - Article
C2 - 33617450
AN - SCOPUS:85101785729
SN - 0278-0062
VL - 40
SP - 2857
EP - 2868
JO - IEEE transactions on medical imaging
JF - IEEE transactions on medical imaging
IS - 10
ER -