Learning semantic and visual similarity for endomicroscopy video retrieval

Barbara Andre; Tom Vercauteren; Anna M. Buchner; Michael B. Wallace; Nicholas Ayache

doi:10.1109/TMI.2012.2188301

Learning semantic and visual similarity for endomicroscopy video retrieval

Barbara Andre, Tom Vercauteren, Anna M. Buchner, Michael B. Wallace, Nicholas Ayache

Gastroenterology and Hepatology

Research output: Contribution to journal › Article › peer-review

67 Scopus citations

Abstract

Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

Original language	English (US)
Article number	6153380
Pages (from-to)	1276-1288
Number of pages	13
Journal	IEEE transactions on medical imaging
Volume	31
Issue number	6
DOIs	https://doi.org/10.1109/TMI.2012.2188301
State	Published - 2012

Keywords

Bag-of-visual-words (BoW)
content-based image retrieval (CBIR)
endomicroscopy
semantic and visual similarity
semantic gap
similarity learning

ASJC Scopus subject areas

Software
Radiological and Ultrasound Technology
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TMI.2012.2188301

Cite this

@article{eb2dea1b8b654dfbbebb163d15c99bb7,

title = "Learning semantic and visual similarity for endomicroscopy video retrieval",

abstract = "Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.",

keywords = "Bag-of-visual-words (BoW), content-based image retrieval (CBIR), endomicroscopy, semantic and visual similarity, semantic gap, similarity learning",

author = "Barbara Andre and Tom Vercauteren and Buchner, {Anna M.} and Wallace, {Michael B.} and Nicholas Ayache",

year = "2012",

doi = "10.1109/TMI.2012.2188301",

language = "English (US)",

volume = "31",

pages = "1276--1288",

journal = "IEEE transactions on medical imaging",

issn = "0278-0062",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - Learning semantic and visual similarity for endomicroscopy video retrieval

AU - Andre, Barbara

AU - Vercauteren, Tom

AU - Buchner, Anna M.

AU - Wallace, Michael B.

AU - Ayache, Nicholas

PY - 2012

Y1 - 2012

N2 - Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

AB - Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

KW - Bag-of-visual-words (BoW)

KW - content-based image retrieval (CBIR)

KW - endomicroscopy

KW - semantic and visual similarity

KW - semantic gap

KW - similarity learning

UR - http://www.scopus.com/inward/record.url?scp=84861873452&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861873452&partnerID=8YFLogxK

U2 - 10.1109/TMI.2012.2188301

DO - 10.1109/TMI.2012.2188301

M3 - Article

C2 - 22353403

AN - SCOPUS:84861873452

SN - 0278-0062

VL - 31

SP - 1276

EP - 1288

JO - IEEE transactions on medical imaging

JF - IEEE transactions on medical imaging

IS - 6

M1 - 6153380

ER -

Learning semantic and visual similarity for endomicroscopy video retrieval

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this