Learning semantic and visual similarity for endomicroscopy video retrieval

Barbara Andre, Tom Vercauteren, Anna M. Buchner, Michael B. Wallace, Nicholas Ayache

Research output: Contribution to journalArticlepeer-review

67 Scopus citations


Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

Original languageEnglish (US)
Article number6153380
Pages (from-to)1276-1288
Number of pages13
JournalIEEE transactions on medical imaging
Issue number6
StatePublished - 2012


  • Bag-of-visual-words (BoW)
  • content-based image retrieval (CBIR)
  • endomicroscopy
  • semantic and visual similarity
  • semantic gap
  • similarity learning

ASJC Scopus subject areas

  • Software
  • Radiological and Ultrasound Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Learning semantic and visual similarity for endomicroscopy video retrieval'. Together they form a unique fingerprint.

Cite this