The standard gamble demonstrated lower reliability than the feeling thermometer

Milo A. Puhan, Gordon H. Guyatt, Victor M. Montori, Mohit Bhandari, P. J. Devereaux, Lauren Griffith, Roger Goldstein, Holger J. Schünemann

Research output: Contribution to journalArticlepeer-review

23 Scopus citations


Background and Objective: Participants rated clinical marker states (CMS) to make respondents familiar with the task of preference instruments, ground their ratings in relation to other health states, and help investigators interpret patient ratings. The objective was to assess the reliability of CMS using appropriate reliability statistics. Study Design and Setting: Eighty-one patients rated CMSs for mild, moderate, and severe chronic respiratory disease using the feeling thermometer (FT) and the standard gamble (SG) before and after a 3-month respiratory rehabilitation program. To assess reliability we used (a) intraclass correlation coefficients (ICC) with the variance between CMSs as signal and the variance between raters, the variance within raters, and the signal as noise; (b) scatter plots; and (c) Bland-Altman plots. Results: ICCs were 0.47 for the FT and 0.37 for the SG. Scatter and Bland-Altman plots showed large between- and within-person variability; 64.2% and 11.3% of the CMSs ratings were in the correct order on both occasions on the FT and SG, respectively. Conclusion: Our results suggest moderate reliability of CMSs ratings for the FT and poor reliability for the SG, which may explain their lack of improving the SG's measurement properties. Investigators should use appropriate reliability statistics when addressing related issues.

Original languageEnglish (US)
Pages (from-to)458-465
Number of pages8
JournalJournal of Clinical Epidemiology
Issue number5
StatePublished - May 2005


  • Feeling thermometer
  • Preference-based instruments
  • Reliability
  • Standard gamble
  • Test-retest reliability

ASJC Scopus subject areas

  • Epidemiology


Dive into the research topics of 'The standard gamble demonstrated lower reliability than the feeling thermometer'. Together they form a unique fingerprint.

Cite this