Missing phenotype data imputation in pedigree data analysis

Brooke L. Fridley, Mariza De Andrade

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Mapping complex traits or phenotypes with small genetic effects, whose phenotypes may be modulated by temporal trends in families are challenging. Detailed and accurate data must be available on families, whether or not the data were collected over time. Missing data complicate matters in pedigree analysis, especially in the case of a longitudinal pedigree analysis. Because most analytical methods developed for the analysis of longitudinal pedigree data require no missing data, the researcher is left with the option of dropping those cases (individuals) with missing data from the analysis or imputing values for the missing data. We present the use of data augmentation within Bayesian polygenic and longitudinal polygenic models to produce k complete datasets. The data augmentation, or imputation step of the Markov chain Monte Carlo, takes into account the observed familial information and the observed subject information available at other time points. These k complete datasets can then be used to fit single time point or longitudinal pedigree models. By producing a set of k complete datasets and thus k sets of parameter estimates, the total variance associated with an estimate can be partitioned into a within-imputation and a between-imputation component. The method is illustrated using the Genetic Analysis Workshop simulated data.

Original languageEnglish (US)
Pages (from-to)52-60
Number of pages9
JournalGenetic epidemiology
Issue number1
StatePublished - Jan 2008


  • Bayesian
  • Data augmentation
  • Linkage analysis
  • Longitudinal
  • Markov chain Monte Carlo
  • Variance components

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)


Dive into the research topics of 'Missing phenotype data imputation in pedigree data analysis'. Together they form a unique fingerprint.

Cite this