Quantifying R2 bias in the presence of measurement error

Karl D. Majeske, Terri Lynch-Caris, Janet Brelin-Fornari

Research output: Contribution to journalArticlepeer-review

Abstract

Measurement error (ME) is the difference between the true unknown value of a variable and the data assigned to that variable during the measuring process. The multiple correlation coefficient quantifies the strength of the relationship between the dependent and independent variable(s) in regression modeling. In this paper, we show that ME in the dependent variable results in a negative bias in the multiple correlation coefficient, making the relationship appear weaker than it should. The adjusted R 2 provides regression modelers an unbiased estimate of the multiple correlation coefficient. However, due to the ME induced bias in the multiple correlation coefficient, the otherwise unbiased adjusted R 2 under-estimates the variance explained by a regression model. This paper proposes two statistics for estimating the multiple correlation coefficient, both of which take into account the ME in the dependent variable. The first statistic uses all unbiased estimators, but may produce values outside the [0,1] interval. The second statistic requires modeling a single data set, created by including descriptive variables on the subjects used in a gage study. Based on sums of squares, the statistic has the properties of an R 2: it measures the proportion of variance explained; has values restricted to the [0,1] interval; and the endpoints indicate no variance explained and all variance explained respectively. We demonstrate the methodology using data from a study of cervical spine range of motion in children.

Original languageAmerican English
JournalJournal of Applied Statistics
Volume37
DOIs
StatePublished - Apr 1 2010

Keywords

  • measurement error
  • regression analysis
  • R2
  • bias correction
  • gage RR

Disciplines

  • Mechanical Engineering

Cite this