I have a Ph.D. in Statistics & Methodology. I can help you collect high quality data and better understand the quality of the data you already have.
Motivated misreporting occurs when respondents give incorrect responses to survey questions to shorten the interview; studies have detected this behavior across many modes, topics, and countries. This paper tests whether motivated misreporting affects responses in a large survey of household purchases, the U. S. Consumer Expenditure Interview Survey. The data from this survey inform the calculation of the official measure of inflation, among other uses. Using a parallel web survey and multiple imputation, this paper estimates the size of the misreporting effect without experimentally manipulating questions in the survey itself. Results suggest that household purchases are underreported by approximately 5 percentage points in three sections of the first wave of the survey. The approach used here, involving a web survey built to mimic the expenditure survey, could be applied in other large surveys where budget or logistical constraints prevent experimentation.
The U.S. Consumer Expenditure Interview Survey asks many filter questions to identify the items that households purchase. Each reported purchase triggers follow-up questions about the amount spent and other details. We test the hypothesis that respondents learn how the questionnaire is structured and underreport purchases in later waves to reduce the length of the interview. We analyze data from 10,416 four-wave respondents over two years of data collection. We find no evidence of decreasing data quality over time; instead, panel respondents tend to give higher quality responses in later waves. The results also hold for a larger set of two-wave respondents.
Several studies have shown that high response rates are not associated with low bias in survey data. This paper shows that, for face-to-face surveys, the relationship between response rates and bias is moderated by the type of sampling method used. Using data from Rounds 1 through 7 of the European Social Survey, we develop two measures of selection bias, then build models to explore how sampling method, response rate, and their interaction affect selection bias. When interviewers are involved in selecting the sample of households or respondents for the survey, high reported response rates can in fact be a sign of poor data quality. We speculate that the positive association detected between response rates and selection bias is because of interviewers’ incentives to select households and respondents who are likely to complete the survey.
Administrative data are increasingly important in statistics, but, like other types of data, may contain measurement errors. To prevent such errors from invalidating analyses of scientific interest, it is therefore essential to estimate the extent of measurement errors in administrative data. Currently, however, most approaches to evaluate such errors involve either prohibitively expensive audits or comparison with a survey that is assumed perfect. We introduce the “generalized multitrait-multimethod” (GMTMM) model, which can be seen as a general framework for evaluating the quality of administrative and survey data simultaneously. This framework allows both survey and administrative data to contain random and systematic measurement errors. Moreover, it accommodates common features of administrative data such as discreteness, nonlinearity, and nonnormality, improving similar existing models. The use of the GMTMM model is demonstrated by application to linked survey-administrative data from the German Federal Employment Agency on income from of employment, and a simulation study evaluates the estimates obtained and their robustness to model misspecification. Supplementary materials for this article are available online.