Human-Centered Data Collection for Machine Learning: Lessons from Survey Research

Abstract

Machine learning model performance depends critically on training data quality, yet many researchers lack systematic approaches to human data collection. This workshop applies established principles from survey methodology to improve data labeling practices in computational research. We’ll examine how labeler characteristics, task design, and sampling strategies affect data quality and ultimately robustness, fairness, and generalizability. Participants will learn practical strategies for recruiting diverse labelers, designing clear annotation tasks, and identifying quality issues. The session will also address ethical considerations in crowdsourced labeling and provide concrete recommendations for improving data collection workflows.

Date
May 12, 2026 12:00 AM — 5:30 PM
Event
AAPOR 2026 Short Course
Location
Santa Anita C - Lobby Floor

Instructors

  • Stephanie Eckman (University of Maryland)
  • Andrew Gordon (Prolific)
  • Frauke Kreuter (LMU Munich)