Data Quality, Data Centric AI
Feb 13, 2023
Data scientists receive little training in how to collect data, leaving them unprepared when they need to quickly gather annotations.
My background in survey data collection has taught me how to collect high quality data; my expertise can save data scientists money and time in model development.
Stephanie Eckman
Principal Research Scientist
Building better AI through better data. I apply survey methodology to improve training data quality for machine learning.
Publications
Crowd-sourced labels don’t always reflect your target users. PAIR reweights training data so your model learns from the perspectives that matter most for your application.
Stephanie Eckman, Bolei Ma, Christoph Kern, Rob Chew, Barbara Plank, Frauke Kreuter
AI models are only as good as their training data. This paper shows how 50+ years of survey science can help ML researchers collect better data - leading to fairer, more accurate models.
Stephanie Eckman, Barbara Plank, Frauke Kreuter
The order you show items to annotators matters. We found that changing the sequence of examples can shift labeling decisions - another reason to carefully design your annotation pipeline.
Jacob Beck, Stephanie Eckman, Bolei Ma, Frauke Kreuter
Small changes in how you ask annotators to label data can dramatically change your model’s behavior. We tested 5 versions of a hate speech labeling task and found significant differences in model performance.
Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter
How you design a labeling interface affects the labels you get. We show that task structure, ordering, and annotator backgrounds all shape training data quality.
Jacob Beck, Stephanie Eckman, Rob Chew, Frauke Kreuter
Events
Dec 16, 2025 9:00 AM
Stephanie Eckman
May 6, 2025 9:00 AM
New York City, NY
Stephanie Eckman
5 minute lightening talk on methods to collect annotations for NLP models
Aug 22, 2023 6:30 PM
Arlington, VA
Stephanie Eckman