Data Checking
For more than a decade, our lab has been conducting research on data entry and data checking. We’ve conducted the Data Entry Study, the Double Entry Study, the Data Checking Study, and the Data Verification Study. In the Data Entry Study (Barchard & Pace, 2011), we showed that, in psychology, single entry leads to error rates of around 1%. Most of those errors are within the allowable range for the variables and thus would be invisible on histograms and frequency tables. Even such small errors can completely change the results of a study, making significant t-tests non-significant and reversing the sign of a correlation. Therefore, it is essential to check every single item. Two item-by-item data checking methods were examined: visual checking and double entry. This study showed that visual checking was no more accurate than single entry: Both resulted in roughly 30 times as many errors as double entry.
The Data Checking Study (Barchard & Verenikina, 2013) used a new study design. In this study, we entered the data sheets into the computer before participants arrived. We deliberately introduced a few errors when we were entering the data and participants’ job was to locate and correct those errors. This study showed that partner read aloud and visual checking left more than 20 times as many errors in the dataset as double entry. These two studies lead us to recommend double entry whenever it is feasible.
The Data Verification Study (Barchard, Ochoa, & Stephens, under review) used a similar design. As expected, we replicated the finding that double entry is significantly and substantially more accurate than visual checking and partner read aloud. However, double entry participants did make a few errors. This study went beyond previous research by carefully examining those errors. We discovered that double entry participants always made the two sets of entries match each other, but sometimes they did not make the entries match the data sheets: Sometimes they made their entry match an incorrect original entry and sometimes they introduced new errors into the dataset. This gave us an idea about how we could make double entry even more accurate: We need to focus participants’ attention on getting the entries to match the original data sheets, rather than getting the two entries to match each other. One way to do that would be to have a third person compare the two sets of entries and correct the errors, rather than having one of the original data entry people do those corrections.
Although double entry is the most accurate method, it isn’t always possible. Many online forms are set up so that you can only enter the data once. For example, I can only enter the data once when I’m entering course grades into the Student Information System and when I’m entering account information into a journal website. And sometimes double entry would be possible, but the researcher does not want to spend the extra time. The three studies above consistently showed it is the slowest data entry method. Therefore, the Data Verification Study also examined a new data checking method. In solo read aloud, a single person reads the original data sheet out loud while checking the entries on the computer screen. Solo read aloud was not as accurate as double entry, but it was faster. Moreover, for people with no previous data entry experience, solo read aloud had fewer errors than visual checking and only about half as many errors as partner read aloud (in which one person reads the data sheet out loud and a second person looks at the computer screen). We therefore recommend solo read aloud when double entry is not possible.
However, researchers rarely use double entry, even though it would usually be possible. In the Double Entry Study, we found that most people use visual checking or only check a small portion of their data. To make it easy for researchers to do double entry, our lab has developed the Poka-Yoke Data Entry System. This is an Excel-based double entry system that allows checking of mismatches and out-of-range values. We distributed our first version of the Poka-Yoke system in 2006 and raised awareness about this system with a 2008 publication (Barchard & Pace). We have made several improvements to the system over the last decade. The most recent version (Barchard, Bedoy, Verenikina, & Pace, 2016) is designed so that you can’t see the first entry when you are doing the second entry, which makes it easier for two people to do the entries and which should increase accuracy even if a single person does both entries. Like previous versions, it highlights mismatches and out-of-range values to make error-correction faster, but it also allows researchers to specify the allowable values for a variable (e.g., Male, Female), lets users cut-and-paste data from one location to another, and lets users insert and sort the rows as needed. Moreover, it does all this without using macros the way previous versions did (some users don’t trust macros created by other people). We hope that our free double entry system will encourage more researchers to use double entry.
Over the last decade, so many of you have been involved in our data checking research – as administrators, formatters, and scorers, as poster team members and literature review authors, and as Excel experts working on revisions to the Poka-Yoke Data Entry System. Thank you all for your help in developing and promoting better data checking methods. Let me know if you have any suggestions for further improvements to data checking methods.
References
Barchard, K. A., Bedoy, E. H., Verenikina, Y., & Pace, L. A. (2016, May). Poka-Yoke Double Entry System Version 3.0.76. Excel 2013 file that allows double entry, checking for mismatches, and checking for out of range values. Available at http://faculty.unlv.edu/barchard/doubleentry/ or from Kimberly A. Barchard, Department of Psychology, University of Nevada, Las Vegas, kim.barchard@unlv.edu
Barchard, K. A., Ochoa, E., & Stevens, A. K. (under review). Double entry and solo read aloud are the two best data checking methods. Manuscript in preparation.
Barchard, K. A., & Pace, L. A. (2008). Meeting the challenge of high quality data entry: A free double-entry system. International Journal of Services and Standards, 4, 359-376.
Barchard, K. A., & Pace, L. A. (2011). Preventing human error: The impact of data entry methods on data accuracy and statistical results. Computers in Human Behavior, 27, 1834-1839. doi:10.1016/j.chb.2011.04.004
Barchard, K. A., & Verenikina Y. (2013). Improving data accuracy: Selecting the best data checking technique. Computers in Human Behavior, 29, 1917–1922. doi:10.1016/j.chb.2013.02.021
Kimberly A. Barchard is the Director of the Interactive Measurement Group. She is an Associate Professor in the Department of Psychology at UNLV. She joined UNLV in 2001 after obtaining her MA and PhD in Psychometrics. She works to empower students and colleagues to accomplish their personal and professional goals, particularly through the development of leadership, communication, and research skills.
Amber Stephens has been a part of the Interactive Measurement Group since Fall 2015. She graduated with a BA in Psychology with a minor in Sociology from UNLV in 2016. She will be joining the MA in Communication Studies at UNLV in Fall 2017. Eventually, she hopes to get her PhD in Social Psychology and do research regarding the self and intimate relationships.
Elizabeth Ochoa was a part of the Interactive Measurement Group between Spring 2014 and Summer 2016. She graduated with a B.A. in psychology from UNLV in 2015. Since Fall 2016, she has been attending the Masters in Public Health/ Peace Corps Masters International program at Colorado State University. She plans to get her PhD in Community Psychology or Public Health, so she can work in academia and mentor students and do research on violence prevention.