Trust in data is of paramount importance to both data professionals and customers. As a data-driven professional, you need to know how to quickly identify faulty data, what to ask for in the process, and how to hold a vendor accountable.
Guidelines for spotless data deliveries
1. Your data is vouched for
Before anything else, ensure somebody cleaned the data, completed the quality assurance process on data collection, and the data set is robust. Don't shy away from asking vendors about the quality assurance process. There are a lot of simple techniques to quickly assess the cleanliness and completeness of data.
2. Your data has no blind spots
Raw spreadsheets are eager to tell you their story! Check that you did not miss anything obvious. There are plenty of quick ways to explore data sets and identify what stands out. Slicers, pivots, conditional formatting, sparklines, and basic consistency checks are just a few examples of easy and attainable ways to ensure nothing obvious gets left out.
3. Your data is proofed and crosschecked
The devil is in the details. Use checklists and crosschecking mechanisms to verify the completion of all steps and lack of errors. Proofing by a native speaker may also prove essential to avoid write-up or interpretation mistakes. Multiple reads and examination with a "fresh pair of eyes" is also a must.
Businesses lose as much as 20% of revenue due to poor data quality (Kissmetrics).
4. Your data is consistent with similar research
Consistency is key. Your data results should be consistent with well-established facts and should align with other similar studies, especially if you followed the same methodology. If this is not the case, make sure you provide a good explanation from the get-go. Challenging and confusing your audience will decrease the chances of it validating your claims. So do your due diligence!
5. Your audience is welcomed to challenge your data
Welcome challenges! Challenges are inevitable, and overcoming them successfully will further validate your claims. Your results will surely get challenged. You might as well embrace it! To make a start, list all questions that may challenge your results and prepare an answer for each. This sounds simple but goes a long way.
Guarding yourself against both intentional and unintentional data collection errors
If you don’t make mistakes it means that you are not trying. Mistakes are normal and can easily be corrected through a healthy quality control process. However, there are cases when mistakes are made intentionally and most often during the data collection process. In order to boost their gains, some data collectors may knowingly deploy a series of tactics that can have a serious impact on the quality of your data.
One traditional example of these tactics is when interviewers remember the addresses of friendly respondents from difficult-to-find targets, then repeatedly approach those respondents in different waves of a study, disregarding the rules of random sampling. Mingling with selection filters, scripting, timestamps, or location on collection devices are other examples. Also, using machine learning algorithms to collect data may have similar intended or unintended consequences as well. You always need to be one step ahead as most errors cannot escape an array of elementary crosschecking mechanisms.
I believe that the real challenge consists in creating a habit of following a healthy cross-checking process based on well-defined checklists. Hope you enjoyed this article. Check out guidetoinsights.com for more!