Research Guides: Data Literacy: Getting Started with Data: Quality of Data Found Online

Data Quality Checklist

Not all data is created equal, and finding reliable information online can be a challenge. Before you use a dataset in your project, take a moment to evaluate its quality. Here’s a simple checklist to help you assess whether the data you find is trustworthy.

Trace Back to the Original Source

Trace Back to the Original Source: The website you find a dataset on may only be sharing a piece of the original dataset or hosting it on behalf of the individual or organization who collected it.

Who Collected This Data? Check who is behind the dataset. Are they a recognized expert or organization in the field?

What’s Their Expertise? Look for information about their background and credentials.

Affiliations: What institutions or organizations are they associated with? This can impact the credibility of the data.

Complete Dataset? Make sure you’re looking at the full dataset, not just a snippet or summary.

Data Interpretation: Has the data been analyzed or interpreted already? Understanding this helps you spot any potential biases.

Example: Let’s say you find a dataset on the latest social media trends, like TikTok challenges. First, check if it’s from a reputable source like a well-known digital marketing agency or research firm, rather than just a viral TikTok influencer’s opinion. Make sure the dataset links back to the original source that collected the data, such as a analytics company, to ensure you’re not just seeing a cherry-picked summary or filtered snapshot. Look for details on how the data was gathered and if any analysis was done to understand potential biases, such as how the data might have been skewed by certain trends or seasonal effects.

Understand How the Data Was Collected

Collection Methodology: Look for details on how the data was gathered. Is it consistent with industry standards?

Scope and Sample Size: Ensure the data’s sample size and scope are appropriate for the research question.

Potential Biases: Consider any biases that might have influenced how the data was collected.

Example: Suppose you stumble upon a dataset about trends in gaming habits collected from a popular gaming forum. Check if the data was collected from a wide variety of gamers, including different ages and gaming preferences, or if it’s biased towards one specific group of players. Ensure that the sample size is large enough to represent the broader gaming community. For example, data gathered only from hardcore gamers on a niche forum might not accurately reflect the habits of casual gamers or those who don’t frequently participate in online communities. Additionally, consider any potential biases, such as if the data was collected during a major gaming event which could skew the results.

Consult an Expert

Seek Guidance: If you're ever uncertain about a dataset, don't hesitate to reach out to someone for help. This could be your course instructor or a librarian who can help you assess the data's reliability and relevance for your research.

Example: Suppose you come across a dataset about the latest trends in streaming services, like the rise of new platforms or changes in viewer habits. If you’re unsure about its credibility or how to interpret it, ask your professor or librarian. They can offer insights on whether the data is trustworthy or point you to better resources, much like how you'd ask for help on a tricky homework problem.