Skip to main content

Finding data: Home

Finding datasets to use in research and teaching

People are increasingly making datasets that they've created and compiled available for others to use. Finding the data most useful to you can be a bit of a challenge, as there are many different ways that people can choose to release their data. This guide runs through the three steps of conducting an efficient search for data:

Step 1 - Identify your dataset needs
Step 2 - Search
Step 3 - Assess what you find

If you want any advice or assistance with your search, please contact researchdatasupport@sydney.edu.au or your Academic Liaison Librarian.

Step 1 - Identify your dataset needs

Make your search for data as efficient as possible by figuring out a few things before you get started. Think about and answer the following questions:

  • What information must be included in the dataset for it to be useful to me? This could include required fields, information about the spatial or temporal coverage of the data, or a description of any processing or modifications that have been made to the data.
  • Which file format(s) am I able to work with? If you are planning on analysing data using specific software, ensure that you know the file formats that can be used with that software.
  • Are there any reuse licences that are incompatible with how I will use the data? For instance, if you plan on commercialising your research, then you should avoid datasets that stipulate a non-commercial condition in their reuse licence.

Knowing the answers to these questions will help you perform an effective search by identifying the keywords, filter conditions and data sources that are most appropriate for the data that you want to find.

Step 2 - Search

There are a number of different strategies you can use to search across the various places that people make data available. The strategies are ordered from the simplest to those that require more effort or may turn up less relevant material.

Strategy 1 (Simplest) - Google's Dataset Search

Strategy 2 - Data repositories and archives

Strategy 3 - Library and other University subscribed data sources

Strategy 4 - General internet search

Step 3 - Assess what you find

After you've found a dataset that you think might be useful to you, make sure that you assess it for relevance and quality to ensure that you don't waste time trying to analyse data that doesn't meet your needs.

 

Relevance

Use the metadata associated with the dataset to make sure that it meets all of the criteria that you established before you started searching. Double check that:

  • The coverage of the dataset is sufficient for your needs
  • The file format is compatible with the software that you plan on using for analysis
  • The reuse licence applied to the dataset permits the activities that you will use the data for

 

Understandability

Look for readme files or other documentation that describes the dataset. To be understandable and usable a dataset must include:

  • Definitions for any technical terms, data codes or variables used
  • Description of data collection procedures
  • Description of data processing methods, including both the cleaning and analyses that were undertaken

After reading the documentation, you should be able to understand what information is contained in the dataset and what can and cannot be done with the data.

 

Trustworthiness

Consider the trustworthiness of the data and the data source. Ask yourself:

  • Has the data been produced by a reputable source, such as a well-known organisation or researcher active in the field?
  • Is there enough descriptive information about the data to satisfy you the original data collection and processing is trustworthy?
  • Have the data been produced according to current best practices, or were outdated collection and analysis methods used?