Skip to Main Content

Data publication: Step 3 - Describe your data

This guide will give you practical hints and tips to publish your data and ensure that it is findable, accessible, usable and citable. Let's publish data well!

[Findable] [Accessible] [Interoperable] [Reusable] - Describe your data

Describing your data well ensures that your data can be understood, discovered, and used by any user. It is fundamental to capture contextual details about how and why the data were created.

Data documentation & metadata

To describe your data, you should address these questions below: 

  • What it is?
  • How will it be collected? By who?
  • How much data will be generated?
  • What data formats do you use?
  • Any equipment or software used? 
  • Is any personal identifiable information or confidential data?
  • Are you using data that someone else produced? If so, where is it come from?
  • What programs or code is needed to read or understand these files?
  • What was changed? New project members? New methods? When did it happen? Why?
  • Who will be a custodian of the data?

To ensure discoverable, understandable and reuseable by other researchers, you should document your data:

Example of a well-described dataset: Data and scripts for evaluation of researcher training in spreadsheet curation

 

Metadata

Metadata is a subset of data documentation. Metadata, that is, structured or schematised information about your data, describes the purpose, origin, creator, time, geographic location, access, and terms of use of the data. Information in the metadata is used to retrieve and index data in a repository or archive, for creating a citation for the dataset, and can be harvested through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) for data sharing.

The repository that you choose to publish in will have a form for you to fill out so that you can describe your data. You should fill in as many fields as possible to help users understand and reuse your data. If the repository is general, like figshare, use this README template to help give a more in-depth description of your data.

Examples of metadata that you can keep track of include:

  • creator/collector
  • title of dataset
  • date of collection
  • location data was collected
  • description
  • format of data
  • language

Many computer systems also create additional technical metadata about your files, such as file size and the date the file was last modified. 

Metadata standards

Some fields of research have developed specific metadata standards that set out the types of information about your data that should be documented. These metadata standards ensure you have a complete, standard set of information about each part of your data and enable your dataset to be organised with other datasets. If you’re working with large datasets, databases, or data management systems, then you should contact ses.admin@sydney.edu.au for advice on metadata standards that might be appropriate for your area of research, or you can view different standards by discipline on the Digital Curation Centre’s website.

Examples of metadata standards; 

  • FGDC (Federal Geographic Data Committee)
  • DDI (Data Documentation Initiative)
  • Dublin Core
  • Darwin Core
  • ABCD (Access to Biological Collections Data)
  • AVMS (Astronomy Visualization Metadata Standard)
  • CSDGM (Content Standard for Digital Geospatial Metadata)

Vocabularies

Vocabularies

Vocabularies are controlled lists of terms that can be used for describing data so that it can easily be found. Vocabularies can be on any subject. They can range from short simple lists to very long complex hierarchies of terms organised into tree structures.

Sometimes these more complex vocabularies are referred to as ontologies, taxonomies, or thesauri. Strictly speaking, vocabularies are simple lists of terms, whereas ontologies include the contextual relationships between the terms.

 

Taxonomies

Taxonomies are ontologies that classify terms into hierarchical arrangements, and thesauri are ontologies that provide pointers to synonyms or alternative terms. However, such words for describing vocabularies are often used interchangeably.

Here, the term vocabulary will be used as an umbrella term, to cover all types of vocabularies, ontologies, taxonomies, and thesauri.

 

Why are vocabularies useful?

Using terms from a controlled vocabulary to describe your data means that the metadata (the data about your data) you create will be more consistent and easier for other researchers to understand and find. If you use a controlled vocabulary, then you can be certain that the terms you use are not only consistent across your own dataset, but are also consistent with all the other datasets in your field that use the same vocabulary. Vocabulary terms are well documented and clearly defined, so if you use a vocabulary to describe your data, then other researchers will be able to look up what those terms mean and thereby understand them.

 

Finding vocabularies

Search the following registries or portals for suitable vocabularies to help you describe your data:

 

Using vocabularies

There are several ways of using vocabularies. For example:

  • As a guide for selecting appropriate terms when developing file naming conventions
  • For creating pre-populated dropdown lists in survey tools or spreadsheets
  • When developing custom databases or research software applications

If you're using a vocabulary as a reference tool, you may wish to bookmark it, or save a copy of it, so that you can browse it when you’re creating your data documentation. You can use it to look up appropriate vocabulary terms for describing your data when you need to.

RightField is a tool that lets you use an existing vocabulary to create a dropdown list in Microsoft Excel spreadsheets. The tool was developed by researchers for researchers and is free and open source.

RightField integrates with BioPortal, so vocabularies published on there can be used directly. However, it’s also possible to download vocabularies discovered through other vocabulary registries and portals for use with RightField.

Not sure where to start looking for vocabularies?

Resources

Guidance on organising dataUK Data Service

Naming convention, University of Edinburgh

ANDS metadata page, Australian National Data Service

Guide to writing “readme” style metadata, Cornell University.

DataCite Metadata Schema

Support

For further assistance in publishing your research data, email us!

Aboriginal and Torres Strait Islander peoples are advised that this website may contain images, voices and names of people who have died.

The University of Sydney Library acknowledges that its facilities sit on the ancestral lands of Aboriginal and Torres Strait Islander peoples, who have for thousands of generations exchanged knowledge for the benefit of all. Learn more