Skip to main content

Data Publication: Step 5 - Choose a suitable licence

This guide will give you practical hints and tips to publish your data and ensure that it is findable, accessible, usable and citable. Let's publish data well!

[Reusable] - Choose a suitable licence

Applying a licence to your research data will help other people understand how they can reuse the data and what kind of things you do and don’t want them to do with it. More information about publishing research data and copyright can be found at Publishing your research.

Which licence should you choose?

Funders, journals, or your selected repositories may specify a licence that you must use. Make sure you check which licence to apply, and that you’re happy with the reuse that will be possible for others under that licence, before publishing your data.

If there are no licensing requirements for the data that you’re publishing, then you can choose a licence based on how you want others to reuse your data. Creative Commons provide a set of six different licences that put various conditions on reuse. All Creative Commons licences require that anyone making use of your data attributes you. The Creative Commons Attribution Licence 4.0 (CC BY) is often a good choice for licensing data, however you can choose to apply whichever licence you feel best suits your data. Creative Commons have a handy licence picking tool to help you select the best licence based on how you want others to reuse your data. 
 

It’s recommended that you don’t use the No Derivatives condition when licensing datasets, as that condition could disallow most of the activities that people would want to undertake when reusing data. For example, combining your data with other datasets or even simply creating a plot using your data would likely be forbidden to others if you applied the No Derivatives condition to your data. Creative Commons provides detailed information about the use of their licences with data and databases.

 

Other licences and waivers exist, such as the Open Data Commons licences for data, GNU software and documentation licences, or CC0 Public Domain Dedication. If you wish to use any of these, or another alternative licence for your work, or if you want any assistance in selecting a licence for your data, you should contact the Library’s Copyright team.

Applying a licence to your data

There are a number of ways that you can apply a licence to your dataset. In almost all cases it’s a good idea to add a rights statement in a prominent place within your dataset, as well as at the location where your data will be hosted. The statement should include the name of the licence you’ve chosen and the URL at which the full text of the licence can be found. If you use the Creative Commons licence picker, you’ll find that it creates a rights statement based on the licence you’ve chosen that you can copy to add to your dataset.
 

Example Creative Commons rights statement:
[Name of dataset] is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

 

Good, prominent places to include a rights statement in your dataset are in the top level of your folder structure as part of your readme text file or as a separate licence.txt plain text file, on the first page of your data dictionary, or as part of the standard template for your interview transcripts.

If you’re publishing your data via a repository or journal, you will usually be asked to specify licence information when you submit your data. This allows the repository or journal to put a rights statement up on your dataset’s record. The rights statement is usually put up in a machine-readable form as well, making your data more discoverable and reusable.
 

An example of a CC BY rights statement clearly displayed on a dataset record is seen in the Earth Sciences repository Pangaea.

 

If you are publishing your data on a website, you should make sure to include the rights statement on the page where people will be able to download your dataset. The Creative Commons licence picker also creates the HTML code for the rights statement of your chosen licence, allowing you to easily add the statement and licence icon to your website.

Changing or removing a license

Releasing data under a licence that allows for reuse means that anyone with a copy of your data can continue to make use of and distribute the data under the original conditions of the licence. This can be a problem if you decide that you want to restrict certain reuse activities that were allowed under the original licence, or if you no longer wish to allow your data to be reused. You can apply new, more restrictive licence conditions to a new version of your data, however, the data distributed under the original licence will still be usable under the original licence conditions.

Changing to a more permissive licence for your data is less problematic, as you will then be publishing a version of your data that allows for additional types of reuse, beyond what was permitted under the licence you originally selected. For example, you could publish a dataset under a Creative Commons Attribution Non-Commercial licence (CC BY-NC 4.0), and then later decide that you are happy for people to use the data commercially. You could then update the licence to be a Creative Commons Attribution licence (CC BY 4.0). Anyone who wanted to use the data for commercial purposes would then be able to use the new version of the data, and all previously allowed uses of the data would still be permitted.

Support

For assistance in understanding copyright issues with your data or help with selecting the best reuse licence, contact Copyright Services.