Skip to main content

Data Publication: Controlling access to published data

This guide will give you practical hints and tips to publish your data and ensure that it is findable, accessible, usable and citable. Let's publish data well!

Access conditions

Publishing your data doesn’t necessarily mean that your data must be available to anyone and everyone. If there are concerns that releasing your data openly could cause harm to someone or something, or have other negative consequences, then you can choose to apply specific restrictions on how people can gain access to your data.

The terms that we use to describe the different access levels, (open, mediated, closed and embargoed), are widely used, but different institutions or repositories may use slightly different terms or definitions. Before you publish, check how your chosen repository refers to the access levels that they provide to make sure you choose the right one.

Access conditions depicted along a spectrum starting from Closed, then Mediated, then Embargoed, then Open. The Closed condition is depicted by a closed padlock and has the text description "A record of the data is published, but access is not granted to the data". The Mediated condition is depicted by a key and has the text description "A record of the data is published and provides information on how to apply for access to the data. Access is approved by the researcher(s) or a nominated data custodian". The Embargoed condition is represented by a clock with a period of time indicated and has the text description "The data can’t be accessed until the end of a specified time period, after which it becomes openly accessible". The Open condition is depicted by an open padlock and has the text description "The data is freely available to be accessed by anyone".

Levels of access

Open access

There are no restrictions on access to the data; anyone can view and download a copy.

This ease of access makes your data more likely to be reused and makes it possible for others to verify the results of your research. Open access is the best choice for publishing data that aren’t sensitive, such as most non-human data. Provided consent has been obtained from participants, human research data may also be published through open access, usually in a non-identifiable form.

ExampleAdjudicated dataset for “Spin” in published biomedical literature: A methodological systematic review is an openly accessible dataset published in the University of Sydney’s institutional repository. You can download the data as a CSV file directly from the dataset record, and you don’t need to request access from anyone.

Embargoed access

A description of your dataset is published, including information such as the dataset title, who created it, and what the data are; however, the dataset is inaccessible until after a specified period of time has elapsed. At the end of the embargo period, data will become available by either open or mediated access, depending on the option that you’ve selected.

Examples of situations where you might wish to apply an embargo to your data could include wanting to publish research based on the data before making the data accessible, or needing to finalise a commercial benefit resulting from the data, such as a patent, before releasing the data.

Mediated access

A description of your dataset is published, including information such as the dataset title, who created it, and what the data are; however, others won’t be able to access the data until after they apply and have their application approved. Conditions of access are usually set by the owner or submitter of the data and may include providing proof that the requester is a genuine researcher and that they have ethical approval from their own institution to undertake the research.

Mediated access enables data to be shared and reused by other researchers while reducing the risk of any harm that might result from a wider release of the data. Mediated access is a good choice for sensitive data, such as identifiable human data, or data for which there is a significant risk of re-identification, although consent is still needed to publish human data through mediated access.

ExampleEffect of glass markings on drinking rate in social alcohol drinkers (study one) is a mediated access dataset available from the University of Bristol’s data.bris research data repository. You can’t access the data directly, but the dataset record tells you how to request access, by filling out the data request form. Note that the data.bris repository uses the term ‘restricted’ to refer to what we define as mediated access.

Closed access

A description of your dataset is published, including information such as the dataset title, who created it, and what the data are; however, the dataset is inaccessible and there is no process in place to allow others to apply for access to it.

Ensuring that a record of the dataset is made available informs others of research that has been done, and that the data exist, even if they are too sensitive to be shared beyond the original research team. Closed access is rarely used, but it might be an appropriate choice if: you need to securely archive sensitive data; you have published a version of the data via open or mediated access with sensitive information removed, and you would like a record of the original, unmodified version of the data; or you have worked on developing a dataset, but don’t have the right to publish the data.

Example[Chemical Warfare - Trials:] Munitions Supply Laboratories. Abstract reports indexed in records of Chemical Defence Board [Part 3 of 23] is a closed record from the National Archives of Australia RecordSearch. Information about the item and the reason for restricting access are provided, but there is no method to apply for access available.

Metadata only records

Some repositories allow you to publish a record of your dataset that includes information describing the data, but that doesn’t directly provide access to the dataset itself. This is called a metadata only record. There are a variety of reasons that you may wish to publish a metadata only record, including:

  • You’ve previously published the data elsewhere, but you need to publish a record in a specific repository to fulfill funder or institutional requirements. The metadata only record should provide a link to the original location of the data.
  • You’ve previously published the data elsewhere, but you would like to make your data more findable by publishing metadata only records for the data in other repositories. These metadata only records should provide links to the original location of the data.
  • Your data are sensitive and can only be shared with people who meet specific conditions. The metadata only record will include information on how to apply for access to the data. See Mediated access above.
  • You don’t have the right to publish the data, but want to let people know of the work that has been done and the data that have been created. See Closed access above.
  • Your data are physical, such as tissue samples or rocks, rather than digital. The metadata only record will include the physical location where the data are stored and information as to how the data can be accessed.
Example: Acacia cyclops - AdaptNRM module 2: Invasive plant species and climate change is a record in Research Data Australia for a dataset that is actually stored at the CSIRO’s Data Access Portal. If you click on Go to Data Provider in the Research Data Australia record, you will be taken to the original CSIRO record of the dataset where you can see and download the data. The metadata only record in Research Data Australia was created to increase the visibility and findability of the dataset.

Culturally sensitive data

Data collected from or in collaboration with people or communities from some cultural backgrounds may have additional sensitivities or protocols regarding access and reuse. Before you publish any data related to Aboriginal and Torres Strait Islander peoples, for example, you must work with the community to identify any sensitive material and local protocols for access used by the community. You must ensure that your published data follows any identified protocols and protects secret or sacred information. This may include applying specific restrictions on accessing the data, such as using mediated access to allow only community members or researchers who have been given community approval to be granted access to the data. If you do restrict access to all or part of your data, it’s important to put a sustainable process in place to ensure that people who have the right to access the data, such as members of the community, will always be able to do so.

Mediated access scenarios

Mediated access, whereby you set conditions on when and how access is granted, is all about allowing your data to be reused in a safe way that protects your research participants from harm. If your data are sensitive, and the sensitive information can’t be removed without the dataset losing value, then mediated access is a good option to choose when publishing.

Scenario 1 – Identifiable domestic violence data

A researcher investigating domestic violence collects identifiable data from their research participants. The researcher obtains consent from the participants to share the identifiable data via mediated access for specific research purposes. After the research is published, other researchers who wish to access the data will need to meet two criteria. They must demonstrate that they will be using the data for the specified purposes and that they have gone through an appropriate ethics approval process. Researchers who prove that they meet both criteria will be granted access to the data.

Scenario 2 – Culturally sensitive data

Researchers working with a community collect culturally significant data that they hold in trust for that community. The researchers publish the data in a repository with specific conditions regarding who may access it. Only researchers who have been given community approval to use the data in their research can access the data, but any community member can access the data at any time. Researchers have to go through a community approval process to access data, but community members can access the data without needing to go through the approval process.

Scenario 3 – Potentially re-identifiable health data

A team of medical researchers collect data on a rare health disorder from patients. The researchers obtain consent to publish a non-identifiable version of the data via open access. However, due to the rare nature of the disorder, a risk assessment of the dataset highlights a risk of potential re-identification if the data were to be used in linked data projects. In order to mitigate the risk, the dataset is published via mediated access, and researchers are only granted access to the dataset if they sign an access agreement stating that they will not be linking the data to other datasets.

Support

For further assistance in publishing your research data, please contact the Digital Curation and Data team.