Skip to main content

Data Publication: Step 1 - Determine if you can publish your data

This guide will give you practical hints and tips to publish your data and ensure that it is findable, accessible, usable and citable. Let's publish data well!

Determine if you can publish your data & being FAIR

Before you publish your data, you should consider if you legally and ethically can publish your data. Most of the time there won’t be a problem, but it’s good to be informed before you publish. Two of the most important things to consider before publishing your data is if your data is too sensitive to publish and if you have the rights to make the data available.  

Then, you can assess whether your data is fair (Finable, Accessible, Interoperable, Reusable) to publish by using ANDS FAIR data self-assessment tool.  

Is your data sensitive?

Sensitive data may contain:

  • details about a person that are identifiable and could put someone at risk if made available, eg their name along with information about their political association or religious beliefs
  • information about an endangered species, eg the location of the species
  • culturally sensitive information, eg information relating to a cultural practice
  • commercially sensitive information, eg a trade secret or a patent
  • information that poses a risk to national security (by law, data that poses a risk to national security must not be published under any circumstances).

You may still be able to publish data that contains sensitive information, however extra steps need to be taken as a precaution such as deidentifying data and setting access conditions so that approval is needed before others can access your published data.

Find more information on how to publish sensitive data. 

Do you have the rights to publish the data?

Understanding if you have the rights to publishing research data can be confusing. University of Sydney staff and students should consult the Intellectual Property Policy 2016. Depending upon the circumstances of how your research was conducted, there may be other things you should consider. A few common situations where further consideration may be needed are:

You’ve used data collected by someone else

If you’ve used data collected by someone else or an organisation, like the Australian Bureau of Statistics or the Department of Health, you’ll need to check what your rights are for the data. If the data is publicly available, checking what license the data was published under will help determine if you can make your dataset available. If you entered into an agreement to access and use the data, check the agreement to see what terms exist regarding publishing data derived from the original dataset. If you’re unsure, contact the original collector of the data or get in contact with Digital Curation and Data team for assistance.

The data was collected as part of a group

If you’ve collected the data as part of a research group then you should get permission from the other researchers before you publish the data, and give appropriate credit to your collaborators upon publication. If you’re working on a project with internal and external collaborators, you should check if agreement was put in place at the commencement of the project that clarifies intellectual property rights of data. You can contact Digital Curation and Data team  for more information.

A commercial organisation funded your research

If you’ve conducted research funded by a commercial organisation, the organisation may assert ownership of any data that was collected during the project. You should read through any agreement that was made between you (or the University) and the commercial organisation on the commencement of the project, or check with the organisation to see if they assert ownership of all data. This may mean that you can’t make the data openly available, you need to set access conditions so that you can better control access to published data, or the organisation may have specific instructions for you to follow when making data available. You can contact Digital Curation and Data team or the Commercial Development and Industry Partnerships team for more information

Indigenous cultural and intellectual property rights

If you are collecting data from Aboriginal or Torres Strait Islander research participants, it is important that you recognise and respect Indigenous Cultural and Intellectual Property (ICIP) rights before you publish any data. Indigenous cultural and intellectual property rights cover all aspects of cultural practices, traditional knowledge, literary and artistic works, ancestral remains and genetic material etc.

The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) provide guidelines on ethical research, including ICIP which can be found on their ethics webpage: https://aiatsis.gov.au/research/ethical-research

Raw, processed, aggregated…What state should my data be in?

Publish your data in a state that you think will be most useful to others, a state that’s considered standard in your field of research or a state that’s required by your funder or journal publisher. For instance, Nature requires that: “Authors should provide their data in the ‘rawest’ form that will permit substantial reuse”.

Raw data

Publishing raw data (the immediate outputs from data collection) allows researchers to re-use data in its purest form, making it easier for them to conduct replication studies and find new patterns using your data. For reproducibility, it also means that others are able to test your pre-processing methodology to determine if it’s accurate. This doesn’t mean that some quality control and validation of the data shouldn’t be done before publication – the dataset should be in a state where it can be interpreted and used by another person.

Example: The continuous water monitoring network run by the NSW Office of Water publishes real-time data from their water monitoring systems. By providing real-time reports on river levels and flows, emergency service agencies can predict potential areas at risk of flooding.

Things to consider when publishing

  • Ensure that your raw data is accompanied by detailed documentation (such as metadata or a readme file) detailing the software or instruments used to collect the data, and your collection methodology.  
  • You should consider ethical issues before publishing your raw data, especially if your research involves human participants. Generally, raw data that contains both identifiable and sensitive information can’t be published without being processed to protect individual identities.

Processed data

Processed data is the result of editing, cleaning and modifying raw data in preparation for data analysis. This may also involve the standardisation and normalisation of your data, e.g. converting all values to the same units or standardising spellings. Processed data can also be the result of removing or anonymising sensitive data in a dataset.

Publishing processed data is the expected standard for most journals and repositories as the data can be easily understood and reused by other researchers. Additionally, publishing processed data reduces the risk of erroneous data impacting on research results if used.

Things to consider when publishing

  • Document how you have cleaned or modified your data in your documentation.
  • You should retain or publish a copy of your raw data if required. Some repositories, such as the Gene Expression Omnibus (GEO), require users to submit both raw and processed datasets.

Aggregated data

Publishing data that has been aggregated (individual observations that have been summarised) can be useful when your dataset contains identifiable information or when summarising your results in a research paper. However, publishing aggregated data on its own mightn’t be useful if the level of aggregation may result in the data being too ambiguous for others to be able to use for further analysis or comparison.

Example: The Australian Bureau of Statistics publishes datasets in aggregate form to the public, however also makes their raw data available through mediated access for genuine researchers if they require more detailed data points.   

Data with negative or null results

Data that demonstrates negative or null results can, and should, be shared as even negative results can tell us something meaningful. Publishing this kind of data has the potential to reduce waste through duplication of research effort and inspire others to build upon your data using different methodologies or variables.

When publishing negative data, you should assess if it will be useful to others and consider whether the benefits of publishing negative data outweigh the costs (both financial and time spent) of preparing the data for publication.

Resources

UK Data Services plan to share â€‹"Plan ahead to create high-quality shareable research data"

Support

For further assistance in publishing your research data, please contact the Digital Curation and Data team.