Before you publish your data, you should consider if you legally and ethically can publish your data. Most of the time there won’t be a problem, but it’s good to be informed before you publish. Two of the most important things to consider before publishing your data is if your data is too sensitive to publish and if you have the rights to make the data available.
Then, you can assess whether your data is fair (Findable, Accessible, Interoperable, Reusable) to publish by using ANDS FAIR data self-assessment tool.
Sensitive data may contain:
You may still be able to publish data that contains sensitive information, however extra steps need to be taken as a precaution such as deidentifying data and setting access conditions so that approval is needed before others can access your published data.
Find more information on how to publish sensitive data.
Understanding if you have the rights to publishing research data can be confusing. University of Sydney staff and students should consult the Intellectual Property Policy 2016. Depending upon the circumstances of how your research was conducted, there may be other things you should consider. A few common situations where further consideration may be needed are:
If you’ve used data collected by someone else or an organisation, like the Australian Bureau of Statistics or the Department of Health, you’ll need to check what your rights are for the data. If the data is publicly available, checking what license the data was published under will help determine if you can make your dataset available. If you entered into an agreement to access and use the data, check the agreement to see what terms exist regarding publishing data derived from the original dataset. If you’re unsure, contact the original collector of the data or get in contact with firstname.lastname@example.org for assistance.
If you’ve collected the data as part of a research group then you should get permission from the other researchers before you publish the data, and give appropriate credit to your collaborators upon publication. If you’re working on a project with internal and external collaborators, you should check if agreement was put in place at the commencement of the project that clarifies intellectual property rights of data. You can contact email@example.com for more information.
If you’ve conducted research funded by a commercial organisation, the organisation may assert ownership of any data that was collected during the project. You should read through any agreement that was made between you (or the University) and the commercial organisation on the commencement of the project, or check with the organisation to see if they assert ownership of all data. This may mean that you can’t make the data openly available, you need to set access conditions so that you can better control access to published data, or the organisation may have specific instructions for you to follow when making data available. You can contact firstname.lastname@example.org or the Commercial Development and Industry Partnerships team for more information.
If you are collecting data from Aboriginal or Torres Strait Islander research participants, it is important that you recognise and respect Indigenous Cultural and Intellectual Property (ICIP) rights before you publish any data. Indigenous cultural and intellectual property rights cover all aspects of cultural practices, traditional knowledge, literary and artistic works, ancestral remains and genetic material etc.
The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) provide guidelines on ethical research, including ICIP which can be found on their ethics webpage: https://aiatsis.gov.au/research/ethical-research
Publish your data in a state that you think will be most useful to others, a state that’s considered standard in your field of research or a state that’s required by your funder or journal publisher. For instance, Nature requires that: “Authors should provide their data in the ‘rawest’ form that will permit substantial reuse”.
Publishing raw data (the immediate outputs from data collection) allows researchers to re-use data in its purest form, making it easier for them to conduct replication studies and find new patterns using your data. For reproducibility, it also means that others are able to test your pre-processing methodology to determine if it’s accurate. This doesn’t mean that some quality control and validation of the data shouldn’t be done before publication – the dataset should be in a state where it can be interpreted and used by another person.
Processed data is the result of editing, cleaning and modifying raw data in preparation for data analysis. This may also involve the standardisation and normalisation of your data, e.g. converting all values to the same units or standardising spellings. Processed data can also be the result of removing or anonymising sensitive data in a dataset.
Publishing processed data is the expected standard for most journals and repositories as the data can be easily understood and reused by other researchers. Additionally, publishing processed data reduces the risk of erroneous data impacting on research results if used.
Publishing data that has been aggregated (individual observations that have been summarised) can be useful when your dataset contains identifiable information or when summarising your results in a research paper. However, publishing aggregated data on its own mightn’t be useful if the level of aggregation may result in the data being too ambiguous for others to be able to use for further analysis or comparison.
Data that demonstrates negative or null results can, and should, be shared as even negative results can tell us something meaningful. Publishing this kind of data has the potential to reduce waste through duplication of research effort and inspire others to build upon your data using different methodologies or variables.
When publishing negative data, you should assess if it will be useful to others and consider whether the benefits of publishing negative data outweigh the costs (both financial and time spent) of preparing the data for publication.