Data providers will each have their own specific standards and procedures that you must follow in order to legally use the data they provide. For example, many data providers license their data to be mined for research purposes only and either prohibit or require special negotiation for data mining with potential commercial applications.
If you have any questions about licensing conditions or negotiating permission for potential commercial applications of data mining with data providers, please contact firstname.lastname@example.org.
The large datasets used in text and data mining are often sourced from pre-existing research outputs, original creative works, or proprietary data owned by commercial enterprises. This means that performing data and text mining may require you to access, copy and process material that is protected by copyright.
If you have any questions or need guidance on complying with copyright during data mining activities, please contact the Library’s Copyright Services team.
Text and data mining sometimes involves the collation and linkage of separate datasets; you should take care to seek appropriate ethics approvals and conduct privacy impact assessments before commencing.
Even if all the original datasets contain de-identified data, data linkage and data mining can sometimes have the unforeseen consequence of enabling re-identification of de-identified data.
Even if the licence permits it, some approaches to text and data mining are considered poor etiquette due to the inconvenience they can cause to data providers.
For example, bulk scraping of a data provider's website to extract information can place a significant burden on the data provider's servers. Similarly, when using an API to automate accessing and downloading content, you should ensure that you use rate limiting to control the number of requests you send to the data provider's servers over a given time period. Not rate-limiting your automated requests can cause slow response times or even down time for other users.
Best practice is to check the requirements of the data provider and comply with their preferences regarding data mining activities.