Text and data mining (TDM) is the automated process of selecting and analysing large amounts of text or data resources for purposes such as searching, finding patterns, discovering relationships, semantic analysis and learning how content relates to ideas and needs. (Springer Nature, 2020).
Data mining involves using computational methods to discover patterns and relationships in large, structured datasets. Structured data is organised in a defined format, allowing it to be easily parsed, manipulated, or have calculations performed on it by a computer. For example, data organised in tables or databases or in files with structured formats, such as XML files.
Text mining is similar but, in this case, computational analysis is used on unstructured text to discover patterns and relationships. Unstructured text has no defined organisation for a computer to work with, so different methods are needed to enable these analyses.
Unstructured text is often accompanied by structured information about the text, such as the author, title, the year the text was published, or the number of pages in the work. This mix of mostly unstructured and some structured data has led to different names being used to describe a similar set of activities. Text and data mining, text data mining, TDM and text mining are all variously used to describe the computational analysis of unstructured text and these terms are often used interchangeably.
When using TDM, also consider how best to present your results. Could they be clearly explained by incorporating data visualisation methods?
Aboriginal and Torres Strait Islander peoples are advised that this website may contain images, voices and names of people who have died.
The University of Sydney Library acknowledges that its facilities sit on the ancestral lands of Aboriginal and Torres Strait Islander peoples, who have for thousands of generations exchanged knowledge for the benefit of all. Learn more