Text mining involves the use of computational tools and techniques to automatically discover new and unexpected information from an aggregated body of machine-readable text or data. Text mining requires the preparation of data that stems from a research question, and involves the collation of data or a text corpus, data familiarisation and cleaning, data formatting and the selection of an analysis method. Text mining is a generic term for computationally analysing a large body of text and can involve a variety of different analysis techniques.
Corpus linguistics is the study of language contained in bodies of texts. Corpus linguists use specialised computer software to analyse naturally occurring language in computerised text collections known as corpora. Computational stylistics and stylometry are concerned with the study of features of style that can be measured, identifying patterns of language usage in text.
Text mining practices:
Tinker is a Digital Humanities toolbox and directory for tools and research methods. It also includes links to example projects to give you an idea on how digital humanities can help you with your research.