Text analysis and text mining are processes that derive information from texts such as novels, monographs, articles, web pages, etc. You can use text analysis tools to quickly search through a large corpus, generate word clouds or find word frequency, or perform more complex tasks like identifying patterns in parts of speech or identifying sentiments, moods, and emotions in a corpus.
Topic modeling is an unsupervised machine learning technique that's capable of scanning a set of documents (text-based corpus), detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize your corpus. If you are interested in learning about tools, you can reach out to the digital scholarship team (library-ds@umich.edu)
Below there is a series of additional resources for text analysis, including access to datasets, corpus, and collection of packages for data science.
R Package Tidyverse is a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Python package (Pandas) is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
TAPor, or the Text Analysis Portal for Research, is an online portal where users can store and keep track of texts they wish to study, learn about and experiment with different tools, and use those tools to analyze text.
If you have further questions on tools, resources or text analysis methods, you can contact the digital scholarship team at library-ds@umich.edu to schedule a consultation.