As described throughout this guide, in many cases you may be able to retrieve the information you need by querying the user interface of a citation index. However, sometimes in order to investigate your research question, you may want to retrieve all (or some particular subset) of the data in the citation index in order to perform your own analysis. While not all indexes support this use case, many of the most important ones do.
Many indexes allow this by way of an application programming interface (API) that allows uses to query the database and retrieve data programmatically. In other cases, the organization that produces the index can supply customized datasets upon request from researchers. And in other cases, the University of Michigan Library has obtained the data from the index for use by researchers affiliated with our university.
This page outlines how you can go about retrieving content for text/data mining purposes from the major citation indexes that we subscribe to.
If you have questions about this process, or need support exploring text and data mining, you may contact the library's digital scholarship support team (email@example.com).
Dimensions Plus (Digital Science) brings together various research-related data sources in a venue that is consistent and accessible to the community. Data include grant information and also information on several publication types, including Books, Journal Articles, Conference Proceedings, and Patents. Dimensions Plus provides the community with a data discovery engine that offers both context and perspective.
Dimensions makes the the Dimensions Metrics API publicly available to anyone who wishes to use it.
To request access to the Dimensions Metrics API, complete this form.
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.
Psycinfo (APA) is the premier resource for surveying the literature of psychology and adjunct fields. Covers 1887-present. Produced by the APA.
PubMedCentral (NLM) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Library of Medicine.
Scopus (Elsevier) is an nternational multi-disciplinary indexing & abstracting database for scientific, medical, technical, and social sciences.
Web of Science (Clarivate) provides a Core Collection of multidisciplinary indexes which permit searching for articles that cite a known author or work.
Science Citation Index - 1900 - present
Social Science Citation Index - 1900 - present
Arts & Humanities Citation Index - 1975 - present
Conference Proceedings Citation Index - 1990 - present
Book Citation Index - 2005 - present
Emerging Sources Citation Index - 2015 - present
To request access to the xml files, please email the library text and data mining group: firstname.lastname@example.org