Skip to Main Content

Research Impact Metrics: Citation Analysis

Information on how to use library resources for citation analysis, including information about impact factors, journal rankings, altmetrics and how to find who has cited an article..

Text/Data Mining for Citation Indexes

As described throughout this guide, in many cases you may be able to retrieve the information you need by querying the user interface of a citation index. However, sometimes in order to investigate your research question, you may want to retrieve all (or some particular subset) of the data in the citation index in order to perform your own analysis. While not all indexes support this use case, many of the most important ones do.

Many indexes allow this by way of an application programming interface (API) that allows uses to query the database and retrieve data programmatically. In other cases, the organization that produces the index can supply customized datasets upon request from researchers. And in other cases, the University of Michigan Library has obtained the data from the index for use by researchers affiliated with our university. 

This page outlines how you can go about retrieving content for text/data mining purposes from the major citation indexes that we subscribe to. 

If you have questions about this process, or need support exploring text and data mining, you may contact the library's digital scholarship support team (library-ds@umich.edu).

 

Major Citation Indexes and API access

Dimensions Plus (Digital Science) brings together various research-related data sources in a venue that is consistent and accessible to the community. Data include grant information and also information on several publication types, including Books, Journal Articles, Conference Proceedings, and Patents. Dimensions Plus provides the community with a data discovery engine that offers both context and perspective.

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.

Psycinfo (APA) is the premier resource for surveying the literature of psychology and adjunct fields. Covers 1887-present. Produced by the APA.

PubMedCentral (NLM) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Library of Medicine.

Scopus (Elsevier) is an nternational multi-disciplinary indexing & abstracting database for scientific, medical, technical, and social sciences.

Web of Science (Clarivate) provides a Core Collection of multidisciplinary indexes which permit searching for articles that cite a known author or work.

  • The U-M Library Web of Science license includes access to the Web of Science Lite API. However, because of limitations in the fields that can be returned, for bibliometric analysis, we recommend that you work with the library's local copy of xml files of the Web of Science Core Collection. The files include:
    • Science Citation Index - 1900 - present

    • Social Science Citation Index - 1900 - present

    • Arts & Humanities Citation Index - 1975 - present

    • Conference Proceedings Citation Index - 1990 - present

    • Book Citation Index - 2005 - present

    • Emerging Sources Citation Index - 2015 - present

  • To request access to the xml files, please email the library text and data mining group: librarytdm@umich.edu