Skip to main content

Communication and Media

Resources related to the study of mass media and communication, including health communication, political communication, gender and race, global and new media, media policy, social media, journalism, and print and broadcast history, among other topics. Re

Text mining the news

Many researchers are interested in doing automated computational analysis of large quantities of news content ("text mining"). The library's typical license agreement with the vendors of news databases do not allow for content to be downloaded en masse for text mining by individual users of the database. The full text or articles may also not be available for export, or not in a file format convenient for text mining. However, some vendors are now working with libraries to make news content available for text mining (see more resources on this page).

Visit our Text and Data Mining (TDM) support page for more details about TDM, and to help you decide if using the public interface of a news database fulfills your needs or if text mining is appropriate for your research needs.

Historical Newspapers from ProQuest

The library has negotiated with ProQuest to make the historical files of several newspapers available for text mining. For each title, there is a limited date range for which the files are available. Files are in .txt format, and often number in the millions (see details below).

Please view our FAQ for more detailed information about ProQuest Historical Newspaper files for text mining.

These files are only available to current U-M faculty, staff and students and all researchers must sign an MOU (memorandum of understanding) before getting access. 

Please contact librarytdm@umich.edu with any questions and to proceed with getting access to the PQ historical newspaper files.

 

Title

Coverage Start Date

Coverage End Date

Number of Text Files

American Israelite

1854

1925

250,000

Boston Globe

1872

1983

11,239,627

Chicago Defender

1909

1975

1,925,000

Chicago Tribune

1849

1935

5,250,000

Detroit Free Press

1831

1922

4,812,453

Detroit Free Press

1923

1999

1,821,606

Guardian & Observer [UK]

1791

1909

2,825,000

Los Angeles Sentinel

1934

2005

1,018,296

Los Angeles Times

1881

1931

4,667,709

New York Times

1851

1934

8,073,453

Times of India

1838

2005

6,828,509

Wall Street Journal

1889

1936

2,650,000

Washington Post

1877

1935

5,275,000