Finding Data
- Getting Started
- Identify Sources
- Source: Government
- Source: Organizations
- Source: Data Archives
- Getting Help
Spatial and Numeric Data Librarian
Your Research Question
Define Your Research Question
Try to state your research question without describing the sources or data you will use to answer the question.
Think about Your Method of Analysis
What sort of analysis do you plan to do? Do you need to data or statistics to illustrate points? Will you be using a stats package, such as R, SPSS, SAS, or Stata, to do analysis?
Defining Your Topic and Unit of Analysis
When you define your topic and unit of analysis, you should look at your research question and ask:
- What are the specifics of the data I need to use to answer my research question? What is my topic? What unit of analysis, geographic unit, and time unit (frequency) do I need? Do I need time series data?
Define Your Topic
Use specific language when defining your topic. This will help you identify a variable or variables.
Examples:
- I'm looking for the percentage of people living below the poverty line in areas where hurricanes frequently hit.
Identify Unit of Analysis
Who or what is being described by your variable(s)?
Examples:
- Individuals, families, households
- Institutions (companies, schools, non-profits, health facilities)
- Products (commodities, stocks, currencies)
Identify Time Frame and Frequency
For what point in time do you want to know this about the people, institutions, or products you identified? How often do you want to know it about them?
Examples:
- As recent as possible, plus data from 10 and 20 years before that
- Every month in 1995 and 1996
Identify Geographic Unit
What part of the world is your research question concerned with?
Examples:
- Counties in Michigan
- Countries currently in the EU
- Businesses headquartered in China
Identify Whether this is Time Series Data
Are you looking for data collected at regular intervals over time? Identifying what sort of time series may be helpful as you search for data.
- Cross sectional: collected at the same point of time for several individuals
- Longitudinal/Panel: data collected at a sequence of time points for each of a sample of individuals
- Time Series: data collected at a sequence of time points, usually at a uniform frequency
- Pooled cross sectional time series: mixture of time series data and cross-section data
* Adapted from Barbara Mento's guide to Finding Data at Boston College
Data vs. Statistics
- Data is the raw information from which statistics are created; statistics provide an interpretation and summary of data.
- To make sense of data, you will likely need to use a statistical software program (SPSS, SAS, Stata, etc.) to analyze and make sense of the data.
- On the other hand, you can often easily use and understand statistics because they have already been processes. .