Skip to Main Content

Data Management Plans for the Social Sciences

Suggested resources for designing data management plans (DMP) for your research project.

NSF Guidance

Your data management plan should describe the type of data you expect to be produced during your research project. NSF describes this broadly as "the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project." 

NSF's Social, Behavioral and Economic Sciences (SBE) Directorate goes on to say: 

The Federal government defines ‘data’ in OMB Circular A-110 (now 2 CFR, Ch. II, §215.36(d)(2)(i), and codified in 5 U.S.C. 552(a)(4)(A)) as:

Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: Preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples).

Please refer to https://new.nsf.gov/sbe/data-management for more details.  

Summary

Describe the data you will produce in the course of the project (see specific agency guidelines for what types of data to include). Describe both the subject matter as well as the file format(s). How much data do you expect to have? If you will be generating multiple data sets, answer the questions below for each data set. If you know you won’t be keeping all the data you generate, state what you will and won’t retain and why.

Questions to Consider

  • What is the general subject and nature of the research data you will be generating? Interview transcripts, experimental measurements and protocols, code for statistical analysis, qualitative data, simulations?
  • How will the data be created or captured? What file formats will you have? 
  • Are there any privacy or confidentiality concerns?
  • If you will be using existing data, state that fact and where you got it. What is the relationship between the data you are collecting or generating and the existing data?

Example 1

The example below is taken from Research Data Management Plan for the Meaningful Data Counts Project (p. 3, "Data Collection"); funder is the Alfred P. Sloan Foundation.

Q4. What types of data will you collect, create, link to, acquire and/or record?


A4. We will be conducting both quantitative and qualitative research, by way of statistical analysis of bibliographic metadata (i.e., metadata of research datasets and scholarly publications), surveys and semi-structured interviews. The project will collect the following types of data: 

Structured numerical and categorical data: 

➔ Results from bibliometric analysis of data citation and reuse: The majority of data will be collected from open APIs and databases such as DataCite, Crossref and ORCID. They are a combination of numerical and categorical data and usually available in a structured format, such as XML, JSON or CSV formats.
 ➔ Survey results from researchers: Questionnaires will be sent to researchers and responses will be collected using SurveyMonkey. The collected data will be available in CSV format after anonymization. 

Textual data: 

➔ Transcriptions from semi-structured interviews: A select number of researchers will be interviewed about sharing, reusing and citing research data. Interviews will be conducted online and recorded and then anonymized and transcribed. Transcriptions will be available in TXT and PDF format. Recordings will be deleted after transcriptions are complete.  
➔ Codebooks: Interview transcripts as well as free-text answers from the survey will be analyzed and coded using MaxQDA. Results will be available in CSV format. 

Audio files: 

➔ Interview recordings: Interviews will be recorded as audio files in Mp3 format. Audio recordings will be deleted after interviews have been transcribed.  

Software and code: 

➔ Code for processing and analyzing bibliometric data: The bibliometric analyses, including all software code and data sources, will be recorded and documented in Jupyter notebooks. 

 

Example 2

The example below is taken from "The Atlas of Collaboration: Building the World's First Large N Database on Collaborative Governance" (p. 1, "Expected Data"). Funding agency is NSF SBE.

Expected Data   

The proposed project will produce several types of data in various data formats, as well as data collection instruments and protocols and a database user manual, most of which will be made available through QDR.

Data. The project will produce several types of quantitative and qualitative data. 

i. Web-collected data will be collected on the structural characteristics, physical geography, and social geography for approximately 300 collaborative governance regimes (CGRs) and dozens of collaborative platforms. The web-collected data will be gathered by visiting organizational websites, with data coded and entered into .xlsx or .csv files and deposited with QDR. Methods for inter-coder reliability will be used to ensure data quality. 

ii. Survey data will be collected from approximately 5,000 CGR participants. Survey data will be collected via Qualtrics, saved in .csv files, and deposited with QDR.

iii. Interview data will be collected from approximately 75 CGR participants and leaders, collaborative platform managers, collaborative governance practitioners, and policymakers. The interviews will be recorded and transcribed, and qualitative software (e.g., NVivo or ATLAS.ti) will be used to code and analyze the data. Once transcribed, the recordings will be destroyed. We will remove direct identifiers from transcripts before analyzing them and store a re-identification key on a protected server at Portland State University. De-identified transcripts will be deposited with QDR along with coded data in REFI-QDA.

iv. Coded legal texts of approximately 13 pieces of state legislation and agency policy will be collected. The legal texts will be coded with qualitative software (e.g., NVivo or ATLAS.ti). The coded data will be deposited with QDR in REFI-QDA format. In addition, we expect to identify several datasets established by state agencies or other organizations that provide data on policy outcomes. These data will not be deposited in QDR, though we may, if possible, provide links to the datasets.