Data Management Plan: Evolving Pronouns

Last modified on December 18, 2018

I created this DMP with DMPTool in conjunction with my student support report. It serves as a model for the kind of DMP that a student would create under my supervision if necessary. This was originally written for my Academic Libraries class in the Spring of 2018.

Roles and responsibilities

The DMP should clearly articulate how sharing of primary data is to be implemented. It should outline the rights and obligations of all parties with respect to their roles and responsibilities in the management and retention of research data. It should also consider changes to roles and responsibilities that will occur if a project director or co-project director leaves the institution or project. Any costs stemming from the management of data should be explained in the budget notes.

The project director is responsible for the management of the data. They shall ensure that data is securely stored in the proper institutional database, and that only authorized users will have the ability to modify the data. Because the data will involve the analysis of copyrighted texts, the project director will ensure that copyrighted texts are never shared in a way that violates their copyright license.

In the event that the project director leaves the instutition, the linguistics department will appoint a successor to take over the role until the project is complete.

When the project is completed, the university library will migrate the database to the institutional repository, and take full responsibility for its retention.

Expected data

The DMP should describe the types of data, samples, physical collections, software, curriculum materials, or other materials to be produced in the course of the project. It should then describe the expected types of data to be retained.

Project directors should address matters usch as these in the DMP:

the types of data that their project might generate and eventually share with others, and under what conditions;
how data will be managed and maintained until shared with others;
factors that might impinge on their ability to manage data, for example, legal and ethical restrictions on access to non-aggregated data;
the lowest level of aggregated data that project directors might share with others in the scholarly or scientific community, given that comunity’s norms on data;
the mechanism for sharing data and/or making it accessible to others; and
other types of information that should be maintained and shared regarding data, for example, the way it was generated, analytical and procedural information, and the metadata.

Data will primarily be in the form of tables which store each text’s vocabulary terms with gender and contexts. For example, one row of a table may indicate when a book uses the word “they” in a gender-neutral context. There will also be entries that store some specific phrases or sentences to provide context for researchers. Tables will contain metadata describing each data source, such as publisher and author information. Personal or identifying information will not be included in metadata.

Copyrighted material may be stored on individual workstations for analysis, but no copyrighted material should be inserted into the database and will not be shared or used outside of its copyright license. Any phrases or sentences that make it into the database will fall under fair use license. Any personal or identifying information, if stored on an individual workstation, will be deleted once the project is completed.

The project director will provide authorized users with permission to insert rows, but only the project director and co-project director will be able to edit or delete rows. Data will be stored on the institution’s secure database.

After the project is complete and the project director migrates the database to the library’s institutional repository, then it will be shared publicly through the university library website.

Period of data retention

NEH is committed to timely and rapid data distribution. However, it recognizes that types of data can vary widely and that acceptable norms also vary by discipline. It is strongly committed, however, to the underlying principle of timely access. In their DMP applicants should address how timely access will be assured.

Data will be publicly available in the university library instutitional repository as soon as the accompanying research is published, barring any technical difficulties or embargo period. After publication, the data should remain publicly availble for the extent of the abilities of the repository.

Data formats and dissemination

The DMP should describe data formats, media, and dissemination approaches that will be used to make data and metadata available to others. Policies for public access and sharing should be described, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. Research centers and major partnerships with industry or other user communities must also address how data are to be shared and managed with partners, center members, and other major stakeholders.

The database tables will be publicly available in each format which the library repository’s software allows to export, in particular XLSX, CVS, or HTML tables. Descriptions of data sources will be included in the metadata, but no personal or identifying information. For example, personal names will not be included. There should be no issues with copyright, security, or personal information leaking.

Data storage and preservation of access

The DMP should describe physical and cyber resources and facilities that will be used to effectively preserve and store research data. These can include third-party facilities and repositories.

Data will be stored in the university’s institutional database server. Backups are made weekly for all databases and stored in Amazon Glacier. Login information and physical access to the servers is maintained soley by the university IT department. Unauthorized access and physical disasters will, at worse, set work back by one week.