Harvard Dataverse Preservation Policy

Data Backup & Preservation Terms

Harvard University Information Technology (HUIT) in collaboration with Harvard Library, and the Institute for Quantitative Social Science (IQSS) hosts Harvard’s Dataverse repository and maintains a full backup of all data and directories. This means that there is always a full, recent off-site copy of the Harvard Dataverse repository.

Backup Schedule

HUIT backs up all of the application/system files and databases nightly. It is stored off-site in Carlstadt, New Jersey for 45 days.

All research data files in the repository are replicated every 4 hours to a second off-site storage array at 1 Summer St, Boston, MA. Since March 2013, HUIT incorporated the data content of the Harvard Dataverse repository into the DRS Storage Infrastructure. This makes use of the storage management software to create a tape copy of the data to be stored for the long-term at the Harvard Depository.

Policy and Procedures for Digital Archiving

Harvard University’s policy for digital archiving is part of the institution’s general mission to preserve all of its archival collections and to ensure their availability for current and future use. More specifically, this policy for preserving our digital data collections is meant to ensure continued access to born digital and digitized data, to ensure their authenticity, and to maintain data quality using the best digital archival practices. Harvard University (in particular with support from IQSS) commits to best archival practice to ensure that all materials deposited in the archive remain available and usable. This includes: preserving previously deposited versions of materials; deaccessioning (removal) of datasets only when legally compelled; maintaining public access to the materials; regularly reviewing risks to materials; and reformatting materials as necessary and if possible to avoid format obsolescence.

Preservation of Materials Deposited in the Harvard Dataverse

Harvard University supports permanent bit-level preservation of all directly deposited in the Harvard Dataverse. In addition all social science data deposited in the Harvard Dataverse that is made publicly available is replicated by the Data-PASS partners for permanent preservation by the partnership. On top of Harvard University’s commitment to archival and long term access of all data published in the Harvard Dataverse, the Harvard Dataverse takes data publication very seriously (see Joint Declaration of Data Citation Principles), encouraging good curation practices through support of standards-based metadata schemas, proper documentation, and automatic extraction of metadata from FITS and tabular files to enable data discovery and reuse. Tabular files deposited in the Harvard Dataverse are reformatted into simple open format text files (.tab format), with variable level XML metadata based on the Data Documentation Initiative (DDI), to ensure long-term preservation of the data. Also, once a dataset is published, the repository guarantees archival and long term access to that dataset with a DOI persistent identifier provided by the California Digital Library’s (CDL) EZID service (DataCite member). In order to ensure long term accessibility of the dataset in the Harvard Dataverse, once a dataset is published it can not be unpublished and can only be deaccessioned under extreme circumstances, such as a legal requirement to destroy that dataset. However, a tombstone landing page with the basic citation metadata will always be accessible to the public if they use the persistent URL (Handle or DOI) provided in the citation for that dataset. Users will not be able to see any of the files or additional metadata that were previously available prior to deaccession. Due to the self-curation nature of some of the datasets in the Harvard Dataverse, owners or distributors of individual datasets have control over selection of materials, documentation, access policies and data user agreements of their datasets, therefore questions about finding and using data distributed by others in the Harvard Dataverse should in general be referred to individual dataset owners.

Changes to this Preservation Policy

Harvard Dataverse may revise this preservation policy at its sole discretion. Please check this page regularly for our current practices. If you have any questions about this preservation policy, the practices of this site, or your dealings with this site, you can contact: support@dataverse.org.

This policy was last modified: 07/10/2015.