Preservation plan: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
Line 17: Line 17:
===Formal Migration, Bit Level Integrity & Obsolescence Planning===
===Formal Migration, Bit Level Integrity & Obsolescence Planning===
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community.  ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community.  ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].
Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].


===File Formats & Metadata Schemas For Long-Term Preservation===
===File Formats & Metadata Schemas For Long-Term Preservation===

Revision as of 23:45, 10 September 2024

The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42]. The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA.

Scope

ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].

Responsibilities

ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting for Indigenous lead projects and research [44], and developing CARE oriented archival practise.

Objectives

Key objects are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata.

Actions

Formal Migration, Bit Level Integrity & Obsolescence Planning

ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].

Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].

File Formats & Metadata Schemas For Long-Term Preservation

The Archive’s collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is statistical quantitative data, with a very small number of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.
Current Preservation Formats:

  • Quantitative: SPSS .sav data file, the SPSS syntax (.sps) and the DDI JSON metadata provide the preservation data.
  • Qualitative: ADA holds a small number of datasets with no active long-term preservation support currently as the processes are in development.

Future Preservation Formats

ADA has developed an R Preservation Tool to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be presented to the wider Dataverse community when it is in production.
Tool preservation functions:

  • Exports the data in ASCII format as .dat
  • Extracts the data file structure attributes from the .sav file, then exports the syntax in .sps format readable by a text editor.

The inclusion of the data file attributes is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or be used to rebuild the data file in other tabular data file formats.

As part of ADA’s reappraisal process for digital objects, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode, ready for implementation.

Metadata Schemas

The Data Documentation Initiative (DDI) [18] metadata schema (Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application.

The Archive workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36]. Metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase. The metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure that the integrity of digital objects from deposit to access can be verified against any changes to the data [37].

Preservation Levels & Retention Periods

ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].

Preservation Measures

Rights to preserve and disseminate data are primarily managed through the License Agreement and the associated Terms of Use. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data [26]. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The license agreement (PDF) is stored and preserved with the Archival Information Package (AIP).

Reappraisal of Digital Objects

Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological, or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through the formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.

Deleting Data & Metadata

Data in the SIP or AIP state can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP state cannot be deleted [37].

For datasets that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the dataset being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to users with the correct permission levels. There is no need to tombstone [4] the DOI as the dataset landing page is still available.

References