Preservation plan: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
Line 26: Line 26:


====Future Preservation Formats ====
====Future Preservation Formats ====
ADA has developed an R Preservation Tool to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor.  The Tool will be available on ADA GitHub and will be presented to the wider Dataverse community when it is in production.
ADA has developed an R Preservation Tool to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor.  The Tool will be available on ADA GitHub and will be presented to the wider Dataverse community when it is in production. <br/>
Tool preservation functions:  
Tool preservation functions:  
- Exports the data in ASCII format as .dat  
* Exports the data in ASCII format as .dat  
- Extracts the data file structure attributes from the .sav file, then exports the syntax in .sps format readable by a text editor.  
* Extracts the data file structure attributes from the .sav file, then exports the syntax in .sps format readable by a text editor.  
   
   
The inclusion of the data file attributes is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or be used to rebuild the data file in other tabular data file formats.   
The inclusion of the data file attributes is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or be used to rebuild the data file in other tabular data file formats.   

Revision as of 00:21, 9 September 2024

Australian Data Archive Preservation Plan

The plan covers data and metadata managed by ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42] as implemented by ADA. The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with ADA. It does not cover administrative data or other data related to the function of ADA.

Scope

ADA holds over 1600 datasets and 13,000 data files through 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through an ARDC project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the Archive for all data that is authorised to share [26] and deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].

Responsibilities

ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing from the needs of its Designated Community. In this time ADA has continued to bring expertise and guidance, situated within POLIS: The Social Policy Research Centre [17] at ANU. This enables ADA to keep pace and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) Principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE Principles for Indigenous Data Governance (Collective Benefit, Authority to Control, Responsibility, and Ethics) [46] by taking opportunities where possible to provide support for Indigenous focussed projects and research [44], with the aim of gaining a deeper understanding of doing CARE oriented archiving.

Objectives

To ensure ADA has administrative, technical and archival processes in place that the Archive, depositors, and the Designated Community can understand, and to follow requirements for long-term preservation and management of data and metadata.

Actions

Formal Migration, Bit Level Integrity & Obsolescence Planning

ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse [47] instances are provisioned, hosted and backed up on NCI servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37]. Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the National Computational Infrastructure (NCI) [7].

File Formats & Metadata Schemas For Long-Term Preservation

The Archive’s collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is statistical quantitative data, with a very small number of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.
Current Preservation Formats:

  • Quantitative: SPSS .sav data file, the SPSS syntax (.sps) and the DDI JSON metadata provide the preservation data.
  • Qualitative: ADA holds a small number of datasets with no active long-term preservation support currently as the processes are in development.

Future Preservation Formats

ADA has developed an R Preservation Tool to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be presented to the wider Dataverse community when it is in production.
Tool preservation functions:

  • Exports the data in ASCII format as .dat
  • Extracts the data file structure attributes from the .sav file, then exports the syntax in .sps format readable by a text editor.

The inclusion of the data file attributes is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or be used to rebuild the data file in other tabular data file formats.

As part of ADA’s reappraisal process for digital objects, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode, ready for implementation.

Metadata Schemas

The Data Documentation Initiative (DDI) [18] metadata schema (Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application.

The Archive workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36]. Metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase. The metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure that the integrity of digital objects from deposit to access can be verified against any changes to the data [37].

Preservation Levels & Retention Periods

ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].

Preservation Measures

Rights to preserve and disseminate data are primarily managed through the License Agreement and the associated Terms of Use. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data [26]. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The license agreement (PDF) is stored and preserved with the Archival Information Package (AIP).

Reappraisal of Digital Objects

Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological, or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through the formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.

Deleting Data & Metadata

Data in the SIP or AIP state can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP state cannot be deleted [37].

For datasets that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the dataset being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to users with the correct permission levels. There is no need to tombstone [4] the DOI as the dataset landing page is still available.

References