Preservation plan: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
No edit summary
 
(37 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Australian Data Archive Preservation Plan ==
The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42].  The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA.
The plan covers data and metadata managed by ADA for digital preservation and reuse, based on the [https://public.ccsds.org/pubs/650x0m2.pdf Open Archival Information System (OAIS) Reference Model PDF] as implemented by ADA <link to AD Logical Overview Diagram working on with Marina>.  The plan outlines the scope, objectives, responsibilities, and actions for preserving data deposited with ADA. It does not cover administrative data or other data related to the function of ADA.


=== Scope ===
== Scope ==


ADA holds over 1600 datasets and 13,000 data files through 1833 until the present day, available for reuse through the [https://docs.ada.edu.au/index.php/Technical_Infrastructure ADA Dataverse platform]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [1] through an [https://ardc.edu.au/ ARDC] project to develop support for archiving of qualitative data. Ongoing preservation of data [2] is provided by the Archive for all data that is deemed suitable once approved at each step of the archival work flow [3], beginning with the deposit appraisal process [4].
ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].


[1] [https://www.socey.net/repository/ Studies of Childhood, Education & Youth (SOCEY)] <br />
== Responsibilities ==
[2] [https://docs.ada.edu.au/index.php/Governance_%26_Resources ADA Governance & Resources] <br />
[3] [https://docs.ada.edu.au/index.php/Workflows ADA Workflows] <br />
[4] [https://docs.ada.edu.au/index.php/Deposit_Appraisal_%26_Collection_Policy ADA Appraisal & Collection Policy] <br />


=== Responsibilities ===
ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting for Indigenous lead projects and research [44], and developing CARE oriented archival practise.


ADA has been responsible for providing archival and preservation support since 1981 [5], growing from the needs of its Designated Community.  In this time ADA has continued to keep pace and meet these requirements within the technical and user landscape, with active participation through external engagement and memberships [6], in-kind collaborations, and funded projects [7].  ADA ensures data is preserved for future usability under the [https://ardc.edu.au/resource/fair-data/ FAIR (Findable, Accessible, Interoperable and Re-Usable)] principles supported through the implementation of [https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse FAIR in the Dataverse platform].  ADA is actively working towards the realisation of [https://ardc.edu.au/resource/the-care-principles/ CARE Principles for Indigenous Data Governance] (Collective Benefit, Authority to Control, Responsibility, and Ethics) by taking opportunities where possible to provide support for Indigenous focussed projects and research [7] with the aim of gaining a deeper understanding of doing CARE oriented archiving. 
== Objectives ==


[5] [https://docs.ada.edu.au/index.php/Mission_%26_Scope Mission & Scope] <br />
Key objects are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata.
[6] [https://docs.ada.edu.au/index.php/Expertise_%26_Guidance Expertise & Guidance] <br />
[7] [https://docs.ada.edu.au/index.php/ADA_Projects ADA Projects] <br />


=== Objectives ===
==Actions==


To ensure ADA, depositors, and the Designated Community understand and follow preservation actions under this plan. Expand this more in para form to explain the four heading that will outline the preservation process etc.
===Formal Migration, Bit Level Integrity & Obsolescence Planning===
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community.  ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].


==== Processes to manage file and metadata formats sufficient for long term storage, preservation, and accessibility ====
Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].


Workflows to ensure appropriateness for preservation - expand paras
===File Formats & Metadata Schemas For Long-Term Preservation===
*The ADA [https://docs.ada.edu.au/index.php/Workflows Workflow] guides a data deposit through access rights [x], appraisal [y], checks and validations [z], and publication [a] to ensure the data and metadata are sufficient for the preservation process.
The ADA collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is tabular quantitative data, with a very small proportion of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.
Actions:


The ADA Deposit and Preservation Tool (ADAPT) supports prov & auth etc
====Current Preservation Formats==== 
* ADAPT [b] programmatically copies the SIP from the Deposit Dataverse to ensure provenance and authenticity [c] written to a PROV log, recording archivist, date, data files copied. ADA plans to expand the functionality of ADAPT to include fixity measures as used by Dataverse for the SIP and DIP to ensure the relationship between the deposited digital objects and those provided at the point of access.
*Quantitative: SPSS data (.sav), SPSS syntax (.sps) and the DDI JSON metadata comprise the preservation package.   
*Qualitative: ADA holds a small number of datasets with minimal long-term preservation support currently. Qualitative preservation processes are in development.  


[x] [https://docs.ada.edu.au/index.php/Rights_Management Rights Management] <br />
====Future Preservation Formats ====
[y] [https://docs.ada.edu.au/index.php/Deposit_Appraisal_%26_Collection_Policy Deposit Appraisal & Collection Policy] <br />
ADA has developed an R Preservation Tool [70] to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be available to the wider community when it is in production. The Preservation Tool exports the data in ASCII (.dat) format, extracts data file structure attributes from the .sav file, and exports the syntax in a format readable by a text editor (.sps). The inclusion of the data file attributes ensures the preservation data is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or other tabular data file formats. Once implemented, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode and ready for implementation.
[z] [https://docs.ada.edu.au/index.php/Quality_Assurance Quality Assurance] <br />
[a] [https://docs.ada.edu.au/index.php/Reuse Reuse] <br />
[b] [https://docs.ada.edu.au/index.php/Technical_Infrastructure Technical Infrastructure] <br />
[c] [https://docs.ada.edu.au/index.php/Provenance_and_authenticity Provenance and authenticity] <br />


==== Measures to address preservation levels and threat of obsolescence ====
====Metadata Schemas ====
Actions
The Data Documentation Initiative (DDI) [18] metadata schema (DDI-Codebook) is the standard metadata schema to describe data collected and used by the Designated Community.  DDI is implemented by the Dataverse application.   


==== Documentation to support activities undertaken by the ADA workflow to ensure preservation ====
The ADA archival workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36].  DDI compatible metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase of the workflow.  Metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure the integrity of digital objects from deposit to access can be verified against any changes to the data [37].
Actions


==== Management of deaccessioned data ====
===Preservation Levels & Retention Periods===
Actions
The ADA has one standard level of preservation and retention for all datasets approved for deposit.  Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].
 
===Preservation Measures===
Rights to preserve and disseminate data are primarily managed through the ADA License Agreement and associated Terms of Use [26]. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data.  Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The signed and completed license agreement is stored and preserved along with the Archival Information Package (AIP).
 
===Reappraisal of Digital Objects===
Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.
 
===Deleting Data & Metadata===
Data in the SIP or AIP stage of the archival workflow can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP package cannot be deleted [37]. For data that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the published collection being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to ADA staff with permission levels equivalent to admin. There is no need to tombstone [4] the DOI as the dataset landing page is still available.
 
==References==
[43] Studies of Childhood, Education & Youth (SOCEY) – (https://www.socey.net/repository/)
 
[44] ADA Projects – (https://docs.ada.edu.au/index.php/ADA_Projects)
 
[45] FAIR Principles – (https://ardc.edu.au/resource/fair-data/)[46] CARE Principles for Indigenous Data Governance – (https://ardc.edu.au/resource/the-care-principles/)
 
[47] ADA OAIS Workflows ADAPT DR Diagram – (https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg)
 
[37] Storage & Integrity – (https://docs.ada.edu.au/index.php/Storage_%26_Integrity)[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)
 
[48] IBM SPSS – (https://www.ibm.com/spss)[62] Dataverse Project FAIR Principles – (https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse)
 
[70] ADA Preservation Tool – (https://github.com/ADA-ANU/ADA_Preservation_tool)

Latest revision as of 00:35, 11 September 2024

The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42]. The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA.

Scope

ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].

Responsibilities

ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting for Indigenous lead projects and research [44], and developing CARE oriented archival practise.

Objectives

Key objects are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata.

Actions

Formal Migration, Bit Level Integrity & Obsolescence Planning

ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].

Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].

File Formats & Metadata Schemas For Long-Term Preservation

The ADA collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is tabular quantitative data, with a very small proportion of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.

Current Preservation Formats

  • Quantitative: SPSS data (.sav), SPSS syntax (.sps) and the DDI JSON metadata comprise the preservation package.
  • Qualitative: ADA holds a small number of datasets with minimal long-term preservation support currently. Qualitative preservation processes are in development.

Future Preservation Formats

ADA has developed an R Preservation Tool [70] to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be available to the wider community when it is in production. The Preservation Tool exports the data in ASCII (.dat) format, extracts data file structure attributes from the .sav file, and exports the syntax in a format readable by a text editor (.sps). The inclusion of the data file attributes ensures the preservation data is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or other tabular data file formats. Once implemented, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode and ready for implementation.

Metadata Schemas

The Data Documentation Initiative (DDI) [18] metadata schema (DDI-Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application.

The ADA archival workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36]. DDI compatible metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase of the workflow. Metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure the integrity of digital objects from deposit to access can be verified against any changes to the data [37].

Preservation Levels & Retention Periods

The ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].

Preservation Measures

Rights to preserve and disseminate data are primarily managed through the ADA License Agreement and associated Terms of Use [26]. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The signed and completed license agreement is stored and preserved along with the Archival Information Package (AIP).

Reappraisal of Digital Objects

Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.

Deleting Data & Metadata

Data in the SIP or AIP stage of the archival workflow can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP package cannot be deleted [37]. For data that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the published collection being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to ADA staff with permission levels equivalent to admin. There is no need to tombstone [4] the DOI as the dataset landing page is still available.

References

[43] Studies of Childhood, Education & Youth (SOCEY) – (https://www.socey.net/repository/)

[44] ADA Projects – (https://docs.ada.edu.au/index.php/ADA_Projects)

[45] FAIR Principles – (https://ardc.edu.au/resource/fair-data/)[46] CARE Principles for Indigenous Data Governance – (https://ardc.edu.au/resource/the-care-principles/)

[47] ADA OAIS Workflows ADAPT DR Diagram – (https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg)

[37] Storage & Integrity – (https://docs.ada.edu.au/index.php/Storage_%26_Integrity)[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)

[48] IBM SPSS – (https://www.ibm.com/spss)[62] Dataverse Project FAIR Principles – (https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse)

[70] ADA Preservation Tool – (https://github.com/ADA-ANU/ADA_Preservation_tool)