Preservation plan: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Australian Data Archive Preservation Plan ==
The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42].  The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA.
The plan covers data and metadata managed by ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42] as implemented by ADA.  The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with ADA. It does not cover administrative data or other data related to the function of ADA.  


== Scope ==
== Scope ==


ADA holds over 1600 datasets and 13,000 data files through 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through an ARDC project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the Archive for all data that is authorised to share [26] and deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].
ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].


== Responsibilities ==
== Responsibilities ==


ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing from the needs of its Designated Community. In this time ADA has continued to bring expertise and guidance, situated within POLIS: The Social Policy Research Centre [17] at ANU.  This enables ADA to keep pace and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) Principles [45] supported through the implementation of FAIR in the Dataverse platform. ADA is actively working towards the realisation of CARE Principles for Indigenous Data Governance (Collective Benefit, Authority to Control, Responsibility, and Ethics) [46] by taking opportunities where possible to provide support for Indigenous focussed projects and research [44], with the aim of gaining a deeper understanding of doing CARE oriented archiving.  
ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting for Indigenous lead projects and research [44], and developing CARE oriented archival practise.


== Objectives ==
== Objectives ==


To ensure ADA has administrative, technical and archival processes in place that the Archive, depositors, and the Designated Community can understand, and to follow requirements for long-term preservation and management of data and metadata.  
Key objects are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata.


==Actions==
==Actions==


===Formal Migration, Bit Level Integrity & Obsolescence Planning===
===Formal Migration, Bit Level Integrity & Obsolescence Planning===
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community.  ADA Archival Storage and Dataverse [47] instances are provisioned, hosted and backed up on NCI servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37]. Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the National Computational Infrastructure (NCI) [7].
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community.  ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].
Actions
 
Quant
Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].
Qual


===File Formats & Metadata Schemas For Long-Term Preservation===
===File Formats & Metadata Schemas For Long-Term Preservation===
The Archive’s collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation.   Most archived data is statistical quantitative data, with a very small number of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.   
The ADA collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is tabular quantitative data, with a very small proportion of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.   
Current Preservation Formats  
 
- Quantitative: SPSS .sav data file, the SPSS syntax (.sps) and the DDI JSON metadata provide the preservation data.  
====Current Preservation Formats==== 
- Qualitative: ADA holds a small number of datasets with no active long-term preservation support currently as the processes are in development.
*Quantitative: SPSS data (.sav), SPSS syntax (.sps) and the DDI JSON metadata comprise the preservation package.  
*Qualitative: ADA holds a small number of datasets with minimal long-term preservation support currently. Qualitative preservation processes are in development.  


====Future Preservation Formats ====
====Future Preservation Formats ====
ADA has developed an R Preservation Tool to export SPSS data currently preserved as .sav in a format readable by a text editor. The Tool will be available on ADA GitHub and will be presented to the wider Dataverse community when it is in production.
ADA has developed an R Preservation Tool [70] to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be available to the wider community when it is in production. The Preservation Tool exports the data in ASCII (.dat) format, extracts data file structure attributes from the .sav file, and exports the syntax in a format readable by a text editor (.sps). The inclusion of the data file attributes ensures the preservation data is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or other tabular data file formats. Once implemented, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode and ready for implementation.
Tool preservation functions:
 
- Exports the data in ASCII format as .dat  
- Extracts the data file structure attributes from the .sav file, then exports the syntax in .sps format readable by a text editor.  
The inclusion of the data file attributes is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or be used to rebuild the data file in other tabular data file formats.  
As part of ADA’s reappraisal process of digital objects, ADA will process all preservation .sav data files to the new preservation format usingthe R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode, ready for implementation.   . 
====Metadata Schemas ====
====Metadata Schemas ====
The Data Documentation Initiative (DDI) [18] metadata schema (Codebook) is the standard metadata schema to describe data collected and used by the Designated Community.  DDI is implemented by the Dataverse application.  
The Data Documentation Initiative (DDI) [18] metadata schema (DDI-Codebook) is the standard metadata schema to describe data collected and used by the Designated Community.  DDI is implemented by the Dataverse application.    
 
The Archive workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36].  Metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase.  The metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure that the integrity of digital objects from deposit to access can be verified against any changes to the data [37].
The ADA archival workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36].  DDI compatible metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase of the workflowMetadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure the integrity of digital objects from deposit to access can be verified against any changes to the data [37].  
 
===Preservation Levels & Retention Periods===
The ADA has one standard level of preservation and retention for all datasets approved for deposit.  Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].
 
===Preservation Measures===
Rights to preserve and disseminate data are primarily managed through the ADA License Agreement and associated Terms of Use [26]. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data.  Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The signed and completed license agreement is stored and preserved along with the Archival Information Package (AIP).
 
===Reappraisal of Digital Objects===
Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.
 
===Deleting Data & Metadata===
Data in the SIP or AIP stage of the archival workflow can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP package cannot be deleted [37]. For data that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the published collection being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to ADA staff with permission levels equivalent to admin. There is no need to tombstone [4] the DOI as the dataset landing page is still available.
 
==References==
[43] Studies of Childhood, Education & Youth (SOCEY) – (https://www.socey.net/repository/)
 
[44] ADA Projects – (https://docs.ada.edu.au/index.php/ADA_Projects)


===PRESERVATION LEVELS & RETENTION PERIODS===
[45] FAIR Principles – (https://ardc.edu.au/resource/fair-data/)[46] CARE Principles for Indigenous Data Governance – (https://ardc.edu.au/resource/the-care-principles/)
ADA has one standard level of preservation and retention for all datasets approved for deposit.  Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].


===PRESERVATION MEASURES ===
[47] ADA OAIS Workflows ADAPT DR Diagram – (https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg)  
Rights to preserve and disseminate data are primarily managed through the License Agreement and the associated Terms of Use.  Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data [26]. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The license agreement (PDF) is stored and preserved with the Archival Information Package (AIP).


===REAPPRAISAL OF DIGITAL OBJECTS ===
[37] Storage & Integrity – (https://docs.ada.edu.au/index.php/Storage_%26_Integrity)[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)
Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological, or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through the formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.  


===DELETING DATA AND METADATA=== 
[48] IBM SPSS – (https://www.ibm.com/spss)[62] Dataverse Project FAIR Principles – (https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse)
Data in the SIP or AIP state can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP state cannot be deleted [37].


For datasets that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the dataset being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to users with the correct permission levels.  There is no need to tombstone [4] the DOI as the dataset landing page is still available.
[70] ADA Preservation Tool – (https://github.com/ADA-ANU/ADA_Preservation_tool)

Latest revision as of 00:35, 11 September 2024

The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42]. The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA.

Scope

ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].

Responsibilities

ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting for Indigenous lead projects and research [44], and developing CARE oriented archival practise.

Objectives

Key objects are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata.

Actions

Formal Migration, Bit Level Integrity & Obsolescence Planning

ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse instances [47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].

Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].

File Formats & Metadata Schemas For Long-Term Preservation

The ADA collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is tabular quantitative data, with a very small proportion of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.

Current Preservation Formats

  • Quantitative: SPSS data (.sav), SPSS syntax (.sps) and the DDI JSON metadata comprise the preservation package.
  • Qualitative: ADA holds a small number of datasets with minimal long-term preservation support currently. Qualitative preservation processes are in development.

Future Preservation Formats

ADA has developed an R Preservation Tool [70] to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be available to the wider community when it is in production. The Preservation Tool exports the data in ASCII (.dat) format, extracts data file structure attributes from the .sav file, and exports the syntax in a format readable by a text editor (.sps). The inclusion of the data file attributes ensures the preservation data is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or other tabular data file formats. Once implemented, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode and ready for implementation.

Metadata Schemas

The Data Documentation Initiative (DDI) [18] metadata schema (DDI-Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application.

The ADA archival workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36]. DDI compatible metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase of the workflow. Metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure the integrity of digital objects from deposit to access can be verified against any changes to the data [37].

Preservation Levels & Retention Periods

The ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].

Preservation Measures

Rights to preserve and disseminate data are primarily managed through the ADA License Agreement and associated Terms of Use [26]. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The signed and completed license agreement is stored and preserved along with the Archival Information Package (AIP).

Reappraisal of Digital Objects

Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.

Deleting Data & Metadata

Data in the SIP or AIP stage of the archival workflow can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP package cannot be deleted [37]. For data that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the published collection being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to ADA staff with permission levels equivalent to admin. There is no need to tombstone [4] the DOI as the dataset landing page is still available.

References

[43] Studies of Childhood, Education & Youth (SOCEY) – (https://www.socey.net/repository/)

[44] ADA Projects – (https://docs.ada.edu.au/index.php/ADA_Projects)

[45] FAIR Principles – (https://ardc.edu.au/resource/fair-data/)[46] CARE Principles for Indigenous Data Governance – (https://ardc.edu.au/resource/the-care-principles/)

[47] ADA OAIS Workflows ADAPT DR Diagram – (https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg)

[37] Storage & Integrity – (https://docs.ada.edu.au/index.php/Storage_%26_Integrity)[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)

[48] IBM SPSS – (https://www.ibm.com/spss)[62] Dataverse Project FAIR Principles – (https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse)

[70] ADA Preservation Tool – (https://github.com/ADA-ANU/ADA_Preservation_tool)