Preservation plan: Difference between revisions
JMcDougall (Sọ̀rọ̀ | contribs) |
JMcDougall (Sọ̀rọ̀ | contribs) |
||
(25 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42]. The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA. | |||
The plan covers data and metadata managed by ADA for digital preservation and reuse, based on the | |||
== Scope == | == Scope == | ||
ADA holds over 1600 datasets and 13,000 data files | ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22]. | ||
== Responsibilities == | == Responsibilities == | ||
ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing | ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting Indigenous lead projects and research [44], and developing CARE oriented archival practise. | ||
== Objectives == | == Objectives == | ||
Key objectives are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata. | |||
==Actions== | ==Actions== | ||
===Formal Migration, Bit Level Integrity & Obsolescence Planning=== | ===Formal Migration, Bit Level Integrity & Obsolescence Planning=== | ||
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse [47] | ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse instances [https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg 47] are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37]. | ||
Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7]. | |||
===File Formats & Metadata Schemas For Long-Term Preservation=== | ===File Formats & Metadata Schemas For Long-Term Preservation=== | ||
The | The ADA collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is tabular quantitative data, with a very small proportion of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation. | ||
Current Preservation Formats | |||
* Quantitative: SPSS .sav | ====Current Preservation Formats==== | ||
* Qualitative: ADA holds a small number of datasets with | *Quantitative: SPSS data (.sav), SPSS syntax (.sps) and the DDI JSON metadata comprise the preservation package. | ||
*Qualitative: ADA holds a small number of datasets with minimal long-term preservation support currently. Qualitative preservation processes are in development. | |||
====Future Preservation Formats ==== | ====Future Preservation Formats ==== | ||
ADA has developed an R Preservation Tool to export SPSS data currently preserved as .sav | ADA has developed an R Preservation Tool [70] to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be available to the wider community when it is in production. The Preservation Tool exports the data in ASCII (.dat) format, extracts data file structure attributes from the .sav file, and exports the syntax in a format readable by a text editor (.sps). The inclusion of the data file attributes ensures the preservation data is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or other tabular data file formats. Once implemented, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode and ready for implementation. | ||
Tool | |||
The inclusion of the data file attributes is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or | |||
====Metadata Schemas ==== | ====Metadata Schemas ==== | ||
The Data Documentation Initiative (DDI) [18] metadata schema (Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application. | The Data Documentation Initiative (DDI) [18] metadata schema (DDI-Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application. | ||
The | The ADA archival workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36]. DDI compatible metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase of the workflow. Metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure the integrity of digital objects from deposit to access can be verified against any changes to the data [37]. | ||
===Preservation Levels & Retention Periods=== | ===Preservation Levels & Retention Periods=== | ||
ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34]. | The ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34]. | ||
===Preservation Measures=== | ===Preservation Measures=== | ||
Rights to preserve and disseminate data are primarily managed through the License Agreement and | Rights to preserve and disseminate data are primarily managed through the ADA License Agreement and associated Terms of Use [26]. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The signed and completed license agreement is stored and preserved along with the Archival Information Package (AIP). | ||
===Reappraisal of Digital Objects=== | |||
Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives. | |||
===Deleting Data & Metadata=== | |||
Data in the SIP or AIP stage of the archival workflow can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP package cannot be deleted [37]. For data that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the published collection being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to ADA staff with permission levels equivalent to admin. There is no need to tombstone [4] the DOI as the dataset landing page is still available. | |||
==References== | |||
[43] Studies of Childhood, Education & Youth (SOCEY) – (https://www.socey.net/repository/) | |||
[44] ADA Projects – (https://docs.ada.edu.au/index.php/ADA_Projects) | |||
[45] FAIR Principles – (https://ardc.edu.au/resource/fair-data/)[46] CARE Principles for Indigenous Data Governance – (https://ardc.edu.au/resource/the-care-principles/) | |||
[47] ADA OAIS Workflows ADAPT DR Diagram – (https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg) | |||
[37] Storage & Integrity – (https://docs.ada.edu.au/index.php/Storage_%26_Integrity)[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf) | |||
[48] IBM SPSS – (https://www.ibm.com/spss)[62] Dataverse Project FAIR Principles – (https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse) | |||
[70] ADA Preservation Tool – (https://github.com/ADA-ANU/ADA_Preservation_tool) |
Latest revision as of 23:27, 28 October 2024
The plan covers data and metadata managed by the ADA for digital preservation and reuse, based on the Open Archival Information System (OAIS) Reference Model [42]. The plan outlines the scope, responsibilities, objectives, and actions for preserving data deposited with the ADA. It does not cover administrative data or other data related to the function of the ADA.
Scope
ADA holds over 1600 datasets and 13,000 data files dating from 1833 until the present day, available for reuse through the ADA Dataverse platform [10]. Most data are focussed within the social sciences as quantitative survey data, but ADA has published a small qualitative collection [43] through a funded project to develop support for archiving of qualitative data. Ongoing preservation of data is provided by the ADA for all data that it is authorised to share [26] and which is deemed suitable once approved at each step of the archival workflow [34], beginning with the deposit appraisal process [22].
Responsibilities
ADA has been responsible for providing data archival and long-term preservation support since 1981 [25], growing over this time in response to the needs of its Designated Community. The ADA keeps pace with and meet user requirements within the technical and user landscape, with active participation through external engagement and memberships [30], in-kind collaborations, and funded projects [44]. ADA ensures data is preserved for future usability under the FAIR (Findable, Accessible, Interoperable and Re-Usable) principles [45] supported through the implementation of FAIR in the Dataverse platform [62]. ADA is actively working towards the realisation of CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [46] for Indigenous Data Governance, supporting Indigenous lead projects and research [44], and developing CARE oriented archival practise.
Objectives
Key objectives are (a) to ensure ADA has administrative, technical and archival processes in place that archive staff and the Designated Community can understand, and (b) to meet effective requirements for long-term preservation and management of data and metadata.
Actions
Formal Migration, Bit Level Integrity & Obsolescence Planning
ADA has no present requirement for format migration of data holdings due to the type of data deposited and reused by the Designated Community. ADA Archival Storage and Dataverse instances 47 are provisioned, hosted and backed up on National Computational Infrastructure (NCI) servers. NCI has procedures for monitoring bit level integrity against the deterioration of their storage media [37].
Obsolescence planning, bit level integrity, and storage migration is managed on behalf of ADA by the NCI [7].
File Formats & Metadata Schemas For Long-Term Preservation
The ADA collection policy [22] outlines preferred data formats suitable for depositing data, and for long-term preservation. Most archived data is tabular quantitative data, with a very small proportion of qualitative data. IBM SPSS [48] data is currently the most common format used by ADA’s Designated Community, so is at present the format used for long-term preservation.
Current Preservation Formats
- Quantitative: SPSS data (.sav), SPSS syntax (.sps) and the DDI JSON metadata comprise the preservation package.
- Qualitative: ADA holds a small number of datasets with minimal long-term preservation support currently. Qualitative preservation processes are in development.
Future Preservation Formats
ADA has developed an R Preservation Tool [70] to export SPSS data currently preserved as .sav to an ASCII format readable by a text editor. The Tool will be available on ADA GitHub and will be available to the wider community when it is in production. The Preservation Tool exports the data in ASCII (.dat) format, extracts data file structure attributes from the .sav file, and exports the syntax in a format readable by a text editor (.sps). The inclusion of the data file attributes ensures the preservation data is readable by a text editor or can be run directly in SPSS to rebuild the .sav, or other tabular data file formats. Once implemented, ADA will process all preservation .sav data files to the new preservation format using the R Preservation Tool. The tool is successfully running on the ADA R Shiny server in test mode and ready for implementation.
Metadata Schemas
The Data Documentation Initiative (DDI) [18] metadata schema (DDI-Codebook) is the standard metadata schema to describe data collected and used by the Designated Community. DDI is implemented by the Dataverse application.
The ADA archival workflow [34] procedures ensure there is sufficient documentation collected for long-term usability and reuse [36]. DDI compatible metadata for each dataset is exported from Dataverse for preservation by archivists during the Ingest phase and Publish phase of the workflow. Metadata is copied to archival storage for preservation of the original (SIP) metadata and data files, and the published (DIP) metadata and data files, to ensure the integrity of digital objects from deposit to access can be verified against any changes to the data [37].
Preservation Levels & Retention Periods
The ADA has one standard level of preservation and retention for all datasets approved for deposit. Approval is managed through the ADA deposit appraisal process [22], continuing through the archival workflow [34].
Preservation Measures
Rights to preserve and disseminate data are primarily managed through the ADA License Agreement and associated Terms of Use [26]. Data users agree to abide by these conditions when they indicate their agreement with the Terms of Use upon requesting access to the data. Completion of the license is part of the deposit appraisal process [22] to ensure appropriate metadata is collected to support future reuse [36]. The signed and completed license agreement is stored and preserved along with the Archival Information Package (AIP).
Reappraisal of Digital Objects
Reappraisal of digital objects is driven by requirements in the Designated Community, or by technological or policy changes within or external to ADA. Changes to curation or preservation levels of digital objects are identified and managed through formal weekly archivist team meetings, including regular participation by the ADA Director, and the ADA Technical Manager as needed [34]. The formal structure of the weekly meetings ensures outcomes are considered from archival, organisational, and technical perspectives.
Deleting Data & Metadata
Data in the SIP or AIP stage of the archival workflow can be deleted by the ADA archivist team if directed by depositors, or for legal reasons, with the archiving team maintaining records relating to the request for deletion. Data in the DIP package cannot be deleted [37]. For data that have been published on ADA’s production Dataverse [10], ADA does not delete them but rather deaccessions them. This process results in the published collection being labelled as “Deaccessioned” in Dataverse and renders its files accessible only to ADA staff with permission levels equivalent to admin. There is no need to tombstone [4] the DOI as the dataset landing page is still available.
References
[43] Studies of Childhood, Education & Youth (SOCEY) – (https://www.socey.net/repository/)
[44] ADA Projects – (https://docs.ada.edu.au/index.php/ADA_Projects)
[45] FAIR Principles – (https://ardc.edu.au/resource/fair-data/)[46] CARE Principles for Indigenous Data Governance – (https://ardc.edu.au/resource/the-care-principles/)
[47] ADA OAIS Workflows ADAPT DR Diagram – (https://docs.ada.edu.au/images/f/f4/CTS_2024_ADA%2BNCI_-_RDS%2BStorage_solutions_JM_%28V4%29_2024-09-06_wiki.jpeg)
[37] Storage & Integrity – (https://docs.ada.edu.au/index.php/Storage_%26_Integrity)[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)
[48] IBM SPSS – (https://www.ibm.com/spss)[62] Dataverse Project FAIR Principles – (https://scholar.harvard.edu/mercecrosas/presentations/fair-guiding-principles-implementation-dataverse)
[70] ADA Preservation Tool – (https://github.com/ADA-ANU/ADA_Preservation_tool)