Storage & Integrity

From ADA Public Wiki
Revision as of 01:42, 4 December 2025 by Mmcgale (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The ADA archival workflow [34] outlines processes to manage the integrity of the data and metadata flow through the Archive. The ADA Archival Workflow Diagram [103] reflects the distinct deposit, ingest, curation, access, and storage locations for each of the archival phases.  

Data and Storage Management

The Dataverse software [49] supports data management and auditing, and minimal reporting, through user accounts. Detailed reports not available via the Dataverse UI/API are created with Metabase [82] queries into the backend database.

ADA has progressively developed versions of the ADA Processing Tool (ADAPT [6]) based on the OAIS Reference Model [42]. ADAPT enables archivists to programmatically manage movement of data and metadata between Dataverse instances and archival storage, to manage data integrity based on the OAIS Reference Model specification [42].

As part of this OAIS RM adherence, each archivist-actioned ADAPT [6] function creates or appends to a log using a standard provenance ontology (PROV-O) [3]. This log is stored as part of the OAIS Archival Information Package (AIP) [42] to support auditing of archival activities actioned through ADAPT. ADAPT version 3 has been deployed into production, reflecting the ADA's commitment to continuously review and develop strategies that minimise the risk of manual data and metadata management, versus building this management into a software application.

Data and metadata are stored with each Dataverse instance, and versioned at publication of a dataset. ADAPT [6] is used to move data and metadata between the Dataverse instances as required at each archival phase.   

Dataverse exports metadata in several formats. The JSON export format of the DDI metadata also includes Dataverse system metadata such as fixity checks on uploaded data files (checksum MD5). During each of the OAIS Ingest Phase and Publish Phase [42], ADAPT exports/copies this JSON metadata to archival storage for preservation of 1) the original OAIS Submission Information Package (SIP) [42] metadata and data files, and 2) the published OAIS Dissemination Information Package (DIP) [42] metadata and data files. This export ensures the integrity of digital objects from deposit to access can be verified against any changes to the data.


Strategy for Multiple Copies

Multiple copies of data corresponding to the OAIS Reference Model’s [42] SIP -> AIP -> DIP phases are created and managed primarily programmatically with the ADAPT tool implementation (steps labelled with 'ADAPT' in [6]) creating Dataverse datasets and ARCHIVAL STORAGE [6] directories as required for each of the 3 phases, supplemented with 2 manual archivist steps [6].


1. Submission Information Package (SIP) [6]:

Copies of Metadata and data exist in 2 locations:
a. deposit.ada.edu.au (Dataverse database & filesystem)
  • DDI metadata
  • Data
b. ARCHIVAL STORAGE [103]
  • Data
The “Data” in b. is the same as the “Data” in a.


2. Archival Information Package (AIP) [6]:

Copies of data and metadata exist in 3 locations:
a. Archivist Processing Space [103]
  • Original SIP data which is then curated by archivist
b. ARCHIVAL STORAGE [103]
  • Generated AIP Data
c. Dataverse TEST [95]
  • DDI Metadata (copied from deposit.ada.edu.au)
  • Data (from 2b.)


3. Dissemination Information Package (DIP) [6]:

Copies of data and metadata exist in 2 locations:
a. dataverse.ada.edu.au (production) [10]
  • DDI metadata
  • Data
b. ARCHIVAL STORAGE [103]
  • DDI metadata
  • Data
The “DDI Metadata” and “Data” in b. is the same as the “DDI Metadata” and “Data” in a.


Updating Published Data

- Any changes to existing files, or addition of any files, that are to be made to a *published* dataset on Production Dataverse [10] results in the creation of a major version of that dataset when it is re-published.

- This scenario (updating files of an existing published dataset on Production Dataverse) necessitates a new SIP->AIP->DIP process, managed via ADAPT [6], and using the original SIP dataset on the Deposit Dataverse [94].

- Because files in the SIP/AIP/DIP will be changed, ADAPT creates a superseded-<date> folder in the ARCHIVAL STORAGE [103] for that dataset, and all of the SIP/AIP/DIP file directories relating to the current (as yet unchanged) dataset are moved to this superseded folder.

- The new set of files from the updated Deposit [94] dataset are then copied to the ARCHIVAL STORAGE [103] for this dataset and represent the new SIP. 

- The archivist then uses ADAPT [6] to reinitiate the SIP->AIP->DIP flow, which then proceeds as previously described.


Risk Management

The OAIS model [42] is implemented with data stored for each of the SIP, AIP, DIP directories in the ARCHIVAL STORAGE [103], and on the Dataverse backend servers. Through use of the ADAPT tool [6], the archive team is forced through the implemented OAIS model to manage copies of files across the SIP/AIP/DIP directories in the ARCHIVAL STORAGE [103]. ADAPT's adherence to the OAIS model [42] minimises the probability of the data copies becoming unsynchronised.


Deterioration Handling

ADA ARCHIVAL STORAGE and Dataverse instances [103] are provisioned, hosted and backed up on servers at the National Computational Infrastructure (NCI). NCI has procedures for monitoring bit level integrity against the deterioration of their storage media. NCI notifies ADA if there are plans for any necessary upgrades and when those upgrades will take place.  

The ADA technical team works with NCI to keep server operating systems updated to supported versions. NCI informs the ADA technical team when a Virtual Machine (VM) operating system (OS) is no longer going to be supported. NCI provisions new VMs when necessary, and the ADA technical team moves Dataverse installations to these new VMs.


Deletion Processes

Data in the SIP or AIP state can be deleted by the ADA archivist team if directed by depositors, or for legal reasons. The archiving team retains any records relating to the request for deletion. Data in the DIP state (published on Dataverse Production [10]) cannot be deleted.  

As datasets on Deposit and Test Dataverse are not published, deleting a dataset from Deposit does not create problems with the temporary test (non-production) DOI that is created for it. Simply deleting the dataset deletes everything relating to it including the files stored in the backend server /files/xxxxxxx directory. The DOI does not have to be tombstoned [4] as the DOI prefix is a fake or test prefix, unrelated to ADA’s production DOI prefix. The data stored in the ARCHIVAL STORAGE [103] SIP and AIP subdirectories are manually deleted by the responsible archivist.  

If required, ADA will deaccession (rather than delete) DIP datasets that have been published on the production Dataverse [10]. This results in the dataset being labelled as "Deaccessioned" in Dataverse and renders the Dataverse data files accessible only to users with the correct permission levels (generally only ADA staff). The files remain in the Dataverse server /files/zzzzzz/ directory and in the ARCHIVAL STORAGE [103] DIP directory. The DOI is marked as “:unav Dataset” in Datacite and resolves to the deaccessioned Dataverse Production dataset.

References

[34] Workflows – (https://docs.ada.edu.au/index.php/Workflows)

[103] ADA Archival Workflow Diagram - (https://docs.ada.edu.au/index.php/Main_Page#ADA_Archival_Workflow_Diagram)

[49] The Dataverse Project – (https://dataverse.org)

[82] Metabase - (https://www.metabase.com/)

[6] ADAPT - (https://docs.ada.edu.au/index.php/ADAPT)

[42] Open Archival Information System (OAIS) Reference Model - (https://ccsds.org/publications/magentabooks/entry/3054/)

[3] PROV-O – (https://www.w3.org/TR/2013/REC-prov-o-20130430/)

[95] ADA Test Dataverse - (redacted)

[10] ADA Production Dataverse - (https://dataverse.ada.edu.au/)

[94] ADA Deposit Dataverse - (https://deposit.ada.edu.au)

[4] Datacite Tombstone – (https://support.datacite.org/docs/tombstone-pages)