Workflows: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
No edit summary
 
(One intermediate revision by one other user not shown)
Line 4: Line 4:


   
   
= Upload of Data and Documentation =
= Deposit of Data and Documentation =
The depositing data workflow is documented and available to prospective depositors as a “quick deposit guide” [5].


== Deposit Shell ==
When a depositor contacts the ADA, the proposed deposit is assessed by the ADA for suitability (see R08 Deposit & Appraisal). Once the deposit has been provisionally accepted, an ADA archivist will set up a deposit shell on the ADA Deposit Dataverse instance. The ADA archival workflow is managed across three separate Dataverse installations. See R14 Storage & Integrity on how and where data is stored.
Once the deposit has been provisionally accepted, an ADA archivist will set up a deposit shell on the ADA Deposit Dataverse site. The Deposit Dataverse is the first of three instances of Dataverse used in the archival process by the ADA. The three Dataverse installations (Deposit/Test/Production) are isolated from one another, with only the [https://dataverse.ada.edu.au/ Production Dataverse], the third instance, publicly accessible. See [[Storage & Integrity]] on how and where data is stored.


The deposit shell simply looks like an empty version of a dataset on the [https://dataverse.ada.edu.au/ Production Dataverse]. Other secure file sharing solutions are allowed, however, this should be discussed with the ADA first. For security reason do not send data files by email.
Depositors are instructed to upload all data and supporting documentation files to their Deposit Dataverse. The ADA archivist will prompt the depositor to complete the DDI metadata fields on the deposit shell. The archivist will correspond with the depositor if further information is needed to create complete documentation for their data (See requirements described in R10 Qualty Assurance).  


== Data upload ==  
= Data Processing =
To the Deposit Dataverse, Data-rights holder (or their authorised data depositor) uploads the data files and supporting documentation (e.g. questionnaires, technical reports). The ADA archivist will prompt the depositor to fill in the DDI metadata fields in the Deposit Dataverse as well. The ADA will contact the depositor if further information is needed to create complete documentation for their data (See [[Quality Assurance]] for more details).  
When the deposit shell is created using ADA’s ADAPT tool (described in section R07), each deposit is assigned a unique six-digit ADA Identification (ADAID) number. ADAPT will copy the deposited files to an archive directory identified by the same unique ADAID number as the Submission Information Package (SIP; see R07). Within the SIP, the initial draft deposit remains unchanged so that a complete end-to-end audit trail can be always maintained. The archivists generate a copy of the data to update the SIP material as required. Archival and working dircetories are accessed via a Remote Desktop Service (RDS) that is managed by the NCI (see R14 on Storage and Integrity).  


Archivists curate the data and documentation as agreed with the depositor. The level of curation may depend on the type of data (e.g., quantitative or qualitative), the perceived value of the data to the designated community, its sensitivity, or other factors as determined in consultation with the depositor. The archivist will check for disclosure risk and liaise with the depositor about how best to mitigate any risks identified. Data will also be checked for re-usability, including appropriate metadata and consistent mapping to supporting documentation such as data dictionary or user guide. Proposed changes to the data are detailed in a Processing Report sent to the Data Owner’s for approval prior to the changes being made. All agreed changes are recorded in the curation syntax as part of the AIP.   
= Data Processing =
== Submission Information Package ==
After the upload of the data is complete [[ADAPT]] assigns each draft deposit a unique six-digit ADA Identification (ADAID) number. The complete draft submission of the data is then stored by ADAPT to an archive folder structure with the same unique ADAID number hosted by the National Computational Infrastructure ([https://nci.org.au/ NCI]) as the Submission Information Package (SIP). Within the SIP, the initial draft deposit remains unchanged so that a complete end-to-end audit trail can be maintained at all times. The archivists use a copy of the data to perform updates and amendments to the material as required. The NCI storage and working areas are accessed via a Remote Desktop Service (RDS) that is managed by the NCI.   


== Curation ==
= Review and Publication =
Once all agreed changes to the data and metadata have been made, the archivist will set up a preview on the Test Dataverse instance that reflects the intended final production version of both the metadata and files. Once the data owner/depositor has approved the preview, it is duplicated on the Production Dataverse instance using the ADAPT tool. Here the data is published, searchable, and available for access request. DDI metadata is always publicly accessible, as is all project documentation files (unless depositors have specified otherwise). Data access is typically restricted and can be downloaded subject to data access criteria [8], including at minimum an ADA account with a verified institutional email and sufficient responses to any “guestbook” questions (subject to the ADA License Agreement and Terms of Access; see section R02 for details).  Access criteria are recorded on ADA’s internal wiki (not publicly available) for reference by access management staff. 


Trained ADA archivist staff (see [[Expertise & Guidance]]) can perform various levels of curation as agreed with the data owner/depositor. The level of curation may depend on the type of dataset (quantitative or qualitative) deposited, the importance of the dataset and its confidentiality (government of longitudinal data), or other factors as determined in consultation with the data depositor. For all types of data the ADA archivist will check for privacy risks and liaise with the depositor about how best to mitigate them. For qualitative data, this could mean that only the transcript of an interview is published (without the recording) or even just an interview summary, depending in the level of sensitivity in the data. For quantitative data, variables with a particularly high re-identification risk can be relegated to a separate file, which will be published with additional access conditions. All data will be checked for re-usability, e.g. clear labels. In addition to that, tabular data is exported to SPSS, STATA, SAS und CSV formats for publication. All proposed changes to the data are captured in a Processing Report for the deposit. This report is sent to the Data Owner’s for approval prior to the changes being made. All agreed changes are recorded in the curation syntax (SPSS or R).  
Changes or updates to the data files of an already published deposit are handled via the above deposit and processing workflows. Changes are automatically version controlled in Dataverse. Major changes, that is a change to the data, result in a full versioning (i.e. Version 1.0 becomes Version 2.0), while a minor change such as the addition of metadata results in a sub-versioning (i.e. from Version 1.0 to Version 1.1).  


= Review of Data and Metadata =
= Preservation =
== Cross Check ==
At publication, preservation versions of the DDI metadata are exported using Dataverse export functionality.  The metadata is stored in a preservation sub-directory with that deposit’s ADAID in the archive directory, along with a copy of the published SPSS data file(s) and SPSS syntax. The Preservation Plan [9] outlines how ADA manages long term preservation of data and metadata for reuse.  
Once all agreed changes to the data and metadata have been made, the ADA archivist will set up a preview page on the second instance of Dataverse, the Test Dataverse, that reflects the current state of the metadata and files. The data owner/depositor will be provided with a private URL to review the data.


== License, Terms & Conditions, Access conditions ==  
= Adjusting Workflows, Decision Handling, and Change Management =  
Before a dataset can be published, the data owner or data rights holder has to sign the license forms, see [[Rights Management]]. In these documents to terms & conditions for the data a specified and the access conditions are set out.  
The ADA Archive Team meets weekly with the ADA Director to discuss workflows and decisions as required. These meetings are minuted and decisions documented on the ADA internal wiki.  


=References =
= Publication =  
 
Once the data owner/depositor has approved the preview version, it is copied to the third instance of Dataverse, the [https://dataverse.ada.edu.au/ Production Dataverse]. On this instance of Dataverse, the data is published and can be requested by external users. For all datasets, the metadata is freely available for viewing. Data itself can be downloaded subject to the user’s fulfilment of the data access criteria. This usually involves providing a verified email address and answering a number of guestbook questions. The access criteria depend on the licensing agreement the owner has signed with the ADA, before publication of the data. For each dataset, the access criteria are formalised as Business Rules and are updated and stored on the ADA’s internal wiki site (These pages are not publicly available)
[34] Workflows – (https://docs.ada.edu.au/index.php/Workflows)  


[9] Preservation Plan – (https://docs.ada.edu.au/index.php/Preservation_plan)  
== Updates and Version control ==
Changes or updates to the data files of an already published deposit are treated like a new deposit, i.e. a new SIP, AIP and DIP are created. Changes to published datasets are also automatically version controlled through the Dataverse application. Major changes, that is a change to the data, result in a new version release (i.e. Version 0.0 becomes Version 1.0), whilst a minor change such as the addition of metadata results in a sub-version uplift (i.e. from Version 0.0 to Version 0.1), see [[Provenance and authenticity]].


= Preservation =
[5] Deposit guidelines – (https://docs.ada.edu.au/index.php/Quick_Deposit_Guide)
After a dataset is successfully published on the ADA Dataverse for access, preservation versions of the DDI metadata are exported using the Dataverse export functionality.  The metadata is stored in the preservation directory, along with a copy of the published SPSS data file(s) and SPSS syntax. [https://docs.ada.edu.au/index.php/Preservation_plan  The Preservation Plan] outlines how ADA manages long term preservation of data and metadata for reuse.

Latest revision as of 00:26, 12 September 2024

Assessment for Suitability of Deposit

When a prospective depositor has made contact with the ADA, the deposit request is assessed by the director or deputy director of the ADA for suitability (see Deposit Appraisal & Collection Policy).


Deposit of Data and Documentation

The depositing data workflow is documented and available to prospective depositors as a “quick deposit guide” [5].

When a depositor contacts the ADA, the proposed deposit is assessed by the ADA for suitability (see R08 Deposit & Appraisal). Once the deposit has been provisionally accepted, an ADA archivist will set up a deposit shell on the ADA Deposit Dataverse instance. The ADA archival workflow is managed across three separate Dataverse installations. See R14 Storage & Integrity on how and where data is stored.

Depositors are instructed to upload all data and supporting documentation files to their Deposit Dataverse. The ADA archivist will prompt the depositor to complete the DDI metadata fields on the deposit shell. The archivist will correspond with the depositor if further information is needed to create complete documentation for their data (See requirements described in R10 Qualty Assurance).

Data Processing

When the deposit shell is created using ADA’s ADAPT tool (described in section R07), each deposit is assigned a unique six-digit ADA Identification (ADAID) number. ADAPT will copy the deposited files to an archive directory identified by the same unique ADAID number as the Submission Information Package (SIP; see R07). Within the SIP, the initial draft deposit remains unchanged so that a complete end-to-end audit trail can be always maintained. The archivists generate a copy of the data to update the SIP material as required. Archival and working dircetories are accessed via a Remote Desktop Service (RDS) that is managed by the NCI (see R14 on Storage and Integrity).

Archivists curate the data and documentation as agreed with the depositor. The level of curation may depend on the type of data (e.g., quantitative or qualitative), the perceived value of the data to the designated community, its sensitivity, or other factors as determined in consultation with the depositor. The archivist will check for disclosure risk and liaise with the depositor about how best to mitigate any risks identified. Data will also be checked for re-usability, including appropriate metadata and consistent mapping to supporting documentation such as data dictionary or user guide. Proposed changes to the data are detailed in a Processing Report sent to the Data Owner’s for approval prior to the changes being made. All agreed changes are recorded in the curation syntax as part of the AIP.

Review and Publication

Once all agreed changes to the data and metadata have been made, the archivist will set up a preview on the Test Dataverse instance that reflects the intended final production version of both the metadata and files. Once the data owner/depositor has approved the preview, it is duplicated on the Production Dataverse instance using the ADAPT tool. Here the data is published, searchable, and available for access request. DDI metadata is always publicly accessible, as is all project documentation files (unless depositors have specified otherwise). Data access is typically restricted and can be downloaded subject to data access criteria [8], including at minimum an ADA account with a verified institutional email and sufficient responses to any “guestbook” questions (subject to the ADA License Agreement and Terms of Access; see section R02 for details). Access criteria are recorded on ADA’s internal wiki (not publicly available) for reference by access management staff.

Changes or updates to the data files of an already published deposit are handled via the above deposit and processing workflows. Changes are automatically version controlled in Dataverse. Major changes, that is a change to the data, result in a full versioning (i.e. Version 1.0 becomes Version 2.0), while a minor change such as the addition of metadata results in a sub-versioning (i.e. from Version 1.0 to Version 1.1).

Preservation

At publication, preservation versions of the DDI metadata are exported using Dataverse export functionality. The metadata is stored in a preservation sub-directory with that deposit’s ADAID in the archive directory, along with a copy of the published SPSS data file(s) and SPSS syntax. The Preservation Plan [9] outlines how ADA manages long term preservation of data and metadata for reuse.

Adjusting Workflows, Decision Handling, and Change Management

The ADA Archive Team meets weekly with the ADA Director to discuss workflows and decisions as required. These meetings are minuted and decisions documented on the ADA internal wiki.

References

[34] Workflows – (https://docs.ada.edu.au/index.php/Workflows)

[9] Preservation Plan – (https://docs.ada.edu.au/index.php/Preservation_plan)

[5] Deposit guidelines – (https://docs.ada.edu.au/index.php/Quick_Deposit_Guide)