Workflows: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 6: Line 6:
= 2. Upload of Data and Documentation =
= 2. Upload of Data and Documentation =


== 2.1 Deposit shell ==
== 2.1 Deposit Shell ==
Once the deposit has been provisionally accepted an ADA archivist will set up a deposit shell on the ADA Deposit Dataverse site. The Deposit Dataverse is the first of three instances of Dataverse used in the archival process by the ADA. The three Dataverse installations are isolated from one another, with only the Production Dataverse, the third instance, publicly accessible.  
Once the deposit has been provisionally accepted an ADA archivist will set up a deposit shell on the ADA Deposit Dataverse site. The Deposit Dataverse is the first of three instances of Dataverse used in the archival process by the ADA. The three Dataverse installations (Deposit/Test/Production) are isolated from one another, with only the [https://dataverse.ada.edu.au/ Production Dataverse], the third instance, publicly accessible.
 
The deposit shell simply looks like an empty version of a dataset on the [https://dataverse.ada.edu.au/ Production Dataverse]. Other secure file sharing solutions are allowed, however, this should be discussed with the ADA first. For security reason do not send data files by email. 


== 2.2 Data upload ==  
== 2.2 Data upload ==  
To the Deposit shell, the data depositor uploads the data files. Please refer to [[2. Deposit Preparation]] and [[Preferred Deposit Formats]] for details on how to prepare data files for upload.
A data deposit has to be accompanied by supporting documentation such as questionnaires and technical reports. Any document that helps a secondary user understand and use the data appropriately can be considered for this. These documents are also uploaded to the data shell. 


To the Deposit Dataverse, Data Owner (or their authorised data depositor) uploads the data files and supporting documentation (e.g. questionnaires, technical reports). The ADA archivist will prompt the depositor to fill in the DDI metadata fields in the Deposit Dataverse as well. The ADA will contact the depositor if further information is needed to create complete documentation for their data. 


== 2.3 Metadata ==
== 2.3 Metadata ==
The ADA uses [https://ddialliance.org/ Data Documentation Initiative] (DDI) standards for metadata. The depositor should fill in as many metadata fields as possible in the deposit shell. The ADA will contact the depositor if further information is needed to create complete documentation for their data.


   
   
=  3. Data Curation Process =
=  3. Data Curation Process =
From here, draft deposits are each assigned a unique six-digit ADA Identification (ADAID) number. The complete draft submission is then saved to an archive folder structure with the same unique ADAID number hosted by the National Computational Infrastructure (NCI) as the Submission Information Package (SIP). Within the SIP, the initial draft deposit remains unchanged so that a complete end-to-end audit trail can be maintained at all times. The archivists uses a copy of the data to perform updates and amendments to the material as required. The NCI storage and working areas are accessed via a Remote Desktop Service (RDS) that is managed by the NCI.  
== 3.1 Submission Information Package ==
Draft deposits are each assigned a unique six-digit ADA Identification (ADAID) number. The complete draft submission is then saved to an archive folder structure with the same unique ADAID number hosted by the National Computational Infrastructure (NCI) as the Submission Information Package (SIP). Within the SIP, the initial draft deposit remains unchanged so that a complete end-to-end audit trail can be maintained at all times. The archivists uses a copy of the data to perform updates and amendments to the material as required. The NCI storage and working areas are accessed via a Remote Desktop Service (RDS) that is managed by the NCI.  


   
   
 
== 3.2 Data Processing ==
Trained ADA archivist staff can perform various levels of curation as agreed with the data owner/depositor. The level of curation may depend on the type of dataset (quantitative or qualitative) deposited, the importance of the dataset and its confidentiality (government of longitudinal data), or other factors as determined in consultation with the data depositor. All proposed changes to the data are captured in a Processing Report for the deposit. This report is sent to the Data Owner’s for approval prior to the changes being made. All agreed changes are tracked and retraceable in the curation syntax (SPSS or R). The processed data and supporting documentation files are converted to preservation formats suitable for long term storage and are saved in the archive file structure as the Archival Information Package (AIP). The Processing Reports are also retained in the archive and form part of the AIP. Approved changes can also be made to the data, supporting information and metadata, by the Data Owner (or if authorised the data depositor) when the information is still in a draft format in the Deposit Dataverse if required. All copies of syntax and superseded data/documents are also retained in an archival form as part of the AIP.   
Trained ADA archivist staff can perform various levels of curation as agreed with the data owner/depositor. The level of curation may depend on the type of dataset (quantitative or qualitative) deposited, the importance of the dataset and its confidentiality (government of longitudinal data), or other factors as determined in consultation with the data depositor. All proposed changes to the data are captured in a Processing Report for the deposit. This report is sent to the Data Owner’s for approval prior to the changes being made. All agreed changes are tracked and retraceable in the curation syntax (SPSS or R). The processed data and supporting documentation files are converted to preservation formats suitable for long term storage and are saved in the archive file structure as the Archival Information Package (AIP). The Processing Reports are also retained in the archive and form part of the AIP. Approved changes can also be made to the data, supporting information and metadata, by the Data Owner (or if authorised the data depositor) when the information is still in a draft format in the Deposit Dataverse if required. All copies of syntax and superseded data/documents are also retained in an archival form as part of the AIP.   


   
   
= 4. Review of Data and Metadata =
= 4. Review of Data and Metadata =
Once all agreed changes to the data and metadata have been made, the ADA archivist will set up a preview page on the second instance of Dataverse, the Test Dataverse, that reflects the current state of the metadata and files. The data owner/depositor will be provided with a private URL to review the data.
== 4.1 Cross Check ===
Once all agreed changes to the data and metadata have been made, the ADA archivist will set up a preview page on the second instance of Dataverse, the Test Dataverse, that reflects the current state of the metadata and files. The data owner/depositor will be provided with a private URL to review the data.
 
== 4.2 License, Terms & Conditions, Access conditions ==


   
   
Line 48: Line 57:


The combination of the data (in a variety of formats), metadata and supporting documentation collectively forms the Dissemination Information Package (DIP) and is accessible through the ADA Production Dataverse site. For all datasets, the metadata is freely available for viewing. Data itself can be downloaded subject to the Data Owner’s licensing agreement and the user fulfilment of the data access criteria. This usually involves providing a verified email address and answering a number of guestbook questions. The access criteria for each dataset are formalised as Business Rules and are updated and stored on the ADA’s internal wiki site (These pages are not publicly available).
The combination of the data (in a variety of formats), metadata and supporting documentation collectively forms the Dissemination Information Package (DIP) and is accessible through the ADA Production Dataverse site. For all datasets, the metadata is freely available for viewing. Data itself can be downloaded subject to the Data Owner’s licensing agreement and the user fulfilment of the data access criteria. This usually involves providing a verified email address and answering a number of guestbook questions. The access criteria for each dataset are formalised as Business Rules and are updated and stored on the ADA’s internal wiki site (These pages are not publicly available).
== Access and Access Restrictions ===

Revision as of 03:58, 20 June 2024

1. Assessment for Suitability of Deposit

When a prospective depositor has made contact with the ADA, the deposit request is assessed by the director or deputy director of the ADA for suitability (see Deposit Appraisal & Collection Policy).


2. Upload of Data and Documentation

2.1 Deposit Shell

Once the deposit has been provisionally accepted an ADA archivist will set up a deposit shell on the ADA Deposit Dataverse site. The Deposit Dataverse is the first of three instances of Dataverse used in the archival process by the ADA. The three Dataverse installations (Deposit/Test/Production) are isolated from one another, with only the Production Dataverse, the third instance, publicly accessible.

The deposit shell simply looks like an empty version of a dataset on the Production Dataverse. Other secure file sharing solutions are allowed, however, this should be discussed with the ADA first. For security reason do not send data files by email.

2.2 Data upload

To the Deposit shell, the data depositor uploads the data files. Please refer to 2. Deposit Preparation and Preferred Deposit Formats for details on how to prepare data files for upload.

A data deposit has to be accompanied by supporting documentation such as questionnaires and technical reports. Any document that helps a secondary user understand and use the data appropriately can be considered for this. These documents are also uploaded to the data shell.


2.3 Metadata

The ADA uses Data Documentation Initiative (DDI) standards for metadata. The depositor should fill in as many metadata fields as possible in the deposit shell. The ADA will contact the depositor if further information is needed to create complete documentation for their data.


3. Data Curation Process

3.1 Submission Information Package

Draft deposits are each assigned a unique six-digit ADA Identification (ADAID) number. The complete draft submission is then saved to an archive folder structure with the same unique ADAID number hosted by the National Computational Infrastructure (NCI) as the Submission Information Package (SIP). Within the SIP, the initial draft deposit remains unchanged so that a complete end-to-end audit trail can be maintained at all times. The archivists uses a copy of the data to perform updates and amendments to the material as required. The NCI storage and working areas are accessed via a Remote Desktop Service (RDS) that is managed by the NCI.


3.2 Data Processing

Trained ADA archivist staff can perform various levels of curation as agreed with the data owner/depositor. The level of curation may depend on the type of dataset (quantitative or qualitative) deposited, the importance of the dataset and its confidentiality (government of longitudinal data), or other factors as determined in consultation with the data depositor. All proposed changes to the data are captured in a Processing Report for the deposit. This report is sent to the Data Owner’s for approval prior to the changes being made. All agreed changes are tracked and retraceable in the curation syntax (SPSS or R). The processed data and supporting documentation files are converted to preservation formats suitable for long term storage and are saved in the archive file structure as the Archival Information Package (AIP). The Processing Reports are also retained in the archive and form part of the AIP. Approved changes can also be made to the data, supporting information and metadata, by the Data Owner (or if authorised the data depositor) when the information is still in a draft format in the Deposit Dataverse if required. All copies of syntax and superseded data/documents are also retained in an archival form as part of the AIP.


4. Review of Data and Metadata

4.1 Cross Check =

Once all agreed changes to the data and metadata have been made, the ADA archivist will set up a preview page on the second instance of Dataverse, the Test Dataverse, that reflects the current state of the metadata and files. The data owner/depositor will be provided with a private URL to review the data.

4.2 License, Terms & Conditions, Access conditions

5. Publication

Once the data owner/depositor has approved the preview version, it is copied to the third instance of Dataverse, the Production Dataverse. On this instance of Dataverse, the data is published and can be requested by external users.


Changes made to published datasets are Version Controlled and stored within the NCI File structure as part of the AIP. Changes to published Datasets are also automatically Version Controlled through the Dataverse application. Major changes, that is a change to the data or metadata, result in a new version release (i.e. Version 0.0 becomes Version 1.0), whilst a Minor change such as the addition of Supporting Documentation results in a sub-version uplift (i.e. from Version 0.0 to Version 0.1).


As a rule, all Quantitative deposits are processed at Level C, Enhanced curation, or Level D, data level curation. As a minimum a few standard data checks are undertaken and full DDI compliant documentation is created. Data files are then converted to, and made available for download subject to access conditions, in four common formats (SPSS, SAS, Stata and CSV are the standard outputs).


For qualitative data, the ADA archivist will check for privacy risks and liase with the depositor about how best to mitigate them. This could mean that only the transcript of an interview is published (without the recording) or even just an interview summary, depending in the level of sensitivity in the data.



The combination of the data (in a variety of formats), metadata and supporting documentation collectively forms the Dissemination Information Package (DIP) and is accessible through the ADA Production Dataverse site. For all datasets, the metadata is freely available for viewing. Data itself can be downloaded subject to the Data Owner’s licensing agreement and the user fulfilment of the data access criteria. This usually involves providing a verified email address and answering a number of guestbook questions. The access criteria for each dataset are formalised as Business Rules and are updated and stored on the ADA’s internal wiki site (These pages are not publicly available).

Access and Access Restrictions =