Preparation of Data: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 10: Line 10:
****[[Preferred Deposit Formats]]
****[[Preferred Deposit Formats]]
****[[File & Folder Naming Conventions]]
****[[File & Folder Naming Conventions]]
****[[Zipping and Encrypting Files and Folders]]
****[[Double-Zipping Files and Folders]]
*****[[Instructions on how to Double-Zip a file or folder]]
*****[[Instructions on how to Double-Zip]]
***[[Collection of Data]]
***[[Collection of Data]]


Line 30: Line 30:
The correct naming of files and folders ensures that both the ADA Staff and secondary users are able to easily identify the files when accessing the information through Dataverse.
The correct naming of files and folders ensures that both the ADA Staff and secondary users are able to easily identify the files when accessing the information through Dataverse.


  4. [[Zipping and Encrypting Files and Folders|Zip and password Encrypt]] the data files and folders in preparation for the population of the 'Shell Dataverse and Dataset' in Section 4.
  4. [[Double-Zipping Files and Folders|Double-Zip]] any CSV, Stata, SPSS, SAS and Excel data files (or folders containing these files) in preparation for population of the 'Shell Dataverse and Dataset' in Section 4.


'''Zipping and password encryption is required on all data files and some supporting documentation file formats (Stata, SPSS, CSV and Excel files).'''
There is also some useful information on organising data contained in spreadsheets in the paper here: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989, this can be used as best practice for researchers and helps to minimise potential harmonisation downstream.


If a password protected layer of encrypted zipping is not added to the file or folder prior to uploading to Dataverse, during the upload process there is a known issue with Dataverse Version 4.6.1 that removes a single layer of zipping from the file or folder. Removal of the layer of zipping can also alter the files formatting if it is one of the aforementioned file types. Therefore the password protected layer of encrypted zipping is required to ensure that a single layer remains post upload to protect not only your data from intrusion but also the original files formatting. Dataverse is unable to remove the layer of encrypted zipping if protected by a password.
Although it is not recommended, if a password is not added you will need to double-zip the files and folders as a single layer of zipping will then be removed by Dataverse during the upload process. The double-layer of zipping will ensure that a single layer remains post upload, and therefore the files original formatting will be retained. In this case, your data will not be protected during the upload if it were to be intercepted.
If a data file is uploaded directly with no zipping, or a only single layer of zipping without a password, dataverse will add an 'Explore' button that enables functionality that the ADA do not want available at the present time. In the event that data files are uploaded without a password protected layer of zipping or double-zipping, your deposit will not pass ADA Quakity Assurance checks and will be returned to you for rectification.


==Notes==
==Notes==

Latest revision as of 23:19, 21 November 2021

In order to collect data, it does not matter whether the Data Custodian or Data Owner has a relationship with the relevant Data Subject, or whether the data and any Personal Information contained within it, was initially processed or obtained from the Data Subject by another entity. However, by collecting any form of Personal Information, the Data Custodian and Data Owner will be bound by The Privacy Act 1988. This means that both the Data Custodian and Data Owner have a responsibility to ensure that the identity of Data Subjects is protected. To do this correctly, a number of Data Protections may need to be carried out to prepare the data for depositing.

Preparation of Data Guidance Notes

1. Apply appropriate Privacy Act 1988 data protections to the data along with any other Data Protections required to de-identify the data.

Applying appropriate protections through de-identification ensures that the material is both safe to upload to the ADA Dataverse and that the Disclosure Risk is minimised.

2. Ensure that the data files are saved in an appropriate file format.

Having the files in one of the ADA preferred formats ensures that the ADA are in a strong position to forward migrate the data in response to any technology advancements and maximises the chances that other users will be able to access and use the data.

3. Ensure that the data files and folders are named correctly.

The correct naming of files and folders ensures that both the ADA Staff and secondary users are able to easily identify the files when accessing the information through Dataverse.

4. Double-Zip any CSV, Stata, SPSS, SAS and Excel data files (or folders containing these files) in preparation for population of the 'Shell Dataverse and Dataset' in Section 4.

There is also some useful information on organising data contained in spreadsheets in the paper here: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989, this can be used as best practice for researchers and helps to minimise potential harmonisation downstream.


Notes

Privacy Act 1988: https://www.legislation.gov.au/Details/C2019C00025

Frequently Asked Questions