Preparation of Data: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
*[[2. Deposit Preparation]]
**[[Collect and Prepare Data File(s)]]
***[[Preparation of Data]]
****[[The Privacy Act 1988]]
****[[De-Identification]]
****[[Primary Risks Associated with Data Breach]]
****[[Common Disclosure Risk Factors]]
****[[Data Protections]]
****[[Data Treatment Techniques]]
****[[Preferred Deposit Formats]]
****[[File & Folder Naming Conventions]]
****[[Double-Zipping Files and Folders]]
*****[[Instructions on how to Double-Zip]]
***[[Collection of Data]]
In order to [[Collection of Data|collect data]], it does not matter whether the [[Glossary of Terms|Data Custodian]] or [[Glossary of Terms|Data Owner]] has a relationship with the relevant [[Glossary of Terms|Data Subject]], or whether the data and any [[Glossary of Terms|Personal Information]] contained within it, was initially processed or obtained from the [[Glossary of Terms|Data Subject]] by another entity. However, by collecting any form of [[Glossary of Terms|Personal Information]], the [[Glossary of Terms|Data Custodian]] and [[Glossary of Terms|Data Owner]] will be bound by [[The Privacy Act 1988|The Privacy Act 1988]]. This means that both the [[Glossary of Terms|Data Custodian]] and [[Glossary of Terms|Data Owner]] have a responsibility to ensure that the identity of [[Glossary of Terms|Data Subjects]] is protected. To do this correctly, a number of [[Data Protections]] may need to be carried out to prepare the data for depositing.
In order to [[Collection of Data|collect data]], it does not matter whether the [[Glossary of Terms|Data Custodian]] or [[Glossary of Terms|Data Owner]] has a relationship with the relevant [[Glossary of Terms|Data Subject]], or whether the data and any [[Glossary of Terms|Personal Information]] contained within it, was initially processed or obtained from the [[Glossary of Terms|Data Subject]] by another entity. However, by collecting any form of [[Glossary of Terms|Personal Information]], the [[Glossary of Terms|Data Custodian]] and [[Glossary of Terms|Data Owner]] will be bound by [[The Privacy Act 1988|The Privacy Act 1988]]. This means that both the [[Glossary of Terms|Data Custodian]] and [[Glossary of Terms|Data Owner]] have a responsibility to ensure that the identity of [[Glossary of Terms|Data Subjects]] is protected. To do this correctly, a number of [[Data Protections]] may need to be carried out to prepare the data for depositing.
   
   
Line 15: Line 30:
The correct naming of files and folders ensures that both the ADA Staff and secondary users are able to easily identify the files when accessing the information through Dataverse.
The correct naming of files and folders ensures that both the ADA Staff and secondary users are able to easily identify the files when accessing the information through Dataverse.


  4. [[Double-Zipping Files and Folders|Double-Zip]] the data files and folders in preparation for the [[Population of 'Shell Dataverse & Dataset'|population of the 'Shell Dataverse and Dataset']].
  4. [[Double-Zipping Files and Folders|Double-Zip]] any CSV, Stata, SPSS, SAS and Excel data files (or folders containing these files) in preparation for population of the 'Shell Dataverse and Dataset' in Section 4.


Double-zipping is required on all data files and folder uploads. A single layer of zipping is removed by Dataverse during the upload process and if a layer of zipping is not retained post upload, there is a known issue in Dataverse Version 4.6.1 that removes formatting from certain file types (SPSS, Stata, SAS, CSV and Excel file extension types). Therefore the double-zipping of files and folders is required to ensure that a single layer remains post upload to protect any file formatting.
There is also some useful information on organising data contained in spreadsheets in the paper here: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989, this can be used as best practice for researchers and helps to minimise potential harmonisation downstream.


In addition, if the data file or folder is uploaded directly without any layers of zipping, dataverse adds an 'Explore' button that enables functionality that the ADA do not want available at the present time..


==Notes==
==Notes==

Latest revision as of 23:19, 21 November 2021

In order to collect data, it does not matter whether the Data Custodian or Data Owner has a relationship with the relevant Data Subject, or whether the data and any Personal Information contained within it, was initially processed or obtained from the Data Subject by another entity. However, by collecting any form of Personal Information, the Data Custodian and Data Owner will be bound by The Privacy Act 1988. This means that both the Data Custodian and Data Owner have a responsibility to ensure that the identity of Data Subjects is protected. To do this correctly, a number of Data Protections may need to be carried out to prepare the data for depositing.

Preparation of Data Guidance Notes

1. Apply appropriate Privacy Act 1988 data protections to the data along with any other Data Protections required to de-identify the data.

Applying appropriate protections through de-identification ensures that the material is both safe to upload to the ADA Dataverse and that the Disclosure Risk is minimised.

2. Ensure that the data files are saved in an appropriate file format.

Having the files in one of the ADA preferred formats ensures that the ADA are in a strong position to forward migrate the data in response to any technology advancements and maximises the chances that other users will be able to access and use the data.

3. Ensure that the data files and folders are named correctly.

The correct naming of files and folders ensures that both the ADA Staff and secondary users are able to easily identify the files when accessing the information through Dataverse.

4. Double-Zip any CSV, Stata, SPSS, SAS and Excel data files (or folders containing these files) in preparation for population of the 'Shell Dataverse and Dataset' in Section 4.

There is also some useful information on organising data contained in spreadsheets in the paper here: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989, this can be used as best practice for researchers and helps to minimise potential harmonisation downstream.


Notes

Privacy Act 1988: https://www.legislation.gov.au/Details/C2019C00025

Frequently Asked Questions