Double-Zipping Files and Folders: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
m (Dahaddican moved page Double-Zipping Files and Folders to Zipping and Encrypting Files and Folders: Update required since no longer need to Double-Zip all data files and some Supporting Documents if they are encrypted with a password since Data...)
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
Double-Zipping is required in order to ensure that the files remain compatible with Dataverse and to keep all files consistent. It is a known issue that Dataverse Version 4.6.1 cannot directly ingest certain file types (SAS, SPSS, Stata, Excel and CSV) without removing some of the formatting. Dataverse Version 4.6.1 also adds an ‘Explorer’ function button to directly ingested files and that enables a functionality that the ADA does not currently want at this point in time. To prevent both of the above, and also to cater for the fact that Dataverse strips away the first layer of zipping during the ingest process, all data files and those supporting documentation files that are in any of the aforementioned formats must be Double-Zipped prior to ingest to Dataverse. This leaves a single layer of zip attached to the files post ingest, the files integrity is then retained, and the files are present in their original format without the explorer function.
The double-zipping of files is required in order to ensure that CSV, Stata, SPSS, SAS and Excel file types remain compatible with Dataverse and to prevent the Two Ravens explorer functionality from being added to these file formats.  


If there is an excessive number of files to upload, it is possible to ingest multiple files as a single downloadable folder, however this folder also needs to be uploaded as a double-zipped folder. This is for the same reasons as above.
=Dataverse Version 4.6.1=
It is a known issue that Dataverse Version 4.6.1 cannot directly ingest certain file types (SAS, SPSS, Stata, Excel and CSV) without removing some of the files original formatting. Dataverse Version 4.6.1 also adds the Two Ravens ‘Explorer’ function button to directly ingested files, this enables functionality that the ADA do not currently want and are not resourced to manage. To prevent both of the above, and also to cater for the fact that Dataverse automatically removes a layer of zipping during the ingest process, all files that are in any of the aforementioned formats must be double-zipped prior to ingest.  


Files and/or folders that are not correctly prepared may not be ingested by Dataverse correctly. Furthermore, files and/or folders that are discovered to have been uploaded incorrectly by ADA Staff during their Quality Assurance checks will need to be rectified prior to publishing. This will add unnecessary delays to the publishing of the Dataverse.  
=Double-Zipping=
Since a single layer of zipping will be removed from the aforementioned file types by Dataverse during the upload process, to maintain the integrity of the files and their data, these file types must be Double-Zipped. Although a single layer of the zipping will be removed during the upload process, the remaining layer post upload will be enough protection to prevent Dataverse from changing any of the files formatting or from adding the Two Ravens 'Explorer' function. The ingest process to Dataverse (HTTPS) protects your data during upload and the Role Permissions applied to the dataset prevent other non-authorised users from accessing your data when stationary in the dataset, ensuring that it remains secure and protected at all times.


= 7-Zip Software =
=No Zipping=
It is recommended by the ADA that all data files and certain supporting documents be encrypted using the 7-Zip open source software. This software is used by the ADA Staff and is free. This ensures that the files and folders are protected from unauthorised disclosure during the Dataverse upload process. The software creates a container called ‘archive’ that holds the files requiring protection. That archive container can then be encrypted and password protected. Copies of the software can be obtained via the links at [https://www.7-zip.org/ https://www.7-zip.org/].
If a file of the aforementioned formats is uploaded directly to your dataset with no or only a single layer of zipping, Dataverse will remove this layer and will add the Two Ravens 'Explorer' button that enables functionality that the ADA do not want available at the present time. In the event that data files are uploaded without double-zipping, your deposit will not pass ADA Quality Assurance checks and the dataset will be returned to you for rectification along with a Processing Report detailing the changes required, therefore delaying its publication.
 
=Files versus Folders=
If there are an excessive number of files that require double-zipping prior to upload, it is possible to ingest multiple files as a single folder that is then downloadable in Dataverse. This however means that the files are unable to be given File Tags or Description Notes, making them harder to search for and discover and potentially reducing their chances of reuse. Before uploading multiple files in a folder, you should [[Contact the ADA|contact the ADA]] to discuss your options. When approved by the ADA, if the folder contains any CSV, Stata, SPSS, SAS or Excel files the folder will need to be uploaded as a double-zipped folder for the same reasons as above.  


= How to Double-Zip =
= How to Double-Zip =
For instructions on how to Double-Zipp files and folders using the 7-Zip software, refer to the page detailed below.
For instructions on how to Double-Zip files and folders refer to the information contained at [[Instructions on how to Double-Zip]] files and folders.
*[[Instructions on how to Double-Zip a file or folder]]

Latest revision as of 04:11, 20 January 2020

The double-zipping of files is required in order to ensure that CSV, Stata, SPSS, SAS and Excel file types remain compatible with Dataverse and to prevent the Two Ravens explorer functionality from being added to these file formats.

Dataverse Version 4.6.1

It is a known issue that Dataverse Version 4.6.1 cannot directly ingest certain file types (SAS, SPSS, Stata, Excel and CSV) without removing some of the files original formatting. Dataverse Version 4.6.1 also adds the Two Ravens ‘Explorer’ function button to directly ingested files, this enables functionality that the ADA do not currently want and are not resourced to manage. To prevent both of the above, and also to cater for the fact that Dataverse automatically removes a layer of zipping during the ingest process, all files that are in any of the aforementioned formats must be double-zipped prior to ingest.

Double-Zipping

Since a single layer of zipping will be removed from the aforementioned file types by Dataverse during the upload process, to maintain the integrity of the files and their data, these file types must be Double-Zipped. Although a single layer of the zipping will be removed during the upload process, the remaining layer post upload will be enough protection to prevent Dataverse from changing any of the files formatting or from adding the Two Ravens 'Explorer' function. The ingest process to Dataverse (HTTPS) protects your data during upload and the Role Permissions applied to the dataset prevent other non-authorised users from accessing your data when stationary in the dataset, ensuring that it remains secure and protected at all times.

No Zipping

If a file of the aforementioned formats is uploaded directly to your dataset with no or only a single layer of zipping, Dataverse will remove this layer and will add the Two Ravens 'Explorer' button that enables functionality that the ADA do not want available at the present time. In the event that data files are uploaded without double-zipping, your deposit will not pass ADA Quality Assurance checks and the dataset will be returned to you for rectification along with a Processing Report detailing the changes required, therefore delaying its publication.

Files versus Folders

If there are an excessive number of files that require double-zipping prior to upload, it is possible to ingest multiple files as a single folder that is then downloadable in Dataverse. This however means that the files are unable to be given File Tags or Description Notes, making them harder to search for and discover and potentially reducing their chances of reuse. Before uploading multiple files in a folder, you should contact the ADA to discuss your options. When approved by the ADA, if the folder contains any CSV, Stata, SPSS, SAS or Excel files the folder will need to be uploaded as a double-zipped folder for the same reasons as above.

How to Double-Zip

For instructions on how to Double-Zip files and folders refer to the information contained at Instructions on how to Double-Zip files and folders.