File & Folder Naming Conventions: Difference between revisions
Dahaddican (Sọ̀rọ̀ | contribs) Created page with " It is important that each version of the data and its supporting documentation is clearly identified using the correct file naming convention. Therefore, preventing erroneous..." |
Dahaddican (Sọ̀rọ̀ | contribs) No edit summary |
||
Line 67: | Line 67: | ||
A full example of a correct file name, adhering to the above naming conventions is: ‘1_ANUPoll_2018_Questionnaire_01212_.pdf’. | A full example of a correct file name, adhering to the above naming conventions is: ‘1_ANUPoll_2018_Questionnaire_01212_.pdf’. | ||
Revision as of 23:05, 26 August 2019
It is important that each version of the data and its supporting documentation is clearly identified using the correct file naming convention. Therefore, preventing erroneous access conditions from being applied to the published data, and preventing potentially unauthorised access to this material. All data files and supporting documentation that is to be uploaded to Dataverse should therefore be named in accordance with the ADA guidance, this also makes the files easily identifiable to ADA Staff and standardises the naming convention across datasets and Dataverse’s.
Dataverse Uploads
All data and their supporting documentation should be simple to locate and identify. Dataverse automatically lists uploaded files in numerical and alphabetical order. Therefore, in order to list materials in a more meaningful order, the ADA have developed a standardised naming convention for files.
ADA Standard Naming Convention
The ADA standardised naming convention for Self-Deposit files uses the following format:
DataverseNumber_StudyName_Year_StudyArtefact_ADAID_FileExtension
Dataverse Number:
The number 0, 1 or 2, indicating the order that the files are to be arranged in. A “0” is to be applied to all licensing information (e.g. License Agreement Forms, License Terms and Conditions of Use, License Access Guestbook’s), “1” is applied to all other supporting documents (e.g. Questionnaires, Codebooks, Technical documents), and individual double-zipped Data files (e.g. SPSS, Stata, SAS, CSV and Excel) are labelled with the prefix “2”.
Due to compatibility issues with Dataverse, all individual data files and any supporting files that are in the following formats must be double-zipped prior to uploading to the dataset to preserve their formatting.
- SPSS
- SAS
- Stata
- CSV
- Excel
Folders containing multiple files
In certain cases, a complete double-zipped folder of supporting documentation or data files may be produced (e.g. many longitudinal studies will have multiple files and it would be time consuming to upload each individually). Where packages of files are to be double-zipped and uploaded as a single folder, the folder should be annotated with ‘-Z’ immediately following the Dataverse Number. For example a double-zipped collection of supporting documents uploaded as a folder would be identified with ‘1-Z’. The files within the folder should all be individually named using the naming convention. Since all individual data files are required to be double-zipped, there is no need to identify these separately with a ‘-Z’, the -Z should be used purely to identify packages or folders of double-zipped files.
Files or Folders containing Sensitive or Personal Information
The suffix ‘-S’ is to be used to identify data files that contain Sensitive Information or Personal Information. Typically this is used to differentiate between those data files that may be available for public release as Open Data and those that contain some form of information that requires access to be managed. Many Data Owners will choose to upload both an open source version of their data as well as a version of the data that requires some form of access restriction. The former is most likely to have far fewer safeguards when sharing, and therefore requires less management and maximises the potential benefits of the data for sharing.
Data Files used to create a Derived Dataset
Finally, for derived datasets (i.e. those made up from multiple sources of separate data), the suffix's ‘a’ through ‘z’, should be used to identify the individual data used in the creation of the derived data. Thus if a new data file was created through the linking of data from the ATO and Medicare, the ATO data may have the Dataverse Number ‘2a’ whilst the Medicare data may have the identifier ‘2b-S’, the latter denoting that the data is also Sensitive.
0. Licensing Information (License Agreement Form, License Terms and Conditions of Use, License Access Guestbook)
1. Supporting Documentation (Questionnaires, Codebooks, Technical documents etc...)
2. Data files - all are to be double-zipped prior to upload (SPSS, Stat, SAS, CSV and Excel file extensions)
2a. Data file used in creation of a derived data file (all are to be double-zipped prior to upload)
-S. The ‘S’ suffix when displayed after the number is used to denote that the Data file contains ‘Sensitive’ data
-Z. The ‘Z’ suffix when displayed after the number is used to denote that multiple files are contained in a double-zipped package
By way of example, the Dataverse Number ‘2-SZ’ would denote a data file or double-zipped package of data files, which contain sensitive data.
Study Name:
This relates to the name of the Project or Study that the dataset(s) belong to. If the full name is unreasonably long this can be abbreviated. For example, ‘The Australian Longitudinal Study on Women’s Health’ is abbreviated to ‘ALSWH’.
Year:
Refers to the year that the Project or Study was conducted in. If this spans multiple years you can enter the period in question. For example, 2018-19.
Study Artefact:
This should refer to the specific item in question. For supporting documentation it could be the item (e.g. Questionnaire, Codebook or Report), for data this is typically the file type (e.g. SAS, SPSS or Stata Data File).
ADAID:
This refers to the five digit ADA Identification number assigned to the Project or Study. For Self-Deposits it is unlikely that this number will have been allocated, although it may have been provided by the ADA Archivist when the ‘Shell Dataverse and dataset(s)’ were created. In the event that it has not been provided, use ‘ADAID’ and the ADA Archivist will enter the correct identification details once the file has been added to the Submission Information Package (SIP).
File Extension:
A file extension (e.g. .pdf, .zip and .xlsx) must be present for every file contained and listed within a dataset.
A full example of a correct file name, adhering to the above naming conventions is: ‘1_ANUPoll_2018_Questionnaire_01212_.pdf’.