4. Population of 'Shell Dataverse & Dataset'

Post-Uploading Activity

There are a number of Non-mandatory Metadata fields to be populated by the Data Owner where the information is both known and relevant. Remember, the richer the metadata entered here, the greater the potential to find your work and therefore the greater chance that it will be re-used and cited. Good metadata supports FAIR data, that is making data Findable, Accessible, Interoperable and Reusable.

All of these fields may require updating post any ADA Staff quality assurance checking of the dataset. Where required, ADA Staff will identify issues through the Processing Report. Where changes are required, you may be asked to complete these directly into the dataset, but in some cases the ADA Staff may make the change with your approval if it is easier to do so.

Metadata schema

Many Social Science fields have their own preferred metadata schema's and these can vary from discipline to discipline. Although the ADA has traditionally used Australian Public Affairs information System (APAIS) Thesaurus for annotating key words, the terms are no longer updated and therefore you may wish to use another that is more current and /or suited to your field. To assist you with selecting an appropriate schema the RDM Metadata Standards Catalogue can be used.

ADA Dataverse DEPOSIT Non-Mandatory Fields

Needless to say, although the fields below are not mandated or indeed required to generate the DOI for the dataset(s), the richer and more accurate the metadata, then the easier it is to locate using the Dataverse search function or through the ARDC harvested metadata. The richness of data also allows secondary users to understand the content without necessarily having to review the entire data. This means the likelihood of reuse and subsequent citation of the data is far greater. In addition, the Portage Network also offers guidance on the general citation block for Dataverse which includes a worked example. This information can be found on the Portage Training Resources area under the Metadata Guidance option.

The Non-mandatory fields consist of:

Subtitle: This is a secondary title used to amplify or state certain limitations on the main title.

Alternative Title: A title by which the work is commonly referred, or an abbreviation of the title. For example the Australian Election Study is commonly referred to as the AES, likewise the National Drug Strategy Household Survey Database is also referred to as NDSHS.

Alternative URL: Some depositors may also have personal or project websites where the dataset or other information relating to the dataset may be found. This field provides the ability to add the URL of the location to allow users to navigate quickly to the desired site. For example, the AES can also be found at [https://australianelectionstudy.org].

Other ID: This field should be populated with the words "Australian Data Archive" in the Agency field and the unique ADAID Number (generated by ADAPT) in the Identifier field. If the ADAID is not known at the time of populating the metadata, it will be entered by the ADA Staff prior to publishing when the Submission Information Package (SIP) is deposited to the archive file structure.

Keyword Fields: Keywords are the terms that are used to describe the most important aspects of the dataset. They consist of the Term, Vocabulary and Vocabulary URL fields. Key words should conform to a Controlled Vocabulary or Schema and should be referenced accordingly. Historically ADA users have used the Australian Public Affairs Information Service (APAIS) Thesaurus, although it is noted that this is no longer updated and therefore the alternative sources listed below should also be considered in addition to those listed in the RDA Metadata Standards Catalogue.

Amongst the commonly used alternatives are the Australian and New Zealand Standard Research Classification (ANZSRC) codes, these codes are currently under review by the Australian Bureau of Statistics (ABS) but are also used by the Australian Research Data Commons (ARDC) for harvesting purposes (see Topic Classification Fields below), expanding the potential cadre of researchers that your data will be visible to. Other codes suitable for and commonly used include: the European Language Social Science Thesaurus (ELSST), the Document, Discover and Interoperate (DDI) controlled vocabulary and the Consortium of European Social Science Data Archives (CESSDA) controlled vocabulary.

Additional resources are available to Data Owners when populating the dataset metadata, this guidance can be found at the Australian National Data Service (ANDS) guides and resources under Vocabularies and Research Data. Each additional keyword can be added through the use of the ‘Add’ button.

For each Keyword Field you should enter the following information:

- Term: The key terms that describe important aspects of the dataset. These can be used for building keyword indexes and/or for classification and retrieval purposes. The Term should be listed from within a Controlled Vocabulary where possible.

- Vocabulary: Specify in this field the keyword Controlled Vocabulary in use for the Term, such as APAIS, ELSST, LCSH, MeSH etc…

- Vocabulary URL: This field allows you to point to the web presence that describes the keyword Controlled Vocabulary, if appropriate. Enter an absolute URL where the keyword Controlled Vocabulary web site is available, such as https://www.nlm.nih.gov/mesh/meshhome.html.

An example of completed Keyword Information is provided below.

Topic Classification Fields: The Topic Classification Fields are very similar to the Keyword Fields introduced above. However, these are used to harvest metadata to the Australian Research Data Commons (ARDC) and therefore expand the potential exposure of your dataset to other researchers. The ANZSRC Field of Research (FoR) Codes are the chosen means of transmitting this information. The correct entry of the FoR Codes will be checked by ADA Staff during the deposit review. No other information should be entered into these fields.

- Term: This field should be populated with a four or six-digit ANZSRC code followed by the corresponding Description. For example 160601 – Australian Government and Politics.

- Vocabulary: This field should be populated with the words “ANZSRC FoR Code”.

- Vocabulary URL: The URL to the appropriate ANZSRC FoR Codes should be entered here. For example, the corresponding Vocabulary URL for the Term 160601 listed above is under Division 16 Studies in Human Society: https://www.abs.gov.au/Ausstats/abs@.nsf/Latestproducts/714A4097B142E9BDCA2574180004B0A1?opendocument

As a minimum use the top level FoR Code for the ANZSRC at: https://www.abs.gov.au/Ausstats/abs@.nsf/Latestproducts/4AE1B46AE2048A28CA25741800044242?opendocument

The ABS hosted ANZFoR Codes can be found at: https://www.abs.gov.au/Ausstats/abs@.nsf/Latestproducts/6BB427AB9696C225CA2574180004463E?opendocument

FoR Codes EXAMPLE - classification ranges from broad (2 digit) to narrow (6 digit)

2 digit - FoR code = 16 DIVISION 16 STUDIES IN HUMAN SOCIETY http://purl.org/au-research/vocabulary/anzsrc-for/2008/16

4 digit - FoR code = 1608 SOCIOLOGY http://purl.org/au-research/vocabulary/anzsrc-for/2008/1608

6 digit - FoR code = 160809 Sociology of Education http://purl.org/au-research/vocabulary/anzsrc-for/2008/160809

EXAMPLE of completing the ‘Topic Classification’ DDI Metadata field:

Term: Studies in Human Society
Vocabulary: ANZSRC FoR
Vocabulary URL: http://purl.org/au-research/vocabulary/anzsrc-for/2008/16 (PURL)

Related Publication: Where publications exist that use the data from the dataset, they can be referenced here. More than one related publication may be added through the use of the ‘Add’ button. This does not include related datasets; a separate field exists for linking these.

- ID Type: The drop down selection gives you a choice of the most common types of digital identifier used for publications, for example DOI.

- Citation: The full bibliographic citation for the related publication should be entered here.

- ID Number: The identifier for the selected ID type should be added.

- URL: A link to the publication web page, such as a journal article page or archive record page should be provided here.

Notes: Any important additional information about the dataset should be entered here, this may include a brief summary of any changes between dataset versions if this is an update to a previous deposit. Of particular importance here is information pertaining to the Copyright and the Ethics approvals. As this field shows up in the Header Box for the Dataset, it is restricted to important information only.

The copyright statement should be entered in the following format within tags so that it can be migrated to a DDI field: “Copyright © Year of first publication, name of copyright owner. All rights reserved.”

The copyright remains with the person or organisation responsible for the study, even after it is deposited with the ADA. Through signing the License Agreement Form, the copyright owner grants the ADA with the non-exclusive license to make copies and redistribute the data.

An example of the text string with tabs that should be entered is provided below:

Language: The primary language of the dataset should be entered here. For images or datasets that do not contain written language then the language used in the descriptive text offered as Supporting Documents should be entered. For the majority of depositors this will be English.

Producer: A series of fields are available for the entry of producer details. The ADA do not routinely use these fields as specific details are added under the Contributor Fields below. In the event that the dataset is being populated by a third-party on behalf of the Data Owner, the details of the person or organisation entering the metadata can be entered here.

- Name: Dataset Producer Name

- Affiliation: The Organisation with which the dataset producer is affiliated.

- Abbreviation: The abbreviation by which the producer is commonly known (for example The ANU or ICPSR).

- URL: The Producer URL points to the dataset producer’s web presence, if appropriate.

- Logo URL: Where applicable, this field should be populated with the URL that takes you to the dataset producer’s web-accessible logo.

Production Date: This date should reflect the date of the data collection or when the materials were produced (not distributed, published or archived).

Production Place: This should identify the location where the data collection and any related materials were produced.

Contributor: These fields provide a location for the entry of any significant contributors or collaborators who assisted in the creation of the material. The details may identify an organization or person responsible for either collecting, managing, or otherwise contributing in some way to the development of the dataset. Multiple contributors can be added through the use of the ‘Add’ button.

- Type: The Type of contributor can be selected from the drop down menu as appropriate for their role.

- Name: Enter the family, given or organisational name of the contributor.

Grant Information: Where the research has been funded through the provision of a grant, the details should be entered in the appropriate fields.

- Grant Agency: The name of the agency who provided the grant or contract for the research is to be entered here, for example the Australian Research Council (ARC).

- Grant Number: The grant or contract number of the project that sponsored the research should be entered here.

Distributor: As the organisation designated by the producer or Data Owner to disseminate the data under the License Agreement, the Australian Data Archive will be the default distributor for all ADA Dataverse datasets.

- Distributor Name: Should always be entered as “The Australian Data Archive” or “The Australian Business Data Archive” as appropriate..

- Affiliation: Should always be entered as “The Australian National University (The ANU)”.

- Abbreviation: Should always be entered as “The ADA”.

- URL: Should always be https://ada.edu.au

- Logo URL: This field should be populated with the URL that takes you to the distributors’ web-accessible logo. For the ADA, use: https://ada.edu.au/wp-content/uploads/2020/11/ADA-PRIMARY-INLINE-300.jpg

Distribution Date: This date should reflect the date that the data was made available for distribution via Dataverse by the ADA, otherwise known as the published date. Note: There will be a delay between this date and the Deposit Date, due to the work required by ADA Staff to check, and where required, to rectify with the depositor, any issues recorded. This information will be added by ADA Staff as part of their publishing activity.

Depositor: The person (Family Name, Given Name) or the name of the organisation that deposited this dataset to the repository.

Deposit Date: This date should reflect the date that the data was made available to the ADA for review. For Self-Deposits, this should be the date that the depositor notified the ADA Staff that the data was ready for review.

Time Period Covered: The time to which the data refers and not the dates of coding or making the documents machine-readable etc… Also known as the span of the data. Appropriate start and end dates should be entered for the data.

Date of Collection: These fields correspond to the start and end dates of the data collection period.

Kind of Data: This field should describe the type of data that is included within the file(s). Common examples include: Survey Data, Census or Enumeration Data, Clinical Data, Aggregate Data, Event or Transaction Data, Program Source Code, Machine-Readable Text, Administrative Records Data, Experimental Data, Psychological Test, Textual Data, Coded Textual Data, Coded Documents, Time Budget Diaries, Observation Data and Ratings, Process-Produced Data etc…Where multiple data types are present, the ‘Add’ button should be used to enter additional values.

Series: If the dataset is part of a series of data, for example a longitudinal study such as the National Drug Strategy Household Survey, then the Name of the Series and any additional information such as a history of the series and a summary of those features that apply to the series as a whole should be entered.

Software: Information about the software and the version used to generate the dataset should be entered here, for example SPSS Version 25.0. This information is important as it may help to assist ADA Staff with solving problems when issues with datasets are discovered. Knowledge of the type and version of the software used to create the original dataset may help staff to identify the root cause of issues sooner, particularly where there are known functionality losses or issues between versions.

Related Material: Enter details of any relevant material related to this dataset other than publications that have been entered previously. This should include details of material that has been uploaded as Supporting Information to the dataset.

Related Datasets: Where other datasets exist that are related to this Dataset, such as previous research on the subject, or previous polls if part of a series, their details should be entered here.

Other References: This field allows other references that would serve as background to the dataset, and that have not been listed elsewhere in the Citation Metadata, to be captured and recorded.

Data Sources: Data sources should include lists of books, articles, serials, journals, or machine-readable data files that served as the sources of the data collection. This and the source fields listed below will all help to support users of the dataset in locating the materials and accessing them appropriately.

Origin of Sources: For historical materials, information about the origins of the sources and the rules followed in establishing the source should be specified.

Characteristic of Sources Noted: An assessment of any characteristics associated with the sources should be entered here.

Documentation and Access to Sources: The level of documentation of the original sources and the access to them should be identified here.

ADA Geospatial Metadata fields

Specific Guidance regarding what information is required under the Geospatial Metadata Fields can be accessed through the page:

Geospatial Metadata field information

ADA Social Science and Humanities Metadata fields

Specific Guidance regarding what information is required under the Social Science and Humanities Metadata Fields can be accessed through the page:

Social Science and Humanities Metadata field information

Non-mandatory Metadata fields information

Contents

Metadata schema

ADA Dataverse DEPOSIT Non-Mandatory Fields

ADA Geospatial Metadata fields

ADA Social Science and Humanities Metadata fields

Navigation menu

Non-mandatory Metadata fields information

Metadata schema

ADA Dataverse DEPOSIT Non-Mandatory Fields

ADA Geospatial Metadata fields

ADA Social Science and Humanities Metadata fields

Navigation menu

Search