Glossary of Terms
Introduction
A number of different terms are used throughout Australia and the world to describe similar processes, for example de-identification, anonymization and confidentialisation are often used interchangeably but each has a subtly different meaning. Given the different uses of terminology it is critical to clearly define the terms so that they are understood by all parties. This Glossary of Terms aims to capture the ADA definitions for specific terms and should be referred to where any doubt exists.
Terms
Attribute Disclosure: This is the process of associating a particular piece of previously unknown data with a particular population unit (person, household, business or other entity). In essence, it means that something new is learned about the population unit. Attribute disclosure often follows re-identification; however it can also occur without re-identification.
Auxiliary Information: Information, usually in the form of a dataset, that is available to others but is not contained within the target dataset. This information can be used to aid in the re-identification process. This could be anything from another ADA dataset to publicly available information.
Confidentialisation: Protecting the secrecy and privacy of information collected from individuals and organisations through removal of Direct Identifiers from the data, and the assessment and management of the Disclosure Risk of indirect identification occurring in the data. For the ADA, this term encapsulates the treatment or curation of the data, and the protections put in place through the application of the remaining Data Sharing Principles.
Confidential Data: This is data given in confidence or data that has been agreed to be kept confidential, i.e. a secret between two or more parties, that is not in the public domain. Examples are information on businesses, income, health, medical details and political opinion. These variables will need to be protected in some manner so as to maintain the confidentiality arrangement between the parties.
Data Custodian: The named individual or organisation responsible for the storage, management and release of data, and accountable and responsible for the governance of the data in their possession. The ADA act as the Data Custodian for the Data Owners that deposit their data to the ADA Dataverse. Data Custodians have legal and ethical obligations to keep the information they are entrusted with safe.
Data Protections: Changes made to data to minimise the likelihood of re-identification of the Data Subject. These protections include, but are not limited to: minimisation, aggregation, removal of direct identifiers, modification or change of values, and suppression of individual records. Although not a direct change to the data, encryption can also be viewed as a form of Data Protection.
Data Owner: The individual, business, or other entity that collects and/or generates the data from a Data Subject for statistical and administrative purposes, and that holds the right to use and publish that data. Note – this is not an exclusive right to use the data, as the License Agreement with the ADA will allow secondary use of the data to those who are permitted access to the data.
Data Release (also see Open Data): Making data publicly available for anybody to use, with no or few restrictions on who may access the data and what they may do with it. Data Release is typically associated with ADA Open and Recorded Open Access Categories.
Data Sharing or Access: The provision of access to data in a controlled manner to another person or organisation under agreed conditions laid out in the License document suite. Data Sharing is typically associated with ADA Managed, Facilitated and Non-Standard Access Categories.
Data Subject: An individual, household, business or other entity that supplies data to another person or organisation; or has data about them supplied by a third party, to another person or organisation.
De-Identification: A process involving the removal of Direct Identifiers from the data followed by one or both of the following steps:
- - the removal or alteration of other information that could potentially be used to re-identify an individual, and/or
- - the use of controls and safeguards in the data access environment to prevent re-identification.
Resulting in no reasonable likelihood of re-identification.
Disclosure Risk: The combination of likelihood and consequence that information about an individual, organisation or other entity is revealed or provided to an unauthorised person, organisation or entity. Typically occurs in two common forms, re-identification or attribute disclosure.
Direct Identifier: Information which, by itself, is able to uniquely identify an individual, organisation or other entity. Examples of direct identifiers include but are not limited to name, address, latitude/longitude, driver’s license number and Australian Business Number (ABN).
Encryption: The process of encoding a message or information in such a way that only authorised parties can access it. Encryption does not itself prevent interference or re-identification, but does deny access to the content. As such, it is considered to be a form of Data Protection but in the context of the Five Safes it can also be considered to be a ‘Safe Setting’ or ‘Trusted User’ element.
Indirect Identifier: Information that can be used to identify an individual, organisation or other entity with a high probability, either alone or together with other indirect identifiers, and in combination with auxiliary information.
Key Variable: A variable that is common to two or more datasets, which may be used to link records between the two. In the context of disclosure risk, a key variable is normally an indirect identifier common to both the target dataset and some other auxiliary information. The values in each when linked could lead to attribute disclosure or re-identification.
License document suite (also known as Data Sharing Agreement): A series of forms that formally documents the agreements between participating parties. This includes agreements between the ADA as the Data Custodian, and the Data Owner, for the storage and release of their data. Also between the ADA and the data user, for the management of the access and the conditions associated with the re-use of the data. The agreement consists of contractual and ethical obligations, as well as penalties for improper disclosure or use of information. For the ADA, the License document suite consists of the License Agreement Form, the License Terms and Conditions of Use, and the License Access Guestbook.
Microdata: These are datasets of unit records, where each record contains information about a Data Subject (i.e. a person or organisation). This can include individual responses to questions on surveys or administrative forms and each dataset may contain hundreds or even thousands of pieces of information.
Open Data (see also Data Release): Data that is released with no or few access restrictions (excluding possible copyright or licensing requirements), usually through publication on the internet. In Data Sharing Principle terms, the only control applied is to the data. The ADA has both Open Access data (with no access restrictions) and Recorded Open Access data (where user details are collected for statistical purposes).
Particularly Sensitive Data: Any data where unauthorised disclosure would likely lead to adverse consequences for the individual, organisation or Australia in general. Data which is of a personal, legal, commercial, security or environmental nature may be considered particularly sensitive. This is broader than the Privacy Act 1988 definition of sensitive data which is defined as a subset of personal information and limits how it can be collected and used.
Personal Information: Information or an opinion about an identified individual, or an individual who is reasonably identifiable:
- a. Whether the information or opinion is true or not true; and
- b. Whether the information or opinion is recorded in a material form or not.
This might include information such as a person’s name and address, medical records, bank account details, photograph, videos, where they work and even what they like. Under the Privacy Act 1988, this term can only refer to living individuals. For the ADA, the assumption is that all persons are still living, unless the information is of such an age that this is impossible. For example, the information is from a poll conducted in 1819, making the participant 200 years old in 2019.
Reasonably Identifiable: An individual will be considered to be reasonably identifiable within a dataset for the purposes of the definition of Personal Information where:
- a. It is technically possible for re-identification to occur (whether from the information in the dataset itself, or in combination with other information that may be available); and
- b. There is a reasonable likelihood that this might occur.
Re-Identification: The discovery of the identity of an individual, organisation or entity in an apparently de-identified dataset, whether through a targeted attack or unintentionally, using publicly or privately held information about that individual, organisation or entity.
Response Knowledge: The knowledge that a population unit is included within a dataset. This could be through private knowledge (e.g. a friend or work colleague has mentioned that they responded to a particular survey), or it could be through simple knowledge that a particular population unit is a member of the population and the data is a full dataset for that population (e.g. a census). For the purposes of clarity, a population unit is any one member (unit) of a set of items (population) that is being studied. This can relate to a person, entity or organisation.
Responsible Officer: A senior person in an organisation, project or team, or their authorised representative who has the legal authority to agree to conditions of shared use and access on behalf of the organisation, project or team. This individual is commonly also the Data Owner, or for a research project the nominated Primary Investigator.
Sensitive Information: A specific sub-set of Personal Information under the Privacy Act 1988 that includes:
- a. Information or an opinion about an individual’s
- (i) Racial or ethnic origin; or
- (ii) Political opinions; or
- (iii) Membership of a political association; or
- (iv) Religious beliefs or affiliations; or
- (v) Philosophical beliefs; or
- (vi) Membership of a professional or trade association; or
- (vii) Membership of a trade union; or
- (viii) Sexual orientation or practices; or
- (ix) Criminal record;
- b. Health information about an individual; or
- c. Genetic information about an individual that is not otherwise health information; or
- d. Biometric information that is to be used for the purpose of automated biometric verification or biometric identification; or
- e. Biometric templates