DSS Linkage

From ADA Public Wiki
Jump to navigation Jump to search

What is data linkage in this context?

Department of Social Services (DSS) defines the integration of data for statistical or research purposes by this definition from the Office of the Australian Information Commissioner.

‘Data integration’refers to the bringing together of multiple datasets, to provide a new dataset (usually for statistical or research purposes). Data integration refers to the full range of practices around the process, including data transfer, linking and merging the data and dissemination. ‘Data linking’ is an element of data integration, which is the process of creating links between data from different sources based on common features present in those sources.

What counts as combining or linking data?

If you answered yes, your research likely involves data integration.It covers any situation where you bring the dataset you are applying for, together with another source of information, at any stage of your research.

This includes:

  • Record linkage — matching individual records across datasets using shared identifiers (such as names, addresses, dates of birth, ID numbers, or postcodes) to create a joined dataset.
  • Merging datasets — appending or joining datasets so that variables from multiple sources appear together in a single analytical file, even if you are not matching at the individual level.
  • Contextual or area-level enrichment — attaching geographic, census, or aggregate data to records in this dataset (for example, linking postcode-level deprivation scores).
  • Sequential use — using outputs, scores, or derived variables from another dataset as inputs to your analysis of this one.

If you are unsure, ask yourself: "Will any variable in my analysis come from a source other than this dataset?" If yes, you are likely integrating data and should answer yes.

Why we ask

Combining datasets can increase the risk that individuals become identifiable, even if each dataset appears anonymous on its own. Telling us about planned integration allows us to assess this re-identification risk, ensure your data sharing agreements cover all sources involved, and confirm that the combination is proportionate to your research aims.