What is Data Linkage?
What is Data Linkage?
Data linkage is a method of bringing information from different sources together about the same person or entity to create a new, richer dataset. The linkage of information from disparate information sources enables the construction of chronological sequences of events and when used at the macro level provide valuable information for policy and research into the health and wellbeing of the population.
Data linkage is done by assigning an identifying number to each person on a dataset and storing a set of links to all records for the person. The TDLU is responsible for creating and maintaining the links between the main state wide health data collections and other approved data sources in Tasmania. In bringing records together, the TDLU uses strict privacy preserving policies, protocols and procedures to ensure the security of the data and confidentiality of the individuals the records relate to. The information about the individual is not brought together in one place. It stays in the separate data collections and the security and means of access to the information in each data source remain unchanged.
The TDLU conducts probabilistic linkage using specialised data linkage software. This process attempts to link records based on the greatest probability of records belonging to the same individual. The technique uses match weights for each linkage field; weights are derived for field agreements, disagreements and missing values. Based on the total weight score, record pairs are classified as matches, non-matches and potential matches using weight thresholds. To link data, the TDLU uses a combination of fields as applicable including:
- Source system identifier
- First name, middle name and surname
- Date of birth, date of death
- Street address, suburb, postcode
- Birthweight (for Perinatal and Births data).
Who is involved in data linkage?
The data linkage process involves three main stakeholders:
- Data Custodians – effectively the 'owners' of data. Data custodians work within an organisation or agency (such as government departments) and are responsible for the collection, use and dissemination of data. Data custodians may manage administrative or research datasets and collect and store personal information (such as name, address, date of birth) as well as information about the person (eg. health diagnosis or treatment details).
- Researchers - the people who use the anonymised linked data for the purpose of analysis and research. Research projects undergo an extensive application process and must be approved by a relevant Human Research Ethics Committee (HREC) as well as relevant data custodians.
- Data Linkage Units - the organisations which link datasets together and create Linkage ID's, which allows data from different sources and organisations to be linked together.
A network of Data Linkage Units exist as part of the Population Health Research Network (PHRN) with each State and Territory represented. There are a further three national Integrating Authorities that can perform data linkage within and between Commonwealth and State/Territory data collections. The three accredited integrating authorities in Australia are the:
- Australian Bureau of Statistics (ABS)
- Australian Institute of Health and Welfare (AIHW)
- Australian Institute of Family Studies
The Separation principle
The key feature of the data-linkage model used by the TDLU is one of ensuring the separation of personal identifying information from service or clinical data. This approach is in accordance with the National Health Medical Research Committee protocols that define linked datasets as non-identifiable.
Using this 'Separation Principle' the TDLU operates under strict protocols which include:
- Identifying data is provided to the TDLU for linkage only;
- Such data is kept on a standalone computing server with no Internet or Intranet connectivity;
- Access to the room housing the computer is via security card, that is strictly controlled;
- Data stored on the server is encrypted;
- The TDLU holds no clinical data whatsoever; and
- Researchers have no way of accessing the personal identifying data held by TDLU.
How is linked data used?
Research using linked data is very reliable and efficient as it uses data from the whole population not from small samples of the population. The linkages between administrative and research or clinical datasets provide an evidence base for policy makers and researchers to better understand population health and wellbeing and implement and evaluate service delivery and programs.
Research projects using linked data make use of administrative, survey and research/clinical data that already exist. Utilising such data minimises the burden on organisations and individuals to provide additional information and is a cost-effective solution for researchers.
Master Linkage Map (MLM) - The MLM groups together records for individuals in a population. Each individual within the Map has their own unique 'key'.
Master Linkage Key (MLK) - Refers to an individual's unique ID, otherwise known as a 'key'.
Project Person Identifier (PPID) - A project-specific, unique pseudo identifier that is supplied to researchers that refers to an individual with minimal risk of re-identification.
What is the Master Linkage Map (MLM)?
At the centre of the TDLU's system is a Master Linkage Map (MLM), which groups together records for individuals from the Tasmanian population. This 'map' enables the extraction of de-identified linked files representative of multiple data sources. By adding an anonymous person identifier, the map can be used for a range of research and planning purposes.
The MLM is a simple structure; it contains a list of individual record ID's, which each point to a specific record in one of the participating datasets. A unique Master Linkage Key (MLK) identifier is associated with every record and all records with the same MLK are considered to belong to the same individual.
Importantly, the MLM does not contain any clinical or service information about individuals. The TDLU only ever receives basic demographic information for the purposes of linking.
What does linked data look like?
Linked data is supplied to researchers in a way that ensures an individual cannot be identified. Personal information such as name and address are removed and replaced by a Project Person Identifier (PPID). For each dataset, the Data Custodian provides requested clinical or service data against each of the PPIDs listed in the dataset. For example:
Year of Birth
Length of Stay
How do I access linked data?
Access to linked data is subject to a comprehensive application process together with relevant human research ethics approvals. The TDLU is currently taking applications for linked data. Examples of projects completed in Tasmania include:
- The burden and cost of injury attributable to health care use and mortality in Australia
- Perinatal outcomes and child development (risk and protective factors)
- Factors that impede early access to defibrillation following out of hospital cardiac arrest
- Population level chlamydia testing and positivity rates in Tasmania
- Community presentations of anaphylaxis in Tasmania: Occurrence, management and treatment outcomes
- REDDISH: REDucing Delays In aneurysmal Subarachnoid Haemorrhage
- Perioperative Risk Assessment and Modification in Patients on Renal Replacement Therapy (RRT)
- Pathways to better health and education outcomes for Tasmania's children
- Factors associated with bowel cancer survival in Tasmania: A data linkage study
- QUIET-IPF Study: Quality of lIfE and costs associaTed with Idiopathic Pulmonary Fibrosis
- Reducing the BurdEn of Liver cANcer for Tasmanians (RELIANT)
- Chronic Kidney Disease in Tasmania using data linkage