Record-linkage is a term referring to technologies that make it possible to merge data on people and organizations from multiple, disparate sources. Early development of the technology was largely related to marketing, for instance, as a means of connecting magazine subscribers' contact information to sales records belonging to retail stores. It's still used that way (more than ever), but some very important applications have emerged since those early days in the 1950s and 1960s, when computers filled whole rooms and developing highly complex software that would use years of run time was pointless.
CarePrecise uses record linkage to create business intelligence datasets from a broad range of information available through the U.S. Department of Health and Human Services, Department of Commerce, USPS, and other resources. For example, by merging Medicare claims data with NPI registry data and other federal data sources, we can build a 360 degree view of the U.S. healthcare system - from the health systems to the hospitals to the medical practice groups and clinics, to individual clinicians. Today, record linkage is also making significant inroads in improving patient care.
What is record linkage technology and how does it work?
Record linkage is becoming a vital tool for getting the most out of many types of data. Record linkage technology works by creating a unique identifier for each patient that is used to combine information from multiple sources. There are two general types of record linkage: Exact (deterministic) matching and statistical (probabilistic) matching.
Disambiguation. Exact matching is, of course, ideal. Linking records based on email addresses and tax identification numbers are excellent examples. "Disambiguation" occurs when otherwise disconnected data can be "hard matched" to create an unambiguous match, for which one unique identifier - a number or other code - can be assigned.
Arriving an unambiguous match may not be as easy as comparing Social Security Numbers. That's when we turn to statistical matching. This is trickier, and almost always less reliable. Probabilistic record linkage uses "fuzzy" matching algorithms to compare data points and make links between different records that may not have the same exact details. For example, if two records had similar birth dates or home addresses, the algorithm would recognize these as potential matches and create a statistical link between them.
Relying on one or a few non-deterministic data points to match records is, naturally, a bad idea. People tend to change home addresses several times over their lifetimes, so using a street address, or phone number or email address, for that matter, would likely miss a number of records. Also, even if these markers have remained constant, another problem, frequently referred to as "fat fingering," occurs when a name, address, phone, etc. is wrongly entered in a database.
Deliberate ambiguation. Early techniques for reducing this kind of ambiguity between datasets included creating a data field in which all of the vowels are removed from a name or street address. This "works" because numbers and consonants are statistically far less likely to be typed incorrectly. Not a good system, but better than nothing. A "false positive," when records are matched that shouldn't be, and "false negatives," when records that should be matched aren't, abound using only this ham-handed method, but it can still be a part of the record linkage process. Where patient data is involved, and where scientists are relying on clean data to glean truth, much more must be done.
Tighter matching for critical healthcare data
Data that can be linked include sensitive medical records, hospital records, laboratory tests, insurance claims data and administrative databases. When used for research involving patient records, record linkage often involves matching information from multiple sources to create a single unified patient record identifier, sometimes called a Master Patient Identifier (MPI), that can be used to track and analyze health outcomes over time. By combining different datasets, researchers can gain insights into the effectiveness of treatments and interventions, as well as uncover patterns in disease progression or risk factors that would not be visible if looking at one dataset alone.
This allows researchers to gain insights into patient care outcomes by combining information from multiple sources and looking at patients over time. As data science developed, and much larger datasets became available, scholarly efforts to improve record matching began to emerge. Systems that compare text strings and score the difference have been among these methods. An algorithm known as Soundex compares text strings phonetically; the words "Mary" and "Merry" would have a low text-only score, but Soundex can add weight to the match because the words sound alike.
Other fuzzy-logic methods exist, and can even be bought as part of record linkage software. "Standardization" essentially means making all of the same kinds of data appear the same way across different datasets. One such technique is address standardization, based either on proprietary technologies such as the CoLoCode technique developed by CarePrecise, or other, less precise, methods such as the USPS "Pub 28" standard. Getting mail delivered properly is important, to be sure, but the post office to its advantage the benefit of mail carriers' knowledge of their routes and the human ability to disambiguate on the fly. When comparing thousands or millions of rows of data, as is not unusual in medical research applications, "eyeballing" is not an option.
Rather than get too deep in the weeds here, a fine elucidation on record linkage in medicine can be found on the National Library of Medicine website.
Benefits of record linkage technology in medicine
Data merged from many sources can provide a more comprehensive view of the patient, allowing researchers to make more accurate and reliable conclusions about healthcare outcomes. By combining multiple datasets, researchers can gain deeper insight into medical conditions and how treatments affect patients over time. It also makes it easier to compare health outcomes across different populations, as well as detect potential errors or risks in patient care.
Additionally, record linkage technology can be used to reduce medical costs and improve efficiency in the healthcare system. By linking administrative databases with clinical data, researchers can better understand why certain treatments cost more than others and identify areas where cost savings can be made. This could lead to improved healthcare decisions, including changes in treatment protocols or resource allocations.
Record linkage has also been used to analyze the prevalence of medical conditions in various populations, create predictive models for patient care, and identify potential drug interactions. All of these studies have helped to improve our understanding of healthcare outcomes and inform decisions about how best to provide care for different patient groups.
Researchers at the University of California‐San Francisco used record linkage to combine patient records from different providers and examine how electronic medical records could be used to improve care coordination.
Challenges in using record linkage technology
Despite the many potential benefits of record linkage technology, there are still challenges that must be overcome. Lack of standardization between datasets can make it difficult for algorithms to identify matches, and data quality issues can lead to incorrect links or missing information.
Additionally, privacy concerns arise when combining multiple datasets, as linking patient records can reveal identifying information about individuals. In order to ensure that patient data is kept secure and confidential, there must be safeguards in place to prevent unauthorized access or misuse of the information. This includes developing secure protocols for data sharing, as well as strong regulations for protecting patient privacy.
It is important to consider the ethics of combining multiple datasets in order to identify a single patient. This could lead to potential issues such as discrimination or stigmatization, and researchers must make sure that they are adhering to ethical codes when collecting and analyzing data. These issues must be addressed in order to ensure that record linkage technology is used responsibly and efficiently. Solutions such as secure data sharing protocols, improved standards for data quality, and rigorous processes for privacy can help researchers harness the power of record linkage technology while protecting patient privacy.
Examples of recent uses of advanced record linkage technology in medical research