Showing posts with label colocode. Show all posts
Showing posts with label colocode. Show all posts

May 8, 2023

Record-Linkage in Healthcare Research... and Marketing

Record-linkage is a term referring to technologies that make it possible to merge data on people and organizations from multiple, disparate sources. Early development of the technology was largely related to marketing, for instance, as a means of connecting magazine subscribers' contact information to sales records belonging to retail stores. It's still used that way (more than ever), but some very important applications have emerged since those early days in the 1950s and 1960s, when computers filled whole rooms and developing highly complex software that would use years of run time was pointless. 

CarePrecise uses record linkage to create business intelligence datasets from a broad range of information available through the U.S. Department of Health and Human Services, Department of Commerce, USPS, and other resources. For example, by merging Medicare claims data with NPI registry data and other federal data sources, we can build a 360 degree view of the U.S. healthcare system - from the health systems to the hospitals to the medical practice groups and clinics, to individual clinicians. Today, record linkage is also making significant inroads in improving patient care.


What is record linkage technology and how does it work?

Record linkage is becoming a vital tool for getting the most out of many types of data. Record linkage technology works by creating a unique identifier for each patient that is used to combine information from multiple sources. There are two general types of record linkage: Exact (deterministic) matching and statistical (probabilistic) matching. 

Disambiguation. Exact matching is, of course, ideal. Linking records based on email addresses and tax identification numbers are excellent examples. "Disambiguation" occurs when otherwise disconnected data can be "hard matched" to create an unambiguous match, for which one unique identifier - a number or other code - can be assigned.

Arriving an unambiguous match may not be as easy as comparing Social Security Numbers. That's when we turn to statistical matching. This is  trickier, and almost always less reliable. Probabilistic record linkage uses "fuzzy" matching algorithms to compare data points and make links between different records that may not have the same exact details. For example, if two records had similar birth dates or home addresses, the algorithm would recognize these as potential matches and create a statistical link between them.

Relying on one or a few non-deterministic data points to match records is, naturally, a bad idea. People tend to change home addresses several times over their lifetimes, so using a street address, or phone number or email address, for that matter, would likely miss a number of records. Also, even if these markers have remained constant, another problem, frequently referred to as "fat fingering," occurs when a name, address, phone, etc. is wrongly entered in a database. 

Deliberate ambiguation. Early techniques for reducing this kind of ambiguity between datasets included creating a data field in which all of the vowels are removed from a name or street address. This "works" because numbers and consonants are statistically far less likely to be typed incorrectly. Not a good system, but better than nothing. A "false positive," when records are matched that shouldn't be, and "false negatives," when records that should be matched aren't, abound using only this ham-handed method, but it can still be a part of the record linkage process. Where patient data is involved, and where scientists are relying on clean data to glean truth, much more must be done.


Tighter matching for critical healthcare data

Data that can be linked include sensitive medical records, hospital records, laboratory tests, insurance claims data and administrative databases. When used for research involving patient records, record linkage often involves matching information from multiple sources to create a single unified patient record identifier, sometimes called a Master Patient Identifier (MPI), that can be used to track and analyze health outcomes over time. By combining different datasets, researchers can gain insights into the effectiveness of treatments and interventions, as well as uncover patterns in disease progression or risk factors that would not be visible if looking at one dataset alone.

This allows researchers to gain insights into patient care outcomes by combining information from multiple sources and looking at patients over time. As data science developed, and much larger datasets became available, scholarly efforts to improve record matching began to emerge. Systems that compare text strings and score the difference have been among these methods. An algorithm known as Soundex compares text strings phonetically; the words "Mary" and "Merry" would have a low text-only score, but Soundex can add weight to the match because the words sound alike.

Other fuzzy-logic methods exist, and can even be bought as part of record linkage software. "Standardization" essentially means making all of the same kinds of data appear the same way across different datasets. One such technique is address standardization, based either on proprietary technologies such as the CoLoCode technique developed by CarePrecise, or other, less precise, methods such as the USPS "Pub 28" standard. Getting mail delivered properly is important, to be sure, but the post office to its advantage the benefit of mail carriers' knowledge of their routes and the human ability to disambiguate on the fly. When comparing thousands or millions of rows of data, as is not unusual in medical research applications, "eyeballing" is not an option.

Rather than get too deep in the weeds here, a fine elucidation on record linkage in medicine can be found on the National Library of Medicine website.


Benefits of record linkage technology in medicine

Data merged from many sources can provide a more comprehensive view of the patient, allowing researchers to make more accurate and reliable conclusions about healthcare outcomes. By combining multiple datasets, researchers can gain deeper insight into medical conditions and how treatments affect patients over time. It also makes it easier to compare health outcomes across different populations, as well as detect potential errors or risks in patient care. 

Additionally, record linkage technology can be used to reduce medical costs and improve efficiency in the healthcare system. By linking administrative databases with clinical data, researchers can better understand why certain treatments cost more than others and identify areas where cost savings can be made. This could lead to improved healthcare decisions, including changes in treatment protocols or resource allocations. 

Record linkage has also been used to analyze the prevalence of medical conditions in various populations, create predictive models for patient care, and identify potential drug interactions. All of these studies have helped to improve our understanding of healthcare outcomes and inform decisions about how best to provide care for different patient groups. 

Researchers at the University of California‐San Francisco used record linkage to combine patient records from different providers and examine how electronic medical records could be used to improve care coordination. 


Challenges in using record linkage technology 

Despite the many potential benefits of record linkage technology, there are still challenges that must be overcome. Lack of standardization between datasets can make it difficult for algorithms to identify matches, and data quality issues can lead to incorrect links or missing information. 

Additionally, privacy concerns arise when combining multiple datasets, as linking patient records can reveal identifying information about individuals. In order to ensure that patient data is kept secure and confidential, there must be safeguards in place to prevent unauthorized access or misuse of the information. This includes developing secure protocols for data sharing, as well as strong regulations for protecting patient privacy.

It is important to consider the ethics of combining multiple datasets in order to identify a single patient. This could lead to potential issues such as discrimination or stigmatization, and researchers must make sure that they are adhering to ethical codes when collecting and analyzing data. 

These issues must be addressed in order to ensure that record linkage technology is used responsibly and efficiently. Solutions such as secure data sharing protocols, improved standards for data quality, and rigorous processes for privacy can help researchers harness the power of record linkage technology while protecting patient privacy.


Examples of recent uses of advanced record linkage technology in medical research

March 23, 2023

CCN and PAC ID to NPI: Crosswalk between the NPI Registry and Hospital and Group Records

The federal Centers for Medicare and Medicaid Services (CMS) publishes a wide range of information on U.S. hospitals, which all carry the unique identifier, the CCN number (CMS Certification Number)*. On the other hand (which often seems to not know what its counterpart is doing), CMS also publishes the frequently updated NPPES database (National Plan and Provider Enumeration System), commonly known as the NPI Registry, which uses the NPI number (National Provider Identifier) as its unique identifier. While hospitals and other medical organizations will have only one CCN Number, they are required to have at least one NPI number, and they're permitted to have as many as they like (and they do seem to like quite a few). 

And, between these two ID systems, the CCN and the NPI, ne'er the twain shall meet.

CarePrecise has developed a sophisticated system to "roll up" an organization's NPI-numbered records with its CCN number (and with the PAC ID for practice groups, which stands for "PECOS Associate Control ID"). This mighty trick produces some eye-opening data, such as contact names and titles, license information, specializations, market data added by CarePrecise to NPI records, and the ability to crossmatch groups to their members and hospital affiliations, directly from their NPI numbers. It also permits integration across the complete line of CarePrecise provider data packages, and all of the information that CarePrecise collects or creates and then merges to the NPI records.

Currently, these CarePrecise rollups (or "crosswalks" if you prefer) are the only available such thing in a relatively comprehensive dataset. The full rollup of all medical facility NPI numbers is available for hospitals, and a single "priority" NPI number is currently available for practice groups, with a full rollup of all PAC ID-to-NPI linkages in development with a tentative release date in May 2023.

The hospital CCN-to-NPI crosswalk is part of the Authoritative Hospital Database (APD), and the Group PAC ID-to-NPI link is part of the Authoritative Physician Database (APD) and CarePrecise Platinum.

The "rolling up" is made possible by several CarePrecise innovations, starting with the CoLoCode (co-location code) affixed to almost every provider in the 7 million+ record CarePrecise master reference database. To fill in additional linkages, the Placekey is used. Placekey is a unique "point of interest" identifier, also attached to essentially every one of the 7 million+ CarePrecise provider records, which can readily be used to link data between data suppliers for a variety of purposes.

* The CMS Certification Number has replaced the term Medicare Provider Number, Medicare Identification Number or OSCAR Number. The CCN is used to verify Medicare/Medicaid providers for survey and certification, assessment-related activities and communications. Note that CarePrecise includes the old OSCAR Number in its CarePrecise Complete and CarePrecise Advanced/Platinum datasets, if reported by the provider in their NPI record(s) or available through third-parties, but this is a small fraction of records, and the OSCAR numbers have changed, hence the need for a CCN-to-NPI crosswalk.

January 6, 2023

Point of Interest Hooks for All U.S. Healthcare Providers

There's a new world of data available from a new class of vendors that can be linked together by the establishment of a universal Point of Interest code that pegs the physical location (the "where") and in some cases even encodes the business name (the "who"). In CarePrecise provider databases, POI-encoded facility information identifies essentially every facility in the U.S. healthcare system, and makes it connectable to other datasets. 

Visitor Traffic Reporting

Want to know how much visitor traffic a given doctor's office gets in a week? That data is available from third party sources, and can be linked in to the CarePrecise provider data using the Placekey™ POI code that CarePrecise appends to almost every* address in the NPI Registry and beyond.

Placekey Integration

CarePrecise invested heavily in creating a system that updates approximately 8 million healthcare provider POI records every month, keeping up with changes in address, new providers, and dropping deactivated providers. Furthermore, CarePrecise keeps historical data for every month, so that location changes can be tracked over time. 

Note: CarePrecise also provides a separate database, Select Geo, that contains latitude and longitude for every geocode-compatible U.S. healthcare provider location in the federal National Provider Identifier registry. This is used in applications that calculate distances and travel time between provider locations, and between patients and prospective providers, such as doctor-finder sites.

The addition of Placekey to CarePrecise data enables a clearer view into healthcare sites — clinics, physician offices, outpatient facilities, hospitals, and the whole panoply of facility types. Because CarePrecise links individual practitioners to their affiliated practice group businesses and the hospitals they are affiliated with, rich patterns emerge when connecting CarePrecise provider networks data to visitor traffic and revenues data. 

Competitive Healthcare, Meet a New Challenge

Whereas the healthcare industry is aggressively competitive, with organizations suppressing intelligence to the fullest extent possible, these new business intelligence pathways represent an unprecedented level of visibility into provider practices vis a vis their patient volumes. Reasonable, actionable assumptions can be made on practices' patient base—despite their closely-held business information—when cross-referenced with the demographic, psychographic, and economic data available on visitor traffic.

*Not all address fields can be parsed to produce a Placekey code or geocode; in particular, street addresses in Puerto Rico use so many non-normalized (GPS readable) addresses. Even CarePrecise's proprietary CoLoCode (uniform address code) has difficulty with many such addresses, though GPS compatibility is not a factor in creation of the CoLoCode.