Showing posts with label NPI Registry. Show all posts
Showing posts with label NPI Registry. Show all posts

May 8, 2023

Record-Linkage in Healthcare Research... and Marketing

Record-linkage is a term referring to technologies that make it possible to merge data on people and organizations from multiple, disparate sources. Early development of the technology was largely related to marketing, for instance, as a means of connecting magazine subscribers' contact information to sales records belonging to retail stores. It's still used that way (more than ever), but some very important applications have emerged since those early days in the 1950s and 1960s, when computers filled whole rooms and developing highly complex software that would use years of run time was pointless. 

CarePrecise uses record linkage to create business intelligence datasets from a broad range of information available through the U.S. Department of Health and Human Services, Department of Commerce, USPS, and other resources. For example, by merging Medicare claims data with NPI registry data and other federal data sources, we can build a 360 degree view of the U.S. healthcare system - from the health systems to the hospitals to the medical practice groups and clinics, to individual clinicians. Today, record linkage is also making significant inroads in improving patient care.


What is record linkage technology and how does it work?

Record linkage is becoming a vital tool for getting the most out of many types of data. Record linkage technology works by creating a unique identifier for each patient that is used to combine information from multiple sources. There are two general types of record linkage: Exact (deterministic) matching and statistical (probabilistic) matching. 

Disambiguation. Exact matching is, of course, ideal. Linking records based on email addresses and tax identification numbers are excellent examples. "Disambiguation" occurs when otherwise disconnected data can be "hard matched" to create an unambiguous match, for which one unique identifier - a number or other code - can be assigned.

Arriving an unambiguous match may not be as easy as comparing Social Security Numbers. That's when we turn to statistical matching. This is  trickier, and almost always less reliable. Probabilistic record linkage uses "fuzzy" matching algorithms to compare data points and make links between different records that may not have the same exact details. For example, if two records had similar birth dates or home addresses, the algorithm would recognize these as potential matches and create a statistical link between them.

Relying on one or a few non-deterministic data points to match records is, naturally, a bad idea. People tend to change home addresses several times over their lifetimes, so using a street address, or phone number or email address, for that matter, would likely miss a number of records. Also, even if these markers have remained constant, another problem, frequently referred to as "fat fingering," occurs when a name, address, phone, etc. is wrongly entered in a database. 

Deliberate ambiguation. Early techniques for reducing this kind of ambiguity between datasets included creating a data field in which all of the vowels are removed from a name or street address. This "works" because numbers and consonants are statistically far less likely to be typed incorrectly. Not a good system, but better than nothing. A "false positive," when records are matched that shouldn't be, and "false negatives," when records that should be matched aren't, abound using only this ham-handed method, but it can still be a part of the record linkage process. Where patient data is involved, and where scientists are relying on clean data to glean truth, much more must be done.


Tighter matching for critical healthcare data

Data that can be linked include sensitive medical records, hospital records, laboratory tests, insurance claims data and administrative databases. When used for research involving patient records, record linkage often involves matching information from multiple sources to create a single unified patient record identifier, sometimes called a Master Patient Identifier (MPI), that can be used to track and analyze health outcomes over time. By combining different datasets, researchers can gain insights into the effectiveness of treatments and interventions, as well as uncover patterns in disease progression or risk factors that would not be visible if looking at one dataset alone.

This allows researchers to gain insights into patient care outcomes by combining information from multiple sources and looking at patients over time. As data science developed, and much larger datasets became available, scholarly efforts to improve record matching began to emerge. Systems that compare text strings and score the difference have been among these methods. An algorithm known as Soundex compares text strings phonetically; the words "Mary" and "Merry" would have a low text-only score, but Soundex can add weight to the match because the words sound alike.

Other fuzzy-logic methods exist, and can even be bought as part of record linkage software. "Standardization" essentially means making all of the same kinds of data appear the same way across different datasets. One such technique is address standardization, based either on proprietary technologies such as the CoLoCode technique developed by CarePrecise, or other, less precise, methods such as the USPS "Pub 28" standard. Getting mail delivered properly is important, to be sure, but the post office to its advantage the benefit of mail carriers' knowledge of their routes and the human ability to disambiguate on the fly. When comparing thousands or millions of rows of data, as is not unusual in medical research applications, "eyeballing" is not an option.

Rather than get too deep in the weeds here, a fine elucidation on record linkage in medicine can be found on the National Library of Medicine website.


Benefits of record linkage technology in medicine

Data merged from many sources can provide a more comprehensive view of the patient, allowing researchers to make more accurate and reliable conclusions about healthcare outcomes. By combining multiple datasets, researchers can gain deeper insight into medical conditions and how treatments affect patients over time. It also makes it easier to compare health outcomes across different populations, as well as detect potential errors or risks in patient care. 

Additionally, record linkage technology can be used to reduce medical costs and improve efficiency in the healthcare system. By linking administrative databases with clinical data, researchers can better understand why certain treatments cost more than others and identify areas where cost savings can be made. This could lead to improved healthcare decisions, including changes in treatment protocols or resource allocations. 

Record linkage has also been used to analyze the prevalence of medical conditions in various populations, create predictive models for patient care, and identify potential drug interactions. All of these studies have helped to improve our understanding of healthcare outcomes and inform decisions about how best to provide care for different patient groups. 

Researchers at the University of California‐San Francisco used record linkage to combine patient records from different providers and examine how electronic medical records could be used to improve care coordination. 


Challenges in using record linkage technology 

Despite the many potential benefits of record linkage technology, there are still challenges that must be overcome. Lack of standardization between datasets can make it difficult for algorithms to identify matches, and data quality issues can lead to incorrect links or missing information. 

Additionally, privacy concerns arise when combining multiple datasets, as linking patient records can reveal identifying information about individuals. In order to ensure that patient data is kept secure and confidential, there must be safeguards in place to prevent unauthorized access or misuse of the information. This includes developing secure protocols for data sharing, as well as strong regulations for protecting patient privacy.

It is important to consider the ethics of combining multiple datasets in order to identify a single patient. This could lead to potential issues such as discrimination or stigmatization, and researchers must make sure that they are adhering to ethical codes when collecting and analyzing data. 

These issues must be addressed in order to ensure that record linkage technology is used responsibly and efficiently. Solutions such as secure data sharing protocols, improved standards for data quality, and rigorous processes for privacy can help researchers harness the power of record linkage technology while protecting patient privacy.


Examples of recent uses of advanced record linkage technology in medical research

April 18, 2023

How to Use Physician Compare to Extract Free Physician Information

Physician Compare Website
The Physician Compare website is a common and free way to acquire very basic physician data. Not only can you look up information on specific providers using the Physician Compare search tool, you can also download the physician and other clinician data as a set of CSV files. The files contain clinicians' NPI number, name, credentials, practice address, phone number, and specialties, along with some other useful data.

The Physician Compare data on the facility affiliations of the doctors and clinicians, is very sparse, and doesn't even list the name of the facility, only its CCN identifier (CMS Certification Number) and PAC ID (PECOS Associate Control ID). For hospital and other facilities' names, address, and other data, you'll have to search and download numerous other files on the CMS website. CarePrecise acquires these from more than a dozen separate files. Alternatively you can purchase the CarePrecise Advanced dataset that includes all of the clinicians' data plus the facilities' data.

Free Physician Data

Within the free Physician Compare data is the Doctors and Clinicians National Downloadable File, which contains the following fields. The file is too large to be used in Excel, with its 1,048,576-row limit. You will need software that can accept more than that number of records, and a way to integrate it with the facility data in the next section, such as a SQL database, Microsoft Access, FileMaker Pro, or similar relational database software environment. (CarePrecise offers it all in an easy-to-use Microsoft Office format.)

  • NPI (national Provider Identifier number)
  • Individual's PAC ID
  • Individual's Medicare Enrollment ID
  • Last Name, First Name, Middle Name, Suffix
  • Gender
  • Credential(s)
  • Medical school (for some)
  • Graduation year (a useful means of inferring approximate age)
  • Primary specialty
  • Secondary specialties
  • Whether the clinician offers telehealth services
  • Name of the group the clinician works with
  • Number of clinicians in the group
  • Practice address fields
  • Phone number
  • Whether the clinician accepts Medicare's approve amount as full payment
  • Whether the affiliated group accepts Medicare's approved amount as full payment
  • Refence Address ID, indicating the specific suite within the same practice address building

Free Hospital and Other Facility Affiliation data

The Doctors and Clinicians Facility Affiliations file, which indicates the CCN numbers of hospitals and other medical facilities the doctors are affiliated with, contains these fields:

  • Clinician's NPI number
  • Clinician's Individual PAC ID
  • Clinician's name fields
  • Facility type (hospitals, long-term care, rehab, dialysis, etc.)
  • CCN number of the facility
  • CCN number of the parent/primary hospital where the clinician provides service

The file doesn't include the name or address of the facility. This file is too large to be used in Excel, which has a limitation of 1,048,576 rows.

Other files available in the Physician Compare download include:

  • Doctors and Clinicians Quality Payment Program PY 2021 Clinician Public Reporting: Overall MIPS Performance
  • Doctors and Clinicians Quality Payment Program PY 2021 Group Public Reporting: MIPS Measures
  • Doctors and Clinicians Quality Payment Program PY 2021 Group Public Reporting: Patient Experience
  • Doctors and Clinicians Quality Payment Program PY 2021 Virtual Group Public Reporting
  • Doctors and Clinicians 2020 Clinician Utilization Data

None of these files include licensed data, such as board certification or residency information.

Conclusion

There is a lot of useful free information the Physician Compare downloadable files, but pulling it together with the more robust data in the NPI registry  – the National Plan and Provider Enumeration System (NPPES) file – is more than a little bit difficult, requiring special methods for dealing with the 7.5 million-record file, and some relational database chops, as well. The hospital affiliations include the CCN number, but not the name, address, phone, etc. of the facilities, requiring additional search and extraction steps. For users who have mastered using these free files but need these additional data, CarePrecise offers data packages that can easily be linked to the Physician Compare date, or they can skip downloading and processing the free files themselves and go to CarePrecise for the combined ready-to-use dataset.

For deeper data on the wide range of U.S. healthcare facilities, CarePrecise also offers the Authoritative Hospital Database, with data on more than 50,000 facilities.

CarePrecise offers its customers free guidance in finding free, downloadable healthcare provider data to fill a wide variety of needs, and works with many research programs that require highly specialized healthcare provider information.

March 23, 2023

CCN and PAC ID to NPI: Crosswalk between the NPI Registry and Hospital and Group Records

The federal Centers for Medicare and Medicaid Services (CMS) publishes a wide range of information on U.S. hospitals, which all carry the unique identifier, the CCN number (CMS Certification Number)*. On the other hand (which often seems to not know what its counterpart is doing), CMS also publishes the frequently updated NPPES database (National Plan and Provider Enumeration System), commonly known as the NPI Registry, which uses the NPI number (National Provider Identifier) as its unique identifier. While hospitals and other medical organizations will have only one CCN Number, they are required to have at least one NPI number, and they're permitted to have as many as they like (and they do seem to like quite a few). 

And, between these two ID systems, the CCN and the NPI, ne'er the twain shall meet.

CarePrecise has developed a sophisticated system to "roll up" an organization's NPI-numbered records with its CCN number (and with the PAC ID for practice groups, which stands for "PECOS Associate Control ID"). This mighty trick produces some eye-opening data, such as contact names and titles, license information, specializations, market data added by CarePrecise to NPI records, and the ability to crossmatch groups to their members and hospital affiliations, directly from their NPI numbers. It also permits integration across the complete line of CarePrecise provider data packages, and all of the information that CarePrecise collects or creates and then merges to the NPI records.

Currently, these CarePrecise rollups (or "crosswalks" if you prefer) are the only available such thing in a relatively comprehensive dataset. The full rollup of all medical facility NPI numbers is available for hospitals, and a single "priority" NPI number is currently available for practice groups, with a full rollup of all PAC ID-to-NPI linkages in development with a tentative release date in May 2023.

The hospital CCN-to-NPI crosswalk is part of the Authoritative Hospital Database (APD), and the Group PAC ID-to-NPI link is part of the Authoritative Physician Database (APD) and CarePrecise Platinum.

The "rolling up" is made possible by several CarePrecise innovations, starting with the CoLoCode (co-location code) affixed to almost every provider in the 7 million+ record CarePrecise master reference database. To fill in additional linkages, the Placekey is used. Placekey is a unique "point of interest" identifier, also attached to essentially every one of the 7 million+ CarePrecise provider records, which can readily be used to link data between data suppliers for a variety of purposes.

* The CMS Certification Number has replaced the term Medicare Provider Number, Medicare Identification Number or OSCAR Number. The CCN is used to verify Medicare/Medicaid providers for survey and certification, assessment-related activities and communications. Note that CarePrecise includes the old OSCAR Number in its CarePrecise Complete and CarePrecise Advanced/Platinum datasets, if reported by the provider in their NPI record(s) or available through third-parties, but this is a small fraction of records, and the OSCAR numbers have changed, hence the need for a CCN-to-NPI crosswalk.

December 29, 2022

Again: Call for CMS to Release Tax Numbers

It's 2022 and still CMS fails to include healthcare organizations' tax numbers. Whether you call them TIN or EIN the numbers are not sensitive in any way, and the Centers for Medicare and Medicaid Services should release them. This is a repost of an article from 2010, more than 3 years after the first NPI Registry data was made public - except, of course, for those tax numbers:

____

The NPI Final Rule called for CMS to establish a system that would assign a National Provider Identifier (NPI) number to essentially every healthcare provider in the U.S. (HIPAA "covered entities"): now more than 3 million providers and growing. Great. But it was years before CMS released that data for the industry to use. CarePrecise personnel were at the forefront even back then, calling for CMS to release the data. If necessary, we were ready to fight for it, filing our own request under the Freedom of Information Act (FOIA). Federal agencies can't keep such kinds of data from the public. It's the law. CMS eventually looked at FOIA, and at their provider data, and decided that, sure enough, they were going to have to release it. We and our clients were ecstatic; now the industry would be able to produce the complex crosswalks necessary to actually achieve the efficiencies promised by the Final Rule.

Hurray... except CMS decided not to release one of the most useful data points of all. A provider's federal tax number is hardly a private number. Businesses have to give their tax number on every imaginable type of transaction. Employees see the employer's number on their W-2s. CMS's excuse was that sole proprietors and pretty much all individual practitioners would have to give their Social Security Number, or that busy doctors might type in the SSN in the wrong spot. Fair enough, but, as everyone who works with data knows, it's a piece of cake to parse a tax number field to determine if the number is a SSN or a business tax number. In fact, that's just exactly what CMS does in the Other ID fields of the NPPES (National Plan and Provider Enumeration System) database, replacing 000-00-0000 with a string of equals signs.

Instead of just redacting the SSNs, CMS decided it was best just to wipe clean the complete Employer Identification Number (EIN) field -- just in case some uppity docs got... uppity. Many of us have been hoping that CMS would revisit the issue of this gaping hole in the provider data, but it seems that the issue is to be ignored so that it will just go away.

So, here we are, once again, years into it, asking CMS to release non-SSN tax numbers/EINs so that we -- health systems and health plans large and small, clearinghouses, HIT vendors, medical billing and coding vendors -- can make this data do what it was intended to do for healthcare and for the taxpayers.

____

Check out the NPI information at CarePrecise.