FAQ and Answers

Answers to healthcare data management questions from clinicians, IT staff, and others

Glad You Asked!

Our answers to your most frequently asked MSA Healthcare Data Management questions are listed below.

Please contact us directly with other questions or for additional information.

What comprises the list of PHI identifiers?

All healthcare data shared by an organization both internally and externally must adhere to the HIPAA Privacy Rule as defined by the U.S. Department of Health and Human Services. The HIPAA Privacy Rule restricts the unauthorized disclosure and release of all protected health information (PHI). PHI identifiers, as defined by the HIPAA Privacy Rule, are comprised of the following eighteen types of individual identifiers. These identifiers must be removed or in some way de-identified prior to sharing the data. Once the PHI data is de-identified, it is no longer subject to the HIPAA Privacy Rule and can be freely released and shared by an organization.


  • Names
  • Geographical subdivisions including street address, city, county, 5 digit zip code
  • All elements of dates (except year) directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89.
  • Telephone numbers
  • Fax numbers
  • Email addresses
  • Vehicle identifiers, license plate numbers
  • Device identifiers, serial numbers
  • Social Security Numbers
  • Web addresses
  • IP addresses
  • Medical Record Numbers
  • Biometric identifiers, including finger and voice prints
  • Health plan numbers
  • Full face photographs and images
  • Account numbers
  • Professional certificate and/or license numbers
  • Any other unique number, characteristic, or code, except as permitted, that could be used alone or in combination with other information to re-identify an individual
What de-identification methodologies are outlined by the HIPAA Privacy Rule?

The Privacy Rule outlines two distinct de-identification methodologies:  Safe Harbor and Expert Determination.


The Safe Harbor method requires the removal of the eighteen types of HIPAA individual identifiers defined above from the data set.  The remaining data set cannot be used to identify an individual. The data set can then be shared within and between organizations.


The Expert Determination method requires that a statistician analyze and certify the data set. The statistician applies statistical methodologies to determine the risk of re-identification of individuals within the data set. The statistician provides a certification document detailing the results of the analysis and guidance regarding the data fields requiring de-identification and/or additional transformation to minimize the risk of re-identification of individuals within the data set. The field level de-identification and transformation guidance must be implemented prior to sharing the data set.


There are multiple approaches to the de-identification of PHI data. In practice, the approaches may be used alone or in combination. The selection of a de-identification approach should be based on the projected use of the data.

What types of data do you offer?

Through partnerships with leading data providers, the MSA Healthcare Data Management team provides disparate data, including Laboratory Diagnostic data, Medical Claims data, Electronic Medical Record (EMR) clinical data, and Digital Behavioral data, enabling customers an up-to 360 degree view of the patient journey.

What is the format of the data you have available? Do you have the specific file layouts for each data format?

The format of the data drives the configuration of the MSA De-Identification Engine. It also helps the MSA Healthcare Data Management team determine if the data must be pre-processed prior to de-identification and patient-matching processing.


The most common data formats we receive are fixed or de-limited flat files. We can also process EDI, HL7, EMR, and other formats – these formats typically require pre-processing.

What is the frequency and volume of data that you expect to have sent to MSA?

The frequency of the data sent to MSA Healthcare Data Management varies by each data type from daily, weekly, monthly, quarterly, etc.  The volume of data also varies by data type.

Do you have HIPAA PHI fields available across all of the data formats?

The MSA De-Identification Engine creates Patient Tokens from the raw PHI fields in the data record. The Patient Tokens are used in the MSA Data@Factory patient matching process to assign an MSA Patient ID to each record.


A set of raw PHI fields such as first name, last name, and date of birth are very commonly used in token creation. It is important that this data is available across all of the data types to ensure consistent token creation across all data sources.

What types of data sets does MSA de-identify, integrate, and aggregate?

The MSA Healthcare Data Management team can de-identify, integrate, and aggregate any data set containing the key patient tokens elements, such as Patient First Name, Last Name, and Date of Birth. Data sets without these token data elements can be integrated and aggregated by linking, using data elements common between the data sets such as Plan ID, Product ID, or Claim ID.


Data sets without token data may still require processing through MSA’s De-Identification Engine to consistently encrypt/obfuscate non-token data used to align data sets.


The list of data types that MSA de-identifies includes:

Rx Claims Data
Medical Claims Data
Rx and Medical Claims Remit Data
EMR Data
Digital Behavioral Data
Consumer Data
SP Hub Data
Patient Assistance Plan Data

Can the MSA Healthcare Data Management team integrate disparate data sources in order to provide the patient journey across various data sources?

Yes; our process is as follows:


Each data source is de-identified using the MSA De-Identification Engine to create one or more patient tokens used by the MSA Data@Factory patient matching process. The patient matching process assigns a Patient ID based on customer-defined patient matching business rules. The MSA Data@Factory creates and maintains an anonymous patient database to enable the longitudinal alignment of the disparate data from multiple data sources to provide a complete view of the patient journey.


The MSA Data@Factory can be configured to support multiple complex patient matching strategies including multi-pass patient matching to adjust for variations in the data provider data.

What is the size of the patient population across all of the data formats?

The size of the patient population is critical to the HIPAA certification of the de-identified output. Very small patient populations may require data elements to be aggregated to prevent the potential identification of patients in the de-identified data set through probabilistic forecasting techniques. E.g., in a very small de-identified, it may be necessary to use two digit zip codes in place of three digit zip codes in the de-identified output file.


The MSA De-Identification Engine HIPAA certification requires the certification of each de-identified output format. The HIPAA certification will determine if any additional aggregation is required for the data set.

What platforms do you have available in your environment to run the MSA De-Identification Engine?

The MSA De-Identification Engine is typically installed on a Red Hat Linux or Microsoft Windows environment.

What are the MSA de-identification requirements?

A set of PHI data elements must be available for processing by the patented MSA De-Identification Engine. The PHI elements are used to create patient tokens for use by the patient matching process to assign a common Patient ID to the data. A Patient ID cannot be assigned if any of the token fields are not available. Data without a Patient ID cannot be longitudinally aligned in the anonymous patient database.


Typical patient token data requirements include the patient’s First Name, Last Name, Date of Birth, Gender, and Zip Code.

What are the MSA De-Identification Engine server requirements?

The MSA De-Identification Engine can be deployed either within a data provider’s environment or at MSA. The operating system and hardware specification requirements listed are listed below are for data providers choosing to have the MSA De-Identification Engine deployed within their environment. A Business Associate Agreement (BAA) is required for all data providers opting to send the raw PHI to MSA for de-identification within the MSA HITRUST-certified environment.


MSA De-Identification Engine Operating System Specifications

  • Microsoft Windows – 32 and 64 bit
  • Red Hat Enterprise Linux – 64 bit versions 5.6 and above


MSA De-Identification Hardware Specifications

  • Minimum of 20mb memory
  • De-identification Engine runs on a single CPU; multi-core processors are not required
Contact Us





MSA Life Sciences: 412-362-2000

Office Hours

Hours: Mon-Fri, 8 am-5 pm

Excluding MSA Holidays