/
MDPHnet Data Quality and Validation Processes

MDPHnet Data Quality and Validation Processes

As part of the installation, configuration, and maintenance of the MDPHnet data systems at data sites, we validate the underlying data, do routine data quality checks, and assess the performance of algorithms for select conditions.

 

Testing the ESP installation at new sites

Testing follows ESP Data Mart installation and data provisioning. To answer the question "Is the ESP system storing a valid representation of the source data?" a series of tests are performed to confirm that all data processing steps are performing as expected. These tests are a type of internal validation. They occur at the time of ESP installation and implementation at each site and are overseen by the informatics vendor, Commonwealth Informatics Inc. (CII). Tests include visual inspection of installation logs, logs from initial processing of historic data, as well as SQL scripts run to provide basic characterization of the data. Sites and CII work together to resolve any issues that are identified during this initial stage.

For the three sites currently participating in MDPHnet, this testing was done at the time of installation. 

 

Characterizing the data for new sites

For new sites, once there is a high level of confidence that the ESP system is acquiring and loading data correctly, a more detailed set of SQL queries are run against the data to characterize the distribution of numeric results and the frequency distributions of result categories – namely, data elements that are routinely used (e.g., blood pressure for hypertension related indicators). For numeric results this includes identification of high and low values, percentile ranges and counts at each percentile, the proportion above or below specified thresholds, as well as proportion missing and null results, etc. For categorical results, the set of categories and the frequency of each are produced. This work is done to identify outliers and any data problems that may exist in the source patient health records. We expect some of these tests may uncover data processing errors not identified during the prior stage, but the primary purpose of these tests is to answer the question "Does the source data provide valid and consistent indicators of clinically meaningful population parameters for epidemiology?"

 

Ongoing data quality review

We have developed queries that are run regularly (quarterly or as needed) at each site to monitor the quality of the underlying data over time. We visually examine longitudinal data of the counts of all encounters, prescriptions, laboratory tests, etc., to assess for anomalies at each site individually. For example, visually examine graphs of the number of all laboratory tests combined from the underlying data by month-year. Anomalies or substantive changes may reflect meaningful public health changes, changes in the underlying population assessed, changes in clinical practice or changes in clinical partners’ operations, or data quality issues. HPHCI epidemiologists review the data and work with CII to investigate issues of concern. In some cases, the sites are contacted to help investigate the underlying cause or identify a solution.

 

ESP identifies and excludes outlier values for the assessment of obesity, and recent updates to the hypertension detection algorithms exclude impossible blood pressure measures when assessing control status. We have not identified other conditions or algorithms where outliers cause significant problems for estimation.

 

3/11/17