Developing a Data Warehouse and Analytic Environment to Support Drug Overdose Surveillance and Data Analysis.

Abstract

Introduction:

Georgia Department of Public Health (DPH) data exist in various siloed datasets agencywide, resulting in unequal access between epidemiologists across the state of Georgia. Siloed data results in individualized data requests for all Georgia state and district epidemiologists and program areas. Each data request has a risk of data being cleaned or standardized differently and could lead to differing final products. An approach to reducing the burden of data requests and non-standard data is to create a data warehouse, a standardized location for direct data access and downloads. As part of the CSTE Data Science Team Training (DSTT) in 2024, a team of DPH epidemiologists and informaticists developed and evaluated a pilot data warehouse for overdose data that traditionally is siloed.

Methods:

A multidisciplinary DPH team was created with subject matter experts in drug surveillance, data pipelines, IT security, and dashboarding. Vital records death data, hospital discharge data, and State Unintentional Death Overdose Reporting System datasets were identified, a pipeline was built for data extraction, transformation, and loading (ETL)data to be visualized for accessibility. Data governance and standard operating procedure documents were created.

Results:

The pipelines for these datasets process 200 records/day and >10 million records/year. State epidemiologists gained direct data access to these datasets and the analytic environment?. Completion time for data requests improved from 23.5 days to on demand. Six unique cleaning and standardizing steps were used to format summary data such as fatal overdoses by district. Data is now enriched with spatial information, such as residence XY coordinates and census tract, via geocoding.

Conclusion:

Data Warehousing increases timely and equitable access to data and establishes additional QA/QC controls to enable standardization across all data requests. A multidisciplinary team and sustained collaboration are required to build an effective tool and maintain the longevity of this project.

Keywords

Data Warehouse, Data Modernization, Drug Surveillance

Conflict of Interest Form

I understand that if my abstract is selected and I agree to present that I must register for the conference and pay the registration fee to attend at least the day of the conference that I present. This includes all Learning Methods except invited Key Note Speakers.

This document is currently not available here.

Share

COinS
 

Developing a Data Warehouse and Analytic Environment to Support Drug Overdose Surveillance and Data Analysis.

Introduction:

Georgia Department of Public Health (DPH) data exist in various siloed datasets agencywide, resulting in unequal access between epidemiologists across the state of Georgia. Siloed data results in individualized data requests for all Georgia state and district epidemiologists and program areas. Each data request has a risk of data being cleaned or standardized differently and could lead to differing final products. An approach to reducing the burden of data requests and non-standard data is to create a data warehouse, a standardized location for direct data access and downloads. As part of the CSTE Data Science Team Training (DSTT) in 2024, a team of DPH epidemiologists and informaticists developed and evaluated a pilot data warehouse for overdose data that traditionally is siloed.

Methods:

A multidisciplinary DPH team was created with subject matter experts in drug surveillance, data pipelines, IT security, and dashboarding. Vital records death data, hospital discharge data, and State Unintentional Death Overdose Reporting System datasets were identified, a pipeline was built for data extraction, transformation, and loading (ETL)data to be visualized for accessibility. Data governance and standard operating procedure documents were created.

Results:

The pipelines for these datasets process 200 records/day and >10 million records/year. State epidemiologists gained direct data access to these datasets and the analytic environment?. Completion time for data requests improved from 23.5 days to on demand. Six unique cleaning and standardizing steps were used to format summary data such as fatal overdoses by district. Data is now enriched with spatial information, such as residence XY coordinates and census tract, via geocoding.

Conclusion:

Data Warehousing increases timely and equitable access to data and establishes additional QA/QC controls to enable standardization across all data requests. A multidisciplinary team and sustained collaboration are required to build an effective tool and maintain the longevity of this project.