Term of Award
Master of Science in Computer Science (M.S.)
Document Type and Release Option
Thesis (restricted to Georgia Southern)
Copyright Statement / License for Reuse
Digital Commons@Georgia Southern License
Department of Computer Sciences
Committee Member 1
Committee Member 2
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes in the data sources and the business requirements have been proposed in the literature (Subotic, Poscic, Poscic, & Jovanovic, 2014). However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model, other aspects related to the ETL, and maintaining the history of changes has not been addressed. As a solution to the problem, we propose a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth”, through an MDM, integrated with the DW. We also present the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test our solution, we have used big data sets from the biomedical field, for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
Naamane, Zaineb, "A Meta-Data Vault Approach for Evolutionary Integration of Big Data Sets: Case Study Using the NCBI Database for Genetic Variation" (2017). Electronic Theses and Dissertations. 1568.
Research Data and Supplementary Material