AI-Assisted Data Modeling: Intelligent Star, Snowflake, and Hybrid Schema Generation for Large-Scale Warehouses
Abstract
Designing efficient data warehouse schemas—such as star, snowflake, and hybrid models—has traditionally been a manual, expertise-driven process requiring deep knowledge of business processes, data dependencies, and query performance optimization. With the exponential growth of data sources and the increasing shift toward real-time analytics, traditional modeling methodologies face limitations in scalability, accuracy, and development speed. This research introduces an AI-Assisted Data Modeling Framework that automates the design of warehouse schemas using clustering algorithms, NLP-assisted semantic understanding, machine learning–based pattern identification, and rule-based optimizations. The proposed system intelligently extracts metadata, identifies facts and dimensions, detects hierarchies, and selects optimal schema types based on analytical workloads, data cardinality, and normalization requirements. A detailed case study on a retail enterprise demonstrates improvements in schema-design time, structural accuracy, dimensional hierarchy detection, and performance predictions. This work follows a structured academic format inspired by the style and clarity of your sample research paper on blockchain-enabled AI systems.
References
Batini, C., & Scannapieco, M. (2016). Data and Information Quality: Principles and Techniques. Springer.
Doan, A., Halevy, A., & Ives, Z. (2012). Principles of Data Integration. Morgan Kaufmann.
Kimball, R., & Ross, M. (2013). The Data Warehouse Toolkit. Wiley.
Mullins, C. (2013). Database Administration. Addison-Wesley.
Rahm, E., & Do, H. H. (2000). Data cleaning. IEEE Data Engineering Bulletin, 23(4), 3–13.
Siau, K. (2018). AI in data management. Journal of Database Management, 29(1), 1–10.
Vassiliadis, P., Simitsis, A., & Skiadopoulos, S. (2002). Conceptual modeling for ETL. ACM DOLAP, 14–21.
Zhu, Q., & Chen, H. (2016). Semantic reasoning in ETL workflows. Expert Systems with Applications, 55, 56–67