140

A COMPARISON OF DATA WARE HOUSE DESIGN MODELSA MASTER’S THESIS in Computer Engineer ing Atilim Univer sity by BERIL PINAR BAŞARAN J ANUARY 2005 A COMPARISON OF DATA WARE HOUSE DESIGN MODELS A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF ATILIM UNIVERSITY BY BERIL PINAR BAŞARAN IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN THE DEPARTMENT OF COMPUTER ENGINEERING J ANUARY 2005 i Approval of the Graduate School of Natural and Applied Sciences _____________________ Prof. Dr. Ibrahim Akman Director I certify that this thesis satisfies all the requirements as a thesis for the degree of Master of Science. _____________________ Prof. Dr. Ibrahim Akman Head of Department This is to certify that we have read this thesis and that in our opinion it is fully adequate, in scope and quality, as a thesis for the degree of Master of Science. _____________________ Prof. Dr. Ali Yazici Co-Supervisor _____________________ Dr. Deepti Mishra Supervisor Examining Committee Members Prof. Dr. Ali Yazici Dr. Deepti Mishra Asst. Prof. Dr. Nergiz E. Çağıltay Dr. Ali Arifoğlu Asst. Prof. Dr. Çiğdem Turhan ii _____________________ _____________________ _____________________ _____________________ _____________________ object-oriented model provides the best solution and for the logical design phase. The generally accepted conceptual design approaches are dimensional fact model. star schema. flat schema. Design Methodologies. starER. galaxy schema. Computer Engineering Department Supervisor: Dr. multidimensional E/R model. flat schema. And in the logical design phase. star schema. This thesis proposes a comparison of both the conceptual and the logical design models and a sample data warehouse design and implementation is provided. snowflake schema. Ali Yazici January 2005. fact constellation schema. OOMD. Deepti Mishra Co-Supervisor: Prof. DF. ME/R. galaxy schema. Keywords: Data Warehouse. 90 pages There are a number of approaches in designing a data warehouse both in conceptual and logical design phases. fact constellation schema. star cluster schema and starflake schemas are widely used approaches.. snowflake schema. Dr.ABSTRACT A COMPARISON OF DATA WARE HOUSE DESIGN MODELS Başaran. star schema is generally the best in terms of performance and snowflake is generally the best in terms of redundancy. terraced schema. It is observed that in the conceptual design phase. starER model and object-oriented multidimensional model. terraced schema. Beril Pınar M.S. Data Analyzer iii . DTS. starflake schema. star cluster schema. ÖZ VERİ AMBARI TASARIM MODELLERİ KARŞILAŞTIRMASI Başaran. star cluster şema. Deepti Mishra Ortak Tez Yöneticisi: Prof. DF. örnek bir veri ambarı tasarımını ve uygulamasını içerir. “ terraced”. Kavramsal tasarım safhası için genel olarak kabul görmüş yaklaşımlar “dimensional fact”. Bu tezde. DTS. star şema. “ snowflake”. Dr. terraced şema. Tasarım Yöntemleri. “multidimensional E/R”. kavramsal tasarım aşamasında “object-oriented multidimensional” modelinin. “ star cluster” ve “ starflake” şemalarıdır. ME/R. starflake şema. “ fact constellation”. Ali Yazici Ocak 2005. Beril Pınar Yüksek Lisans. kavramsal ve mantıksal tasarım modellerini karşılaştırır. Mantıksal tasarım safhası için genel olarak kabul görmüş yaklaşımlar “ flat”. flat şema. “ galaxy” . OOMD. snowflake şema. “ star”. Anahtar Kelimeler: Veri Ambarı. veri tekrarı kriteri aç ısından “snowflake” şemanın en iyi çözümler olduğu gözlendi. Data Analyzer iv . Bilgisayar Mühendisliği Bölümü Tez Yöneticisi: Dr. fact constellation şema. 90 sayfa Veri ambarı tasarımının kavramsal ve mantıksal tasarım aşamaları için birden fazla yaklaşım vardır. “starER” ve “object-oriented multidimensional” modelleridir. starER. mantıksal tasarım aşamasında performans kriteri açısından “star” şemanın. Bu tez. galaxy şema. To my dear husband Thanks for his endless support v . I would like to express my thanks to my husband for his assistance. vi . Nergiz E. Asst. Ali ARIFOĞLU. Prof. Ali YAZICI for their guidance. insight and encouragement throughout the study. Deepti MISHRA and cosupervisor Prof. Dr. Çiğdem TURHAN for their valuable suggestions and comments. Dr. ÇAĞILTAY. I would like to thank my thesis advisor Dr. Dr. I should also express my appreciation to examination committee members Asst. encouragement and all members of my family for their patience.ACK NO WLEDGEMENTS First. sympaty and support during the study. Prof. Dr. ..... 19 4............. x LIST OF FIGURES........ 21 vii .....................1..............1............................................................................................................. 2 2 DATA WAREHOUSE CONCEPTS ............................................................................. Data Storage and Access ........................................................................................................................................................... 5 2.......................... 18 4................................................... 3 2...... 14 4 DESIGNING A DATA WAREHOUSE......................................................... 1 1................................ vi TABLE OF CONTENTS.......................................................................................................................................................................... xi LIST OF ABBREVIATIONS ........................................................ High-Level Modeling ............3.................iii ÖZ........................... Requirements for Data Warehouse Database Management Systems................................. Scope and outline of the thesis..3........................2...................................................................... Data acquisition............................................................. The DW Data Model .....2............................ Definition of Data Warehouse ........2........................................................................................ 10 3............... Data Marts ...................3.......... 12 3........................... 16 4......................... Why OLAP systems must run with OLTP ........................... 3 2................3.............................................xiii CHAPTER 1 INTRODUCTION..... vii LIST OF TABLES .................................... 19 4... Extraction....2.....................................1...TABLE OF CONTENTS ABSTRACT .................................................3............................................................. Data/Process Models ....................................................................... iv ACKNOWLEDGEMENTS................................................................................................... 8 3 FUNDAMENTALS OF DATA WAREHOUSE ..................................................................................................................................... 13 3..................... 16 4..............................................................................................................................................................................1.......1.............1.............. 13 3..... Mid-Level Modeling .. Cleansing and Transformation Tools.......................... Beginning with Operational Data ........................................1................... ...............4.6.... The Dimensional Fact Model.............6....2.................. 58 5............ Terraced Schema ................................................................6............... Low-Level Modeling...............9................................... 41 4.................................... Comparison of Conceptual Multidimensional Models.....................................6.................... Star Cluster Schema .............................................. 33 4.......... 23 4...........................4...................................8.......................................... Starflake Schema ..................... Discussion on Data Warehousing Design Tools...............................................3.................. starER Approach .....................................................................................8..1............................. 39 4.............................. Logical Design Models. 45 4...................................................................................................... 60 5.......................5..................................................................................2.....................................6.............. starER .... 57 5. 24 4....................................................................................................................6........................ Flat Schema...................................7........... 47 4.....3................................................. Comparison of Logical Design Models.................................. Multidimensional E/R Model ................... 53 4............................. 43 4.......9......................................................................................................................... A Case Study..... Cube............... Conceptual Design Models .............................. 48 4....................... Comparison of Dimensional Models and Object-Oriented Models .............3....................6............ 27 4........................................ Galaxy Schema ................... 68 viii ............... 35 4.....10......5..... 56 5..............................................6.............. 53 4..................... Database Design Methodology for DW .....4............. Comparison of Dimensional Models and ER Models ...................................................................6....................... Meta Data ..................... Fact Constellation Schema...................................3...................5.............................. Star Schema........................................ OLAP Server Architectures .................1................................................................3..5.................................6...5.......................3..............6............2...................... Object-Oriented Multidimensional Model (OOMD) .......................... 56 5.. 64 6................. 40 4.................................. 44 4....... Materialized views .................... 43 4.... 36 4.... 27 4.................... 30 4. Snowflake Schema .........4..6...................................... 54 5 COMPARISON OF MULTIDIMENSIONAL DESIGN MODELS................. 37 4............7........................ 61 6 IMPLEMENTING A DATA WAREHOUSE.......................... Dimensional Model Design .....4............... 64 6.........1..........5......1.........5........................ OOMD Approach............................................... 65 6............................................2..................................... .......................... ME/R Approach ........................... 87 ix ............................................... Implementation Details.........................................................6. 83 7...........4.....2............................6.............................................................................................................................. Future Work .... Contributions of the Thesis ........................................ 72 6........................ 74 7 CONCLUSIONS AND FUTURE WORK.................................................................................................................................... 70 6....... DF Approach ............5........................1..................................... 86 REFERENCES ....................................... 85 7..................................................... ......... 49 5............ 49 4......... 61 x .....................................................................LIST OF TABLES TABLE 2............................................ 60 5............................. 58 5...................................................................................................................3 Comparison of logical design models...1 Comparison of ER..........1 Comparison of OLTP and OLAP.......................................................1 2-dimensional pivot view of an OLAP Table ....... 7 4............... DM and OO methodologies .................2 Comparison of conceptual design models ...........2 3-dimensional pivot view of an OLAP Table ......... ................. 16 4.................... 32 4..........17 A sample DW model using starER ...........................LIST OF FIGURES FIGURE 2..... 21 4.............................................2 Same attribute with different formats in different sources............... 22 4................................13 The graphical notation of ME/R elements.............................................................................. 5 3......................................................................................... 21 4..........................14 Multiple cubes sharing dimensions on different levels ......................6 Relationship between ERD and DIS..........................................................................................11 Considerations in low-level modeling ..................................................9 Corporate DIS formed by departmental DISs........................................... 17 4.......................... 17 4................................................................18 Flat Schema ............................................................................ 23 4................................. 31 4....... 28 4..................2 Data Integration................................. 35 4..7 Midlevel model members ................. 41 4.........16 Notation used in starER......... 10 4.............3 Simple comparison of OLTP and DW systems ................................................................................................................................ 20 4....................................................................................................................................10 An example of a departmental DIS............ 33 4................15 Combining ME/R notations with E/R.............................4 A Simple ERD for a manufacturing environment...................1 Architecture of DW.......................1 Data Extraction..................................... 23 4................3 Same data........12 A dimensional fact schema sample.................................................8 A Midlevel model sample........................................................................................................................................................ 20 4............................................................... ......................................20 Star Schema........................................................................................................... 4 2........................... 24 4................1 Consolidation of OLTP information......... 40 4......19 Terraced Schema......................................................................................... 4 2............. different usage..................... 42 xi ........................................... 33 4.........5 Corporate ERD created by departmental ERDs............................................ ......................... 72 6............................................................... 66 6.5 Sales subsystem starER model..................18 Pivot Chart using Excel as client .................................4 Static structure diagram of sales and shipping system ...... 77 6.....................1 ER model of sales and shipping systems.........16 Transformation details for delimited text file ...................................................................................................... 67 6...................................................27 Comparison of schemas............................................................................................................................................................................. 80 6...................................................................... 48 4....................................................................... 65 6.........17 Transact-SQL query as the transformation source.......23 Snowflake Schema...........................3 Statechart diagram of sales and shipping system...................................................................................................................29 Operations on a Cube... 78 6...........................................................19 Pivot Table using Excel as client .................. 73 6......................................................................................12 Snowflake schema for the shipping subsystem...................................... 43 4.............. 45 4.......8 Shipping subsystem ME/R model.......... 74 6........28 3-D Realization of a Cube ................................ 67 6.................................................. 52 6............ 44 4...................7 Sales subsystem ME/R model................................................................................. 80 6........................................................................................24 Star Schema with “fork” ............6 Shipping subsystem starER model..........................25 Star Cluster Schema...................................................................................14 Sales DTS Package .........................................26 Starflake Schema.. 47 4.............................................13 General architecture of the case study................... 71 6............................................................. 75 6.................................... 77 6..............................................................................15 Shipping DTS Package ............................................................................................................ 46 4..................................................................................22 Galaxy Schema............................................................... 73 6........................................ 75 6...............................................................20 Data Analyzer as client......................................................... 69 6........................ 47 4................................2 Use case diagram of sales and shipping system..9 Sales subsystem DF model.......................... 70 6........................... 81 xii ................................................21 Fact Constellation Schema ............................11 Snowflake schema for the sales subsystem.......... 50 4...................10 Shipping subsystem DF model.............. 79 6.......................4............... Transform. Load Hybrid OLAP Input/Output Information Technology Multidimensional E/R Multidimensional OLAP Open Database Connectivity Object Identifier Online Analytical Processing Online Transaction Processing Object Oriented xiii .LIST OF ABBREVIATIONS 3GL 4GL DAG DB DBMS DDM DF DIS DSS DTS DW ER ERD ETL HOLAP I/O IT ME/R MOLAP ODBC OID OLAP OLTP OO - Third Generation Language Fourth Generation Language Directed Acyclic Graph Database Database Management Systems Data Dimensional Modeling Dimensional Fact Data Item Set Decision Support System Data Transformation Services Data Warehouse Entity Relationship Entity Relationship Diagram Extract. OOMD RDBMS ROLAP SQL UML XML - Object Oriented Multidimensional Relational Database Management Systems Relational OLAP Structured Query Language Unified Modeling Language Extensible Markup Language xiv . The companies desire to increase the value of their organizational data by turning it into actionable information. because it is in different formats. This process would be costly. it becomes harder to access and get the most information out of it.CHAPTER 1 INTRODUCTION Information is an asset that provides benefit and competitive advantage to any organization. 1 . the corporate decisionmakers require access to all the organization’s data at any level. exists on different platforms and resides on different structures. Data warehousing process contains extraction of data from heterogenous data sources. Also. As the amount of the organizational data increases. cleaning. Data warehousing provides an excellent approach in transforming operational data into useful and reliable information to support the decision making process and also provides the basis for data analysis techniques like data mining and multidimensional analysis. filtering and transforming data into a common structure and storing data in a structure that is easily accessed and used for reporting and analysis purposes. Today. every corporation have a relational database management system that is used for organization’s daily operations. which may mean modifications on existing or development of new consolidation programs. Organizations have to write and maintain several programs to consolidate data for analysis and reporting. inefficient and time consuming for an organization. The focus of this thesis is discussing the data warehouse conceptual and logical design models and comparing these approaches. 1. now the question is how. Chapter 3 provides information on data warehousing fundamentals and process.1. In chapter 5. the design approaches described in chapter 4 are discussed and compared. There are generally accepted design methodologies in designing and implementing a data warehouse. a sample conceptual model is logically implemented using the logical design models and the physical implementation of a data warehouse is described. Finally in chapter 6. 2 . Scope and outline of the thesis The thesis organized as follows: Chapter 2 presents an overview of data warehouse concepts and makes a comparison between operational and analytical processing systems.As the need for building an organizational data warehouse is clear. Chapter 4 gives information on data warehouse design approaches used in conceptual and logical design phases. H. 3]. According to Barry Devlin. 3 . The data in the warehouse should be organized based on subject and only subject-oriented data should be moved into a warehouse. 11]. complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context” [1. The description of the four key features of the DW is given below. an enterprise contains information that is very detailed to meet all requirements needed for related subsets of the organization (sales dept. IBM Consultant. 10. Usually. 6. Decision-makers need subject-oriented data. time-var iant . integr ated . “ a DW is a subject-or iented . this type of data is not suitable for decision-makers to use. According to W. 3. “ a DW is simply a single.1. 2. DW should include only key business information. marketing dept etc.) and optimized for transaction processing. and nonvolatile collection of data in support of management’s decision making process” [1. Subject-or iented: In general. human resources dept. Definition of Data War ehouse A data warehouse (DW) refers to a database that is different from the organization’s Online Transaction Processing (OLTP) database and that is used for the analysis of consolidated historical data.CHAPTER 2 DATA WAREHOUSE CONCEPTS 2. Inmon. 4 . XML data.2 Same attr ibute with differ ent for mats in differ ent sour ces Time-var ian t: DW provides information from a historical prospective. so it does not require transaction processing. to be used for comparisons.3). DW also provides mechanisms for cleaning and standardizing data. recovery and concurrency control mechanisms. an element of time. he/she would need to use all systems like rental sales system. flat files.1 Consolidation of OLTP infor mation Integr ated: DW is an architecture constructed by integrating data from multiple heterogeneous sources (like relational database (DB). trends and forecasting. data from the legacy systems) to support structured and/or ad hoc queries. analytical reporting and decision making.If the decision-maker needs to find all information about a spesific product.1. all the key information must be consolidated in a warehouse and organized into subject areas as illustrated in Figure 2. which is not the preferable and the practical way. Nonvolatile: Data in the warehouse are not updated or changed (see Figure 2. Figur e 2. excel sheets. order sales system and catalog sales system. Figur e 2.2 emphasizes various uses and formats of “Product Code” attribute. A DW generally stores data that is 5-10 years old. Every key structure in the DW contains. either implicitly or explicitly. Instead. Figure 2. 2. DWs usually do not keep as much detail as transactionoriented systems. whereas data warehouses may retain data for years. It contains large amount of data. Why OLAP systems must r un with OLTP In this section.The operations needed in the DW are initial loading of data and access of data and refresh. It is subject-oriented. 5 . Figur e 2. I aim to make a comparison of OLTP and Online Analytical Processing (OLAP) systems and explain the reasons why an OLAP system is needed.3 Simple compar ison of OL TP and DW systems Some of the DW characteristics are given below. Data is summarized.            It is a database that is maintained separately from organization’s operational databases. It supports information processing by consolidating historical data. It is non-volatile. It is updated infrequently but periodically updates are required to keep the warehouse meaningful and dynamic. Data is longer-lived.2. User interface aimed at decision-makers. Data is stored in a format that is structured for querying and analysis. Transaction systems may retain data only until processing is complete. It allows for integration of various application systems. The nature of OLTP and OLAP systems are completely different both in technical and in business needs. customeroriented. index/hash on primary key. lots of scans. The following table compares OLTP systems OLAP systems in main technical topics OLTP User and System Or ientation Thousands users. fact constellation model and a subject-oriented database design. used for data analysis by knowledge workers Manages large amounts of historical data. provides facilities for summarization and aggregation. or application-oriented database design. marketoriented. 6 . used for transactions and querying clerks. stores information at different levels of granularity to support decision making process Data is continuously updated Data is volatile and normalized (EntityRelationship (ER) Model) Database Design Data is refreshed Data is non-volatile and denormalized (Dimensional Model) Adopts an ER model and an Adopts star. clients and Information Technology (IT) professionals Data Contents Manages current data. snowflake. very detail-oriented OLAP Hundreds users. summarized. OLAP requires historical data.   To gain high performance of both systems by proper data organization (DB design). day to day activities. simple transactions. flat relational. 7 . and uses of the data. atomic. long term informational requirements. detailed. mostly reads. although many could be complex queries. integrates information from many organizational locations and data stores. locking and logging. The following list compares the main reasons for using a DW. complex query.View Focuses on the current data within an enterprise or department. Mostly read-only operations. OLTP requires the current data. contents. requires concurrency control and recovery mechanisms. Table 2. mostly updates. Data in OLTP might be “dirty” because it is collected by clerks that may make mistakes and for other reasons. OLTP deals with many records at ones. Data that goes into OLAP should be cleaned and standardized. decision support. OLTP deals with transactions. concurrency. Data Cleanness.1 Comp ar ison of OLTP and OLAP It may seem questionable to implement a DW system for companies running their business on OLTP systems. Spans multiple versions of a database schema due to the evolutionary process of an organization.   Different structures. Transaction performance of OLTP and selection performance of OLAP would be in conflict. multidimensional. Access Patter ns Short. a DW solution has many advantages and benefits to an organization. While an OLTP database management systems (DBMS) must only consider transaction 8 . They may provide different functionality and use different types on queries. 6]. Also implementing a DW solution solves some business problems. High query performance Does not interfere with local processing at sources Information copied at warehouse (can modify. it may bring some new self-owned problems mentioned below [2.        Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands High maintenance Long duration projects Complexity of integration Data homogenization High demand for resources Data ownership    2. OLTP and OLAP systems need to run different types of queries. restructure. etc. The main roles in a company that will use a DW solution are [4]. 3.           Top executives and decision makers Middle/operational managers Knowledge workers Non-technical business related individuals The main advantages of using a DW solution are summarized in the list below [2. 6]. many technical points must be considered. Requir ements for Data War ehouse Database Management Systems In the implementation of a DW solution.) Potential high Return on Investment Competitive advantage Increase productivity of corporate decision makers As discussed above.3. summarize. filtering. War ehouse administr ation: Easy-to-use and flexible administrative tools should exists for data warehouse administration. 9 . Mass user scalability: The data warehouse RDBMS should be able to support hundreds of concurrent users. indexing and reformatting may be necessary during loading data into the data warehouse. and with support of thousands of transactions per second) The relational DBMS (RDBMS) suitable for data warehousing has the following requirements [6].processing performance (which is basically. This process should be executed as a single unit of work. a transaction must be completed in the minimum time. Ter abyte scalability: The data warehouse RDBMS should not have any database size limitations and should provide recovery mechanisms.      Quer y Per for mance: Complex queries must complete in acceptable periods.  Load pr ocessing: Data conversion.  Data quality m anagement: The warehouse must ensure consistency and referential integrity despite various data sources and big data size. The measure of success for a data warehouse is the ability to satisfy business needs. without deadlocks. Advanced quer y functionality: The data warehouse RDBMS should supply advanced analytical operations to enable end-users perform advanced calculations and analysis.  Load per for m ance : Data warehouses need incremental loading of data periodically so the load process performance should be like gigabytes of data per hour. Figur e 3. accessing and maintaining the data warehouse. Data coming from both internal and external sources in various formats and structures is consolidated and integrated into a single repository.CHAPTER 3 FUNDAMENTALS OF DATA WAREHOUSE The main reason for building a DW is to improve the quality of information in the organization.1 Ar chitectur e of DW 10 . DW system comprises the data warehouse and all components used for building. A general architecture of a DW is given in Figure 3.1 and the main components are described below [5, 32]. The data import and preparation component is responsible for data acquisition. It includes all programs (like Data Transformation Services (DTS)) that are responsible for extracting data from operational sources, preparing and loading it into the warehouse. The access component includes all applications (like OLAP) that use the information stored in the warehouse. Additionally, a metadata management component is responsible for the management, definition and access of all different types of metadata. Metadata is defined as “data describing the meaning of data”. In data warehousing, there are various types of metadata, e.g., information about the operational sources, the structure and semantics of the data warehouse data, the tasks performed during the construction, the maintenance and access of a data warehouse, etc. Implementing a DW is a complex task containing two major phases. In the configuration phase, a conceptual view of the warehouse is first specified according to user requirements (DW design). Then, the related data sources and the Extraction-LoadTransform (ETL) process (data acquisition) are determined. Finally, decisions about persistent storage of the warehouse using database technology and the various ways data will be accessed during analysis are made. After the initial load (the first load of the DW according to the configuration), during the operation phase, warehouse data must be regularly refreshed, i.e., modifications of operational data since the last DW refreshment must be propagated into the warehouse such that data stored in the data warehouse reflect the state of the underlying operational systems. A more natural way to consider multidimensionality of warehouse data is provided by the multidimensional data model. In this model, the data cube is the basic modeling construct. Operations like pivoting (rotate the cube), slicing-dicing (select a subset of the cube), roll-up and drill-down (increasing and decreasing the level of aggregation) can be applied to a data cube. For the implementation of multidimensional databases, there are two main approaches. In the first approach, extended RDBMSs, called relational OLAP 11 (ROLAP) servers, use a relational database to implement the multidimensional model and operations. ROLAP servers provide SQL extensions and translate data cube operations to relational queries. In the second approach, multidimensional OLAP (MOLAP) servers store multidimensional data in non-relational specialized storage structures. These systems usually precompute the results of complex operations (during storage structure building) in order to increase performance. 3.1. Data acquisition Data extraction is one of the most time-consuming tasks of DW development. Data consolidated from heterogenous systems may have problems, and may need to be first transformed and cleaned before loaded into the DW. Data gathered from operational systems may be incorrect, inconsistent, unreadable or incomplete. Data cleaning is an essential task in data warehousing process in order to get correct and qualitative data into the DW. This process contains basically the following tasks: [5]     converting data from heterogenous data sources with various external representations into a common structure suitable for the DW identifying and eliminating redundant or irrelevant data transforming data to correct values (e.g., by looking up parameter usage and consolidating these values into a common format) reconciling differences between multiple sources, due to the use of homonyms (same name for different things), synonyms (different names for same things) or different units of measurement As the cleaning process is completed, the data that will be stored in the warehouse must be merged and set into a common detail level containing time related information to enable usage of historical data. Before loading data into the DW, tasks like filtering, sorting, partitioning and indexing may need to be performed. After these processes, the consolidated data may be imported into the DW using one of bulk data loaders, a custom application or an import/export wizard provided by the DBMS administration applications. 12 3.1.1. Extr action, Cleansing and Tr ansfor mation Tools The tasks of capturing data from a source system, cleansing, and transforming the data and loading the consolidated data into a target system can be done either by separate products or by a single integrated solution. Integrated solutions fall into one of the following categories [6]:    Code generators Database data replication tools Dynamic transformation engines There are solutions that fulfill all of the requirements mentioned above. One of these products is Microsoft® Data Transformation Services is described in chapter 6. Code gener ator s Code generators create customized 3GL, 4GL transformation programs based on source and target data definitions. The main issue with this approach is the management of the large number of programs required to support a complex corporate DW. Database data r eplication tools Database data replication tools employ database triggers or a recovery log to capture changes to a single data source on one system and apply the changes to a copy of the source data located on a different system. Most replication products don’t support the capture of changes to non-relational files and databases and often not provide facilities for significant data transformation and enhancement. These tools can be used to rebuild a database following failure or to create a database for a data mart, provided that the number of data sources is small and the level of data transformation is relatively simple. Dynamic tr ansfor mation engines Rule-driven dynamic transformation engines capture data from a source system at userdefined intervals, transform the data and then send and load the results into a target environment. Most products support only relational data sources, but products are now emerging that handle non-relational source files and databases. 3.2. Data Stor age and Access Because of the special nature of warehouse data and access, accustomed mechanisms for data storage, query processing and transaction management must be 13 Data marts allow the efficient execution of predicted queries over a significantly smaller database. Once the DW is available for end-users.3. There are several tools and products that are commercially available. storage structures and query processing techniques. The most commercially used client application is Microsoft® Excel with pivot tables. 14 . Data marts usually contain simple replicas of warehouse partitions or data that has been further summarized or derived from base warehouse data. These organizations may need to establish data marts which are selected parts of the DW that support specific decision support application requirements of a company’s department or geographical region. Data Mar ts A data mart is a subset of the data in a DW and is summary data relating to a department or a specific function [6].9. These operations need special access methods. Data marts focus on the requirements of users in a particular department or business function of an organization. One of these physical storage methods may be chosen concerning the trade-off between query performance and amount of data. ODBC or native client providers to access the DW data. The main reasons for implementing a data mart instead of a DW may be summarized as follows:  Data marts enable end-users to analyze the data they need most often in their daily operations.adapted. DW solutions need complex querying requirements and operations involving large volumes of data access. A centric DW may not be feasible for these companies. there are a variety of techniques to enable end-users access the DW data for analysis and reporting. Since data marts are specialized for departmental operations. 3. The storage approaches of a DW is described in detail in section 4. they contain less data and the end-users are much capable of exploiting data marts than DWs. In common all client tools use generally OLEDB. A company that makes business in several countries througout the world may need to analyse regional trends and my need to compete in regions. there are some issues that must be addressed about data marts. User access to data in multiple data mar ts: A solution to this problem is building virtual data marts which are views of several physical data marts. security and performance tuning. For increasing the response time. Data marts are more specialized and contain less data. building a data mart may be a more feasible project than building a DW.  In terms of software engineering. integrity. the management need arises to coordinate data mart activities such as versioning. it is likely to have a performance decrease. therefore data transformation and integration tasks are much faster in data marts than DWs and setting up a data mart is a simpler and a cheaper task compared to establishing an organizational DW in terms of time and resources. Size: Although data marts are considered to be smaller than data warehouses. the end-user response time in queries is much quicker. Although data marts seem to have advantages over DWs. Administr ation: With the increase in number of data marts. size and complexity of some data marts may match a small corporate DW. Load per for mance: Both end-user response time and data loading performance are critical tasks of data marts. 15 .  Since data marts contain less data. because the requirements of building a data mart are much more explicit than a corporate wide DW project. As the size of a data mart increases. consistency. data marts usually contain lots of summary tables and aggregations which have a negative effect on load performance. There are two major components to build a DW.1 Data Extr action 16 .1. the design of the interface from operational systems and the design of the DW [11].1) . DW design is different from a classical requirements-driven systems design. Beginning with Oper ational Data Creating the DW does not only involve extracting operational data and entering it into the warehouse (Figure 4.CHAPTER 4 DESIGNING A DATA WAREHOUSE Designing a warehouse means to complete all the requirements mentioned in section 2.3 and obviously is a complicated process. 4. Figur e 4. These results in data redundancy. same data may exist in other applications with same meaning. and attempting to scan all of it every time a DW load needs to be done is resource and time consuming and unrealistic. Figur e 4. Figur e 4. with different name or with different measure ( Figure 4.3 ).Pulling the data into the DW without integrating it is a big mistake ( Figure 4.2 Data Integr ation Existing applications were designed with their own requirements and integration with other applications was not concerned much.e. The existing systems environment holds gigabytes and perhaps terabytes of data. Three types of data are loaded into the DW from the operational system:   Archival data Data currently contained in the operational environment 17 . differ ent usage Another problem is the performance of accessing existing systems’ data.3 Same data.2 ). i. Another problem when passing data is the need to manage the volume of data that resides in and passes into the warehouse. Volume of data in the DW will grow fast. Scan a log file or an audit file created by the transaction processing system. However when the data is loaded into the warehouse. after that it may be updated. Modify application code. The data model applies to both the operational environment and the DW environment. Scan a 'delta' file. so a time element must be attached to it. Changes to the DW environment from the updates that have occurred in the operational system since the last refresh Five common techniques are used to limit the amount of operational data scanned to refresh the DW.2.      Scan data that has been timestamped in the operational environment. A process model consists:        Functional decomposition Context-level zero diagram Data flow diagram Structure chart State transition diagram Hierarchical input process output(HIPO) chart Pseudo code 18 . Another difficulty is that operational data must undergo a time-basis shift as it passes into the DW. it cannot be updated anymore. A delta file contains only the changes made to an application as a result of the transactions that have run through the operational environment. Rubbing a 'before' and an 'after' image of the operational file together. Data/Pr ocess Models The process model applies only to the operational environment. 4. The operational data’s accuracy is valid at the instant it is accessed. A log file contains the same data as a delta file. and low-level modeling (called the physical model). A final design activity in transforming the corporate data model to the data warehouse data model is to perform “stability” analysis.1. for instance. Next. and only direct relationships are indicated. Performance factors are added into the corporate data model as the model is transported to the existing systems environment. Derived data is added to the corporate data model where the derived data is publicly used and calculated once. Finally. data relationships in the operational environment are turned into “artifacts” in the DW. entity relationship level). it is not suitable for the DW. when building the data mart. The direction and number of the arrowheads indicate the cardinality of the relationship. An overall corporate data model has been constructed with no regard for a distinction between existing operational systems and the DW. 4. Stability analysis involves grouping attributes of data together based on their tendency for change. The data model is applicable to both the existing systems environment and the DW environment. The process model is requirements-based. The DW Data Model There are three levels in data modeling process: high-level modeling (called the ERD. the key structures of the corporate data model are enhanced with an element of time. more changes are made to the corporate data model to use in DW environment. Relationships among entities are depicted with arrows. or DIS). The corporate data model focuses on only primitive data.3. 19 . 4. The name of the entity is surrounded by an oval. High-Level Modeling The high level of modeling features entities and relationships.3. not repeatedly. data that is used purely in the operational environment is removed.A process model is invaluable. First. midlevel modeling (called the data item set. Although few changes are made to the corporate data model for operational environment. Figur e 4.Figur e 4. Separate high-level data models have been created for different communities within the corporation. they make up the corporate ERD. Collectively.4) are at the highest level of abstraction. The corporate ERD as shown in Figure 4.5 Cor p or ate ERD cr eated b y depar tmental ERDs 20 .4 A Simple ERD for a manufactur ing envir onment The entities that are shown in the ERD level (see Figure 4.5 is formed of many individual ERDs that reflect the different views of people across the corporation. the next level is established—the midlevel model or the DIS. For each major subject area. Mid-Level Modeling After the high-level data model is created.6) Figur e 4.7 Midlevel model members 21 .2. suggesting the relationships of data between major subject areas “Type of” data Figur e 4. Each area is subsequently developed into its own midlevel model (see Figure 4. or entity. a midlevel model is created.3.7):     A primary grouping of data A secondary grouping of data A connector.6 Relationship between ERD and DIS Four basic constructs are found at the midlevel model (also shown in Figure 4. identified in the highlevel data model.4. and only once. 22 . The grouping of data to the right is the subtype of data. There may be as many secondary groupings as there are distinct groups of data that can occur multiple times. The secondary grouping holds data attributes that can exist multiple times for each major subject area. Figure 4. A sample model is drawn in Figure 4. The connector relates data from one grouping to another. the primary grouping contains attributes and keys for each major subject area. the corporate DIS is created from multiple DISs. The fourth construct in the data model is “type of” data. The third construct is the connector.8 A Midlevel model sample Like the corporate ERD that is created from different ERDs reflecting the community of users. As with all groupings of data. A relationship identified at the ERD level results in an acknowledgement at the DIS level. The grouping of data to the left is the supertype. Figur e 4. These four data modeling constructs are used to identify the attributes of data in a data model and the relationship among those attributes. for each major subject area.8 below.The primary grouping exists once. The convention used to indicate a connector is an underlining of a foreign key. “Type of” data is indicated by a line leading to the right of a grouping of data.9 shows a sample corporate DIS formed by many departments DISs. This grouping is indicated by a line drawn downward from the primary grouping of data. When a relationship is identified at the ERD level. it is manifested by a pair of connector relationships at the DIS level. It holds attributes that exist only once for each major subject area. Figur e 4.Figur e 4. Low-Level Modeling The physical data model is created from the midlevel data model just by extending the midlevel data model to include keys and physical characteristics of the model. At this point. sometimes called 23 .3. Figure 4.3.9 Cor p or ate DIS for med by depar tmental DISs.10 An example of a depar tmental DIS 4.10 shows an individual department’s DIS. the physical data model looks like a series of tables. 37. The job of the DW designer is to organize data physically for the return of the maximum number of records from the execution of a physical I/O. 24 . Figure 4. 36. logical design deals with concepts related to a certain kind of DBMS. conceptual design manages concepts that are close to the way users perceive data. With the DW. After granularity and partitioning are factored in. 40].relational tables. Physical I/O is the activity that brings data into the computer from storage or sends data to storage from the computer. the first step in doing so is deciding on the granularity and partitioning of the data. 4. physical design depends on the specific DBMS and describes how data is actually stored [35. a variety of other physical design activities are embedded into the design. At the heart of the physical design considerations is the usage of physical input/output (I/O). 38] three different design phases are distinguished. Adopting the terminology of [23.4. Figur e 4. This frees the designer to use physical design techniques that otherwise would not be acceptable if it were regularly updated.11 Consider ations in low-level modeling There is another mitigating factor regarding physical placement of data in the data warehouse: Data in the warehouse normally is not updated. Database Design Methodology for DW In the next few sections of this thesis I will be discussing both conceptual and logical design methods of data warehousing.11 illustrate the major considerations in low-level modeling. Only when the grain for the fact table is chosen can we identify the dimensions of the fact table. Identifying and conforming the dimensions: Dimensions set the context for asking questions about the facts in the fact table. I also prefer to summarize the methodology proposed by Kimball [21]. Choosing the facts : The grain of the fact table determines which facts can be used in the data mart. the basic concepts of dimensional modeling should be mentioned which are: facts. The first data mart to be built should be the one that is most likely to be delivered on time with in budget and to answer the most important business question. consisting of measures and context data. All the facts must be expressed at the level implied by the grain. The facts should be numeric and additive. Choosing the grain: This means deciding exactly what a fact table record represents. 4. the dimension is referred to as being conformed. 3. Choosing the process: The process (function) refers to the subject matter of a particular data mart. 2. The grain decision for the fact table also determines the grain of each of the dimension tables. A well-built set of dimensions makes the data mart understandable and easy to use. A dimension is a collection of data that describe one business dimension. 42. 24].Prior to beginning the discussion.   A fact is a collection of related data items. they are the parameters over which we want to perform OLAP. It typically represents business items or business transactions. dimensions and measures [7. 43]: 1. When a dimension is used in more than one data mart. 25 . Additional facts can be added to a fact table at any time provided they are consistent with the grain of the table. A poorly presented or incomplete set of dimensions will reduce the usefulness of a data mart to an enterprise.  A measure is a numeric attribute of a fact. Dimensions determine the contextual background for the facts. Before this discussion. representing the performance or behavior of the business relative to the dimensions. The nine step methodology by Kimball is as follows[6. who is accepted as a guru on data warehousing and whose studies have encouraged many academicians on the study of data warehousing. The text descriptions should be as intuitive and understandable to the users. 26 . Very large fact tables raise at least two very significant DW design issues. Tracking slowly changing dimensions: There are three basic types of slowly changing dimensions: o Type1: where a changed dimension attribute is overwritten. Deciding the query priorities and the query modes: We consider physical design issues. the more likely there will be more problems in reading and interpreting the old files or the old tapes. First. and security. not the most current versions. The older data. indexing performance. Second.5. it is often increasingly difficult to source increasingly old data. backup. There is requirement to look at the same time period a year or two earlier. Rounding out the dimension tables : We return to the dimension tables and add as much text description to the dimensions. 8. This is known as the ‘slowly changing dimension’ problem.We have a design for data mart that supports the requirements of a particular business process and also allows the easy integration with other related data marts to ultimately form the enterprise-wide DW. 9. There are additional physical design issues affecting administration. The usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables. it is mandatory that the old versions of the important dimensions be used. Storing pre-calculations in the fact table : Once the facts have been selected each should be re-examined to determine whether there are opportunities to use precalculations. Choosing the duration of the database: The duration measures how far back in time the fact table goes. 7. o Type2: where a changed dimension attribute causes a new dimension record to be created. 6. o Type3: a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record. The most critical physical design issues affecting the end-user’s perception of the data mart are physical sort order of the fact table on disk and the presence of pre-stored summaries or aggregations.   Record the associations between objects and facts: Facts are connected to objects.1. The Dimensional Fact Model This model is built from ER schemas [9. Additionally there are three special types of associations. dimensions and hierarchies. complete. optional dimension attributes and non-dimension attributes’ existence may also be represented on fact schemas. Distinguish dimensions and categorize them into hierarchies: dimensions governed by associations of type membership forming hierarchies that specify different granularities. 16. Fact attributes’ additivity. 4. membership (showing that an object is a member of another higher object class with the same characteristics and behavior).5. abstract design based on the user requirements [34]. aggregation (showing objects as parts of a layer object). specialization/generalization (showing objects as subclasses of other objects). Conceptual Design Models The main goal of conceptual design modeling is developing a formal. 17. Compatible fact schemas may be overlapped in order to relate and compare data. Connect the dimension to facts: Time is always associated to a fact. Represent objects and capture their properties with the associations among them: Object properties (summary properties) can be numeric. Complete membership (or not) (all members belong to one higher object class and that object class is consisted by those members only). The fact is represented by a box which reports the fact name. 33]. The Dimensional Fact (DF) Model is a collection of tree structured fact schemas whose elements are facts.4. At this phase of a DW there is the need to:    Represent facts and their properties: Facts properties are usually numerical and can be summarized (aggregated). 27 . attributes. 15. A fact schema is structured as a tree whose root is a fact. Strict membership (or not) (all members belong to only one higher object class).5. o Building the attribute tree. For each fact. Most attributes are additive along all dimensions.Figur e 4. is connected to it by a -to-one relationship and cannot be used for aggregation. The non-dimension attributes (address attribute as shown in Figure 4. (Each vertex corresponds to an attribute of the schema. non-additive if it is additive along no dimension. A fact attribute is called semi-additive if it is not additive along one or more dimensions. A fact expresses a many-to-many relationship among the dimensions. A non-dimension attribute contains additional information about an attribute of the hierarchy. identifier (F) denotes their 28 . DF model consists of 5 steps. for each vertex v. This means that the sum operator can be used to aggregate attribute values along all hierarchies. The arcs represented by dashes express optional relationships between pairs of attributes.12 A dimensional fact schema sample Sub-trees rooted in dimensions are hierarchies. the root corresponds to the identifier of F. one value for each fact attribute. If F is identified by the combination of two or more attributes. The circles represent the attributes and the arcs represent relationship between attribute pairs. Each combination of values of the dimensions defines a fact instance. the corresponding attribute functionally determines all the attributes corresponding to the descendants of v.12) are represented by lines instead of circles.   Defining facts (a fact may be represented on the E/R schema either by an entity F or by an n-ary relationships between entities E1 to En). o Defining dimensions (The dimensions must be chosen in the attribute tree among the children vertices of the root. it can be inserted into the attribute tree. Pruning is carried out by dropping any subtree from the tree.concatenation. the attribute tree may be pruned and grafted in order to eliminate the unnecessary levels of detail. time is explicitly represented as an E/R attribute and thus it is an obvious candidate to define a dimension. Grafting is used when its descendants must be preserved. Most n-ary relationships have maximum multiplicity greater than 1 on all their branches. old versions of data varying over time are continuously replaced by new versions. x-to-many relationships cannot be inserted into the attribute tree.) o Pruning and grafting the attribute tree (Not all of the attributes represented in the attribute tree are interesting for the DW. for instance by a star schema. an n-ary relationship is equivalent to n binary relationships. Time is not 29 . It is worth adding some further notes: It is useful to emphasize on the fact schema the existence of optional relationships between attributes in a hierarchy. Optional relationships or optional attributes of the E/R schema should be marked by a dash. would be impossible without violating the first normal form. they determine n one-to-many binary relationships which cannot be inserted into the attribute tree. E/R schemas can be classified as snapshot and temporal. A temporal schema describes the evolution of the application domain over a range of time. Generalization hierarchies in the E/R schema are equivalent to one-to-one relationships between the super-entity and each sub-entity. Thus. hence. it will be impossible to use them to aggregate data. The attributes dropped will not be included in the fact schema. old versions of data are explicitly represented and stored. hence. A one-to-one relationship can be thought of as a particular kind of many-to-one relationship. When designing a DW from a temporal schema. representing these relationships at the logical level. A snapshot schema describes the current state of the application domain.). In fact. ). this model should be easy to learn and use for an experienced ER Modeler. if the only information to be recorded is the occurrence of the fact. the specialization should be powerful enough to express the basic multidimensional aspects.2. the attributes which should not be used for aggregation but only for informative purposes may be identified as non-dimension attributes. There are some specializations:   A special entity set: dimension level Two special relationship sets connecting dimension levels: o a special n-ary relationship set: the ‘fact’ relationship set 30 . It is still possible to prune and graft the tree in order to eliminate irrelevant details. 4. o Defining fact attributes (Fact attributes are typically either counts of the number of instances of F. o Defining hierarchies (Along each hierarchy. Multidimensional E/R (ME/R) model includes some key considerations [14]:    Specialization of the ER Model Minimal extension of the ER Model. This model allows the generalization concepts.explicitly represented however. namely the qualifying and quantifying data and the hierarchical structure of the qualifying data. There are few additional elements. Representation of the multidimensional aspects. or the sum/average/maximum/minimum of expressions involving numerical attributes of the attribute tree.). During this phase. A fact may have no attributes. despite the minimality.5. Multidimensional E/R Model It is argued that ER approach is not suited for multidimensional conceptual modeling because the semantics of the main characteristics of the model cannot be effectively represented. attributes must be arranged into a tree such that an x-to-one relationship holds between each node and its descendants. It is also possible to add new levels of aggregation by defining ranges for numerical attributes. should be added as a dimension to the fact schema). This allows a different attribute structure for each dimension level. It connects n different dimension level entities. it relates a dimension level A to a dimension level B representing concepts of a higher level of abstraction (city roll-up to country).13 . The ME/R model does not contain an explicit counterpart for this idea.13 The gr aphical notation of ME/R elements Individual characteristics of ME/R model may be summarized as follows. The rolls-up relationship sets define a directed acyclic graph on the dimension levels. Figur e 4. 31 . The fact relationship set models the natural separation of qualifying and quantifying data. This model uses a special graphical notation which a sample notation is shown in Figure 4. The information which dimension-levels belong to a given dimension is included implicitly within the structure of the rolls-up graph.o a special binary relationship set: the ‘roll-up to’ relationship set The ‘roll-up to’ relationship set. alternative paths and shared hierarchy levels for different dimensions. This enables the easy modeling of multiple hierarchies. The ‘fact’ relationship set is a specialization of a general n-ary relationship set. The attributes of the fact relationship set model the measures of the fact while dimension levels model the quantifying data.  The hierarchical classification structure of the dimensions is expressed by dimension level entity sets and the roll-up relationships. Dimension level attributes are modeled as attributes of dimension level entity sets. Thus no redundant modeling of the shared levels is necessary.  A central element in the multidimensional model is the concept of dimensions that span the multidimensional space. This is not necessary because a dimension consists of a set of dimension levels. Therefore levels of different dimensions may roll up to a common parent level. Remarkably the schema also contains information about the granularity level on which the dimensions are shared. 32 . Like the E/R model the ME/R model captures the static structure of the application domain.  Concerning measures and their structure. ME/R and ER models notations can be used together. An orthogonal functional model should capture these dependencies. Figure 4. the ME/R and ER model notations can be used together as illustrated in Figure 4. The semantic information that some of the measures are derived cannot be included in the model. The calculation of measures is functional information and should not be included in the static model. This information can be used to avoid redundancies.14 Multiple cubes shar ing dimensions on differ ent levels As mentioned above. Figur e 4.15.   This model is used ‘is a’ relationship. By modeling the multidimensional cube as a relationship set it is possible to include an arbitrary number of facts in the schema thus representing a ‘multicube model’.14 shows multiple cubes that share dimensions on different levels.  Schema contains rolls-up relationship between entities. the ME/R model allows record structured measures as multiple attributes for one fact relationship set. It is represented as a rectangle.5.16 shows the notation for relationship set types.Figur e 4.16 Notation used in star ER 33 . relationships and attributes. oneto-many. This model has the following constructs:    Fact set: represents a set of real world facts sharing the same characteristics or properties. Relationship sets among entity sets can be type of specialization/generalization. Figure 4. Its cardinality can be many-to-many. It is always associated with time. It is represented as a circle. many-to-one. Relationship set: represents a set of associations among entity sets or among entity sets and fact sets. Figur e 4.3. star ER This model combines star structure with constructs of ER model [13]. aggregation and membership.15 Combining ME/R notations with E/R 4. The starER contains facts. entities. Entity set: represents a set of real world objects with similar properties. It is represented as a diamond. Fact properties can be of type stock (S) (the state of something at a specific point in time).  DF requires only a rather straight forward transformation to fact and dimension tables. aggregation. The following criteria are satisfied by the starER schema. membership) and represent more information. relationship sets. It is represented as an oval. This is an advantage of DF Schema. (specialization/generalization. but not in the form of a dimension are allowed in the starER.   Object participating in the data warehouse. 34 . fact sets.  Attribute: static properties of entity sets. flow (F) (the commutative effect over a period of time for some parameter in the DW environment and which is always summarized) or valueper-unit (V) (measured for a fixed-time and the resulted measures are not summarized). since well-known rules of how to transform an ER Schema (Which is the basic structural difference between the two approaches) to relations do exist. But this is not a drawback for the starER model. which allows for better understanding of the involved information.           Explicit hierarchies in dimensions Symmetric treatment of dimensions and summary attributes (properties) Multiple hierarchies in each dimension Support for correct summary or aggregation Support of non-strict hierarchies Support of many-to-many relationships between facts and dimensions Handling different levels of granularity at summary properties Handling uncertainty Handling change and time Relationships between dimensions and facts in starER aren’t only many-to-one. but also many-to-many. Specialized relationships on dimensions are permitted There following list shows the main differences between DF Schema and starER model. 4. Fact classes cardinality is defined as * to indicate that a dimension object can be part of one. Derived measures are placed in fact class by notation after “/”. Derivation rules appear between braces. Fact classes are considered as composite classes in a sharedaggregation relationship of n-dimension classes. starER model combines the powerful constructs of the ER model with the star schema. 19]. OOMD modeling approach is based on UML. dimensions and facts are represented by dimension classes and fact classes [18. Ob ject-Or iented Multidimensional Model (OOMD) Unified Modeling Language (UML) has been widely accepted as a standard objectoriented modeling language for software design.. The minimum cardinality of dimension classes is defined as 1 to indicate that a fact object is always related to object instances from all dimensions. In this way.5.17 A sample DW model using star ER 4. Identifying attribute can be defined in fact classes by notation after {OID} (Object Identifier). All measures are additive (Sum operator can be applied to aggregate measure values along all dimensions). In OOMD model.* cardinality on the dimension class.17. For dimensions. A sample DW model using starER is illustrated in Figure 4. many-to-many relationships between facts and particular dimensions are represented by indicating 1. Figur e 4. zero or more fact object instances. every classification hierarchy level is specified by a class 35 . Descriptor attribute ({D}) define in every class that represents a classification hierarchy level. Measures area. contains the measures to be analyzed. Completeness means that all members belong to one higher-class object and that object consists of those members only. a working prototype should be created for the end-user.6. the source databases.(base class). contains the dimensions and their grouping conditions to address the analysis. At the end of logical design phase. Logical Design Models DW logical design involves the definition of structures that enable an efficient access to information. OOMD approach uses a generalization-specialization relationship to categorize entities that contain subtypes. making more compatible logical data representation with OLAP data management. and non functional (mainly performance) requirements.      Head area. cover the OLAP operations for a further data analysis phase. The designer builds multidimensional structures considering the conceptual schema representing the information requirements. 4. data loading processes. Cube classes contain. Dimensional models represent data with a “cube” structure. contains the constraints to be satisfied. These classes must define DAG (Directed Acyclic Graph) rooted in the dimension class. This phase also includes specifications for data extraction tools. The DAG structure can represent both alternative path and multiple classification hierarchies. contains the cube class’s name. and warehouse access methods. Slice area. Dice area. Cube operations. The objectives of dimensional modeling are [10]: 36 . Cube classes represent initial user requirements as the starting point for subsequent dataanalysis phase. Strictness means that an object at a hierarchy’s lower level belongs to only one higher level object. An association of classes specifies the relationships between two levels of a classification hierarchy. 1. Dimensional model composed of one table with a composite primary key. not natural keys. Simple queries require multiple table joins and complex subqueries. Normalized databases have some characteristics that are appropriate for OLTP systems. This maximizes efficiency of updates. Each dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. all natural keys are replaced with surrogate keys. To maximize the efficiency of queries.  To produce database structures that are easy for end-users to understand and write queries against. Another important feature. but tends to penalize retrievals. Each surrogate key should have a generalized structure based on simple integers. This data model is used by OLTP systems. It achieves these objectives by minimizing the number of tables and relationships between them. called fact table. The use of surrogate keys allows the data in the DW to have some independence from the data used and produced by the OLTP systems.6.  Data redundancy is minimized. Data redundancy is not a problem in DWs because data is not updated on-line. It contains no redundancy. usually end-users interact with the database through a layer of software. In OLTP systems this is not a problem because. but not for DWs [7]:  Its structure is not easy for end-users to understand and use. It is suitable for technical specialist. Dimensionality modeling uses the ER Modeling with some important restrictions. This characteristic structure is called star schema or star join . but high efficiency of updates. Dimensional Model Design This section describes a method for developing a dimensional model from an Entity Relationship model [12]. and a set of smaller tables called dimension tables . shows all data and relationships between them. 37 . 4. This means that every join between fact and dimension tables is based on surrogate keys. o Collapse Hierarchy: Higher level entities can be collapsed into lower level entities within hierarchies. etc. “how” and “why” of event (customer. product. Classify Entities: For producing a dimensional model from ER model.). They construct fact tables in star schema. o Classification Entities : These entities are related with component entities by a chain of one-to-many relationship. An entity is called maximal if it has no many-to-one relationship. There are some characteristics. They construct dimension tables in star schema. This increases redundancy in the form of a transitive 38 . o Transaction Entities: These entities are the most important entities in a DW.  Produce Dimensional Models: There are two operators to produce dimensional models from ER. volumes) o Component Entities: These entities are directly related with a transaction entity with a one-to-many relationship. They define the details or components of each transaction. They are functionally dependent on a component entity. A hierarchy is called maximal if it cannot be extended upwards or downwards by including another entity.   It describes an event that occurs at a point in time.) that decision makers want to understand and analyze.  Identify Hierarchies: Most dimension tables in star schema include embedded hierarchies. An entity is called minimal if it has no one-to-many relationship. “when”. They have highest precedence. “what”. Collapsing a hierarchy is a form of denormalization. It contains measurements or quantities that may be summarized (sales amount. Time is an important component of any transaction. These entities record details about particular events (orders. which may be collapsed in to component entity to form dimension tables in star schema. These entities represent hierarchies embedded in the data model. They answer the “who”. first classify the entities into three categories. etc. payments. They have lowest precedence. “where”. period. Figure 4. This is formed by collapsing all entities in the data model down into the minimal entities. We end up with one table for each minimal entity in the original data model [12]. There are 8 models used in dimensional modeling [6. When we collapse numerical amounts from higher level transaction entities in to other they will be repeated. It contains some problems. 12]:         Flat Schema Terraced Schema Star Schema Fact Constellation Schema Galaxy Schema Snowflake Schema Star Cluster Schema Starflake Schema 4. the complexity of each table (element complexity) is increased. We can continue doing this until we reach the bottom of the hierarchy and end up with a single table.dependency. It contains redundancy. Flat Schema This schema is the simplest schema. This structure does not lose information from the original data model.2. Therefore while the number of tables (system complexity) is minimized. 39 . Second this schema contains large number of attributes. first it may lead to aggregation errors when there are hierarchical relationships between transaction entities. in the form of transitive and partial dependencies.18 shows a sample flat schema.6. but does not involve any aggregation. which is a violation to 3NF. o Aggregation: This operator can be applied to a transaction entity to create a new entity containing summarized data. This minimizes the number of tables in the database and joins in the queries. 18 Flat Schema 4. It causes some problems for inexperienced user.3. The Figure 4.Figur e 4. Ter r aced Schema This schema is formed by collapsing entities down maximal hierarchies. 40 .19 illustrates a sample terraced schema. because the separation between levels of transaction entities is explicitly shown [12]. end with when they reach a transaction entity.6. This results in a single table for each transaction entity in the data model. 4. thousands. It has one fact table and a set of smaller dimension tables arranged around the fact table. Each star schema is formed in the following way. It contains measurements which may be aggregated in various ways [10. The fact data will not change over time. millions of records at a time and aggregate them.19 Ter r aced Schem a 4. The fact table is linked to all the dimension tables by one to many relationships. Dimension tables contain descriptive textual information.Figur e 4. 39]. Star Schema It is the basic structure for a dimensional model. The most useful fact tables are numeric and additive because data warehouse applications almost never access a single record. They generally consist of embedded hierarchies. They access hundreds. 41 . Dimension attributes are used as the constraints in the data warehouse queries. Dimension tables provide the basis for aggregating the measurements in the fact table.6. 12.  Numerical attributes within transaction entities should be aggregated by key attributes (dimensions).20 shows a sample star schema. The advantage of using this schema. The key of the table is the combination of the keys of its associated component entities. The Figure 4.20 Star Schema 42 . Where hierarchical relationships exist between transaction entities. Star schemas can be used to speed up query performance by denormalizing reference information into a single dimension table. The aggregation attributes and functions used depend on the application. A dimension table is formed for each component entity. it reduces the number of tables in the database and the number of relationships between them and also the number of joins required in user queries. Denormalization is not appropriate where the additional data is not accessed very often. avoiding the overhead of having to join additional tables to access those attributes. Figur e 4. This provides the ability to “drill down” between transaction levels. because the overhead of scanning the expanded dimension table may not be offset by gain in the query performance.   A fact table is formed for each transaction entity. Denormalization is appropriate when there are a number of entities related to the dimension table that are often accessed. by collapsing hierarchically related classification entities into it. the child entity inherits all dimensions (and key attributes) from the parent entity. 22.4. Figure 4. The links between the various fact tables provide the ability to “drill down” between levels of detail [10. Fact Constellation Schem a A fact constellation schema consists of a set of star schemas with hierarchically linked fact tables.6. Figur e 4. The following figure.5.21.6. 43 .21 Fact Constellation Schema 4. illustrates a sample of a galaxy schema.6. Unlike a fact constellation schema. Figure 4. illustrates a sample of a fact constellation schema. 12]. Galaxy Schema Galaxy schema is a schema where multiple fact tables share dimension tables. the fact tables in a galaxy do not need to be directly related [12]. The following figure. Each dimension table may contain multiple independent hierarchies. forming a hierarchy.7.Figur e 4.22 Galaxy Schema 4. A snowflake schema can be produced by the following procedure:  A fact table is formed for each transaction entity.6. hierarchies in the original data model are collapsed or denormalized to form dimension tables. 12]. Snowflake Schem a In a star schema. The decomposed snowflake structure visualizes the hierarchical structure of dimensions very well. 44 . A snowflake schema is a variant of star schema with all hierarchies explicitly shown and dimension tables do not contain denormalized data [10. The key of the table is the combination of the keys of the associated component entities. The many-to-one relationships among sets of attributes of a dimension can separate new dimension tables. the child entity inherits all relationships to component entities (and key attributes) from the parent entity. So.23. A fork occurs when an entity acts as a parent in two different dimensional hierarchies.23 Snowflake Schema 4. Overlapping dimensions can be identified as forks in hierarchies. The following figure. which leads to redundancy. which adds complexity to the schema and requires extra joins. In Figure 4. The attributes and functions used depend on the application.  Each component entity becomes a dimension table. the best solution may be a balance between these two schemas [12]. Star Cluster Schema While snowflake contains fully expanded hierarchies. star schema contains fully collapsed hierarchies. Fork entities can be identified as classification entities with multiple one-to-many relationships.24. 45 . illustrates a sample of a snowflake schema.  Numerical attributes within transaction entities should be aggregated by the key attributes. Figur e 4. Figure 4.8. Region is parent of both Location and Customer entities and the fork occurs at the Region entity.6. Where hierarchical relationships exist between transaction entities. the child entity should inherit all dimensions (and key attributes) from the parent entity. 46 . Collapsing should begin again after the fork entity. The sub dimension table will consist of the fork entity plus all its ancestors. When a component entity is reached. A star cluster schema has the minimal number of tables while avoiding overlap between dimensions. If a fork is reached. The key of the table is the combination of the keys of the associated component entities. a dimension table should be formed.   Where hierarchical relationships exist between transaction entities. A star cluster schema can be produced by the following procedure:   A fact table is formed for each transaction entity. Numerical attributes within transaction entities should be aggregated by the key attributes (dimensions).Figur e 4. The attributes and functions used depend on the application.25 illustrates a sample diagram of star cluster schema. Classification entities should be collapsed down their hierarchies until they reach either a fork entity or a component entity.24 Star Schema with “fork” A star cluster schema is a star schema which is selectively “snowflaked” to separate out hierarchical segments or sub dimensions which are shared between different dimensions. a sub dimension table should be formed. The Figure 4. The most appropriate database schemas use a mixture of denormalized star and normalized snowflake schemas [6.Figur e 4. Figur e 4.25 Star Cluster Schema 4. Star flake Schema Starflake schema is a hybrid structure that contains a mixture of star and snowflake schemas.26 Star flake Schem a 47 .26 illustrates a sample diagram of starflake schema.9. 41]. The Figure 4.6. each cell of the cube hold one value. The star schema can adapt to changes in the user requirements. Figur e 4. A cube defines a set of related dimensions. the consistency of the database structure allows more efficient access to the data by various tools including report writers and query tools. the dimensional model is extensible. Cube Cubes are the logical storage structures for OLAP databases.Whether the schema star.27.   Ability to model common business situations Predictable query processing The following figure. as all dimensions are equivalent in terms of providing access to the fact table. Ability to handle changing requirements. adding new dimensional attributes. shows a comparison of the logical design methods in complexity versus redundancy trade-off. It must support adding new dimensions. Figure 4.  Extensibility.10. breaking existing dimension records down to lower level of granularity from a certain point in time forward. the value of each cell is an 48 . snowflake or starflake.6.   Efficiency.27 Compar ison of schemas 4. the predictable and standard form of the underlying dimensional model offers important advantages within a DW environment including. 1 2-dimensional pivot view of an OLAP Table A 3-dimensional view of an OLAP table is given in Table 4.28. 49 .1.2 3-dimensional pivot view of an OLAP Table A 3-D realization of the cube shown in Table 4.intersection of the dimensions.2 is illustrated in Figure 4.2. Table 4. Table 4. A 2-dimensional view of an OLAP table is given in Table 4. 50 . state and product. users can zoom out to see a summarized level of data. 23.  Dice: Slicing defines a sub cube by performing a selection on two or more dimensions of a cube. Slicing cuts through the cube so that users can focus on more specific perspectives. Each dimension enables you to perform specific OLAP operations on the cube.  Drill down: An operation for moving down the hierarchy level and stepping down the hierarchy.28 3-D Realization of a Cube The cube has three dimensions which are time. The basic OLAP operations are as follows [10. users can navigate to higher levels of detail. Using roll up capability. Drill down operation is the reverse of roll up operation.Figur e 4. Roll up operation is also called the drill up operation. Using drill down capability.  Slice: Slicing performs a selection on one dimension of a cube and results in a sub cube. 24]:  Roll up: An operation for moving up the hierarchy level and grouping into larger units along a dimension. Pivoting is a visualization operation which rotates the data axes in view in order to provide an alternative presentation of the data.29. Pivot: Pivot operation is also called rotate operation. A detailed figure describing the operations above is illustrated below in Figure 4. 51 . Figur e 4.29 Oper ations on a Cube 52 . 4.7. Meta Data An important component of the DW environment is meta data. Meta data, or data about data, provides the most effective use of the DW. Meta data allows the end user/DSS analyst to navigate through the possibilities. In other words, when a user approaches a data warehouse where there is no meta data, the user does not know where to begin the analysis. Meta data acts like an index to the data warehouse contents. It sits above the warehouse and keeps track of what is where in the warehouse. Typically, items the meta data store tracks are as follows [6]:       Structure of data as known to the programmer and to the DSS analyst Source data Transformation of data Data model DW History of extracts Metadata has several functions within the DW that relates to the processes associated with data transformation and loading, DW management and query generation. The metadata associated with data transformation and loading must describe the source data and any changes that were made to the data. The metadata associated with data management describes the data as it is stored in the DW. Every object in the database needs to be described including the data in each table, index and view and any associated constraints. The metadata is also required by the query manager to generate appropriate queries. 4.8. Mater ialized views They address the problem of selecting a set of views to materialize in a DW taking into account [7]:   the space allocated for materialization the ability of answering a set of queries (defined against the source relations) using exclusively these views 53  the combined query evaluation and view maintenance cost In this proposal they define a graph based on states and state transitions. They define a state as a set of views plus a set of queries, containing an associated cost. Transitions are generated when views or queries are changed. They demonstrate that there is always a path from an initial state to the minimal cost state. 4.9. OLAP Ser ver Ar chitectur es Logically, OLAP engines present business users with multidimensional data from data warehouses or data marts, without concerns regarding how or where the data are stored. However, the physical architecture and implementation of OLAP engines must consider data storage issues. Implementations of a warehouse server engine for OLAP processing include [10]: Relation al OLAP (ROLAP) servers: These are the intermediate servers that stand in between a relational back-end server and client front-end tools. They use a relational or extended-RDBMS to store and manage warehouse data, and OLAP middleware to support missing pieces. ROLAP servers include optimization for each DBMS back-end, implementation of aggregation navigation logic, and additional tools and services. ROLAP technology tends to have greater scalability than MOLAP technology. Multidimensional OLAP (MOLAP) servers: These servers support multidimensional views of data through array-based multidimensional storage engines. They map multidimensional views directly to data cube array structures. The advantage of using a data cube is that it allows fast indexing to precomputed summarized data. Notice that with multidimensional data stores, the storage utilization may be low if the data set is sparse. In such cases, sparse matrix compression techniques should be explored. Many OLAP servers adopt a two-level storage representation to handle sparse and dense data sets: the dense subcubes are identified and stored as array structures, while the sparse subcubes employ compression technology for efficient storage utilization. Hybr id OLAP (HOLAP) ser vers: The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. For example, a HOLAP server may allow large volumes of 54 detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store. 55 The dimensional model is a standard framework. All existing fact and dimension tables can be changed in place without having to reload data. It is also hard to query ER models because of the complexity. End user tools can make strong assumptions about the dimensional model to make user interfaces more user friendly and to make processing more efficient [20]. All dimensions can be thought as symmetrically equal entry points into the fact table. Compar ison of Dimensional Models and ER Models The main objective of ER modeling is to remove redundancy from data. many tables should be joined to obtain a result set. ER modeling aims to optimize performance for transaction processing. There is no easy way to enable endusers navigate through the data in ER models. The logical design can be made independent of expected query patterns. 56 . Dimensional model is more adaptable to unexpected changes in user behavior and requirements.1. Therefore ER models are not suitable for high performance retrieval of data. designers must use hundreds of entities and relations between entities. which makes ER model complex.CHAPTER 5 COMPARISON OF MULTIDIMENSIONAL DESIGN MODELS 5. Dimensional model is extensible to new design decisions and data elements. End user query and reporting tools are not affected by the change. To remove redundancy. An OO approach provides a tighter conceptual association between strategic business goals and objectives and the DDM model. Objects have types. 57 . inheritance and association. DDM approach lacks modeling business goals and processes. 5. which are basically reusable software packages. It starts with the specification of business goals. objects. it focuses mainly on data and its proper structuring to maximize query performance. However. A DDM provides a multidimensional conceptual view of data. relations between object and finally components. Although DDM is the favorite approach in data warehousing. use cases. Based on the use cases. UML has been accepted as the standard OO modeling language for describing and designing various aspects of software systems. none of them has been accepted as a standard for multidimensional modeling. Various approaches have been developed for the conceptual design of multidimensional systems in the last years to represent multidimensional structural and dynamic properties. sub processes. On the other hand. object-oriented (OO) model is much stronger in the logical and conceptual design phases. classes. Compar ison of Dimensional Models and Ob ject-Or iented Models Dimensional data modeling (DDM) is a dimensional model design methodology that for each business process.ER modeling does not involve business rules. A DDM approach is basically an approach in which tables are associated with SQL methods to support set-oriented processing of data and return result set to the caller. Using UML. Although the final logical and physical model will be a dimensional data model. it enumerates relevant measures and dimensions.2. collaboration of objects. On the other hand. methods and relations with other objects like aggregation. OO approach provides an object layer to a DW application unifying behavior and data within the object components. system actors. it involves data rules. Dimensional model involves business rules. properties (attributes). behaviors. a logical design modeled by an OO model can be mapped easily to a DDM model. then modeling of the processes and use cases. OO model allows modeling of the business process. We should consider DDM approach and OO approach as complementary to each other. 58                 DM     OO  (UML)    . Using ME/R model.2. Compar ison of Conceptual Multidimensional Models In sections 5. ER Standard notation Business rules focus Data rules focus Ability to model all business requirements in detail Specialization / Generalization Commercial case tool availability High association with business objectives Adaptability to changing requirements Table 5. DM and OO meth odologies 5.1 and 5.3. 19]. This section gives a comparison of conceptual multidimensional models according multidimensional modeling properties [13. therefore ME/R does not support this property. No functional aspect can be implemented with ME/R model. The resulting DDM model will be valid with the corresponding business requirements. 14. DM and OO methodologies according to factors.  Additivity of measures: DF. 15. the main conceptual modeling approaches are mentioned. in Table 5. DF and ME/R models do not support many-to-many relationships. I consider as the most important. In addition to the discussion above. Derived measures: None of the conceptual models include derived measures as part of their conceptual schema except OOMD model. starER and OOMD support this property. 18.1 Comp ar ison of ER.   Many-to-many relationships with dimensions: StarER and OOMD support this property.1. 17. I summarized the comparison of ER. 16. only static data structure can be captured.objects are modeled. it has support for dimension categorization. Note that specialization/generalization is a basic aspect of object-orientation. starER Additivity of measures Many-to-many with dimensions Derived measures Nonstrict and  complete            relationships  DF   ME/R   OOMD   classification hierarchies Categorization of dimensions  59 . Since OOMD model is object-oriented. Nonstrict and complete classification hierarchies: Although DF. 30]. Only ME/R model provides state diagrams to model system’s behavior and provides a basic set of OLAP operations to be applied from these user requirements.  Case tool support: All conceptual design models except starER have case tool support. Rational Rose or GOLD case tools [18. starER and ME/R can define certain attributes for classification hierarchies. the conceptual design may be implemented using Microsoft® Visio. Conceptual design using ME/R approach can be implemented using GramMi case tool [29]. OOMD can represent nonstrict and complete classification hierarchies. Since starER and ME/R models derive from ER model. these two models use is-a relationship to categorize dimensions. The Table 5. Conceptual design using DF approach can be implemented using the WAND case tool [31].  Graphic notation and specifying user requirements: All modeling techniques provide a graphical notation to help designers in conceptual modeling phase. only starER model can define exact cardinality for nonstrict and complete classification hierarchies.2 summarizes the comparison given above. With the OOMD approach.  Categorization of dimensions (specialization/generalization): DF does not support this property. OOMD provide complete set of UML diagrams to specify user requirements and help define OLAP functions. Because many queries will access large amounts of data and involve multiple join operations. In terms of usability. the snowflake schema is more reusable than star and fact constellation schemas. most optimizers recognize star schemas and can generate efficient “star join” operations. Considering reusability.2 Compar ison of conceptu al design models 5. First. star schema. a design with denormalized tables need fewer joins. a number of advantages may be considered for star schema design approach. Similarly.(specialization/generalization) Graphic notation Specifying user requirements Case tool support         MS Visio Table 5. Dimension tables in a snowflake schema do not contain 60 WAND GRAMMI GOLD. snowflake schema and the fact constellation schema are the mostly used models commercially. Compar ison of L ogical Design Models Among the logical design models. users need to execute fewer join operations which makes it easier to formulate analytic queries. In some cases where the denormalized dimension tables in star schema becomes very large. A fact constellation schema may need more join operations on fact tables. A fact constellation schema is a set of star schemas with hierarchically. Rational Rose. Second. The star schema is the simplest structure among the three schemas. It is easier to learn star schema compared to other two schemas. I want to compare these three models in terms of efficiency. Fact constellation schema and snowflake schema are more complex than star schema which is a disadvantage in terms of usability. A DW is usually a very large database. usability. I think efficiency is the most important factor in DW modeling. reusability and flexibility quality factors. Because a star schema has the fewest number of tables.4. A star schema is generally the most efficient design for two reasons. . In this section. efficiency becomes a major consideration [22]. a snowflake schema may be the most efficient design approach. a snowflake schema will require more joins on dimension tables. C#. This makes it easier to share dimension tables between snowflake schemas in a DW. These tools can generate code that may be used in the development phase in forms of VB. Discussion on Data War ehousing Design Tools There are CASE tools that enable a user to design DW architecture.5. a star schema is more flexible in adapting to changes in user requirements. Using these CASE tools. a designer can also generate databases via the tool and reverse engineer databases into model using the design model diagrams. dimension tables are denormalized and this makes it less convenient to share dimension tables between schemas. In terms of flexibility. Unfortunately.3 summarizes the comparison of the three logical design models in terms of quality factors. as all dimensions are equivalent in terms of providing access to the fact table.NET. Table 5. Star Schema Snowflake Schem a Fact Constellation Schema Efficiency Usability Reusability Flexibility High High Low High Low Low High Low Moderate Moderate Moderate Moderate Table 5.denormalized data.3 Compar ison of logical design models 5.3. Some of the CASE tools are mentioned in section 5. the development of the CASE tools on the data warehousing area is not as mature as the development in ER and software modeling areas. Very few commercially available tools may help in designing data warehousing solutions and may still not cover the requirements you need. The star schema can adapt to changes in the user requirements easier. Current commercial CASE tools have great design options to model that enable modelers to model databases and software solutions even in the enterprise level. C++ and may reverse engineer a given source code project into software models. In star schema and fact constellation schema design approaches. 61 . There are some factors in which data warehouse software developers must consider to make more use of existing software development tools. rather than just being a tool. ERWin or WarehouseArchitect may be used. Unfortunately. It should be addressed that although these CASE tools may support generating ER databases from models and reverse engineering from an ER generated database into an ER model. a designer/developer should use the existing CASE tools for the development of a DW.  Database Design: Most of the commercial CASE tools enable designing ER databases. Recently someone asked me: “What is the difference between a methodologist and a terrorist?” I answered. we should not expect significant automatic conversion of the most complex business rules into code using the present commercial CASE tools. The widely used ETL product is 62 .A CASE tool without basing the modeling notation on UML may never cover the needs of data warehousing design. “At least you can negotiate with a terrorist. These topics are summarized as follows:  Definition and declaration of business rules: Communication and documentation are very important for data warehouse business rules design. Note that. For designing a database Microsoft® Visio. these tools do not provide a complete solution for an OLAP solution. Finally. Unfortunately. Complex business rules still remain in a documentation-only form. But there are still solutions using the existing CASE tools in data warehousing arena.” “. So often I see a methodology becoming the goal of a project. a case tool for a complete data warehousing design is not available.  ETL Process: The commercially available tools for ETL process meets the requirements needed by this process. The best CASE tool meeting this requirement may be Microsoft® Visio 2003 Enterprise Edition. and probably will only be enforced by a human designer. As Kimball mentions [27] “ I am cautious about recommending methodologies and CASE tools from traditional environments for data warehouse software development. the main purpose of UML is to “represent business rules” [26]. Methodologies scare me the most. Sybase. etc. Microsoft® SQL Server 2000. Microsoft turned over the content of its proprietary repository design to the Metadata Coalition. As an example. an Microsoft® Access database. the XML web services appeared. Most analysts believe that Microsoft’s repository effort is the one most likely to succeed. Microsoft and many other vendors are actively programming their tools to read and write from the Microsoft Repository. and is accessible to any number of potentially disparate systems using ubiquitous Internet standards. such as XML and HTTP” [28] . This tool can accept any data source that has an ODBC or OLEDB provider. But.  Communication: While the nature of data warehousing requires the need of consolidating data from heterogeneous data sources. intended to support software development and data warehousing” [27]. such as application logic. 63 . In this thesis. a data source may be Microsoft® SQL Server 7. an Microsoft® Excel spreadsheet (all versions).  Metadata Repository Design: Vendors like Microsoft. the world needs a single standard instead of multiple repository definitions. “ An XML Web service is a programmable entity that provides a particular element of functionality. DB2. Web services change the nature of DWs. IBM. As XML become a standard of communication.0. and worked with a group of vendors to define the Open Information Model (OIM). Oracle.Microsoft® Data Transformation Services (DTS) which is a built-in feature of Microsoft® SQL Server 2000. DTS allows defining ETL processes on both its built-in designer or using the COM API of DTS or using scripting language. data sources may become homogenous. the ETL process of the sample DW solution is implemented using DTS and described in section 6.5. “ In 1997. a text file. Oracle have all either defined a global metadata repository format or promised the market that they will do so in the future. 1. The DW is designed using snowflake schema. The aim of this case study is to build a DW to enable analysis of the data in the OLTP database. The conceptual design is modeled using OOMD. And finally this case study illustrates a implementation of a data warehousing solution covering all phases of design. A database is designed for simulating an OLTP application.CHAPTER 6 IMPLEMENTING A DATA WAREHOUSE 6. ME/R and DF models to give example designs for all approaches.1. A Case Study This section describes a case study of implementing a DW. The ER model of the OLTP database that simulates the basis for our DW is shown in Figure 6. starER. 64 . Data Warehousing assembles components.Figur e 6. OOMD Appr oach Data warehousing has largely developed with little or no reference to objectoriented software engineering. while restricting explicit object-orientation to the various GUI. activity. web-enabling.1 ER model of sales and shipping systems 6. and GUI frontend. could get by with a pragmatic "systems integration" orientation at the global level of component interaction. analysis. The days of two-tier client/server-based data warehouses are gone now. This is consistent with (a) its development out of two-tier client/server relational database methodology. and reporting tools that comprise the data warehousing arsenal. middleware.2. 65 . limited middleware. The dynamic of the data warehousing business and its own development of new technology. data transformation. rather than software development. and (b) its character as a kind of highlevel systems integration. rather than creating them. The initial top-down centralized data warehousing model with its single database server. and with internet. intranet. I strongly believe it is more convenient to model complex software systems with OO design approach using UML. and by an increasing complexity of object relations.3 shows the statechart diagram and Figure 6. Figure 6. with design and metadata repositories. and the evolutionary development of this solution over time. Figur e 6.2 Use case diagr am of sales and shipping system 66 . I would like to introduce some sample diagrams for the sales and shipping system using the OO conceptual modeling approach.4 illustrates the static structure diagram of sales and shipping system using UML notation. Data warehouse development now requires the creation of a specific distributed object/component solution. Moreover this reality is one of physically distributed objects across an increasingly complex network architecture. The two-tier client/server paradigm has given way to a multi-tier reality characterized by an increasing diversity of objects/components. rather than the creation of a classical two-tier client/server application [25]. and extranet front-ends. with specialized application servers. with data stores of diverse type and content. Figure 6. Respectively.2 illustrates the use case diagram.has caused centralized data warehouses to be increasingly supplemented with data marts. before going to implementation details. So. A use case is a set of events that occurs when an actor uses a system to complete a process.Use case diagrams are used to describe real-world activities.3 Statechar t diagr am of sales and shipping system Statechart diagrams are used to show the sequence of states an object goes through during its life.4 Static str uctur e diagr am of sales and shipping system 67 . Normally. a use case is a relatively large process. Figur e 6. Figur e 6. 3. “Sales” and “Shipping” classes form the fact tables. The items respresented as ovals are attributes. This diagram can easily be mapped to MD model. Figure 6.Static structure diagrams are used to create conceptual diagrams that represent concepts from the real world and the relationships between them.     The items represented as circles are fact sets. The following list describes the items in the starER model.4 forms the basis of the MD model. “Region”. 6. “Shipper”. 68 .5 illustrates the sales subsystem starER model. The items represented as diamonds are relationship sets. star ER Appr oach In this section. One package can contain subordinate packages. Collaboration diagrams are used to show relationships among object roles such as the set of messages exchanged among the objects to achieve an operation or result. UML modeling has also the following diagrams that give modeler great flexibility and helps understandability during the conceptual and logical design phases. Static structure diagram in Figure 6. or single elements. diagrams. the conceptual model is designed using starER approach. “Product”. Package diagrams are used to group related elements in a system. “Employee”. Component diagrams are used to partition a system into cohesive components and show the structure of the code itself. “State”. or class diagrams that decompose a software system into its parts. Sequence diagrams are used to show the actors or objects participating in an interaction and the events they generate arranged in a time sequence. Deployment diagrams are used to show the structure of the run-time system and communicate how the hardware and software elements that make up an application will be configured and deployed. The items represented as rectangles are entity sets. “Product_SubCategory” and “Product_Category” classes form the dimension tables. ”Country”. Activity diagrams are used to describe the internal behavior of a method and represent a flow driven by internally generated actions. Figur e 6. Non-complete membership is shown in these dimensions.5 Sales subsystem star ER model Figure 6. time and state).6 illustrates the shipping subsystem starER model. product and state dimensions contain hierarchies that enable summarization of the fact set by different granularities. 69 .The model contains one fact set (sales) and four dimensions (employee. The time. product. Each dimension is 70 . day and employee. The fact relationship in the middle of the diagram connects the atomic dimension levels. The rolls-up relationships (arrow shapes) are shown in the model below. product.6 Shipping subsystem star ER model The model contains one fact set (shipping) and five dimensions (shipper. It is relatively easy and straightforward to design the conceptual model using starER modeling technique having the ER of the OLTP system. time and state_from.4. product. Noncomplete membership is shown in these dimensions.7 illustrates the sales subsystem with ME/R approach. The time. product and states dimensions contain hierarchies that enable summarization of the fact set by different granularities. 6. ME/R Appr oach In this section. the conceptual model is designed using ME/R approach. Figure 6. Fact relationship connects the sales fact with the dimensions state. state_to). The first design step is determining dimensions and facts.Figur e 6. The actual facts (sales units.g. Additional attributes of a dimension level (e.g. The dimension hierarchies are shown by the rolls-up relationships (e.7 Sales subsystem ME/R model Figure 6. sales dollars) are modelled as attributes of the fact relationship. Product rolls-up to product_subcategory and product_category).represented by a subgroup that starts at the corresponding atomic level.8 illustrates the shipping subsystem with ME/R approach and is similar to the sales subsytem. employee_id and employee_name of an employee) are depicted as dimension attributes of the corresponding level. Figur e 6. 71 . 8 Shipping subsystem ME/R model 6. Subtrees rooted in dimensions are hierarchies. are attributes (product_ID). The dimensions in the sales model are product.5. the conceptual model is designed using DF approach. 72 . DF Appr oach In this section. state and employee. time. Each vertex directly attached to the fact is a dimension. The vertices in the fact schema represented by lines instead of circles are non-dimension attributes (product_name). sales_units). their arcs represent -to-one relationships between pairs of attributes.Figur e 6.9 illustrates the sales subsystem with DF approach. The sales fact scheme is structured as a tree whose root is the sales fact. Their vertices. represented by circles. The fact is illustrated as a rectangle and contains the fact name (sales) and measures (sales_dollars. Figure 6. 9 Sales subsystem DF model Figure 6.10 illustrates the shipping subsystem with DF approach.Figur e 6. Figur e 6.10 Shipping subsystem DF model 73 . the sample OLTP database I have prepared as the data source of my sample DW is completely formed of normalized tables and therefore using snowflake schemas in the design of the DW is more applicable and easier to implement. there are two fact tables. I have chosen snowflake schema for the implementation of the DW.6. Implementation Details In this section. using the OOMD approach and taking the static structure diagram in Figure 6. The sample DW is implemented using Microsoft® OLAP Server [8]. 74 . Figur e 6.6. The main reason for choosing the snowflake schema is that. The snowflake schema for the sales subsystem is illustrated in Figure 6.11.4 as the basis.11 Snowflake schema for the sales subsystem. the physical implementation of the sample DW is given. “sales” and “shipping”. In the model. The snowflake schema for the shipping subsystem is illustrated in Figure 6. Figur e 6. transform. The diagram in Figure 6. DTS tasks. A DTS package is an organized collection of connections. DTS is a set of graphical tools and programmable objects that let the designer extract.13 Gener al ar chitectur e of the case study The ETL process is implemented using Microsoft® Data Transformation Services (DTS). DTS 75 .12. Figur e 6.13 illustrates the general architecture of the case study. and consolidate data from different sources into single or multiple destinations.12 Snowflake schema for the shipping subsystem. and workflow constraints assembled either with a DTS tool or programmatically and saved to Microsoft® SQL Server. an Access database. Other data sources provided by third-party vendors. Microsoft® Active Directory and other nonrelational data sources. The source data is not changed. Microsoft® SQL Server 2000 Meta Data Services. and notifies other users or processes of events. password protected. When executed. using the built-in DTS flat file OLE DB provider. In the sample implementation. Microsoft® Exchange Server. HTML. The particular substring function is the transformation mapped onto the source column. using the Microsoft® OLE DB Provider for ODBC. DTS is based on an OLE DB architecture that allows copying and transforming data from a variety of data sources. transforms data. Some of these are listed below:       SQL Server and Oracle directly. Text files. the package connects to the correct data sources. the sales fact table and the shipping fact table are populated from a delimited text file. Microsoft® Excel 2000. and additional file data sources. and conversions during the import and export process.14 and Figure 6. Paradox. A DTS transformation is one or more functions or operations applied against a piece of data before the data arrives at the destination. Microsoft® Access 2000. The user also can search for rows with certain characteristics (for example. an Excel spreadsheet and a SQL Server database.15 respectively. or a Microsoft® Visual Basic file. The DTS packages implemented are shown in Figure 6. dBase.transformations. Microsoft® Visual FoxPro. Transformations make it easy to implement complex data validation. copies data and database objects. specific data values in columns) and apply functions only against the data in those rows. data scrubbing. 76 . Each package contains one or more steps that are executed sequentially or in parallel when the package is run. ODBC sources. Packages can be edited. For example. scheduled for execution. and retrieved by version. using native OLE DB providers. the user can extract a substring from a column of source data and copy it to a destination table. a structured storage file. 15 Shipping DTS Package 77 .Figur e 6.14 Sales DTS Package Figur e 6. 17. but with a custom query. The table with the data is not loaded as it is. The detail of this transformation is shown Figure 6. the transformation source is a custom Transact-SQL query with grouping and aggregation of data.16 Tr ansfor mation details for delimited text file Transformations from Microsoft® Excel and Microsoft® Access are similar to transformation from the delimited text file.The arrow from delimited sales text file to DW Data Source indicates the transformation from the text file to the “SalesFact” table. The transformation is a copy column transformation with the details shown in Figure 6. Figur e 6. As seen in the figure. Transformation from the SQL Server database is a little different. 78 .16. Lookup queries allow running queries and stored procedures against other connections besides the source and destination. by using a lookup query. After the populating data from different sources and transforming and loading the data into the DW using DTS. you can make a separate connection during a query and include data from that connection in the destination table. For example.Figur e 6. 79 .18 shows a snapshot of pivot chart view of the sales cube. we need some client application for end users to enable them query the DW. In the sample implementation I used Microsoft® Excel and Microsoft® Data Analyzer as the client applications and mainly the pivot table and pivot chart technologies.17 Tr ansact-SQL quer y as the tr ansfor mation sour ce Another feature of DTS is lookup queries. The Figure 6. Lookup queries are especially useful in validating input data before loading it. Figur e 6.18 Pivot Ch ar t using Excel as client The Figure 6.19 Pivot Table using Excel as client 80 .19 shows pivot table view of the sales cube. Figur e 6. Figur e 6. Data Analyzer provides a number of advantages for displaying and analyzing the data:  A complete overview on a single screen replaces masses of grids. opportunities. which helps the user to quickly find hidden problems. graphs.20 shows the sales cube using Microsoft® Data Analyzer. and reports.The Figure 6. 81 .20 Data Analyzer as client Microsoft® Data Analyzer is business-intelligence software that enables you to apply the intelligence of an organization to the challenges of displaying and analyzing data in a quick and meaningful manner. Data Analyzer accomplishes this by giving the user a complete overview in one screen. non-technical business users can get answers to their questions immediately and independently. and trends. which puts knowledge directly into the hands of the people who need it most — decision-makers at all levels in the organization. By using Data Analyzer. Saving and reporting functions allow the user to save views for future use and export them to Microsoft® PowerPoint or Microsoft® Excel. Power users can do more advanced analysis in less time. and trends. Customizable displays guide the user to hidden problems.     Multidimensional views show relationships throughout all aspects of your business. 82 . opportunities. The dynamic use of color highlights anomalies. do not satisfy the requirements for data analysis of the decision-making users. and since they are usually highly normalized. A data mart refers to a part of an organization and contains limited amount of data. do not include historical data. a data mart. Operational databases contain detailed data. Traditional database systems. The structure of a DW is based on a multidimensional model. dimensions allowing the decision-making users to see these measures from different perspectives. The characteristics of a multidimensional model specified for the DW can be applied for a smaller structure. and hierarchies supporting the presentation of detailed or summarized measures. 83 . A DW represents a large repository of integrated and historical data needed to support the decision-making process.CHAPTER 7 CONCLUSIONS AND FUTURE WORK Successful data management is an important factor in developing support systems for the decision-making process. they perform poorly for complex queries that need to join many relational tables or to aggregate large volumes of data. This model includes measures that are important for analysis. An operational database supports daily business operations and the primary concern of such database is to ensure concurrent access and recovery techniques that guarantee data consistency. which is different from the DW in the scope of its analysis. called operational or transactional. star. and hierarchies. star schema. In this thesis. pivoting. snowflake schema and the fact constellation schema are the mostly used models commercially. On the other hand. snowflake. OLAP tools can manage high volumes of historical data allowing for dynamic data manipulations and flexible interactions with the end-users through the drill-down. ME/R and OOMD design models are compared. Furthermore. galaxy. and tendencies. dimensions. These tools include different kinds of applications. terraced. These three models are compared in terms of efficiency. DSSs require not only the data repository represented by DW. If the DW data structure has a well-defined multidimensional model.DWs have become the main technology for DSS. roll-up. 84 . OOMD design model meets the following factors while the others lack one or more:         Additivity of measures Many-to-many relationships with dimensions Derived measures Nonstrict and complete classification hierarchies Categorization of dimensions (specialization/generalization) Graphic notation Specifying user requirements Case tool support In the logical design phase flat. star cluster and starflake schemas are discussed. behaviour. starER. OOMD supports conceptual design phase with a rich set of diagrams that enables the designer model all the business information and requirements using a case tool with UML. widely accepted conceptual and logical design approaches in DW design are discussed. OO design model is significantly better than the other design approaches. OLAP tools are based on multidimensional concepts similar to DW multidimensional model using for it measures. and slicingdicing operations. Among these logical design models. In the conceptual design phase DF. it is easier to fully exploit OLAP tools capabilities. but also the tools that allow analysing data. fact constellation. software that include statistics and data mining techniques offers complex analysis for a large volume of data to identify profiles. usability. for example. for design and storage phase Microsoft® SQL Server Analysis Services and finally for end user access Microsoft® Excel and Microsoft® Data Analyzer are used. The data warehouse design models and approaches in the literature are researched and grouped 85 . either snowflake or star schema may be the best choice in the design. designing and storage of data and finally enabling end users access this data. It is important for a platform to be able to offer a complete solution in data warehousing. Note that. The sample OOMD model in the thesis is implemented using Microsoft® Visio. There are CASE tools that enable a user to design DW architecture. A CASE tool without basing the modeling notation on UML may never cover the needs of data warehousing design. Unfortunately. Likewise. the main purpose of UML is to “represent business rules”.reusability and flexibility quality factors among which efficiency is the most important one considering DW modeling.1. Contr ibutions of the Thesis This thesis contributes to both theory and practice of data warehousing. But there are still solutions using the existing CASE tools in data warehousing arena. there are a number of OLAP Servers like Microsoft® SQL Server Analysis Services. the data warehousing application is implemented using Microsoft technologies. a case tool for a complete data warehousing design is not available. Unfortunately. For the data acquisition phase Microsoft® DTS. Hyperion Essbase. Current commercial CASE tools have great design options to model that enable modelers to model databases and software solutions even in the enterprise level. 7. DB2 OLAP Server. Considering these factors and the requirements of the business and considering the trade-off between redundancy and the query performance. Very few commercially available tools may help in designing data warehousing solutions and may still not cover the requirements you need. PowerPlay. In this thesis. the development of the CASE tools on the data warehousing area is not as mature as the development in ER and software modeling areas. As mentioned in the thesis. Two of the commercial CASE tools that support OOMD design are Microsoft® Visio and Rational Rose. data warehousing is a complete process starting with data acquisition. Another future work may be implementing a more complex case study using real world application data. These models are refined according to the acceptance in the literature. So. The second contribution of the thesis is the comparison of the conceptual design models. Another future work may be improving the comparison of logical design models by both covering more model and more quality factor in the comparison. Also. The first contribution of the thesis is the comparison of methodologies namely E/R. perform performance tests using the three logical models compared to support the comparison on logical design models presented in the thesis. One future work may be implementing of a case tool meeting the requirements of a data warehousing solution. The third contribution of the thesis is the comparison of the logical design models in terms of the quality factors. Futur e Wor k One possible future work may be comparing the pyhsical design models for a data warehousing solutions and extend the case study to cover these physical design approaches.according to the phase in the project development cycle. very few articles point to an implementation covering all phases of data warehousing. The fourth contribution of the thesis is a case study covering the phases of a data warehousing solution. DM and OO. According to my research.2. issues and problems identified in this thesis that might impact the data warehouse implementation will enable project managers to develop effective strategies for successfully implementing data warehouses. A case tool for a complete data warehousing design is not available. there is no complete study in literature on DW models providing a mapping of models to development phases and giving a comparison of the models according to these phases. In addition. 7. the last contribution of the thesis is providing a complete study on these missing points. 86 . inf. Leslie Pang Web Site and Lecturer Notes [5] Gatziu S. Lecture http://www. and Vavouras A. Han and M. Berlin.it/~franconi/teaching/2002/cs636/2 . Inmon. Pilani Lecture Notes [3] Franconi E. 3th Edition”. Data Warehousing: Concepts and Mechanisms.. Kamber. Barnes & Nobles... Open Problems in Data Warehousing. Data Warehousing and Data Mining.. Germany [10] J..REFERENCES [1] Romm M. Chapter2: Data Warehouse and OLAP Technology for Data Mining. Introduction to Data Warehousing. Introduction to Data Warehousing.informatik. “ Data Mining: Concepts and Techniques”. 2002 [7] Gatierrez A. “ Building the Data Warehouse.. San Diego SQL User Group [2] Goyal N. 3th Edition”. 1999 [6] Thomas Connolly & Carolyn Begg... An Overview of Data Warehouse Design Approaches and Techniques. “ Microsoft® SQL Server 2000 Analysis Services”. BITS. John Wiley. AddisonWesley. Uruguay. 2000 [8] Reed Jacobson. and Marotta A.2002 Notes. [4] Pang L. ISBN 0-73560904-7.unibz.de/Publications/CEUR-WS/Vol-77/ DMDW 2003. Introduction to Data Warehousing..rwthaachen. H. “ Database Systems. http://sunsite.. 2002 87 . 2000 [11] W. 2000 [9] Rizzi S. .de/Publications/CEUR-WS//Vol-28/ http://sunsite. An Object Oriented Approach to Multidimensional Database Conceptual Modeling (OOMD) . No. Rizzi S. Rizzi S.. Busborg F. Stockholm. A. IBM Redbook. 1998 88 . and Kortink M. 1998 [18] Lujan-Mora S. Maio D.html “ A Dimensional Modeling Manifesto”.. starER: A Conceptual Model for Data Warehouse Design. http://www. “ Fundamentals of Database Systems”.[12] Moody D. 1. Proceeding of the ACM 2nd International Workshop Data Warehousing and OLAP (DOLAP99).. Maio D. Vol... IBM International Technical Support Organization.. Höfling G. Bell R. Extending the E/R Model for the Multidimensional Paradigm. Kim E.informatik..... 2004 [23] Elmasri R.. Palomar M. “ Data Modeling Techniques for Data Warehousing”.1998 [16] Golfarelli M.. International Journal of Cooperative Information Systems (IJCIS). The Dimensional Fact Model: A Conceptual Model For Data Warehouses... 7. 1996 [22] Martyn T. 1999 [14] Sapia C... From Enterprise Models to Dimensional Models: Methodology for Data Warehouse and Data Mart Design. Sweden [13] Tryfona N. Song I.. Christiansen J. Multidimensional Modeling with UML Package Diagrams. Vol.. VII. L. Rizzi S. Reconsidering Multi-Dimensional Schemas. 2002 [19] Trujillo J. DBMS Magazine. Aug 1997 [21] Kimball R. 21st International Conference on Conceptual Modeling (ER2002)... Proceeding of the ACM DOLAP98 Workshop. Vol.dbmsmag. Proceeding 1st International Workshop on Data Warehousing and OLAP (DOLAP98). 3rd Edition.. Blaschka M. 2000 [24] Ballard C. Conceptual Design of Data Warehouses from E/R Schemas. SIGMOD Record. Schau D. and Valencic A. Trujillo J. G. R. Addison-Wesley... “ The Data Warehouse Toolkit ”. Proceeding of the 31st Hawaii International Conference on System Sciences (HICSS-31)... 33....informatik.rwth-aachen. http://sunsite. John Wiley. A Methodological Framework for Data Warehouse Design.. 1998 [20] Kimball R.rwth DMDW 2000 . Herreman D. 1998 [17] Golfarelli M. Navathe S.. Proceeding 1st International Workshop on Data Warehousing and Data Mining (DWDM98). 1998 [15] Golfarelli M.com/9708d15. Dinter B. . 2000. Ceri S. Object-Oriented Data Warehousing. Navathe S.. ACM Sigmod Record.[25] Firestone J.. A Framework for the Classification and Description of Multidimensional Data Models. 2000 . 1997 [26] Kimball R.. 2002 [35] Peralta V. Automatically Generating OLAP Schemata from Conceptual Graphical Models. Marotta A. 2000 [30] Mora-Lujan S. 2000 89 . October 2004 [29] Hahn K.. The Simplification of Data Warehouse Design. DMDW’02... Automating Data Warehouse Conceptual Schema Design and Evaluation... “ Conceptual Database Design-An Entity Relationship Approach”. Rizzi S. Saltor F. Technical Report. Saltor F. Samos J..com/000908/webhouse... The Software Developer in Us. Database and Expert Systems Applications.intelligententerprise... Dayal U.. Addison-Wesley. Rizzi S. Sybase.. Multidimensional Modeling Using UML and XML. Ruggia R..com/000908/webho [28] Microsoft Developer Network (MSDN) Library. http://www.. WAND: A Case Tool for Data Warehouse Design.. and Blaschka M. 1997 [33] Golfarelli M... 2001 [32] Chaudhuri S. Enforcing the Rules. 12th International Conference.intelligententerprise.. Demo Proceedings of The 17th International Conference on Data Engineering (ICDE 2001). Journal of Computer Science and Information Management. Proceedings 16th European Conference on Object-Oriented Programming (ECOOP 2002). An Overview of Data Warehousing and OLAP Technology. 2000 [38] Abello A. 1992 [37] Abello A. A Data Warehouse Multidimensional Data Models Classification.. Sapia C. 2003 [36] Batini C.. Designing the Data Warehouse: Key Steps and Crucial Issues. 2002 [31] Golfarelli M. Proceedings ACM 3rd International Workshop Data Warehousing and OLAP (DOLAP 2000). Davis K.jhtml http://www. http://www.. Samos J.intelligententerprise. 2001 [39] Teklitz F.. XML Web Services Overview.com/000818/webhouse...26. 1999 [34] Phipps C. vol.jhtml?_requestid=380244 [27] Kimball R. Towards the Automation of Data Warehouse Design. DBMS.dbmsmag. First International Conference on Construction in the 21st Century (CITC2002) “Challenges and Opportunities in Management and Technology” .dbmsmag.html 90 . DBMS. 1996. 2000 [41] Ahmad I. Part 1...[40] Prosser A. 2002 [42] Kimball R.com/9612d05. Letting the Users Sleep.ht [43] Kimball R.. 1997.com/9612d05.com/9701d05. http://www. Ossimitz M.. Azhar S. Data Warehouse Management .. Letting the Users Sleep. University of Economics and Business Admin..html http://www. Data Warehousing in Construction: From Conception to Application. http://www. Vienna..dbmsmag. Part 2.

Comments

Description