![[计算机软件及应用]Chapter 3 The Data Warehouse and Design.ppt_第1页](http://file.renrendoc.com/FileRoot1/2018-12/23/2061eb4f-4157-4679-861d-d37f941eb7c2/2061eb4f-4157-4679-861d-d37f941eb7c21.gif)
![[计算机软件及应用]Chapter 3 The Data Warehouse and Design.ppt_第2页](http://file.renrendoc.com/FileRoot1/2018-12/23/2061eb4f-4157-4679-861d-d37f941eb7c2/2061eb4f-4157-4679-861d-d37f941eb7c22.gif)
![[计算机软件及应用]Chapter 3 The Data Warehouse and Design.ppt_第3页](http://file.renrendoc.com/FileRoot1/2018-12/23/2061eb4f-4157-4679-861d-d37f941eb7c2/2061eb4f-4157-4679-861d-d37f941eb7c23.gif)
![[计算机软件及应用]Chapter 3 The Data Warehouse and Design.ppt_第4页](http://file.renrendoc.com/FileRoot1/2018-12/23/2061eb4f-4157-4679-861d-d37f941eb7c2/2061eb4f-4157-4679-861d-d37f941eb7c24.gif)
![[计算机软件及应用]Chapter 3 The Data Warehouse and Design.ppt_第5页](http://file.renrendoc.com/FileRoot1/2018-12/23/2061eb4f-4157-4679-861d-d37f941eb7c2/2061eb4f-4157-4679-861d-d37f941eb7c25.gif)
已阅读5页,还剩40页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Data Warehouse 数据仓库,Chapter 3 The Data Warehouse and Design,1,Contents: The Data Warehouse design The data warehouse and data models The data warehouse and iterative development Normalization and Denormalization Snapshot Profile records Going from DW to the operational environment Star joins and Multidemension Approach,The Data Warehouse and Design,Chapter 3,2,The Data Warehouse Design,Two major components to building a data warehouse The design of the interface from operational systems The design of the data warehouse itself Constructed in a heuristic manner 只有在装载了部分数据并开始使用时,才能弄清楚需求;根据用户的反馈,继续修改或添加数据,然后建立数据仓库的另外一部分,如此反复循环。 Different from the classical requirements-driven system 但这并不代表需求不重要,相反,需要对需求进行预测,3,数据仓库设计与数据库系统设计的区别,4,Beginning with Operational Data,Placement of data into the data warehouse from the operational environment Merely pulling data out of the legacy environment and placing it in the data warehouse? Unfeasible. Why? (Unintegrated) Issue 1: Data integration Encoded consistently(编码不一致) Semantic field transformation (Mapping) Translation of the technology (data exists in many different formats under different DBMS) Issue 2: Efficiency of accessing existing systems data Whether a file has been scanned previously? Full scan when loading is wasteful and unrealistic,3.1,5,Beginning with Operational Data,Three types of data were loaded: Archival data (One-time-only event) Data currently contained in the Operational Environment (done only once) Ongoing changes to the data warehouse from the changes that have occurred in the operational environment since the last refresh (Major challenge),3.1,6,Beginning with Operational Data,Scanning each line of data in the operational environment when refreshing the data warehouse is UNFEASIBLE Five techniques to limit the amount of operational data scan: Scan data that has been time stamped,3.1,对记录的最近一次变化或更新打上时间戳,7,Beginning with Operational Data,Delta file (增量文件) A delta file contains only the changes made to an application as a result of the transactions Log file or Audit file Log file = delta file But different in: used mainly in recovery process, its internal format is built for system purpose, contains much more informaiton than that desired by the data warehouse developer,3.1,8,Beginning with Operational Data,Modify application code Never a popular option (application code is old and fragile) Comparison between a “before” and an “after” image of the operational file A snapshot is taken at the moment of extraction Complex and simply a last resort,3.1,9,Beginning with Operational Data,Issue 3: a time-basis shift Operational data: current-value, can be updated Data warehouse data: update forbidden, an element of time attached,3.1,10,Beginning with Operational Data,Issue 4: data condensation,3.1,11,Process/Data Models and the Architected Environment,3.2,12,The Data Warehouse and Data Models,Data model is applicable to both the operational environment and the data warehouse environment An overall Corporate Data Model is created,3.3,13,Corporate Data Models,Corporate Data Model Focuses on and represents only primitive data When transported to the operational environment: performance factors are added When applied to the data warehouse environment: Remove pure operational data Add element of time to key Add derived data where appropriate Create artifacts of relationships Stability analysis (shown as bellow),3.3,14,Stability Analysis,Stability analysis involves grouping attributes of data together based on their propensity for change,3.3,15,The Data Warehouse Data Model,Three levels of data modeling: High-level modeling (the entity relationship diagram, ERD) Features entities and relationships,3.3.1,16,Entity Relationship Diagram,Scope of integration: Defines the boundaries of the data model and must be defined before the modeling process commences,17,Corporate ERD,Reflect the different views of people across the corporation,18,The Midlevel Data Model,The Midlevel Data Model (DIS) After the high-level data model is created, the midlevel model, or the DIS (Data Item Set,数据项集) is established For each major subject area, or entity, identified in the high-level data model, a midlevel model is created,3.3.2,19,The Midlevel Data Model,Four basic constructs at the midlevel model: A primary grouping of data(主要数据分组) Exists once, and only once, for each major subject area It holds attributes and keys for each major subject area,3.3.2,20,The Midlevel Data Model,A secondary grouping of data(二级数据分组) Hold data attributes that can exist multiple times A connector(连接器) Signifies the relathionships of data between major subject areas “Type of” data The grouping of data is the supertype The grouping of data to the right is the subtype of data Four constructs are used to identify the attributes of data in a data model and the relationship among those attributes,3.3.2,21,A pair of connector relationship is manifested for a relationship identified at the ERD level,22,A Full-blown DIS,Hold data attributes that can exist multiple times,23,The Corporate DIS created from ERDs,24,The Physical Data Model,The low-level modeling (Physical Model) Created from the midlevel data model by extending it to include keys and physical characteristics of the model Looks like relational tables The first step in data warehouse design is: deciding on the granularity and partitioning of the data (The key structure is changed to add the element of time) The heart of the physical design: usage of physical I/O - responsible for bring data into the computer from storage or sends data to storage from the computer,3.3.3,25,The Data Model and Iterative Development,Why iterative development: The industry track record of success strongly suggests it The end user is unable to articulate many requirements until the first iteration is done Management will not make a full commitment until at least a few actual results are tangible and obvious Visible results must be seen quickly,3.4,26,The Data Model and Iterative Development,The role of the data model in iterative development: Tell what needs to be done Allows the different iterations of development to be built in a cohesive manner,3.3.3,27,Normalization and Denormalization,Output of the data model process: tables (keys + attributes) How to deal with a lots of little tables: Physically merge some tables to minimize I/O Creating an array of data,3.5,28,Creative index 创造性索引(Creative profile),Created as data is passed from the operational environment to the data warehouse environment Low overhead Create a profile on items of Interest to the end user,29,Snapshots in the Data Warehouse,Each of the data warehouse centers around a snapshot Created as some event occurring (event-snapshot interaction): Activity-generated event (random) Time-generated event (predictable) Four basic components of a snapshot: A key (identifies the snapshot) A unit of time Primary data that relates only to the key Secondary data captured as part of the snapshot process that has no direct relationship to the primary data or key (Optional),30,Cyclicity of Data,Cyclicity of Data: The length of time a change of data in the operational environment takes to be relfected in the data warehouse The changes need to be moved to the data warehouse. But how soon? Wrinkle of time: No rush to try to move the changes into the data warehouse The more tightly, the more expensive and complex the technology is Imposes a certain discipline on the environments Allow data to settle, and adjustments can be done,3.7,31,Complexity of Transformation and Integration,Functionalities required as data passes from the operational environment to the data warehouse environment: A change in technology (OS, hardware, ) Selection of data from the operational environment Operational input keys be restructured and converted Nonkey data is reformatted Data cleansing Multiple input sources of data be merged Key resolution when multiple input files exist Default values be supplied Data summarization Renaming of data elements must be tracked Data format conversion ,3.8,32,Complexity of Transformation and Integration,ETL (Extract/Transform/Load) Automates the process of converting, reformating, and integrating data from the operational environment Two varieties: code producing software (功能更强大;可以以原有数据的格式进行访问) Run-time software(首先需要对原有数据格式进行统一) ELT (Extract/Load/Transform) Transformation can be done concurrently with the reference to large amounts of data,3.8,33,Profile Records,数据仓库中数据的特点:包含大量细节性数据;不允许更新,生成快照 当数据内容经常发生改变时,当不要求详细的细节信息时,如何处理? Profile record is used when there is no need for historical detail of data A profile record groups many different, detailed occurrences of operational data into a single record Difference between profile and snapshot: profile records represent multiple events, while snapshot represent a single event Both profile and snapshot are triggered by some events,3.9,34,Profile Records,Aggregation of operational data into a single record may take the following forms: Values taken from operational data can be summarized Units of operational data can be tallied, where the total number of units is captured Units of data can be processed to find the highest, lowest, average, and so forth First and last occurrences of data can be trapped Data of certain types, falling within the boundaries of several parameters, can be measured Data that is effective as of some moment in time can be trapped The oldest and the youngest data can be trapped,3.9,35,Managing Volume,Managing large volume data: profile records Disadvantage: detail is lost, and thus, certain capability or functionality of the data warehouse is lost Resolution: Build the profile records iteratively Create an alternative level of historical detail along with the profile record,36,Going from the DW to the Operational Environment,Natural, formal and technologically feasible Direct access,3.10,Limitations: Long response time(不具备在线特性) Minimal request for data Technology compatiblity (capacity, protocol, ) The formatting of data must be nonexisttent or minimal Conclusion: Direct access is seldom used,37,Indirect access,38,Star Joins and Data Mart,Multidimensional Approach (used only in the data mart design but never data warehouse design) Multidimensional Approach entails star joins, fact tables, and dimensions Once the processing requirements are known, the data mart can be shaped into an optimal star join structure Creating a star join for the data warehouse is a mistake A data warehouse optimized for one community at the expense of all other communities,3.11,39,Star Joins
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 离婚协议中股票财产分割与收益权归属明确协议
- 复杂离婚案件调解协议及子女抚养权争议起诉书模板
- 离婚协议书制作与婚姻法律文书审核服务合同
- 玄武区二手房买卖合同及配套社区文化活动参与权合同
- 环保科研实验室场地租赁与污染治理技术合同
- 城市离婚房产分割及补偿协议范本
- 物业服务企业员工劳动合同解除与经济补偿金计算合同
- 离婚协议书模板(涉及共同债务承担与子女抚养)
- 双方户口迁移配合离婚财产分割执行合同
- 电力企业实习生电力系统运行与职业培训合同
- 大脑动脉狭窄脑梗死的护理查房
- T-GDPIA 21-2020 高转速高转矩同向双螺杆挤出机
- 创伤性窒息护理课件
- 人口老龄化对寿险产品需求结构的影响
- 最常用2000个英语单词-电子表格版
- 老年人常见疾病预防知识讲座
- 《解决方案营销》节选版
- 流感传染的预防与护理知识培训课件
- 秋季慢性病知识讲座
- 2024年全国高考体育单招考试语文试卷试题(含答案详解)
- 《西方经济学》(下册)课程教案
评论
0/150
提交评论