_transportation_refreshing warehouse data   数据仓库英文oracle99版 教学课件_第1页
_transportation_refreshing warehouse data   数据仓库英文oracle99版 教学课件_第2页
_transportation_refreshing warehouse data   数据仓库英文oracle99版 教学课件_第3页
_transportation_refreshing warehouse data   数据仓库英文oracle99版 教学课件_第4页
_transportation_refreshing warehouse data   数据仓库英文oracle99版 教学课件_第5页
已阅读5页,还剩33页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Transportation: Refreshing Warehouse Data,Overview,Objectives,After completing this lesson, you should be able to do the following: Describe methods for capturing changed data Explain techniques for applying the changes Discuss techniques for purging and archiving data Outline final tasks, such as publishing the data, controlling access, and automating processes List tools for transporting data into the warehouse,Developing a Refresh Strategy for Capturing Changed Data,Consider load window Identify data volumes Identify cycle Know the technical infrastructure Plan a staging area Determine how to detect changes,T1,T2,T3,Operational databases,User Requirements and Assistance,Users define the refresh cycle IT balances requirements against technical issues Document all tasks and processes Employ user skills,T1,T2,T3,Operational databases,Load Window,Time available for entire ETT process Plan Test Prove Monitor,0 3 am 6 9 12 pm 3 6 9 12,User Access Period,Load Window,Load Window,Load Window,Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first. High availability requirements may mean a small load window.,0 3 am 6 9 12 pm 3 6 9 12,User Access Period,Scheduling the Load Window,0 3 am,1,File 1,File 2,Receive data,Control File File names File types Number of files Number of loads First-time load or refresh Date of file Date range Records in file - counts Totals - amounts,FTP,Control process,4,Open and read files to verify and analyze,3,2,Requirements,Load cycle,Scheduling the Load Window,3 am 6 am 9 am,Load into warehouse,File 1,File 2,5,Verify, analyze, reapply,6,Create summaries,8,7,Index data,Update metadata,9,Parallel load,Scheduling the Load Window,6 am 9 am,Create views for specialized tools,11,10,Back up warehouse,Users access summary data,12,Publish,13,User access,Capturing Changed Data for Refresh,Capture new fact data Capture changed dimension data Determine method for capture of each Methods: Wholesale data replacement Comparison of database instances Time stamping Database triggers Database log Hybrid techniques,Expensive Limited historical data, if any Data mart implementations Time period replacement,Wholesale Data Replacement,Comparison of Database Instances,Database comparison,Yesterdays operational database,Delta file holds changed data,Simple to perform, but expensive in time and processing Delta file: Changes to operational data since last refresh Used by various techniques,Todays operational database,Time and Date Stamping,Fast scanning for records changed since last extraction Date Updated field No detection of deleted data,Operational data,Delta file holds changed data,Database Triggers,Changed data intersected at the server level Extra I/O required Maintenance overhead,Operational server (DBMS),Triggers on server,Trigger,Trigger,Trigger,Operational data,Delta file holds changed data,Using a Database Log,Contains before and after images Requires system checkpoint Common technique,Log,Log analysis and data extraction,Operational server (DBMS),Verdict,Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issues.,Applying the Changes to Data,You have a choice of techniques: Overwrite a record Add a record Add a field Maintain history Add version numbers,Overwriting a Record,Customer Id John Doe Single,.,.,Customer Id John Doe Married,Easy to implement Loses all history Not recommended,Adding a New Record,1 Customer Id John Doe Single,History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys.,Adding a Current Field,Customer Id John Doe Single,Customer Id John Doe Single Married 01-JAN-96,Maintains some history Loses intermediate values Is enhanced by adding an Effective Date field,Limitations of Methods for Applying Changes,Complete history impossible Dimensions may grow large Maintenance overhead,Maintaining History,Product,Time,Sales,HIST_CUST,CUSTOMER,One-to-many relationship Always retain current record Consistently able to refer to record history,History Preserved,History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. Model must be able to: Reflect business changes Maintain context between fact and dimension data Retain sufficient data to relate old to new,Version Numbering,Avoid double counting Facts hold version number,Customer.CustId Version Customer Name 1234 1 Comer 1234 2 Comer Sales.CustId Version Sales Facts 1234 1 11,000 1234 2 12,000,Customer,Sales,Product,Time,Purging and Archiving Data,As data ages, its value depreciates. Remove old data from the warehouse: Archive for later use Purge without copy,Techniques for Purging Data,TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers,Techniques for Archiving Data,Export to dump file from tables Import to tables from dump file ALTER TABLE EXCHANGE partitions,EXP,.dmp,IMP,Verdict,Defined by business requirements Must be managed,Final Tasks,Update metadata ETT User Publish data Availability Changes Subject area basis Use database roles to prevent and allow access,Sources,Extract,Stage,Transform,Rules,Load,Publish,Query,Publishing Data,Control access using database roles 24-hour operation may be requested Compromise between load and access Consider Staggering updates Using temporary tables Using separate tables,ETT Tool Selection Criteria,Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code,ETT Tool Selection Criteria,Activity scheduling and sophistication Metadata generation Learning curve Flexibility Supported operating systems Cost,Transportation Tools,Informatica OpenBridge Oracle SQL*Loader Gateways PL/SQL Precompilers Platinum Technology InfoPump Platinum Info Transpo

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论