




已阅读5页,还剩33页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Transportation: Refreshing Warehouse Data,Overview,Objectives,After completing this lesson, you should be able to do the following: Describe methods for capturing changed data Explain techniques for applying the changes Discuss techniques for purging and archiving data Outline final tasks, such as publishing the data, controlling access, and automating processes List tools for transporting data into the warehouse,Developing a Refresh Strategy for Capturing Changed Data,Consider load window Identify data volumes Identify cycle Know the technical infrastructure Plan a staging area Determine how to detect changes,T1,T2,T3,Operational databases,User Requirements and Assistance,Users define the refresh cycle IT balances requirements against technical issues Document all tasks and processes Employ user skills,T1,T2,T3,Operational databases,Load Window,Time available for entire ETT process Plan Test Prove Monitor,0 3 am 6 9 12 pm 3 6 9 12,User Access Period,Load Window,Load Window,Load Window,Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first. High availability requirements may mean a small load window.,0 3 am 6 9 12 pm 3 6 9 12,User Access Period,Scheduling the Load Window,0 3 am,1,File 1,File 2,Receive data,Control File File names File types Number of files Number of loads First-time load or refresh Date of file Date range Records in file - counts Totals - amounts,FTP,Control process,4,Open and read files to verify and analyze,3,2,Requirements,Load cycle,Scheduling the Load Window,3 am 6 am 9 am,Load into warehouse,File 1,File 2,5,Verify, analyze, reapply,6,Create summaries,8,7,Index data,Update metadata,9,Parallel load,Scheduling the Load Window,6 am 9 am,Create views for specialized tools,11,10,Back up warehouse,Users access summary data,12,Publish,13,User access,Capturing Changed Data for Refresh,Capture new fact data Capture changed dimension data Determine method for capture of each Methods: Wholesale data replacement Comparison of database instances Time stamping Database triggers Database log Hybrid techniques,Expensive Limited historical data, if any Data mart implementations Time period replacement,Wholesale Data Replacement,Comparison of Database Instances,Database comparison,Yesterdays operational database,Delta file holds changed data,Simple to perform, but expensive in time and processing Delta file: Changes to operational data since last refresh Used by various techniques,Todays operational database,Time and Date Stamping,Fast scanning for records changed since last extraction Date Updated field No detection of deleted data,Operational data,Delta file holds changed data,Database Triggers,Changed data intersected at the server level Extra I/O required Maintenance overhead,Operational server (DBMS),Triggers on server,Trigger,Trigger,Trigger,Operational data,Delta file holds changed data,Using a Database Log,Contains before and after images Requires system checkpoint Common technique,Log,Log analysis and data extraction,Operational server (DBMS),Verdict,Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issues.,Applying the Changes to Data,You have a choice of techniques: Overwrite a record Add a record Add a field Maintain history Add version numbers,Overwriting a Record,Customer Id John Doe Single,.,.,Customer Id John Doe Married,Easy to implement Loses all history Not recommended,Adding a New Record,1 Customer Id John Doe Single,History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys.,Adding a Current Field,Customer Id John Doe Single,Customer Id John Doe Single Married 01-JAN-96,Maintains some history Loses intermediate values Is enhanced by adding an Effective Date field,Limitations of Methods for Applying Changes,Complete history impossible Dimensions may grow large Maintenance overhead,Maintaining History,Product,Time,Sales,HIST_CUST,CUSTOMER,One-to-many relationship Always retain current record Consistently able to refer to record history,History Preserved,History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. Model must be able to: Reflect business changes Maintain context between fact and dimension data Retain sufficient data to relate old to new,Version Numbering,Avoid double counting Facts hold version number,Customer.CustId Version Customer Name 1234 1 Comer 1234 2 Comer Sales.CustId Version Sales Facts 1234 1 11,000 1234 2 12,000,Customer,Sales,Product,Time,Purging and Archiving Data,As data ages, its value depreciates. Remove old data from the warehouse: Archive for later use Purge without copy,Techniques for Purging Data,TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers,Techniques for Archiving Data,Export to dump file from tables Import to tables from dump file ALTER TABLE EXCHANGE partitions,EXP,.dmp,IMP,Verdict,Defined by business requirements Must be managed,Final Tasks,Update metadata ETT User Publish data Availability Changes Subject area basis Use database roles to prevent and allow access,Sources,Extract,Stage,Transform,Rules,Load,Publish,Query,Publishing Data,Control access using database roles 24-hour operation may be requested Compromise between load and access Consider Staggering updates Using temporary tables Using separate tables,ETT Tool Selection Criteria,Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code,ETT Tool Selection Criteria,Activity scheduling and sophistication Metadata generation Learning curve Flexibility Supported operating systems Cost,Transportation Tools,Informatica OpenBridge Oracle SQL*Loader Gateways PL/SQL Precompilers Platinum Technology InfoPump Platinum Info Transpo
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 中国二苯基氯磷行业市场调查报告
- 2025年 中国铁路青藏集团有限公司招聘考试笔试试题附答案
- 2025年 锡林郭勒盟市级机关遴选考试笔试试题附答案
- 2024年中国钌粉行业市场调查报告
- 中国智能垃圾分类技术行业市场占有率及投资前景预测分析报告
- 写字楼可行性分析报告
- 2024年中国磷酸铵盐干灭火剂行业调查报告
- 2025年中国进口食品行业市场调查研究及投资前景预测报告
- 2025年中国电力巴士行业发展监测及投资战略规划研究报告
- 2024-2030年中国凳类家具行业市场深度研究及投资战略咨询报告
- 法律职业伦理试题及答案
- 2025年国家公务员考录《申论》真题及参考答案(行政执法卷)
- 2024珠海农商银行社会招聘笔试历年典型考题及考点剖析附带答案详解
- 2025年公路水运工程重大事故隐患判定标准
- 车间物料员员试题及答案
- 2025国内外虚拟电厂实践经验分析及高质量发展相关建议报告-国网能源院
- 锚杆锚固质量无损检测技术规程
- 老年痴呆健康知识讲座课件
- 2025年中考语文二轮复习:散文阅读 专题练习题(含答案)
- 云南楚雄州金江能源集团有限公司招聘笔试题库2025
- 高中生物2015-2024年10年高考真题专题分类汇编-专题14体液调节考点2激素调节的过程
评论
0/150
提交评论