




已阅读5页,还剩15页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1What is the difference between a data warehouse and a data mart?Dataware house:It is a collection of data marts. Represents historical data.a data warehouse is a relational database which is specially designed for analysis purpose rather then for transactional purpose.Data mart:It is a sub set of data ware housing.It can provide the data to analyze query reporting& analysis.a datmart is subject oriented database which gives the data about each and every individual department in an organisation.2how can one connect two fact tables ? is it possible ? how?This confirm dimenstion methodology.If a dimension table is connected to more then one Fact table is called confirm dimension.Fact Tables are connected by confirmed dimensions, Fact tables cannot be connected directly, so means of dimension we can connect3suppose data are coming from different locations and those data will not change . is there any need to use surrogate key ?Yes, We should use the surrogate key, here we are getting data from different locations means every one have one primary key, while transforming the data into target that time more than two key not in use so if you use surrogate key it will identified the duplicate fields in dimensional table.4what is the difference between aggregate table and fact table ? how do you load these two tables ?A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables or aggregated fact. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.5What Oracle features can be used to optimize my Warehouse system?Partition table, bitmap index, sequence ,table function ,sql loader ,function like cube ,roll_upetc.6When should you use a STAR and when a SNOW-FLAKE schema?The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database .The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.Snowflake schema is nothing but one dimension table will be connected to another dimension table and so on.-Snowflake-?If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking maybe appropriate.?A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will hence be easier to implement.?A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.?Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snow flaking would typically be required to permit simple query tools such as CognosPower play to form such a query, especially if provision for these forms of query werent anticipated when the data warehouse was first designed.-Star-The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single fact table with a compound primary key, with one segment for each dimension and with additional columns of additive, numeric facts.The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF). If you want tonormalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.7When should one use an MD-database (multi-dimensionaldatabase) and not a relational one?1 Because More than one dimensions can be shareble for Other Department2 The Physical Load will be less.3 Less Complexity of Fact8What is the difference between an ODS and a W/H?An ODS is an environment that pulls together, validates, cleanses and integrates data from disparate source application systems. This becomes the foundation for providing the end-user community with an integrated view of enterprise data to enable users anywhere in the organization to access information for strategic and/or tactical decision support, day-to-day operations and management reporting.The defination of Data Warehouse is as follows.?Subject-oriented, meaning that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;?Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;?Non-volatile, meaning that data in the database is never over-written or deleted, but retained for future reporting;?Integrated, meaning that the database contains data from most or all of an organizations operational applications, and that this data is made consistent.Difference-Ods-Transactions similar to those of an Online Transaction Processing SystemData Warehouse-Queries process larger volumes of dataOds-Contains current and near current dataData Warehouse-Contains historical data.Ods-Typically detailed data only, often resulting in very large data volumesData Warehouse-Contains summarised and detailed data,generally smaller in size than on ODSOds-Real-time and near real-time data loadsData Warehouse-Typically batch data loadsOds-Generally modeled to support rapid data updateData Warehouse-Generally dimensionally modeled and tunes to optimise query performanceOds-Updated at the data field leveData Warehouse-Data is appended, not updatedOds-Used for detailed decision making andoperational reportingData Warehouse-Used for ling-term decision making andmanagement reportingOds-Knowledge workers (customer servicerepresentatives, line managers)Data Warehouse-Strategic audience (executives, businessunit management)9What is the difference between a W/H and an OLTP application?Warehouse is used for high level data analysis purpose .It is used for predictions, time series analysis, financial analysis , what -if simulations etc. Basically it is used for better decision making.OLTP is NOT used for analysis purpose.It is used for transaction and data processing.Its basically used for storing day-to-day transactions that take place in an organisation.The main focus of OLTP is easy and fast inputing of data, while the main focus in data warehouse is easy retrieval of data.OLTP doesnt store historical data.(this is the reason why it cant be used for analysis)DW stores historical data.10What is the difference between OLAP, ROLAP, MOLAP and HOLAP?ROLAPROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS).MOLAP-MOLAP(Multidimensional OLAP), provides the analysisof data stored in a multi-dimensional data cube.HOLAP-HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP canprovide multidimensional analysis simultaneously of datastored in a multidimensional database and in a relationaldatabase(RDBMS).DOLAP-DOLAP(Desktop OLAP or Database OLAP)provide multidimensionalanalysis locally in the client machine on the data collectedfrom relational or multidimensional database servers.11what are the types of dimension tables1.Confirmed Dimension.2.Junk Dimension.3.Degenerated Dimension.4.Slowly changing Dimensions.12What is a Data Cube?Data cube is the logical representation of multidimensional data .The edge of the cube contains dimentions and the body of the cube contains datas.13How does data mining and data warehousing work together?data warehousing is used to store the historical data.by using dwh bsiness users can analize thier business.data mining is used to predict the future .dwh will act as the source for the data mining14What is Bulk Insert?In informatica .bulk insert or bulk load does 2 things :-1) Ignores the commit interval specified at the session level.2) Do not create a database session log file.So, advantage is its very fast as no entry goes into log.Disadvantage is session cannot be rolled back as no entry exists in the log file .15What is surrogate key?A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.There appear to be two definitions of a surrogate in the literature. We shall call these surrogate (1) and surrogate(2):Surrogate (1) This definition is based on that given by Hall, Owlett and Todd (1976). Here a surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible by the user or application.Surrogate (2) This definition is based on that given by Wieringa and de Jung (1991). Here a surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user orapplication.16What is a diff between joiner and lookup transformation?Joiner will join the two different data sources based on ajoin condition ,and pass only the rows which satisfy thatcondition.discards the remaining rows.Joiner transformation supports 4 types of joins atInformatica levelNormalMaster OuterDetail OuterFull OuterLookUp TransformationLookup transformation basically for Reference ,based on thelookup condition .when u want some data based on targetdata ,will take lookup on that particular table andretrieve the corresponding fields from that table.we can override the lookup transformation using the SQLquery.17What is the difference between View and Materialized View ?A view has a logical existence but a materialized view hasa physical existence .Moreover a materialized view can beindexed , analisied and so on.that is all the things thatwe can do with a table can also be done with a materializedview.18How could we generate the sequence of key values without using sequence generator transformation in the target ?Do a lookup on the Target table with an Lookup SQl OverrideSelect MAX(FIELD_NAME), field 1 , field3 from target groupby field1, field2.In the Expression increment the Max values of the fieldwhich you just got from the lookup by 1.Here MAX_FIELDNAME) is the Max value of the field you wantto generate the sequence of.19why we need to use unconnected transformation?Unconencted is used when ever u want to call the sametransformation several times and u have one return port.We use unconnected transformation to use multiple numberof tables or views without physically taking the entityinto mapping.This kind of transformation is also helpfulwhen single return port is required.Use dynamic cache if u want to update the case whileupdating the target table itself and static is untouchedwith the cache.20Whats the difference between $, $, $ - These are the system variables like $Bad file,$inputfile, $output file, $DB connection$ - Can any one tell me the scenariowith example foruser defined variables$ - $SessStartTime$SessStartTime returns the initial system date value onthe machine hosting the PowerCenter Server when the serverinitializes a session. $SessStartTime returns the sessionstart time as a string value. The format of the stringdepends on the database you are using.21What is Factless fact table ?A Fact table without measures(numeric data) for a column iscalled Factless Fact table.FACT LESS FACT TABLES ARE USED TO CAPTURE DATE TRANSACTIONEVENTS22explain the scenario for bulk loading and the normal loading option in Informatica Work flow manager ?Normal: In this case server manager allocates theresources(Buffers) as per the parameter settings. It createsthe log files in database.Bulk: In this case server manager allocates maximumresources(Buffers) available irrespective of the parametersettings. It will not create any log files in database.In first case data loading process will be time takingprocess but other applications are not affected. While inbulk data loading will be much faster but other applicationare affected.Normal Load: It loads the record one by one and writes log each file.It will take more time to complete.Bulk Load: Load the number of records at a time ,It wont fallow ant log files or trace levels,It takes less time .23Why is meant by direct and indirect loading options in sessions?we use file type direct when we are loading single file into target. we use Indirect when we want to load multiple files through single session in the mapping24What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?This can be handled by using the file list in informatica.If we have 5 files in different locations on the server andwe need to load in to single target table.In sessionproperties we need to change the file type as Indirect.am taking a notepad and giving following paths and filenames in this notepad and saving this notepad asemp_source.txt in the directory /ftp_data/webrep/ftp_data/webrep/SrcFiles/abc.txt/ftp_data/webrep/bcd.txt/ftp_data/webrep/srcfilesforsessions/xyz.txt/ftp_data/webrep/SrcFiles/uvw.txt/ftp_data/webrep/pqr.txtIn session properties i give /ftp_data/webrep/ in thedirectory path and file name as emp_source.txt and file typeas Indirect.25On a day, I load 10 rows in my target and on next day if Iget 10 more rows to be added to my target out of which 5 are updated rows how can I send them to target? How can I insert and update the record?the best way to do this use the slowy changing dimension inthe mappings-wizzard-slowly changing dimension-type1here u need to select the source and target tables.take a look t/f and a update strategy t/f. basing on lookup, if the record exits in target then reject it ,if notexit insert it,and if the record exist but it is changedthen update it26In real time scenario where can we use mapping parameters and variables?Before using mapping parameters and mapping variables weshould declare these things in mapping tab of mappingdesigner.A mapping parameter cannot change untill the session hascompleted unless a mapping variable can be changed inbetween the session.Example:if we declare mapping parameter we can use that parameteruntill completing the session,but if we declare mappingvariable we can change in between sessions.Use mappingvariable in Transcation Control Transformation.27By using Filter Transformation,How to pass rows that doesnot satisfy the condition(discarded rows) to another target?Connect the ports of the filter transformation to thesecond target table and enable the FORWARD REJECTED ROWSin the properties of the filter transformation. therejected rows will be forwarded to this table.Well You can Use Router iif you need rejected rows alongwith satisfied rows otherwis you just give the conditionfor filter tx as you want it in your target table.28A table contains some null values .how to get ( Not Applicable) in place of that null value in target ?with the help of ISNULL() function of the InformaticaIn the column properties sheet, write N/A in the Defaultvalue text box for the particular column29WHAT IS FACT TABLES?Fact table is the primary table in the dimensionalmodeling . the numerical performance of measures of thebusiness stored in fact tablemostly usedfacts a
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025福建厦门市教育局所属事业单位厦门市音乐学校招聘专业技术岗位教师1人(2025年4月)考前自测高频考点模拟试题附答案详解(完整版)
- 2025湖南娄底市冷水江市城发实业有限公司招聘实验室试验员3人模拟试卷及答案详解(必刷)
- 2025广东佛山市三水海江昇平建设工程有限公司第一批招聘企业工作人员拟聘用人员(第一批)模拟试卷及答案详解(历年真题)
- 2025年威海市环翠区教育和体育局公开招聘中小学教师(53人)模拟试卷附答案详解(模拟题)
- 2025年河北承德医学院附属医院招聘工作人员20名模拟试卷及一套参考答案详解
- 2025河南周口市西华县中医院校园招聘17人模拟试卷及答案详解(必刷)
- 2025赤峰龙韵城市建设有限公司所属子公司员工招聘21人考前自测高频考点模拟试题及答案详解(必刷)
- 2025年龙岩市新罗区国有资产经营集团有限公司招聘招商专员考前自测高频考点模拟试题及答案详解(各地真题)
- 2025年内江市市本级部分事业单位公开考核招聘工作人员(第二批)的(35人)模拟试卷及答案详解(新)
- 2025北京友谊医院平谷医院招聘编外财务1人模拟试卷完整答案详解
- 小学语文高段课标解读
- 艺术展演活动策划公司简介范文
- DB32∕T 3723-2020 高标准农田建设项目工程概算编制规程
- 财产申报表-被执行人用
- 万能式断路器课件
- 《小篮球规则》知识培训
- 江苏扬州历年中考语文古诗欣赏试题汇编(2003-2024)
- 入党申请书专用纸-A4单面打印
- 南方医科大学物理实验激光实验实验报告
- 无人机的分类
- 油气储运安全技术
评论
0/150
提交评论