数据库与数据仓库文献与译文.doc_第1页
数据库与数据仓库文献与译文.doc_第2页
数据库与数据仓库文献与译文.doc_第3页
数据库与数据仓库文献与译文.doc_第4页
数据库与数据仓库文献与译文.doc_第5页
免费预览已结束,剩余2页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

数据库和数据仓库 什么是数据库?数据库就是相关的数据项的集合。它通常被存储在辅助存储器上,这些设备允许快速直接地访问单个数据项。这样可以做到冗余最小,如果数据项存在的话那么它只有一个副本。数据库可被很多不同的应用系统使用,因而避免了不同的系统为每一个应用而维护一个数据库。当用户程序需要数据库中的特殊项时,数据库土管理系统(DBMS)做实际的搜索。用户不需要了解数据的存储格式或数据的实际物理位置。数据库管理系统建立数据库并不断更新,为授权用户提供方便的访问方法。数据库管理系统也提供其它的安全措施以防止非授权访问。DBMS使用相关数据项之间关系的表示非常方便,使用户应用系统的设计变得容易。它们提供备份和恢复功能以防止重要的信息丢失或被破坏。由系统管理员决定谁可以访问数据库、修改数据库以及增加新的关系等。这也是非常重要的责任。数据库管理员具有对商业信息生命线最大的控制权。数据库管理系统真正的问题是如何组织信息为用户提出的各种可能的问题提供快速的答案。以不同方式组织的相同数据会产生非常不同的访问速度。为了表明数据项之间的关系,通常有三种方法用于建立数据库:关系数据库、层次数据库和网状数据库。层次、网状和关系数据库(3个主要的逻辑数据库模型)层次模型 在层次数据库中,数据记录按严格的双亲孩子关系控制。每一个双亲记录可以有许多孩子,但是每一个孩子只能有一个双亲。图3-1所示为一个简单的层次数据库,它显示了顾客和一家公司的定货关系。从上到下搜索一个层次数据库是快捷和方便的。IBM的信息管理系统(IMS)是应用最广泛的层次数据库管理系统。层次数据库管理系统最适合于需要有限个能够提前详细说明的结构性答案的问题。一旦数据关系被详细说明,如果没有强有力的编程支持,它们是不能轻易被修改的。因此,层次模型不能灵活地满足信息要求改变的情况。网状模型网状数据库模型最适宜于表示数据之间多对多的关系。换句话说,“孩子”可以有不止一个“双亲”,如图F-2所示。Computer Associate的IDMS就是一个大型计算机的网状数据库管理系统。订单 2订单 1部件 2部件 1客户A客户 A客户 B订单 1订单 2订单 3 图3-1 层次模型 图3-2 网状模型网状数据库管理系统比层次数据库管理系统更加的灵活,但是访问路径仍需提前说明。实际应用中,记录之间建立的连接或关系有一些限制。如果关系太多的话,软件将不能有效地工作。在不修改主程序的情况下,网状和层次数据库管理模型都不能容易的在数据元素之间建立新的关系或新的访问方式。关系模型在20世纪70年代早期发展关系数据库是为了提供更多的对用户友好的组织。关系数据库简单地将数据存储为表格(这些表格叫关系)而没有采用复杂的指针结构。这些有时被称为平面文件,因为表的行同文件的记录非常相似。关系中的每一行叫一个记录。每一列是记录的一个特殊的域。域对应以域名开头,它描述了整个一列。关系数据库包含一个或多个关系。一个关系数据库管理系统对关系进行以下三种主要的操作以建立新的关系。l 连接两个关系(合并)。l 扩展一个关系(从关系中抽出某几列用于形成新的关系的列)。l 根据用户描述的不同标准选择记录。SQL(结构查询语言)是基于关系模型的最重要的查询语言。例如,一个叫“账目”的关系记录银行的账目,它们的结余和类型如: 账号结余 类型 173921 ¥1700.00 检查 251101¥888.00 储蓄 列的头是3个域名:账号、结余、类型。域名以下是行或记录。第一行说明账号173921有1700元的结余,它是检查账号。假设我们想知道账号173921的结余,我们可以用SQL发出请求如下:SELECT balance FORM Accounts WHERE accountNo=173921IBM的DB2和Oracle公司的Oracle就是大型计算机关系数据库管理系统。Microsoft的Access是个人计算机关系数据库管理系统。(Oracle也有个人计算机版本。)客户/服务器结构 现代软件中有很多采用客户/服务器结构。在这种结构中一个过程(客户)的要求发送给另一个过程(服务器)去处理,数据库系统也不例外。在最简单的客户/服务器结构中,除了查询接口外整个数据库管理系统都是服务器,查询接口和用户打交道并且发送查询或其他命令给服务器。例如:关系系统通常用SQL语言来表示从客户到服务器的要求。然后数据库服务器以表或关系的形式将答案返回给客户。当答案非常大时,客户和服务器的关系可能会变得复杂。如果同时有很多数据库用户,那么服务器就会成为瓶颈,因此现在也有一种将更多的工作交给用户的趋势。数据库的新形式和数据仓库面向对象的数据库面向对象的数据库将数据作为能自动重新得到和共享的对象存储。包含在对象中的是完成每一项数据库事务的处理指令。这些对象可能包含不同类型的数据,包括传统的数据和处理过程,也包括声音、图形和视频信号等。对象可以共享和重用。面向对象的数据库的这些特性通过重用和建立新的多媒体应用的能力使开发软件变得容易,这些应用可以将不同类型的数据结合起来。面向对象数据库管理系统的好处就是它们支持WWW应用的能力。超媒体数据库超媒体数据库管理数据的方法与面向对象数据库管理系统不同而且它也可能包含不同类型的数据。它们将数据按信息“块”存储,每一块在一个单独的节点里。每一个节点可能包含传统的数字、字符数据或整个文档、软件程序、图形甚至连续运动的视频图像。每一个节点是完全独立的节点并不像它们在传统数据库中那样通过预先确定的组织方案联系起来,而是由用户自己建立节点间的连接。节点间的关系并不像传统数据库管理系统那样结构化,搜索信息也不需按照事先确定好的组织方案,用户可以直接从一个节点到另一个节点而不管它们之间是什么关系。数据仓库很多公司允许他们的数据被存在很多开放的系统上,但这些系统不能在公司范围内提供了一个统一的可用的信息视图。解决这个问题的方法是建立一个数据仓库。数据仓库是一种数据库。它将从不同产品和操作系统调出的数据组合在一起放入这种大型数据库,对管理状况做出报告和进行分析。这种数据库对源于机构核心事务处理系统的数据进行重新组织并与其它信息(包括过去的信息)进行合并。这些数据可以用来做出管理方面的决策和对管理进行分析。在大多数情况下,数据仓库中的数据只可以用来进行报告,不可进行更新,所以公司的隐性操作系统的表现就没有受到影响。数据仓库这种侧重解决问题的特性,使众多的公司由于运用了数据仓库而获益匪浅。数据仓库一般都有重新塑造数据的能力。关系数据库的数据视图可以让用户从两个以上的方面观察数据例如,按地区按季度销售。为了提供这种信息,组织可以用一种特殊化的多维数据库,或用可以在关系数据库中生成数据的多维视图的工具。多维分析能够使用户使用多维的不同方式看到相同的数据。信息的每个方面生产、定价、成本、地区或时间都代表不同的维。所以一个产品经理能用多维工具得知六月在西南销售区共卖出了多少件,与前一个月和去年六月相比怎么样,和销售预测相比怎么样。多维数据分析的另外一项是在线分析过程(LOAP)。数据的独立性、完整性和安全性数据的独立性在数据库系统中,每一个程序处理它自己的视图或数据库的视图。如果给数据记录里加了新的域,数据库管理系统就保存已有的视图以便已有的程序不必改变。修改数据库的结构而不影响已有的引用数据库的程序被称为数据的独立性。数据的完整性数据的完整性是指数据库中的数据的精确性、正确性和有效性 。 在数据库系统中,数据的完整性意味着保护数据防止非法修改或破坏。在大型联机数据库系统中,数据的完整性更加重要。数据的安全性数据的安全性是指数据库防止未授权的或非法的访问或修改。这通常涉及一级或多级的口令保护,这些在数据字典中有详细说明。例如,高级口令可能允许用户读出、写入和修改数据库结构,但低级口令可能只允许用户从数据库中读出。通常,审计跟踪记录着数据库修改历史,可以用来指明数据库被破坏的时间和地点,它也用于将文件复原。Database and Data WarehousesWhat is a Database?What is a database? A database is a collection of related data items. It is generally stored on secondary storage devices that allow rapid direct access to individual data items. Redundancy is minimized; where possible, only a single copy if a data item exists. The database may be used by many different application systems at once, eliminating the need for separate systems to maintain the data for each application .When a user program inquires if a particular item is in the database, a database management system (DBMS) does the actual searching. The user does not need to be familiar with the format in which the data is stored or the actual physical location of the data.The DBMS cerates the database, keeps it up-to-date, and provides reday access to authorized user. Database management systems also provide extensive security measures to prevent unauthorized access. They make it convenient for expressing relationship between related data items and facilitate the design of user application systems. They provide backup and recovery capabilities to prevent against loss or destruction of vital information. They ensure database integrity, that is, what is supposed to be in the database is there and what is not supposed to be .isnt. a person called the database administrator determines who may access the database, modify it, add new relationships ,and the like. This is a very important lifeline. The real question in database management systems is how to organize information to provide rapid answers to the kinds of questions users are likely to ask. The same data organized differently can yield drama-tidally different access speeds. There common ways are used to structure a database to indicate the relationships among the data items; these are the relational database, the hierarchical data-base, and the network database.Hierarchical, Network, and Relational Databases (three principal logical database models)The Hierarchical Model In a hierarchical database, data records are arranged in a strict parent child relationship. Each parent record may have many children, but each child record has exactly one parent. Figure 3-1 shows a simple hierarchical database, indicating the relationship between a customer and the orders it has placed with a company. Searching a hierarchical database is rapid and convenient as long as it is searched from the top down. IBMs IMS (Information Management System) is the most widely used hierarchical DBMS. Hierarchical DBMS are best suited for problems that require a limited number of structure answers that can be specified in advance. Once data relationships have been specified, they cannot easily be changed without a major programming effort. Thus, the hierarchical model cannot respond flexibly to changing requests for information.The Network Model The network database model is best at representing many-to-many relationship among data. In other words, a ”child” can have more than one “parent”, as Figure3-2 illustrates. Computer Associates IDMS is a network DBMS for computer mainframes.Network DBMS are more flexible than hierarchical DBMS, but access paths must still be specified in advance. There are practical limitations to the number of links, or relationships ,that can be established among records. If they are too numerous, the software will not work efficiently. Neither network no hierarchical database management models can easily create new relationships among data elements or new patterns of access without major programming efforts.The Relational Model In the early 1970s the relational database approach was developed to provide a much more user-friendly organization. Instead of using complex structures of pointers the relational database stores information simply as tables called relations. These tables are sometimes called flat because the rows of the table really are very much the same as the records of a file.Each row in a relation is called a record. Each column corresponds to a particular field within the record (fields are also called domains).The fields are headed by attributes, which describe the entries in the column. A relational database consists of one or more relations. A relational DBMS performs three primary operations on relations to form new relations:l Two relations may be joined (combined).l A relational may be projected (some of the columns are extracted form the relations and used to form the columns of the new relation).l Records may be selected according to various user specified criterias(Structured Query Language) is the most important query language based on the relational model. For instance, a relation named Accounts, record bank accounts, their balance, and type might look like: AccountsAccountBalance Type 173921 1700.00 Checking 251101888.00Savings Heading the columns are the three attributes: Account No, Balance, and Type. Below the attributes are the rows, or records. The first row says that account number 173921 has a balance of one thousand and seven hundreds dollars, and it is checking account .Suppose we wanted to know the balance of account 173921.We could ask this Query in SQL as follows:SELECT balance FORM Accounts WHERE accountNo=173921IBMs DB2 and Oracle from the Oracle Corporation are examples of mainframe relational database management systems. Microsoft Access is a PC relational database management system. (Oracle also has a PC version).Client-Server ArchitectureMany varieties of modern software use a client-server architecture, in which requests by one process (the client) are sent to another process (the server) for execution. Database systems are no exception. In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces than interact with user and send queries or other commands across to the server. For example, relational systems generally use the SQL language for representing requests from the client to the server. The database server then sends the answer, in the form of a table or relation, back to the client. The relationship between client and server can get more complex especially when answers are extremely large. There is also a trend to put more work in the client, since the server will be a Bottleneck if there are many simultaneous database users.New Forms of Database and Data WarehousesObject-Oriented Database Object-oriented databases store data as objects that can be automatically retrieved and shared. Included in the object are the processing instructions to complete each database transaction. These objects can contain various types of data, including sound, graphics, and video as well as traditional data and processing procedures. The objects can be shared and reused. These features of Object-oriented databases promises to facilitate software development through reuse and the ability to build new multimedia application that combine multiple types of data. Benefit of Odoms is their ability to support application for the World Wide Web, as described in the focus on Technology.Object-oriented database are still a relatively new technology and can be much slower than relational systems for handing large quantities of data where there is a high volume of transaction processing. Hybrid object-relational data bases have been developed that combine the capability of handing large numbers of transactions found in relational Dams with the capability of handing complex relationships and new types of data found in Odoms.Hypermedia Database Hypermedia database manage data differently from object oriented DBMS. but they can also contain diverse types of data. They store data as “chunks”of information, with each chunk in a separate node. Each node can contain traditional numeric or character data or wholedocuments ,software programs, graphics, and even full motion video. Each node is totally independent-the nodes are not related by a predetermined organization scheme as they are in traditional database. Insured, user establishes their own links between nodes. The relationship among nodes is less structured than in a traditional DBMS. Searching for information does not have to follow a predetermined organization scheme. Users can branch directly form one node to another in any relationship they establish. For instance, a hypermedia database on automobiles might link basic product information with descriptive sales brochures, a video showing the automobile in action, and the location of authorized dealers.Data Warehouses Many companies have allowed their data to be stored in many separate systems that are unable to provide a console dated view of information usable company-wide. One way to address this problem is to build a data warehouse .A data warehouse is a database that consolidates data extracted from various production and operational systems into one large organizations core transaction processing systems are reorganized and combined with other information, including historical data so that they can be used for management decision making and analysis. In most cases, the data in the data warehouse can be used for reporting-they can not be updated-so that the performance of the companys underlying operational system is not affected. The focus on problem solving describes some of the benefits

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论