大数据简介【PPT课件】_第1页
大数据简介【PPT课件】_第2页
大数据简介【PPT课件】_第3页
大数据简介【PPT课件】_第4页
大数据简介【PPT课件】_第5页
已阅读5页,还剩20页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Big Data,Big Data,What is Big Data?Analog starage vs digital.The FOUR Vs of Big Data.Whos Generating Big DataThe importance of Big Data.OptimalizationHDFC,Definition,Big data is the term for a collection ofdata setsso large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis,and visualization.,The FOUR Vs of Big Data,From traffic patterns and music downloads to web history and medical records, data is recorded, stored, and analyzed to enable that technology and services that the world relies on every day. But what exactly is big data be used?According to IBM scientists big data can be break into four dimensions: Volume, Velocity, Variety and Veracity.,The FOUR Vs of Big Data,The FOUR Vs of Big Data,Volume.Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.,The FOUR Vs of Big Data,The FOUR Vs of Big Data,Variety.Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.,The FOUR Vs of Big Data,The FOUR Vs of Big Data,Velocity.Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.,The FOUR Vs of Big Data,The FOUR Vs of Big Data,Veracity - Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep dirty data from accumulating in your systems.,Whos Generating Big Data,Social media and networks(all of us are generating data),Scientific instruments(collecting all sorts of data),Mobile devices (tracking all objects all the time),Sensor technology and networks(measuring all kinds of data),The progress and innovation is no longer hindered by the ability to collect dataBut, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion,15,The importance of Big Data,The real issue is not that you are acquiring large amounts of data. Its what you do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable:Cost reductionsTime reductionsNew product development and optimized offeringsSmarter business decision making,The importance of Big Data,For instance, by combining big data and high-powered analytics, it is possible to:Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually.Optimize routes for many thousands of package delivery vehicles while they are on the road.Analyze millions of SKUs to determine prices that maximize profit and clear inventory.Generate retail coupons at the point of sale based on the customers current and past purchases.Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers.Recalculate entire risk portfolios in minutes.Quickly identify customers who matter the most.Use clickstream analysis and data mining to detect fraudulent behavior,HDFS / Hadoop,Data in a HDFS cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing. The goal of Hadoop is to use commonly available servers in a very large cluster, where each server has a set of inexpensive internal disk drives.,PROS OF HDFS,Scalable New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.Cost effective Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.Flexible Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can pr

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论