大数据科学与机器学习平台介绍_第1页
大数据科学与机器学习平台介绍_第2页
大数据科学与机器学习平台介绍_第3页
大数据科学与机器学习平台介绍_第4页
大数据科学与机器学习平台介绍_第5页
已阅读5页,还剩25页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、大数据科学与机器学习平台介绍技术创新,变革未来Agenda 大纲v 数据科学和机器学习概要Data Science 101Machine Learning 101Data Science and ML Challengesv IBM 数据科学平台介绍IBM Data Science ExperienceIBM Machine Learningv 数据科学和机器学习案例演示What is Data Science?Data science, also known as data-driven science, is an interdisciplinary field about scienti

2、fic methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured,12 similar to Knowledge Discovery in Databases (KDD).Data science is a concept to unify statistics, data analysis and their related methods in order to understand and ana

3、lyze actual phenomena with data.3 It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, a

4、nd visualization.From Wikipedia4Data Scientist: The Sexiest Job of the 21st CenturyWhat abilities make a data scientist successful?Think of him or her as a hybrid ofdata hackeranalystcommunicatortrusted adviserThe combination is extremely powerfuland rare.-Harvard Business Review Oct 2012 Issue数据科学家

5、的硬技能/thoughts/becoming-a-data-scien tist/机器学习 第三次浪潮What is Machine Learning?Computers that Learn without being explicitly programmed Grow and change when exposed to new dataDeliver personalized and optimized customer interactionsIdentify Patternsnot readily foreseen byhumansBuild Modelsof behavior f

6、rom those patternsAchieving Business Value through Watson Machine LearningChurn analysis helps identify the cause of the churn and implement effective strategies for retention.Detect and understand life- threatening medicalconditions and design ever more effective treatment programsLearn, predict we

7、ather patterns and energy production from renewable sources and integrate into grid more effectivelyProduct recommendation, nextpurchase prediction, targeted offers individual tailored shopping experience.Identify suspicious behavior, predict and prevent threats / fraud continually reduce business r

8、isks and costsCapabilitiesMachine Learning helpsConstantly learns and adaptsAvoids making the same mistakesFaster, deeper, improved insightsResulting inSmarter business outcomesLower business risks and costsNew business opportunities8Machine Learning 101 : Types of machine learningClassificationData

9、 points are labeled and are being used to predict a categoryTwo-class vs multi-classExample:Fraud detection (fraud vs non-fraud)Spam email detection (spam vs non-spam)RegressionWhen a value is being predictedExample:Stock prices predictionClusteringData points are not labeled.Goal is to group data i

10、nto clusters to better organize the data9Machine Learning 101 : feature engineeringA feature is a piece of information that might be useful for predictionExample, predict the churn probability of a customerLabeled data is the desired output dataExample, CHURN_LABEL false representing a churn sampleN

11、OT a featureFeatureFeatureFeatureTraining a modelFeature EngineeringFeature EngineeringScoringLabeled examplesTrainingScoringNew dataModelModelPredicted dataa TrainOps (DevOps) storyDeploy11Data ScientistOperational systemDevOpsWhat is Machine Learning (机器学习概要)The (incomplete) machine learning proce

12、ssTakes significant development, deployment and management effortsIngest DataExtract FeaturesTrain ModelDeploy ModelMake PredictionsHuman Intervention1212Choose Best ModelIdentify Model DegradationPrediction And ScoringManage Deployments数据科学及机器学习新挑战降低数据科学入门门槛 (Citizen Data Scientist)管控机器学习全生命周期提高持续交

13、付能力数据科学的可重复性1133IngestionFeedbackMonitorDataTrainEvalDeployPredict/ActIngestionPrepScoreHistory dataTraining & Validation dataTest dataNew dataAgenda 大纲v 数据科学和机器学习概要Data Science 101Machine Learning 101Data Science and ML Challengesv IBM 数据科学平台介绍IBM Data Science ExperienceIBM Machine Learningv 数据科学和机

14、器学习案例演示14IBM 数据科学工具箱v IBM SPSSv IBM Data Science Experiencev IBM Machine Learning1516IBM Data Science Experience社区教材与数据集连接数据科学家提问文章与论文开源Scala/Python/R/SQLJupyter and Zeppelin* NotebooksRStudio IDE and Shiny appsApache Spark复制与分享项目 Your favorite librariesIBM 提供的能力数据预处理/Pipeline UI *自动数据准备与建模*高级可视化*模型

15、管理与部署模型API文档*Spark云服务/Packaged SparkData Science Experience (DSx) 主要特性v DSx Cloud Service v DSx Local EditionIBM Machine Learning for z/OS 组件Notebook 和可视化建立模型Cognitive Assistant for Data Scientists (CADS)模型部署模型管理持续监控和反馈IBM Machine Learning for z/OS 企业级机器学习平台Feature Highlights CADS 数据科学认知助手18What is

16、CADS?Cognitive Assistant for Data Scientist which helps select the best fit algorithm for trainingWhy Data Scientists need CADS?Many algorithms for classification/regression tasks: SVM, Decision Trees/Forests, Nave Bayes, Logistic Regression, etc.Substantial cost in user and compute time to select t

17、he best algorithmUser spends time on trying various learnersComputational cost for training a single SVM can exceed 24hSelection commonly based on data scientist bias and experienceFeature Highlights CADS/HPOTraining DataLogistic RegressionRandom ForestDecision Tree500500Minimize amount of data to b

18、e considered to make an informed selection of most suitable learnerGiven a data set try to select best approach by directly considering part of actual data19Feature Highlights Integrated Notebook Interface with flexible APIsIngest data from DB2z tableData transformation and training20Feature Highlig

19、hts Data Visualization with Brunel (/Brunel-Visualization/Brunel)212122Feature Highlights Visual Model Builder, the guided Machine Learning InterfaceIngest data and transformTraining and evaluationFeature Highlights Model ManagementManage model, create deploymentManage deploymentFeature Highlights Easily cons

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论