




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
FoundationsofMachineLearning
Regression2023/11/4FoundationsofMachineLearningRegressionSimplelinearregression(简单线性回归)EvaluatingthemodelMultiplelinearregression(多元线性回归)Polynomialregression(多项式回归)Regularization(正则化)ApplyinglinearregressionFittingmodelswithgradientdescent(梯度下降)2023/11/4LinearRegressionLesson3-2SimplelinearregressionSimplelinearregressioncanbeusedtomodelalinearrelationshipbetweenoneresponsevariableandoneexplanatoryvariable.Supposeyouwishtoknowthepriceofapizza.2023/11/4LinearRegressionLesson3-3Observethedata2023/11/4LinearRegressionLesson3-4importmatplotlib.pyplotaspltX=[[6],[8],[10],[14],[18]]y=[[7],[9],[13],[17.5],[18]]plt.figure()plt.title('Pizzapriceplottedagainstdiameter')plt.xlabel('Diameterininches')plt.ylabel('Priceindollars')plt.plot(X,y,'k.')plt.axis([0,25,0,25])plt.grid(True)plt.show()Sklearn.linear_model.LinearRegression2023/11/4LinearRegressionLesson3-5#importsklearnfromsklearn.linear_modelimportLinearRegression#TrainingdataX=[[6],[8],[10],[14],[18]]y=[[7],[9],[13],[17.5],[18]]#Createandfitthemodelmodel=LinearRegression()model.fit(X,y)print('A12"pizzashouldcost:$%.2f'%model.predict([12])[0])#A12"pizzashouldcost:$13.68Sklearn.linear_model.LinearRegressionThesklearn.linear_model.LinearRegressionclassisanestimator.Estimatorspredictavaluebasedontheobserveddata.Inscikit-learn,allestimatorsimplementthefit()andpredict()methods.Theformermethodisusedtolearntheparametersofamodel,andthelattermethodisusedtopredictthevalueofaresponsevariableforanexplanatoryvariableusingthelearnedparameters.Itiseasytoexperimentwithdifferentmodelsusingscikit-learnbecauseallestimatorsimplementthefitandpredictmethods.2023/11/4LinearRegressionLesson3-6Results2023/11/4LinearRegressionLesson3-7print((ercept_,model.coef_))Z=model.predict(X)plt.scatter(X,y)plt.plot(X,Z,color='red')plt.title('Pizzapriceplottedagainstdiameter')plt.xlabel('Diameterininches')plt.ylabel('Priceindollars')plt.show()#(array([1.96551743]),array([[0.9762931]]))EvaluatingthefitnessofamodelRegressionlinesproducedbyseveralsetsofparametervaluesareplottedinthefollowingfigure.Howcanweassesswhichparametersproducedthebest-fittingregressionline?2023/11/4LinearRegressionLesson3-8costfunctionAcostfunction,alsocalledalossfunction,isusedtodefineandmeasuretheerrorofamodel.Thedifferencesbetweenthepricespredictedbythemodelandtheobservedpricesofthepizzasinthetrainingsetarecalledresidualsortrainingerrors.Later,wewillevaluateamodelonaseparatesetoftestdata;thedifferencesbetweenthepredictedandobservedvaluesinthetestdataarecalledpredictionerrorsortesterrors.Theresidualsforourmodelareindicatedbytheverticallinesbetweenthepointsforthetraininginstancesandregressionhyperplaneinthefollowingplot:2023/11/4LinearRegressionLesson3-9Wecanproducethebestpizza-pricepredictorbyminimizingthesumoftheresiduals.Thatis,ourmodelfitsifthevaluesitpredictsfortheresponsevariableareclosetotheobservedvaluesforallofthetrainingexamples.Thismeasureofthemodel'sfitnessiscalledtheresidualsumofsquarescostfunction.2023/11/4LinearRegressionLesson3-10importnumpyasnprss=np.sum((model.predict(X)-y)**2)print('Residualsumofsquares:%.2f'%(rss,))#Residualsumofsquares:8.75Solvingordinaryleastsquaresforsimplelinearregression对于一元线性回归模型,
假设从总体中获取了n组观察值(X1,Y1),(X2,Y2),
…,(Xn,Yn)。对于平面中的这n个点,可以使用无数条曲线来拟合。要求样本回归函数尽可能好地拟合这组值,最常用的是普通最小二乘法(
Ordinary
LeastSquare,OLS):所选择的回归模型应该使所有观察值的残差平方和达到最小。2023/11/4LinearRegressionLesson3-11varianceofx>>>importnumpyasnp>>>printnp.var([6,8,10,14,18],ddof=1)covarianceofxandy>>>importnumpyasnp>>>printnp.cov([6,8,10,14,18],[7,9,13,17.5,18])[0][1]2023/11/4LinearRegressionLesson3-12Nowthatwehavecalculatedthevarianceofourexplanatoryvariableandthecovarianceoftheresponseandexplanatoryvariables,wecansolveusingthefollowingformula:Havingsolvedβ,wecansolveαusingthefollowingformula:2023/11/4LinearRegressionLesson3-13EvaluatingthemodelWehaveusedalearningalgorithmtoestimateamodel'sparametersfromthetrainingdata.Howcanweassesswhetherourmodelisagoodrepresentationoftherealrelationship?2023/11/4LinearRegressionLesson3-14R-squaredSeveralmeasurescanbeusedtoassessourmodel'spredictivecapabilities.Wewillevaluateourpizza-pricepredictorusingr-squared.R-squaredmeasureshowwelltheobservedvaluesoftheresponsevariablesarepredictedbythemodel.Moreconcretely,r-squaredistheproportionofthevarianceintheresponsevariablethatisexplainedbythemodel.Anr-squaredscoreofoneindicatesthattheresponsevariablecanbepredictedwithoutanyerrorusingthemodel.Anr-squaredscoreofonehalfindicatesthathalfofthevarianceintheresponsevariablecanbepredictedusingthemodel.Thereareseveralmethodstocalculater-squared.Inthecaseofsimplelinearregression,r-squaredisequaltothesquareofthePearsonproductmomentcorrelationcoefficient,orPearson'sr.2023/11/4LinearRegressionLesson3-15
2023/11/4LinearRegressionLesson3-16MultiplelinearregressionFormally,multiplelinearregressionisthefollowingmodel:Let'supdateourpizzatrainingdatatoincludethenumberoftoppingswiththefollowingvalues:2023/11/4LinearRegressionLesson3-17MultiplelinearregressionFormally,multiplelinearregressionisthefollowingmodel:2023/11/4LinearRegressionLesson3-18MultiplelinearregressionFormally,multiplelinearregressionisthefollowingmodel:Let'supdateourpizzatrainingdatatoincludethenumberoftoppingswiththefollowingvalues:Wemustalsoupdateourtestdatatoincludethesecondexplanatoryvariable,asfollows:2023/11/4LinearRegressionLesson3-19WewillmultiplyXbyitstransposetoyieldasquarematrixthatcanbeinverted.DenotedwithasuperscriptT,thetransposeofamatrixisformedbyturningtherowsofthematrixintocolumnsandviceversa,asfollows:2023/11/4LinearRegressionLesson3-20>>>fromnumpy.linalgimportinv>>>fromnumpyimportdot,transpose>>>X=[[1,6,2],[1,8,1],[1,10,0],[1,14,2],[1,18,0]]>>>y=[[7],[9],[13],[17.5],[18]]>>>printdot(inv(dot(transpose(X),X)),dot(transpose(X),y))[[1.1875][1.01041667][0.39583333]]2023/11/4LinearRegressionLesson3-21NumPyalsoprovidesaleastsquaresfunctionthatcansolvethevaluesoftheparametersmorecompactly:>>>fromnumpy.linalgimportlstsq>>>X=[[1,6,2],[1,8,1],[1,10,0],[1,14,2],[1,18,0]]>>>y=[[7],[9],[13],[17.5],[18]]>>>printlstsq(X,y)[0][[1.1875][1.01041667][0.39583333]]2023/11/4LinearRegressionLesson3-22sklearn.linear_model.LinearRegression2023/11/4LinearRegressionLesson3-23PolynomialregressionInthepreviousexamples,weassumedthattherealrelationshipbetweentheexplanatoryvariablesandtheresponsevariableislinear.Thisassumptionisnotalwaystrue.Inthissection,wewillusepolynomialregression,aspecialcaseofmultiplelinearregressionthataddstermswithdegreesgreaterthanonetothemodel.Thereal-worldcurvilinearrelationshipiscapturedwhenyoutransformthetrainingdatabyaddingpolynomialterms,whicharethenfitinthesamemannerasinmultiplelinearregression.Foreaseofvisualization,wewillagainuseonlyoneexplanatoryvariable,thepizza'sdiameter.Let'scomparelinearregressionwithpolynomialregressionusingthefollowingdatasets:2023/11/4LinearRegressionLesson3-24Quadraticregression,orregressionwithasecondorderpolynomial,isgivenbythefollowingformula:Weareusingonlyoneexplanatoryvariable,butthemodelnowhasthreetermsinsteadoftwo.Theexplanatoryvariablehasbeentransformedandaddedasathirdtermtothemodeltocapturethecurvilinearrelationship.Also,notethattheequationforpolynomialregressionisthesameastheequationformultiplelinearregressioninvectornotation.ThePolynomialFeaturestransformercanbeusedtoeasilyaddpolynomialfeaturestoafeaturerepresentation.Let‘sfitamodeltothesefeatures,andcompareittothesimplelinearregressionmodel.2023/11/4LinearRegressionLesson3-252023/11/4LinearRegressionLesson3-262023/11/4LinearRegressionLesson3-27Now,let'stryanevenhigher-orderpolynomial.Theplotinthefollowingfigureshowsaregressioncurvecreatedbyaninth-degreepolynomial:2023/11/4LinearRegressionLesson3-28Theninth-degreepolynomialregressionmodelfitsthetrainingdataalmostexactly!Themodel'sr-squaredscore,however,is-0.09.Wecreatedanextremelycomplexmodelthatfitsthetrainingdataexactly,butfailstoapproximatetherealrelationship.Thisproblemiscalledover-fitting.Themodelshouldinduceageneralruletomapinputstooutputs;instead,ithasmemorizedtheinputsandoutputsfromthetrainingdata.Asaresult,themodelperformspoorlyontestdata.Itpredictsthata16inchpizzashouldcostlessthan$10,andan18inchpizzashouldcostmorethan$30.Thismodelexactlyfitsthetrainingdata,butfailstolearntherealrelationshipbetweensizeandprice.2023/11/4LinearRegressionLesson3-29RegularizationRegularizationisacollectionoftechniquesthatcanbeusedtopreventover-fitting.Regularizationaddsinformationtoaproblem,oftenintheformofapenaltyagainstcomplexity,toaproblem.Occam‘srazor(奥卡姆剃刀定律)statesthatahypothesiswiththefewestassumptionsisthebest.Accordingly,regularizationattemptstofindthesimplestmodelthatexplainsthedata.2023/11/4LinearRegressionLesson3-30
2023/11/4LinearRegressionLesson3-31
2023/11/4LinearRegressionLesson3-32
2023/11/4LinearRegressionLesson3-33ApplyinglinearregressionAssumethatyouareataparty,andthatyouwishtodrinkthebestwinethatisavailable.Youcouldaskyourfriendsforrecommendations,butyoususpectthattheywilldrinkanywine,regardlessofitsprovenance.Fortunately,youhavebroughtpHteststripsandothertoolstomeasurevariousphysicochemicalpropertiesofwine—itis,afterall,aparty.Wewillusemachinelearningtopredictthequalityofthewinebasedonitsphysicochemicalattributes.2023/11/4LinearRegressionLesson3-34TheUCIMachineLearningRepository'sWinedatasetmeasureselevenphysicochemicalattributes,includingthepHandalcoholcontent,of1,599differentredwines.Eachwine'squalityhasbeenscoredbyhumanjudges.Thescoresrangefromzerototen;zeroistheworstqualityandtenisthebestquality.Thedatasetcanbedownloadedfrom/ml/datasets/Wine.Wewillapproachthisproblemasaregressiontaskandregressthewine'squalityontooneormorephysicochemicalattributes.Theresponsevariableinthisproblemtakesonlyintegervaluesbetween0and10;wecouldviewtheseasdiscretevaluesandapproachtheproblemasamulticlassclassificationtask.Inthischapter,however,wewillviewtheresponsevariableasacontinuousvalue.2023/11/4LinearRegressionLesson3-35Exploringthedatafixedacidity非挥发性酸,volatileacidity挥发性酸,citricacid柠檬酸,residualsugar剩余糖分,chlorides氯化物,freesulfurdioxide游离二氧化硫,totalsulfurdioxide总二氧化硫,density密度,pH酸碱性,sulphates硫酸盐,alcohol酒精,quality质量2023/11/4LinearRegressionLesson3-36First,wewillloadthedatasetandreviewsomebasicsummarystatisticsforthevariables.Thedataisprovidedasa.csvfile.Notethatthefieldsareseparatedbysemicolonsratherthancommas):2023/11/4LinearRegressionLesson3-37Visualizingthedatacanhelpindicateifrelationshipsexistbetweentheresponsevariableandtheexplanatoryvariables.Let'susematplotlibtocreatesomescatterplots.Considerthefollowingcodesnippet:2023/11/4LinearRegressionLesson3-382023/11/4LinearRegressionLesson3-39Theseplotssuggestthattheresponsevariabledependsonmultipleexplanatoryvariables;let'smodeltherelationshipwithmultiplelinearregression.Howcanwedecidewhichexplanatoryvariablestoincludeinthemodel?Dataframe.corr()calculatesapairwisecorrelationmatrix.Thecorrelationmatrixconfirmsthatthestrongestpositivecorrelationisbetweenthealcoholandquality,andthatqualityisnegativelycorrelatedwithvolatileacidity,anattributethatcancausewinetotastelikevinegar.Tosummarize,wehavehypothesizedthatgoodwineshavehighalcoholcontentanddonottastelikevinegar.Thishypothesisseemssensible,thoughitsuggeststhatwineaficionadosmayhavelesssophisticatedpalatesthantheyclaim2023/11/4LinearRegressionLesson3-40Fittingandevaluatingthemodel2023/11/4LinearRegressionLesson3-41Ther-squaredscoreof0.35indicatesthat35percentofthevarianceinthetestsetisexplainedbythemodel.Theperformancemightchangeifadifferent75percentofthedataispartitionedtothetrainingset.Wecanusecross-validationtoproduceabetterestimateoftheestimator'sperformance.Recallfromchapteronethateachcross-validationroundtrainsandtestsdifferentpartitionsofthedatatoreducevariability:2023/11/4LinearRegressionLesson3-42Thefollowingfigureshowstheoutputoftheprecedingcode:2023/11/4LinearRegressionLesson3-43FittingmodelswithgradientdescentIntheexamplesinthischapter,weanalyticallysolvedthevaluesofthemodel‘sparametersthatminimizethecostfunctionwiththefollowingequation:RecallthatXisthematrixofthevaluesoftheexplanatoryvariablesforeachtrainingexample.ThedotproductofXTXresultsinasquarematrixwithdimensionsn×n,wherenisequaltothenumberofexplanatoryvariables.Thecomputationalcomplexityofinvertingthissquarematrixisnearlycubicinthenumberofexplanatoryvariables.Furthermore,XTXcannotbeinvertedifitsdeterminantisequaltozero.2023/11/4LinearRegressionLesson3-44GradientdescentInthissection,wewilldiscussanothermethodtoefficientlyestimatetheoptimalvaluesofthemodel'sparameterscalledgradientdescent.Notethatourdefinitionofagoodfithasnotchanged;wewillstillusegradientdescenttoestimatethevaluesofthemodel‘sparametersthatminimizethevalueofthecostfunction.Gradientdescentissometimesdescribedbytheanalogyofablindfoldedmanwhoistryingtofindhiswayfromsomewhereonamountainsidetothelowestpointofthevalley.2023/11/4LinearRegressionLesson3-45Formally,gradientdescentisanoptimizationalgorithmthatcanbeusedtoestimatethelocalminimumofafunction.Recallthatweareusingtheresidualsumofsquarescostfunction,whichisgivenbythefollowingequation:Wecanusegradientdescenttofindthevaluesofthemodel'sparametersthatminimizethevalueofthecostfunction.Gradientdescentiterativelyupdatesthevaluesofthemodel'sparametersbycalculatingthepartialderivativeofthecostfunctionateachstep.2023/11/4LinearRegressionLesson3-46Itisimportanttonotethatgradientdescentestimatesthelocalminimumofafunction.Athree-dimensionalplotofthevaluesofaconvexcostfunctionforallpossiblevaluesoftheparameterslookslikeabowl.Thebottomofthebowlisthesolelocalminimum.Non-convexcostfunctionscanhavemanylocalminima,thatis,theplotsofthevaluesoftheircostfunctionscanhavemanypeaksandvalleys.Gradientdescentisonlyguaranteedtofindthelocalminimum;itwillfindavalley,butwillnotnecessarilyfindthelowestvalley.Fortunately,theresidualsumofthesquarescostfunctionisconvex.2023/11/4LinearRegressionLesson3-47Typesof
GradientdescentGradientdescentcanvaryintermsofthenumberoftrainingpatternsusedtocalculateerror;thatisinturnusedtoupdatethemodel.Thenumberofpatternsusedtocalculatetheerrorincludeshowstablethegradientisthatisusedtoupdatethemodel.Wewillseethatthereisatensioningradientdescentconfigurationsofcomputationalefficiencyandthefidelityoftheerrorgradient.Thethreemainflavorsofgradientdescentarebatch,stochastic,andmini-batch.2023/11/4LinearRegressionLesson3-48StochasticGradientDescentStochasticgradientdescent,oftenabbreviatedSGD,isavariationofthegradientdescentalgorithmthatcalculatestheerrorandupdatesthemodelforeachexampleinthetrainingdataset.Theupdateofthemodelforeachtrainingexamplemeansthatstochasticgradientdescentisoftencalledanonlinemachinelearningalgorithm.2023/11/4LinearRegressionLesson3-49StochasticGradientDescentUpsidesThefrequentupdatesimmediatelygiveaninsightintotheperformanceofthemodelandtherateofimprovement.Thisvariantofgradientdescentmaybethesimplesttounderstandandimplement,especiallyforbeginners.Theincreasedmodelupdatefrequencycanresultinfasterlearningonsomeproblems.Thenoisyupdateprocesscanallowthemodeltoavoidlocalminima(e.g.prematureconvergence).2023/11/4LinearRegressionLesson3-50StochasticGradientDescentUpsidesDownsidesUpdatingthemodelsofrequentlyismorecomputationallyexpensivethanotherconfigurationsofgradientdescent,takingsignificantlylongertotrainmodelsonlargedatasets.Thefrequentupdatescanresultinanoisygradientsignal,whichmaycausethemodelparametersandinturnthemodelerrortojumparound(haveahighervarianceovertrainingepochs).Thenoisylearningprocessdowntheerrorgradientcanalsomakeithardforthealgorithmtosettleonanerrorminimumforthemodel.2023/11/4LinearRegressionLesson3-51BatchGradientDescentBatchgradientdescentisavariationofthegradientdescentalgorithmthatcalculatestheerrorforeachexampleinthetrainingdataset,butonlyupdatesthemodelafteralltrainingexampleshavebeenevaluated.Onecyclethroughtheentiretrainingdatasetiscalledatrainingepoch.Therefore,itisoftensaidthatbatchgradientdescentperformsmodelupdatesattheendofeachtrainingepoch.2023/11/4LinearRegressionLesson3-52BatchGradientDescentUpsidesFewerupdatestothemodelmeansthisvariantofgradientdescentismorecomputationallyefficientthanstochasticgradientdescent.Thedecreasedupdatefrequencyresultsinamorestableerrorgradientandmayresultinamorestableconvergenceonsomeproblems.Theseparationofthecalculationofpredictionerrorsandthemodelupdatelendsthealgorithmtoparallelprocessingbasedimplementations.2023/11/4LinearRegressionLesson3-53BatchGradientDescentUpsidesDownsidesThemorestableerrorgradientmayresultinprematureconvergenceofthemodeltoalessoptimalsetofparameters.Theupdatesattheendofthetrainingepochrequiretheadditionalcomplexityofaccumulatingpredictionerrorsacrossalltrainingexamples.Commonly,batchgradientdescentisimplementedinsuchawaythatitrequirestheentiretrainingdatasetinmemoryandavailabletothealgorithm.Modelupdates,andinturntrainingspeed,maybecomeveryslowforlargedatasets2023/11/4LinearRegressionLesson3-54Mini-BatchGradientDescentMini-batchgradientdescentisavariationofthegradientdescentalgorithmthatsplitsthetrainingdatasetintosmallbatchesthatareusedtocalculatemodelerrorandupdatemodelcoefficients.Implementationsmaychoosetosumthegradientoverthemini-batchwhichfurtherreducesthevarianceofthegradient.Mini-batchgradientdescentseekstofindabalancebetweentherobustnessofstochasticgradientdescentandtheefficiencyofbatchgradientdescent.Itisthemostcommonimplementationofgradien
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 养老机构医养结合项目投资分析与运营效益报告
- 自动化测试的成本与效益分析试题及答案
- JAVA网络协议的基础知识试题及答案
- 领导科学与组织管理模式的融合试题及答案
- 四级数据库试题备考及指导
- 2025年教育行业数字化教材开发与历史教育创新研究
- 工业互联网平台入侵检测系统2025年在网络安全防护中的数据安全防护策略
- 优化信用环境助力统一大市场建设的策略及实施路径
- 2025年春七年级下册道德与法治导学案 第二课 第2课时 学会管理情绪
- 2025年工业互联网平台SDN在智能城市基础设施安全监测中的应用报告
- 衢州万达暖通工程施工方案(最终版)
- (完整版)ECRS培训课件
- 学校端午假期致学生家长一封信
- 第1本书出体旅程journeys out of the body精教版2003版
- 链轮齿数尺寸对照表三
- 塑料制品事业部独立核算体系文件
- 《鸿门宴》话剧剧本
- 灸法操作规程完整
- 金蝶ERP实施-01-10-02供应链系统调研报告
- 展业低潮如何度过PPT课件
- 汽车轮毂夹具说明书
评论
0/150
提交评论