FAFU机器学习03-2 Linear Regression课件

上传人：秋*** IP属地：陕西上传时间：2023-11-05 格式：PPTX 页数：62 大小：1.84MB 积分：30 举报 版权申诉

已阅读5页，还剩57页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

FoundationsofMachineLearning

Regression2023/11/4FoundationsofMachineLearningRegressionSimplelinearregression（简单线性回归）EvaluatingthemodelMultiplelinearregression（多元线性回归）Polynomialregression（多项式回归）Regularization（正则化）ApplyinglinearregressionFittingmodelswithgradientdescent（梯度下降）2023/11/4LinearRegressionLesson3-2SimplelinearregressionSimplelinearregressioncanbeusedtomodelalinearrelationshipbetweenoneresponsevariableandoneexplanatoryvariable.Supposeyouwishtoknowthepriceofapizza.2023/11/4LinearRegressionLesson3-3Observethedata2023/11/4LinearRegressionLesson3-4importmatplotlib.pyplotaspltX=[[6],[8],[10],[14],[18]]y=[[7],[9],[13],[17.5],[18]]plt.figure()plt.title('Pizzapriceplottedagainstdiameter')plt.xlabel('Diameterininches')plt.ylabel('Priceindollars')plt.plot(X,y,'k.')plt.axis([0,25,0,25])plt.grid(True)plt.show()Sklearn.linear_model.LinearRegression2023/11/4LinearRegressionLesson3-5#importsklearnfromsklearn.linear_modelimportLinearRegression#TrainingdataX=[[6],[8],[10],[14],[18]]y=[[7],[9],[13],[17.5],[18]]#Createandfitthemodelmodel=LinearRegression()model.fit(X,y)print('A12"pizzashouldcost:$%.2f'%model.predict([12])[0])#A12"pizzashouldcost:$13.68Sklearn.linear_model.LinearRegressionThesklearn.linear_model.LinearRegressionclassisanestimator.Estimatorspredictavaluebasedontheobserveddata.Inscikit-learn,allestimatorsimplementthefit()andpredict()methods.Theformermethodisusedtolearntheparametersofamodel,andthelattermethodisusedtopredictthevalueofaresponsevariableforanexplanatoryvariableusingthelearnedparameters.Itiseasytoexperimentwithdifferentmodelsusingscikit-learnbecauseallestimatorsimplementthefitandpredictmethods.2023/11/4LinearRegressionLesson3-6Results2023/11/4LinearRegressionLesson3-7print((ercept_,model.coef_))Z=model.predict(X)plt.scatter(X,y)plt.plot(X,Z,color='red')plt.title('Pizzapriceplottedagainstdiameter')plt.xlabel('Diameterininches')plt.ylabel('Priceindollars')plt.show()#(array([1.96551743]),array([[0.9762931]]))EvaluatingthefitnessofamodelRegressionlinesproducedbyseveralsetsofparametervaluesareplottedinthefollowingfigure.Howcanweassesswhichparametersproducedthebest-fittingregressionline?2023/11/4LinearRegressionLesson3-8costfunctionAcostfunction,alsocalledalossfunction,isusedtodefineandmeasuretheerrorofamodel.Thedifferencesbetweenthepricespredictedbythemodelandtheobservedpricesofthepizzasinthetrainingsetarecalledresidualsortrainingerrors.Later,wewillevaluateamodelonaseparatesetoftestdata;thedifferencesbetweenthepredictedandobservedvaluesinthetestdataarecalledpredictionerrorsortesterrors.Theresidualsforourmodelareindicatedbytheverticallinesbetweenthepointsforthetraininginstancesandregressionhyperplaneinthefollowingplot:2023/11/4LinearRegressionLesson3-9Wecanproducethebestpizza-pricepredictorbyminimizingthesumoftheresiduals.Thatis,ourmodelfitsifthevaluesitpredictsfortheresponsevariableareclosetotheobservedvaluesforallofthetrainingexamples.Thismeasureofthemodel'sfitnessiscalledtheresidualsumofsquarescostfunction.2023/11/4LinearRegressionLesson3-10importnumpyasnprss=np.sum((model.predict(X)-y)**2)print('Residualsumofsquares:%.2f'%(rss,))#Residualsumofsquares:8.75Solvingordinaryleastsquaresforsimplelinearregression对于一元线性回归模型,

假设从总体中获取了n组观察值（X1，Y1），（X2，Y2），

…，（Xn，Yn）。对于平面中的这n个点，可以使用无数条曲线来拟合。要求样本回归函数尽可能好地拟合这组值,最常用的是普通最小二乘法（

Ordinary

LeastSquare，OLS）：所选择的回归模型应该使所有观察值的残差平方和达到最小。2023/11/4LinearRegressionLesson3-11varianceofx>>>importnumpyasnp>>>printnp.var([6,8,10,14,18],ddof=1)covarianceofxandy>>>importnumpyasnp>>>printnp.cov([6,8,10,14,18],[7,9,13,17.5,18])[0][1]2023/11/4LinearRegressionLesson3-12Nowthatwehavecalculatedthevarianceofourexplanatoryvariableandthecovarianceoftheresponseandexplanatoryvariables,wecansolveusingthefollowingformula:Havingsolvedβ,wecansolveαusingthefollowingformula:2023/11/4LinearRegressionLesson3-13EvaluatingthemodelWehaveusedalearningalgorithmtoestimateamodel'sparametersfromthetrainingdata.Howcanweassesswhetherourmodelisagoodrepresentationoftherealrelationship?2023/11/4LinearRegressionLesson3-14R-squaredSeveralmeasurescanbeusedtoassessourmodel'spredictivecapabilities.Wewillevaluateourpizza-pricepredictorusingr-squared.R-squaredmeasureshowwelltheobservedvaluesoftheresponsevariablesarepredictedbythemodel.Moreconcretely,r-squaredistheproportionofthevarianceintheresponsevariablethatisexplainedbythemodel.Anr-squaredscoreofoneindicatesthattheresponsevariablecanbepredictedwithoutanyerrorusingthemodel.Anr-squaredscoreofonehalfindicatesthathalfofthevarianceintheresponsevariablecanbepredictedusingthemodel.Thereareseveralmethodstocalculater-squared.Inthecaseofsimplelinearregression,r-squaredisequaltothesquareofthePearsonproductmomentcorrelationcoefficient,orPearson'sr.2023/11/4LinearRegressionLesson3-15

2023/11/4LinearRegressionLesson3-16MultiplelinearregressionFormally,multiplelinearregressionisthefollowingmodel:Let'supdateourpizzatrainingdatatoincludethenumberoftoppingswiththefollowingvalues:2023/11/4LinearRegressionLesson3-17MultiplelinearregressionFormally,multiplelinearregressionisthefollowingmodel:2023/11/4LinearRegressionLesson3-18MultiplelinearregressionFormally,multiplelinearregressionisthefollowingmodel:Let'supdateourpizzatrainingdatatoincludethenumberoftoppingswiththefollowingvalues:Wemustalsoupdateourtestdatatoincludethesecondexplanatoryvariable,asfollows:2023/11/4LinearRegressionLesson3-19WewillmultiplyXbyitstransposetoyieldasquarematrixthatcanbeinverted.DenotedwithasuperscriptT,thetransposeofamatrixisformedbyturningtherowsofthematrixintocolumnsandviceversa,asfollows:2023/11/4LinearRegressionLesson3-20>>>fromnumpy.linalgimportinv>>>fromnumpyimportdot,transpose>>>X=[[1,6,2],[1,8,1],[1,10,0],[1,14,2],[1,18,0]]>>>y=[[7],[9],[13],[17.5],[18]]>>>printdot(inv(dot(transpose(X),X)),dot(transpose(X),y))[[1.1875][1.01041667][0.39583333]]2023/11/4LinearRegressionLesson3-21NumPyalsoprovidesaleastsquaresfunctionthatcansolvethevaluesoftheparametersmorecompactly:>>>fromnumpy.linalgimportlstsq>>>X=[[1,6,2],[1,8,1],[1,10,0],[1,14,2],[1,18,0]]>>>y=[[7],[9],[13],[17.5],[18]]>>>printlstsq(X,y)[0][[1.1875][1.01041667][0.39583333]]2023/11/4LinearRegressionLesson3-22sklearn.linear_model.LinearRegression2023/11/4LinearRegressionLesson3-23PolynomialregressionInthepreviousexamples,weassumedthattherealrelationshipbetweentheexplanatoryvariablesandtheresponsevariableislinear.Thisassumptionisnotalwaystrue.Inthissection,wewillusepolynomialregression,aspecialcaseofmultiplelinearregressionthataddstermswithdegreesgreaterthanonetothemodel.Thereal-worldcurvilinearrelationshipiscapturedwhenyoutransformthetrainingdatabyaddingpolynomialterms,whicharethenfitinthesamemannerasinmultiplelinearregression.Foreaseofvisualization,wewillagainuseonlyoneexplanatoryvariable,thepizza'sdiameter.Let'scomparelinearregressionwithpolynomialregressionusingthefollowingdatasets:2023/11/4LinearRegressionLesson3-24Quadraticregression,orregressionwithasecondorderpolynomial,isgivenbythefollowingformula:Weareusingonlyoneexplanatoryvariable,butthemodelnowhasthreetermsinsteadoftwo.Theexplanatoryvariablehasbeentransformedandaddedasathirdtermtothemodeltocapturethecurvilinearrelationship.Also,notethattheequationforpolynomialregressionisthesameastheequationformultiplelinearregressioninvectornotation.ThePolynomialFeaturestransformercanbeusedtoeasilyaddpolynomialfeaturestoafeaturerepresentation.Let‘sfitamodeltothesefeatures,andcompareittothesimplelinearregressionmodel.2023/11/4LinearRegressionLesson3-252023/11/4LinearRegressionLesson3-262023/11/4LinearRegressionLesson3-27Now,let'stryanevenhigher-orderpolynomial.Theplotinthefollowingfigureshowsaregressioncurvecreatedbyaninth-degreepolynomial:2023/11/4LinearRegressionLesson3-28Theninth-degreepolynomialregressionmodelfitsthetrainingdataalmostexactly!Themodel'sr-squaredscore,however,is-0.09.Wecreatedanextremelycomplexmodelthatfitsthetrainingdataexactly,butfailstoapproximatetherealrelationship.Thisproblemiscalledover-fitting.Themodelshouldinduceageneralruletomapinputstooutputs;instead,ithasmemorizedtheinputsandoutputsfromthetrainingdata.Asaresult,themodelperformspoorlyontestdata.Itpredictsthata16inchpizzashouldcostlessthan$10,andan18inchpizzashouldcostmorethan$30.Thismodelexactlyfitsthetrainingdata,butfailstolearntherealrelationshipbetweensizeandprice.2023/11/4LinearRegressionLesson3-29RegularizationRegularizationisacollectionoftechniquesthatcanbeusedtopreventover-fitting.Regularizationaddsinformationtoaproblem,oftenintheformofapenaltyagainstcomplexity,toaproblem.Occam‘srazor（奥卡姆剃刀定律）statesthatahypothesiswiththefewestassumptionsisthebest.Accordingly,regularizationattemptstofindthesimplestmodelthatexplainsthedata.2023/11/4LinearRegressionLesson3-30

2023/11/4LinearRegressionLesson3-31

2023/11/4LinearRegressionLesson3-32

2023/11/4LinearRegressionLesson3-33ApplyinglinearregressionAssumethatyouareataparty,andthatyouwishtodrinkthebestwinethatisavailable.Youcouldaskyourfriendsforrecommendations,butyoususpectthattheywilldrinkanywine,regardlessofitsprovenance.Fortunately,youhavebroughtpHteststripsandothertoolstomeasurevariousphysicochemicalpropertiesofwine—itis,afterall,aparty.Wewillusemachinelearningtopredictthequalityofthewinebasedonitsphysicochemicalattributes.2023/11/4LinearRegressionLesson3-34TheUCIMachineLearningRepository'sWinedatasetmeasureselevenphysicochemicalattributes,includingthepHandalcoholcontent,of1,599differentredwines.Eachwine'squalityhasbeenscoredbyhumanjudges.Thescoresrangefromzerototen;zeroistheworstqualityandtenisthebestquality.Thedatasetcanbedownloadedfrom/ml/datasets/Wine.Wewillapproachthisproblemasaregressiontaskandregressthewine'squalityontooneormorephysicochemicalattributes.Theresponsevariableinthisproblemtakesonlyintegervaluesbetween0and10;wecouldviewtheseasdiscretevaluesandapproachtheproblemasamulticlassclassificationtask.Inthischapter,however,wewillviewtheresponsevariableasacontinuousvalue.2023/11/4LinearRegressionLesson3-35Exploringthedatafixedacidity非挥发性酸，volatileacidity挥发性酸，citricacid柠檬酸，residualsugar剩余糖分，chlorides氯化物，freesulfurdioxide游离二氧化硫，totalsulfurdioxide总二氧化硫，density密度，pH酸碱性，sulphates硫酸盐，alcohol酒精，quality质量2023/11/4LinearRegressionLesson3-36First,wewillloadthedatasetandreviewsomebasicsummarystatisticsforthevariables.Thedataisprovidedasa.csvfile.Notethatthefieldsareseparatedbysemicolonsratherthancommas):2023/11/4LinearRegressionLesson3-37Visualizingthedatacanhelpindicateifrelationshipsexistbetweentheresponsevariableandtheexplanatoryvariables.Let'susematplotlibtocreatesomescatterplots.Considerthefollowingcodesnippet:2023/11/4LinearRegressionLesson3-382023/11/4LinearRegressionLesson3-39Theseplotssuggestthattheresponsevariabledependsonmultipleexplanatoryvariables;let'smodeltherelationshipwithmultiplelinearregression.Howcanwedecidewhichexplanatoryvariablestoincludeinthemodel?Dataframe.corr()calculatesapairwisecorrelationmatrix.Thecorrelationmatrixconfirmsthatthestrongestpositivecorrelationisbetweenthealcoholandquality,andthatqualityisnegativelycorrelatedwithvolatileacidity,anattributethatcancausewinetotastelikevinegar.Tosummarize,wehavehypothesizedthatgoodwineshavehighalcoholcontentanddonottastelikevinegar.Thishypothesisseemssensible,thoughitsuggeststhatwineaficionadosmayhavelesssophisticatedpalatesthantheyclaim2023/11/4LinearRegressionLesson3-40Fittingandevaluatingthemodel2023/11/4LinearRegressionLesson3-41Ther-squaredscoreof0.35indicatesthat35percentofthevarianceinthetestsetisexplainedbythemodel.Theperformancemightchangeifadifferent75percentofthedataispartitionedtothetrainingset.Wecanusecross-validationtoproduceabetterestimateoftheestimator'sperformance.Recallfromchapteronethateachcross-validationroundtrainsandtestsdifferentpartitionsofthedatatoreducevariability:2023/11/4LinearRegressionLesson3-42Thefollowingfigureshowstheoutputoftheprecedingcode:2023/11/4LinearRegressionLesson3-43FittingmodelswithgradientdescentIntheexamplesinthischapter,weanalyticallysolvedthevaluesofthemodel‘sparametersthatminimizethecostfunctionwiththefollowingequation:RecallthatXisthematrixofthevaluesoftheexplanatoryvariablesforeachtrainingexample.ThedotproductofXTXresultsinasquarematrixwithdimensionsn×n,wherenisequaltothenumberofexplanatoryvariables.Thecomputationalcomplexityofinvertingthissquarematrixisnearlycubicinthenumberofexplanatoryvariables.Furthermore,XTXcannotbeinvertedifitsdeterminantisequaltozero.2023/11/4LinearRegressionLesson3-44GradientdescentInthissection,wewilldiscussanothermethodtoefficientlyestimatetheoptimalvaluesofthemodel'sparameterscalledgradientdescent.Notethatourdefinitionofagoodfithasnotchanged;wewillstillusegradientdescenttoestimatethevaluesofthemodel‘sparametersthatminimizethevalueofthecostfunction.Gradientdescentissometimesdescribedbytheanalogyofablindfoldedmanwhoistryingtofindhiswayfromsomewhereonamountainsidetothelowestpointofthevalley.2023/11/4LinearRegressionLesson3-45Formally,gradientdescentisanoptimizationalgorithmthatcanbeusedtoestimatethelocalminimumofafunction.Recallthatweareusingtheresidualsumofsquarescostfunction,whichisgivenbythefollowingequation:Wecanusegradientdescenttofindthevaluesofthemodel'sparametersthatminimizethevalueofthecostfunction.Gradientdescentiterativelyupdatesthevaluesofthemodel'sparametersbycalculatingthepartialderivativeofthecostfunctionateachstep.2023/11/4LinearRegressionLesson3-46Itisimportanttonotethatgradientdescentestimatesthelocalminimumofafunction.Athree-dimensionalplotofthevaluesofaconvexcostfunctionforallpossiblevaluesoftheparameterslookslikeabowl.Thebottomofthebowlisthesolelocalminimum.Non-convexcostfunctionscanhavemanylocalminima,thatis,theplotsofthevaluesoftheircostfunctionscanhavemanypeaksandvalleys.Gradientdescentisonlyguaranteedtofindthelocalminimum;itwillfindavalley,butwillnotnecessarilyfindthelowestvalley.Fortunately,theresidualsumofthesquarescostfunctionisconvex.2023/11/4LinearRegressionLesson3-47Typesof

GradientdescentGradientdescentcanvaryintermsofthenumberoftrainingpatternsusedtocalculateerror;thatisinturnusedtoupdatethemodel.Thenumberofpatternsusedtocalculatetheerrorincludeshowstablethegradientisthatisusedtoupdatethemodel.Wewillseethatthereisatensioningradientdescentconfigurationsofcomputationalefficiencyandthefidelityoftheerrorgradient.Thethreemainflavorsofgradientdescentarebatch,stochastic,andmini-batch.2023/11/4LinearRegressionLesson3-48StochasticGradientDescentStochasticgradientdescent,oftenabbreviatedSGD,isavariationofthegradientdescentalgorithmthatcalculatestheerrorandupdatesthemodelforeachexampleinthetrainingdataset.Theupdateofthemodelforeachtrainingexamplemeansthatstochasticgradientdescentisoftencalledanonlinemachinelearningalgorithm.2023/11/4LinearRegressionLesson3-49StochasticGradientDescentUpsidesThefrequentupdatesimmediatelygiveaninsightintotheperformanceofthemodelandtherateofimprovement.Thisvariantofgradientdescentmaybethesimplesttounderstandandimplement,especiallyforbeginners.Theincreasedmodelupdatefrequencycanresultinfasterlearningonsomeproblems.Thenoisyupdateprocesscanallowthemodeltoavoidlocalminima(e.g.prematureconvergence).2023/11/4LinearRegressionLesson3-50StochasticGradientDescentUpsidesDownsidesUpdatingthemodelsofrequentlyismorecomputationallyexpensivethanotherconfigurationsofgradientdescent,takingsignificantlylongertotrainmodelsonlargedatasets.Thefrequentupdatescanresultinanoisygradientsignal,whichmaycausethemodelparametersandinturnthemodelerrortojumparound(haveahighervarianceovertrainingepochs).Thenoisylearningprocessdowntheerrorgradientcanalsomakeithardforthealgorithmtosettleonanerrorminimumforthemodel.2023/11/4LinearRegressionLesson3-51BatchGradientDescentBatchgradientdescentisavariationofthegradientdescentalgorithmthatcalculatestheerrorforeachexampleinthetrainingdataset,butonlyupdatesthemodelafteralltrainingexampleshavebeenevaluated.Onecyclethroughtheentiretrainingdatasetiscalledatrainingepoch.Therefore,itisoftensaidthatbatchgradientdescentperformsmodelupdatesattheendofeachtrainingepoch.2023/11/4LinearRegressionLesson3-52BatchGradientDescentUpsidesFewerupdatestothemodelmeansthisvariantofgradientdescentismorecomputationallyefficientthanstochasticgradientdescent.Thedecreasedupdatefrequencyresultsinamorestableerrorgradientandmayresultinamorestableconvergenceonsomeproblems.Theseparationofthecalculationofpredictionerrorsandthemodelupdatelendsthealgorithmtoparallelprocessingbasedimplementations.2023/11/4LinearRegressionLesson3-53BatchGradientDescentUpsidesDownsidesThemorestableerrorgradientmayresultinprematureconvergenceofthemodeltoalessoptimalsetofparameters.Theupdatesattheendofthetrainingepochrequiretheadditionalcomplexityofaccumulatingpredictionerrorsacrossalltrainingexamples.Commonly,batchgradientdescentisimplementedinsuchawaythatitrequirestheentiretrainingdatasetinmemoryandavailabletothealgorithm.Modelupdates,andinturntrainingspeed,maybecomeveryslowforlargedatasets2023/11/4LinearRegressionLesson3-54Mini-BatchGradientDescentMini-batchgradientdescentisavariationofthegradientdescentalgorithmthatsplitsthetrainingdatasetintosmallbatchesthatareusedtocalculatemodelerrorandupdatemodelcoefficients.Implementationsmaychoosetosumthegradientoverthemini-batchwhichfurtherreducesthevarianceofthegradient.Mini-batchgradientdescentseekstofindabalancebetweentherobustnessofstochasticgradientdescentandtheefficiencyofbatchgradientdescent.Itisthemostcommonimplementationofgradien

人人文库> 全部分类> 应用文书 > 技术指导

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

FAFU机器学习03-2 Linear Regression课件

文档简介

温馨提示

最新文档

评论

FAFU机器学习03-2 Linear Regression课件

文档简介

温馨提示

最新文档

评论

相关文档