机器学习题库

上传人：a*** IP属地：天津上传时间：2021-10-12 格式：DOCX 页数：57 大小：632.95KB 积分：30 举报 版权申诉

已阅读5页，还剩52页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、机器学习题库一、极大似然1、 ML estimati on of exponen tial model (10)A Gaussia n distributi on is often used to model data on the real li ne, but is sometimesinapprop riate whe n the data are ofte n close to zero but con stra ined to be nonn egative. In suchcases one can fit an exponen tial distributi on, whose

2、p robability den sity fun cti on is give n by1 x p X -e pbGiven N observations x i drawn from such a distribution:(a)Write down the likelihood as a function of the scale parameter b.(b) Write dow n the derivative of the log likelihood.(c) Give a simple exp ressi on for the ML estimate for b.V.V 1 1_

3、 _/jx：惊 _ F厂_ f厂严二 IfiaJLiX.： h =： =、= J jUI I二 L Lii!1 1 乙=X：A1 = 口片山= .n = .1-E=lE=l2、换成Poisson 分布： p x|xe,y 0,1,2,. x!Nlog p x |i 1NXi log Ni 1NXi logi 1N log Xi!i 1log x!二、贝叶斯1、贝叶斯公式应用假设在考试的多项选择中，考生知道正确答案的概率为P,猜测答案的概率为1-P,并且假设考生知道正确答案答对题的概率为1，猜中正确答案的概率为1 m，其中m为多那么已知考生答对题目，求他知道正确答案的概率。：选项的数目。dist

4、ributio n p |p known | correctp known, correctp known2、 Con jugate p riorswith hyperparameters 丫 , such that the posterior distributionP |X, P X | P与先验的分布族相同(a) Suppose that the likelihood is given by the exponential distribution with rate parameterGive n a likelihoodx|for a class models with p aram

5、eters 0 , a con jugate p rior is a3Show fha二he gamma disfribufionGammais a conjugate prior for fhe exponenfiaL Derive fhe parameter updafe given observations X:K - xz andfhe predicHon dismbujon p xz一严 K - XN .s Exp=nrlri二=1=1 二rl+-T1 弓 lilw=hL=邑 bi %詞-A)A) H H 厂一三说pl.v.f二二h1 二三 pl.mrpl.mr T T 一二K -

6、 -y)-y) H H jrjrswjrjrsw二一一 =:二 H H g g二JAL L工 K 二nnw r-三三注*rvari-m?i ,rl:,rr 三耳-M rhirrhir m=lLm=lL T T- -三二二怎 PCPC玉 uricr its扌 % 一 x! * 一亍空二 AUXPTAM 卜 L L r(2r(2H 空wmr二 A 一 = + N:7 + X-V-L=4 VI-盘二W + HP匕Thcl,f4hl, fLS” lal“ll=prcr 二prJnl.= +ls 才巴、-+ NFur rhu wu三L-iiul二Tir二zmul二m-cll亡三L-二壬 tb-low

7、ilhx亘.r.r 一 .rl .r壬二rN + l 一七Ml - ,rl:-r.Nh?一 =叱 +v-二W + .NE+Nr-rn +N)r-m + M E + nf J g了 + F +. + i5(s + 二+N n +NU U + + yryr + + -2+1)0-2+1)0 -241-241 .*v v + + 4.+-4.arHl=s?l一 11】二#.ray.ray 占一吋4=电1(11 壬二1(1 hfih 塞仝I Hf+I/7f兰苍- = +A, I二5精选文库6(c) SupposeEi IT tlh* jnrdirtiuih disTliliirLuii *7 olH

8、pi iro hr* hhViU Llilrgk iil：J/jfXj = i II Al =Hl -屮 1FD bP hj + i. A + A- “川 FJlj fkj -1)1(；, + A-+1 -2 f r.,. - - - - - - 1 Hfjr 打“甘 17 十 J.1 h = A 十 Fr(fj + i)r(ft + A- u r(H +山 + 口+ 0-H7r(fj 于 b + k re 斤十=MfJ + 1口力 + A- 1 1 ) ) ffrF + 力十 AT * 1 j U 十 b TT A 十 J - ir(fj + b + k rh + k 亠 r空)rd + &

9、 1)口“ + 山丁矗 + 0 亠 jwljrii ilir pp|iLli jLihih s4p iu riraudiird 仍rHLuh H/】 + .$)iin ihi 你卩”讨切】煜山i”I M B讨；iiLiftriburujii.is a con jugate p rior for the likelihood p x| ; show that the mixture priorMWm P| mm 1is also con jugate for the same likelihood, assum ing the mixture weights w m sum to 1.(r) A

10、Fixtiiie PnavTliv prior given by rh inbcruiic.M尸用 hl7.u)= E.:ji=l:ji=lMoiiover, we aiu tint宵 J is H cui.ji腳to prior for the JikoJilioo*I 巩X | 旳：in otliri厂但 I x.7, s =I 纠修 I g = r.ff a 呐 J -Wlii丄丄 M宅 iitiiltiijiy Ik* inixChiT priorthe likHihooH. 肚讥 the tbllowiiL pohterLor:m=lm=lA/A/卄ululAJAJ“Ll“Ll 5

11、5AJAJFn=lFn=lur hscnc rlint tipluis tlic SHIIH tuiiii the 卩Iiur, i.-f. M uiLxtLirc ilistiiluLtioiLwitli iipcLittc weightE bind liypeiparaiiictei-s.(d) Rep eat part (c) for the case where the p rior is a sin gle distributi on and the likelihood is a mixture, and the p rior is con jugate for each mixt

12、ure component of the likelihood.some p riors can be conjugate for several different likelihoods; for exam ple, the beta is conjugate for the Bernoulli and the geometric distributions and the gamma is conjugate for the expo nential and for the gamma with fixed(e) (Extra credit, 20) Explore the case w

13、here the likelihood is a mixture with fixed components and unknown weights; i.e., the weights are the p arameters to be lear ned.精选文库7所以极大似然估计的风险最小。而局部线性回归只需利用Problem 2CuiLHitlfr till prohaliility Jciihiry fuiictioii (of fiuictiun. if X b 山沁rett?) fur tlit ?xjuueiiriiiJ fmiuiv:嵐J：吋三/心)(丹心)一出川一(n)(n)

14、 Sliuv rluitrluit thethe innvuriutG huiiunlhuiiunl EUHI ilioilio niukiituinial distributions lieloiiglieloiig to fLuutly.fLuutly.!) SIKW Hinr, ill n fciicTiUive cbissitifntioii iiicidfd, ifrlM、(iiiHLitional titiisitiiw lnl()iig h) thn cxpoiHiiriiil iainily. then rhe jjtjMtcriur tlistrihntiun tbr a c

15、hikis is a sotnnax fiof M linear function uf the feiunie VUtIOt .i(r) (.oiisiciciius tf tci曲汕ir“ Hue an fXEircssini) fifr 驾严 fVliPre ivill this pxrrssiciu he(I) (Ftu-(Ftu- e!xti-e!xti-：i i ciedit)ciedit) A sn1 LS( ic i：s 胡 id h) heoror n pjLLFiiikJrc iF 网科八一“：旳=川厂(f),or inin other ivords. itit is ii

16、vlcEicndeur of 7, 丫 thfir fur H rand0111 viiriaMc A drawn from mi exponential family d乜itiity 尹(E：心 I心iri ti Hiitticieut ist就ktic fur yShew iluit a bieturiziitiuii p(j-.= /j (2 )/2(j/)i?i iin母;LQ- fiiul suthciriiit, to hr ：i iiHticiriir riNitisrif hw q).(cj For extraextra credit)credit) Snppu沁 Xi. a

17、iviid fiuin an tx卩oiwuTial family fleiiirv /(-/： r/). Wli；rthe siiffitiebt sititisfie .J-) tuj- 7?二、判断题(1)给定n个数据点，如果其中一半用于训练，另一半用于测试，则训练误差和测试误差之间的差别会随着n的增加而减小。(2)极大似然估计是无偏估计且在所有的无偏估计中方差最小,(3)回归函数 A和B，如果A比B更简单，则 A几乎一定会比 B在测试集上表现更好。(4)全局线性回归需要利用全部样本点来预测新输入的对应输岀值,查询点附近的样本来预测输岀值。所以全局线性回归比局部线性回归计算代价更高。(5

18、) Boosting和Bagging都是组合多个分类器投票的方法，二者都是根据单个分类器的正确率决定其权重。(6 ) In the boosting iterations, the training error of each new decision stump and the training error of the comb ined classifier vary roughly in con cert( F)While the training error of the comb ined classifier typ ically decreases as a fun cti o

19、n of boost ing iterati ons, the error of the in dividual decisi on stu mps typ ically in creases since the exa mple weights become concen trated at the most difficult exa mpl es.(7 ) One advantage of Boosting is that it does not overfit. ( F)(8 ) Support vector mach ines are resista nt to outliers,

20、i.e., very no isy exa mp les draw n from a differe nt distributi on. (F)(9 )在回归分析中，最佳子集选择可以做特征选择，当特征数目较多时计算量大；岭回归和Lasso 模型计算量小，且 Lasso也可以实现特征选择。(10)当训练数据较少时更容易发生过拟合。精选文库8(11)梯度下降有时会陷于局部极小值，但EM算法不会。精选文库9(12 )在核回归中，最影响回归的过拟合性和欠拟合之间平衡的参数为核函数的宽度。(13)In the AdaBoost algorithm, the weights on all the misc

21、lassified points will go up by the same mult ip licative factor.( T)7, 2 卩oiiitsj true/false In A(IEIBoost, weighted training terror 斜 of the 少 weak(?hm閃ifkT (l aiiihig inla vvil lis Dt l(ni(|s I(i iiici PHsx HH Hion 门/.* * SOLUTIONSOLUTION： True. In the course of boosting itertions the weak classif

22、iers are forced to try to classify more difFioult examplE. The weights will increase for exampl that are repeatedly mischssified by the weak classifiers. The weighted training error g of the ph vveak classifier on the trainning data therefore tends to increase.9. 2 points Consider point is coiTPctly

23、泸ifiM and flist.aiu Iroii the decisionboniiflarv. Viiv would SVMs disioii boiindarv be unaffected bv rliifs 卩oiiK. but the one IrrtTiKci l)y k曙inlirEg门吓!4讪1 bp* * SOLinSOLin iONiON： The hinge loss used by SVMs gives zero weight to these pointE while the log-loss used by logistic regression gives a l

24、ittle bit of weight to these points.(14)True/False: In a least-squares lin ear regressi on p roblem, addi ng an L 2 regularizati on pen altycannot decrease the L2 error of the solution w ?on the training data.( F)(15)True/False: In a least-squares lin ear regressi on p roblem, addi ng an L 2 regular

25、izati on pen alty always decreases the expected L2 error of the solution w ?on unseen test data ( F).(16)除了 EM算法，梯度下降也可求混合高斯模型的参数。(T)(20) Any decisi on boun dary that we get from a gen erative model with class-c on diti onal Gaussia n distributi ons could in principle be rep roduced with an SVM and

26、a polyno mial kern el.True! In fact, since class-c on diti onal Gaussia ns always yield quadratic decisi on boun daries, they can be rep roduced with an SVM with kernel of degree less tha n or equal to two.(21) AdaBoost will eve ntually reach zero trai ning error, regardless of the type of weak clas

27、sifier it uses, pro vided eno ugh weak classifiers have bee n comb in ed.False! If the data is not sep arable by a lin ear comb in ati on of the weak classifiers, AdaBoost can t achieve zero training error.(22) The L2 pen alty in a ridge regressi on is equivale nt to a Lapl ace p rior on the weights

28、. ( F)(23) The log-likelihood of the data will always in crease through successive iterati ons of the exp ectati on maximati on algorithm. (F)(24) In trai ning a logistic regressi on model by maximizi ng the likelihood of the labels give n the inputs we have mult iple locally op timal soluti ons. (F

29、)精选文库102、考虑线性回归模型：yN W02W1X,，训练数据如下图所示。（10 分）（1）用极大似然估计参数，并在图（a）中画岀模型。（3分）（2）用正则化的极大似然估计参数，即在log似然目标函数中加入正则惩罚函数wW并在图（b）中画岀当参数 C取很大值时的模型。（3分）（3）在正则化后，高斯分布的方差2是变大了、变小了还是不变？（4分）四、回归1、考虑回归一个正则化回归问题。在下图中给岀了惩罚函数为二次正则函数，当正则化参数取不同值时，在训练集和测试集上的log似然（mean log-probability ）。（10分）（1）说法“随着C的增加，图2中训练集上的log似然永远不会

30、增加”是否正确，并说明理由。（2）解释当C取较大值时，图2中测试集上的log似然下降的原因。精选文库113.52.52.51.50.50.5-0.5-1-1-1 50 01.5. 斗. .1 . . 图3.考虑二维输入空间点X X1, X2T上的回归问题,其中Xj1,1 , j1,2在单位正方形内。训练样本和测试样本在单位正方形中均匀分布输岀模型为yN xx： 10 x1x2 7为2 5x23, 1 ,我们用1-10阶多项式特征，采用线性回归模型来学习x与y之间的关系（高阶特征模型包含所有低阶特征），损失函数取平方误差损失。（1）现在n 20个样本上,训练 1阶、2阶、8阶和10阶特征的模型，

31、然后在一个大规模的独立的测试集上测试，则在下的模型为什么测试误差小。3列中选择合适的模型（可能有多个选项），并解释第3列中你选择训练误差最小训练误差最大测试误差最小1阶特征的线性模型X2阶特征的线性模型X8阶特征的线性模型X10阶特征的线性模型X（10 分）3.5320.5-05-1-1.54图（b）精选文库126（2）现在n 10个样本上，训练1阶、2阶、8阶和10阶特征的模型，然后在一个大规模的独精选文库13立的测试集上测试，则在下的模型为什么测试误差小。 N 0,1 .3列中选择合适的模型（可能有多个选项），并解释第3列中你选择4、We are trying to learn regre

32、ssion parameters for a dataset which we know was gen erated from a polyno mial of a certa in degree, but we do not know what this degree is. Assume the data was actually gen erated from a polyno mial of degree 5 with some added Gaussia n no ise (that is2345y w) WX w2Xw3X w4X w/For training we have 1

33、00 x,y p airs and for test ing we are using an additi onal set of 100 x,y p airs. Since we do not know the degree of the polyno mial we lear n two models from the data. Model A lear ns p arameters for a polyno mial of degree 4 and model B lear ns p arameters for a polyno mial of degree 6. Which of t

34、hese two models is likely to fit the test data better?An swer: Degree 6 polyno mial. Since the model is a degree 5 polyno mial and we have eno ugh trai ning data, the model we lear n for a six degree polyno mial will likely fit a very small coefficie nt for x6 . Thus, eve n though it is a six degree

35、 polyno mial it will actually behave in a very similar way to a fifth degree polyno mial which is the correct model leadi ng to better fit to the data.5、Input-dependent noise in regression训练误差最小训练误差最大测试误差最小1阶特征的线性模型X2阶特征的线性模型8阶特征的线性模型XX10阶特征的线性模型X(10 分)(3) The app roximati on error of a polyno mial

36、regressi on model depends on the n umber of trainingpoin ts.仃)(4) The structural error of a polyno mial regressi on model depends on the n umber of tra ining poin ts. (F)14ii.iii.(iii) is correct. In a Gaussia n distributi on over y, the varia nee is deter mined by the coefficie nt of y2 2； so2 2by

37、rep lac ingby x , we get a varia nee that in creases lin early with x. (Note also the cha nge tothe no rmalizati on“ constant.” ) (i) has quadratic depende(iiedo oes not change the varianee atall, it just renames w i.b)Circle the pl ots in Figure 1 that could p lausibly have bee n gen erated by some

38、 in sta nee of the model family(ies) you chose.(ii) and (iii). (Note that (iii) works for20 .) (i) exhibits a large varianee at x = 0, and thec)d)varia nee app ears independent of x.True/False: Regressi on with inp ut-de pendent no ise gives the same soluti on as ordinary regressi on for an infin it

39、e data set gen erated accord ing to the corres ponding model.True. In both cases the algorithm will recover the true un derly ing model.For the model you chose in p art (a), write dow n the derivative of the n egative log likelihood with resp ect to wi.Ordinary least-squares regressi on is equivale

40、nt to assum ing that each data point is gen erated accord ing to a lin ear fun cti on of the input p lus zero-mea n, con sta nt-varia nee Gaussia n no ise. In many systems, however, the no ise varia nee is itself a p ositive lin ear fun cti on of the inp ut (which is assumed to be non-n egative, i.e

41、., x = 0).a) Which of the follow ing families of p robability models correctly describes this situati on in the uni variate case? (Hint: only one of them does.)J讪rU斗现(吕存空/Tj-v27r VZYEZp, I , I“e - 01：亠H十皿3 =话即Tlickhg likelihood h,=E =1圖処二!签护应LIIKL rlic flerivEULVe刘祗將 =-W(加-皿-?otc that for lilies； tl

42、iruiLli die crifin (如u=0)- Hit cpriiihEd Huliitiun bjis the pnU-精选文库15titiihirlv riiinple tunit M：i = j/Z.T.It isti ibike the leiivutiT? uf tlieiiutirii吗 cliittex卩丿we iiriv lu氏 likelilioodHfur a gLnnlPills, Ti(y Mm卩lig rh，MmLlh以nvdtiph duta 卩厲】心，rbo prcKlncr ufiT(jh：il)i|idrsa sum of kn prohKhLlitir

43、s.五、分类1.产生式模型VS.判别式模型(a) Your billi on aire friend n eeds your help. She n eeds to classify job app licatio ns into good/bad categories, and also to detect job app lica nts who lie in their app licati ons using den sity estimati on to detect outliers. To meet these n eeds, do you recomme nd using a

44、discrim in ative or gen erative classifier? Why?产生式模型因为要估计密度P x|y (b) Your billi on aire friend also wan ts to classify software app licati ons to detect bug-prone app licatio ns using features of the source code. This p ilot p roject only has a few app licati ons to be used as training data, though

45、. To create the most accurate classifier, do you recomme nd using a discrim in ative or gen erative classifier? Why?判别式模型样本数较少，通常用判别式模型直接分类效果会好些(d) Finally, your billionaire friend also wants to classify companies to decide which one to acquire. This p roject has lots of training data based on sever

46、al decades of research. To create the most accurate classifier, do you recomme nd using a discrim in ative or gen erative classifier? Why?产生式模型样本数很多时，可以学习到正确的产生式模型2、logstic 回归精选文库16-0.40.5 1.522.53regularization parameter CFigure 2: Log-probability of labels as a fun cti on of regularizati on p aram

47、eter CHere we use a logistic regressi on model to solve a classificati on p roblem .In Figure 2, we have pl otted the mean log-pr obability of labels in the tra ining and test sets after hav ing trained the classifier with quadratic regularizati on pen alty and differe nt values of the regularizati

48、on p arameter C.1、In training a logistic regressi on model by maximiz ing the likelihood of the labels give n the inp uts we have mult iple locally op timal soluti ons. (F)An swer: The log-probability of labels give n exa mples imp lied by the logistic regressi on model is a con cave (con vex dow n)

49、 fun cti on with res pect to the weights. The (only) locally op timal soluti on is also globally op timal2、A stochastic gradie nt algorithm for tra ining logistic regressi on models with a fixed lear ning rate will find the optimal setting of the weights exactly. ( F)An swer: A fixed lear ning rate

50、mea ns that we are always tak ing a fin ite ste p towards improving the log-probability of any sin gle trai ning exa mple in the up date equati on. Un less the exa mples are somehow “ aligned ” , wecontinue jumping from side to side of the optimal solution, and will not be able to get arbitrarily cl

51、ose to it. The lear ning rate has to app roach to zero in the course of the up dates for the weights to con verge.3、The average log-pr obability of training labels as in Figure 2 can n ever in crease as wein crease C.(T)Stron ger regularizati on means more con stra ints on the soluti on and thus the

52、 (average) log-p robability of the tra ining exa mples can only get worse.4、 Explain why in Figure 2 the test log-p robability of labels decreases for large values of C.As C in creases, we give more weight to con stra ining the p redictor, and thus give less flexibility to fitting the tra ining set.

53、 The in creased regularizati on guara ntees that the test p erforma nee gets closer to2g-一qBqodlBO_精选文库17the trai ning p erforma nee, but as we over-c on strain our allowed pr edictors, we are not able to fit the training set at all, and although the test p erforma nee is now very close to the train

54、ing p erforma nee, both are low.5、The log-probability of labels in the test set would decrease for large values of C eve n if wehad a large n umber of training exa mp les.(T)The above argume nt still holds, but the value of C for which we will observe such a decrease will scale up with the n umber o

55、f exa mp les.6、Adding a quadratic regularizati on pen alty for the p arameters whe n estimat ing a logistic regressi on model en sures that some of the p arameters (weights associated with the components of the input vectors) vani sh.A regularizati on pen alty for feature selecti on must have non- z

56、ero derivative at zero. Otherwise, the regularizati on has no effect at zero, and weight will tend to be slightly non-zero, eve n whe n this does not impr ove the log-p robabilities by much.3、正则化的Logstic回归This p roblem we will refer to the bi nary classificati on task dep icted in Figure 1(a), which

57、 we atte mpt to solve with the sim ple lin ear logistic regressi on model厂（” =1 |x,门，吧）=曲 n + 叫出= -；-；1 +皿|】（一川1；心一勺空）(for sim plicity we do not use the bias p arameter wO). The trai ning data can be sep arated with zero training error - see line L1 in Figure 1(b) for instanee.(a) The 2-dime nsional

58、 data set used inP roblem 2(b) The points can be separated by LI (solid IConsider a regularization approach where we try to maximize52 心耳 P.吋Xh , 2-石瞇1=11=1 - -for large C. Note that only w2 is penalized. We d like to know which of the fines in Figure 1(b) could arise as a result of such regularizat

59、i on. For each p ote ntial line L2, L3 or L4 deter mine whether it can result from regularizing w2. If not, explain very briefly why not.L2: No. When we regularize w 2, the lin e). P ossible other decisi on boundaries are shown by L2；L3；L4.resulting boundary can rely less on the value of x 2 and the

60、refore becomes more vertical. L 2 here seems to be more horizontal than the unregularized solution so it精选文库181、SVMcannot come as a result of penalizing w 2L3: Yes. Here w22 is small relative to w 1人2 (as evide need by high slop e), and eve n though it would assig n a rather low log-pr obability to

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

机器学习题库

文档简介

温馨提示

最新文档

评论

机器学习题库

文档简介

温馨提示

最新文档

评论

相关文档