哈工大机器学习历年考试_第1页
哈工大机器学习历年考试_第2页
哈工大机器学习历年考试_第3页
免费预览已结束,剩余18页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、1 Give the definitions or your comprehensions of the following terms.(12')1.1 The in ductive lear ning hypothesisP171.2 Overfitti ngP491.4 Con siste nt lear nerP1482 Give brief answers to the following questions.(15')2.2 If the size of a version space is| VS |, In general what is the smalles

2、t numberof queries may be required by a concept learner using optimal query strategy to perfectly lear n the target con cept?P272.3 In genaral,decision trees represent a disjunctionof conjunctions ofconstrains on the attribute values of instanse,then what expression does the follow ing decisi on tre

3、e corresp onds to ?3 Give the expla in ati on CANDIDATE-ELIMINATIONto in ductive algorithm,bias, and listin ductivebias ofdecisi ontreelear nin g(ID3),BACKPROPAGATION algorithm.(10 ')4 How to solve overfitting in decision tree and neural network?(10')Soluti on:Decisi on tree:及早停止树增长(stop gro

4、wing earlier)后修剪法(post-pruning)Neural Network权值衰减(weight decay) 验证数据集(validation set)5 Prove that the LMS weight update rulei i(Vtrain (b) V (b)xi performs agradient descent to minimize the squared error. In particular, define the squared error E as in the text. Now calculate the derivative of E wit

5、h respect to the weightAi, assuming that V (b) is a linear function as definedin the text. Gradientdesce nt is achieved by updat ing each weight in proport ion toE.Therefore,iyou must show that the LMS training rule alters weights in this proportion foreach training exampleit encounters.( E(VtrainbM

6、ain (b) training exampleA(b) V(b)(8')Soluti onAs Vtrai n(b)? (Successor(b)we can get E=Main (bV(b) w0 +w1x1+w2x2+w3x3+w4x4 +w5x5+w 6x6E/Wj2(Vtrain(b)2(Vtrain (b)V(b)*As mentioned in LMS:(Vtrain (b) V?(b)XV(b)g Main (b) V?(b)/ WiWe can get i/2Therefore, gradie nt desce nt is achieveme nt by updat

7、i ng each weight in proport ion to E/ wi ;LMS rules encoun ters.altersweightsin thisproporti onfor each trai ningexample it6 True orfalse:if decisiontreeD2is an elaborationoftree D1,the n D1 ismore-ge neral-tha nD2. AssumeD1and D2 are decisi ontreesreprese nti ngarbitrarybooleanfuncions.andthatD2 is

8、 an elaborationof D1if ID3 couldexte nd D1 to D2. If true give a proof; if false, a coun ter example.(Definition:Lethj and hk beboolean-valuedfunctions defined over X .then hjis more_general_than_or_equal_tohk (writtenhj g hk ) If and only if(x X)(hk(x)1)(hj(x)1) then hjhk (hjg hk) (hk g hj)(10'

9、)The hypothesis is false.One cou nter example is A XOR Bwhile if A!=B, trai ning examples are all positive,while if A=B, training examples are all negative,then, using ID3 to exte nd D1, the new tree D2 will be equivale nt to D1, i.e., D2 is equal to D1.7 Desig natwo-i nputperceptr onthatimpleme nts

10、theboolea nfun ctio nA B .Design a two-layernetworkof perceptrons that implements A XOR B .(10 ')8 Suppose that a hypothesis space containing three hypotheses,h-i, h2, h3, andthe posterior probabilities of these typotheses give n the training data are 0.4, 0.3and 0.3 respectively. And if a new i

11、nstaneex is encountered, which is classifiedpositive by h| , but negative by h2 andh3 ,then give the result and detailclassification course of Bayes optimal classifier.(10')P1259 Suppose S is a collectionof training-exampledays described by attributesincluding Humidity, which can have the values

12、 High or Normal. Assume S is acollecti on containing 10 examples, 7+,3-. Of these 10 examples, suppose 3 of the positive and 2 of the n egative examples have Humidity = High, and the rema in der have Humidity = Normal. Please calculate the in formatio n gain due to sort ing the original 10 examples

13、by the attribute Humidity.(log 2仁0, log 22=1, log 23=1.58,log 24=2, log 25=2.32, log 26=2.58,log 27=2.8, log 28=3, log 29=3.16,log 2 10=3.32,)(5')Soluti onHere we denote S=7+,3-,then Entropy(7+,3-)=10 10 10 10=0.886;(b) Gain(S,Humidity)=Entropy(S)-v values(Humidity JEntropy(Sv) Gain(S,a2)Q Value

14、s( Humidity )=High, NormalSHighs S | Humidity (s) High33 22Entropy(SHigh Apog?二-匸1og2二 0.972, SHigh5=455 554 4 11EntrOpy(SNormal)=-:lOg2 匚-匚 log2 匚 0.72, Srmal =55 5 5555ThusGain (S,Humidity) =0.886- (0.972*0.72) =0.04101010 Finish the following algorithm. (10')(1) GRADIENT-DESCENT(training exam

15、ples, Each training example is a pair of the form in put values, and t is the target output value.)x is the vector ofis the lear ning rate (e.g.,0.05).In itialize eachi to some small random valueUn til the term in atio n con diti on is met, DoIn itialize eachto zero.For eachin trai nin g_examples, D

16、oIn put the in sta neex to the unit and compute the output oFor each linear unit weighti , DoFor each lin ear unit weighti, Do(2) FIND-S AlgorithmIn itialize h to the most specific hypothesis in HFor each positive trai ning in sta nee xFor each attribute con stra int ai in hIf The ndo nothingElserep

17、lace ai in h by the next more general constraintthat issatisfied by xOutput hypothesis h1. What is the defi niti on of lear ning problem?(5)Use “ a checkers learning problem ” as an example to state how to design a lear ning system. (15)An swer:A computer program is said to lear n from experie nee E

18、 with respect to someclass of tasks T and performa nee measure P, if its performa nee at tasks in T, asmeasuredbyP,improvesExample:A checkers lear ning problem:T: play checkers(1)P:perce ntageof games won(1)E:opport un itytoplay(1)To desig n a lear ning system:Step1:Choosi ngthewithexperie nee.in at

19、ourn ame ntaga institselfTrai ningExperie neeA checkers lear ning problem:Task T: play ing checkersPerforma nee measure P: perce nt of games won in the world tourn ame ntTraining experie nee E: games played aga in st itselfIn order to complete the design of the learning system, we must now choose1.

20、the exact type of kno wledge to be lear ned2. a represe ntati on for this target kno wledge3. a lear ning mecha nismStep2:Choos ingtheTargetFunction1. if b is a final board state that is won, then V(b)=1002. if b is a final board state that is lost, then V (b)=-1003. if b is a final board state that

21、 is drawn, then V (b)=04. if b is a not a final state in the game, then V(b)=V (b'), where b' is the bestfinalboard statethat can be achievedstartingfrom b and playingoptimallyuntilthe end ofthe game (assuming theopponentplays optimally, aswell).Step3:Choosinga Representationfor theTarget Fu

22、nction(4)x1: the number of black pieces on the boardx2: the number of red pieces on the boardx3: the number of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by red (i.e., which can be captured on red's ext turn)x6: the number of red pie

23、ces threatened by black.Thus, our learning program will represent V (b) a's a linear function of the formV (b)=w o+wlxl+w2x2+w 3x3+w4x4+w5x5+w 6x6where w o through w 6 are numerical coefficients, or weights, to be chosen by thelearning algorithm. Learned values for the weights w1 through w6 will

24、 determinethe relative importance of the various board features in determining the value ofthe board, whereas the weight wo will provide an additive constant to the boardvalue.2. Answer:Find-S & Find-G:Step 1: Initialize S to the most specific hypothesis in H. (1)S0: , , , , , Initialize G to th

25、e most general hypothesis in H.G 0:?, ?, ?, ?, ?, ?.Step 2: The first example is <Sunny, Warm, Normal, Strong, Warm, Same, +>(3)S 1:Sunny, Warm, Normal, Strong, Warm, Same G1:?, ?, ?, ?, ?, ?.、-5、-5JStep 3: The second example is <Sunny, Warm, High, Strong, Warm, Same, +> (3)S 2:Sunny, Wa

26、rm, ?, Strong, Warm, SameG2:? ? ? ? ? ?Step 4: The third example is <Rainy. Cold, High,Strong.Warm, Change, ->S3: Su nny, Warm, ?, Stro ng, Warm, Same G3:<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, SameStep 5: The fourth example is <S unny, Warm, High, Stro

27、ng, Cool, Chan ge, +>S4:Su nny, Warm, ?, Stro ng, ?, ?G4:<S unny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> Fin ally, all the hypotheses are:(2)<Sunny,Warm, ?, Strong,?, ?>, <Sunny, ?, ?, Strong, ?, ?>, <Sunny,Warm, ?, ?, ?, ?>,<?, Warm, ?, Stro ng, ?, ?>, <Sunn

28、y, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>3. An swer:Flog(X) = -X*log(X)-(1-X)*log(1-X);STEP1 choose the root no de:en tropy_all=flog(4/10)=0.971;gain_outlook = entropy_all- 0.3*flog(1/3)- 0.3*flog(1)- 0.4*flog(1/2)=0.296;(1)gain_templture = en tropy_all - 0.3*flog(1/3) - 0.3*flog(1/3) - 0.4*

29、flog(1/2)=0.02;(1)gain _humidity=en tropy_all-0.5*flog (2/5)-0.5*flog(1/5)=0.125;(1)gain_wind=en tropy_all-0.6*flog(5/6)-0.4*flog(1/4)=0.256;(1)RootNodeis“ outlookstep 2 choose the second NODE: for sunny (humidity OR temperature):entropy_sunny =flog(1/3)=0.918;(1)sunny_gain_windentropy_sunny -(2/3)*

30、flog(0.5)- (1/3)*flog(1)=0.252;(1)sunny_gain_humidity= entropy_sunny- (2/3)*flog(1)- (1/3)*flog(1)=0.918;(1) sunny_gain_temperature = entropy_sunny - (2/3)*flog(1) - (1/3)*flog(1)=0.918;(1)choose humidity or temperature. (1)for rain (wind):entropy_rain = flog(1/2)=1; rain_gain_wind = entropy_rain(1)

31、 rain_gain_humidity = entropy_rain(1)rain_gain_temperature = entropy_rain(1)choose(1)(1) (1/2)*flog(1) - (1/2)*flog(1)=1;- (1/2)*flog(1/2)-(1/2)*flog(1/2)=0;- (1/4)*flog(1)- (3/4)*flog(1/3)=0.311;wind.oroutlookHigh4. An swer:A: The primitive n eural un its are: perceptro n, li near unit and sigmoid

32、un it.(3)Perceptr on:(2)A perceptr ontakes a vector of real-valued in puts, calculates a lin earcomb in ati on of these in puts, the n output a 1 if the result is greater tha n somethreshold and -1 otherwise. More precisely, give n in put x1 through xn,theoutput o(x1,.xi,. xn) computed by the percep

33、tr on is NSometimes write the perceptr on fun cti on asLin ear un its:(2)a lin ear unit for which the output o is give n byThus, a lin ear un it corresp onds to the first stage of a perceptr on, without the threshold.Sigmoid un its:(2)The sigmoid un it is illustrated in picture like the perceptro n,

34、 the sigmoid un itfirst computes a lin ear comb in ati on of its in puts, the n applies a threshold to theresult. In the case of the sigmoidunit,however, the threshold output is acon ti nu ous fun cti on of its in put.-netMore precisely, the sigmoid un it computes its output o asWhere,B:(因题目有打印错误,所以

35、感知器规则和delta规则均可,给出的是delta规则)Derivati on process is:dE眾盘21 / 22 kn如2匸2仙oj)d-一(切 %)伽一仍扃)怎 3w-£(站呛)(-也5. Answer:P(no)=5/14(1)P(sunny|no)=3/5(1)P(cool|no)=1/5P(high|no)=4/5P(strong|no)=3/5P(no|new instance)=P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no) -2=5/14*3/5*1/5*4/5*3/5 = 0.02057=2.057

36、*10 -2P(sunny|yes)=2/9P(cool|yes)=3/9P(high|yes)=3/9P(strong|yes)=3/9P(yes|new instance)=P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes) =9/14*2/9*3/9*3/9*3/9 = 0.05291=5.291*10 -3ANSWER: NOP(yes)=9/14(1)(1)(1)(2)(1)(1)(1)(1)(2)(2)6. Answer:INDUCTIVE BIAS: (8)Consider a concept learning al

37、gorithm L for the set of instances X, Let c be anarbitrary concept define over X, and letDc = < x; c (x) > be an arbitrary set ofinstancexi by L after training on the dataminimal set of assertions B such corresponding training examples Dc:(? xi X)(B A xi A De) ?L(xi;Dc)-Dc .The inductive bias

38、of L is any that for any target concept c andtraining examples of c . Letdenote the classification assigned to theThe futility of bias-free learning:(7)A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances

39、. In fact, the only reason that the learner was able to generalize beyond the observed training examples is that it was biased by the inductive bias.Unfortunately , the only instances that will produce a unanimous vote are the previously observed training examples. For, all the other instances, taki

40、ng a vote will be futile: each unobserved instance will be classified positive by precisely half the hypotheses in the version space and will be classified negative by the other half.1 In the EnjoySportlearningtask, every example day is representedby 6attributes. Given that attributes Sky has three

41、possible values, and that AirTemp、Humidity 、Wind、Wind、Water and Forecast each have two possible values. Explain why the size of the hypothesis space is 973. How would thenu mber of possiblein sta ncesand possible hypotheses in crease with theaddition of one attribute A that takes on on K possible va

42、lues?2 Write the algorithm of Can didate_Elim in ati on using vers ion space. Assume Gis the set of maximally gen eral hopytheses in hypothesis space H, and S is the set of maximally specific hopytheses.3 Con sider the followi ng set of trai ning examples for EnjoySport:ExampleSkyAirTempHumidityWind

43、WaterForcastEnjoySp ort1sunnywarmno rmalstro ngwarmsameYes2sunnywarmhighstro ngwarmsameyes3rai nycoldhighstro ngwarmchaggeno4sunnywarmhighstro ngcoolchangeyes5sunnywarmno rmalweakwarmsameno(a) Whatis theEntropy of thecollectio ntrai ningexamples withrespect to thetarget function classificati on?(b)

44、According to the 5 traning examples, compute the decision tree that be learned by ID3, and show the decisi on tree.(Iog23=1.585, Iog25=2.322)4 Give several approaches to avoid overfitti ng in decisi on tree lear ning. How todetermin the correct final tree size?5 Write the BackPropagatio n algorithm

45、for feedforward n etwork containing twolayers of sigmoid un its.6 Explai n the Maximum a posteriori(MAP) hypothesis.7 Usi ng Naive Byes Classifier to classify the new in sta nee:<Outlook=s unn y,Temperature=cool,Humidity=high,Wi nd=stro ng>Our task is to predict the target value (yes or no) of

46、 the target conceptPlayTennis for this new instanee. The table blow provides a set of 14 training examples of the target con cept.DayOutlookTemperatureHumidityWindPlayTe nnisD1SunnyHotP HighWeakNoD2SunnyHotHighStro ngNoD3OvercastHotHighWeakYesD4Rai nMildHighWeakYesD5Rai nCoolNormalWeakYesD6Rai nCool

47、NormalStro ngNoD7OvercastCoolNormalStro ngYesD8SunnyMildHighWeakNoD9SunnyCoolNormalWeakYesD10Rai nMildNormalWeakYesD11SunnyMildNormalStro ngYesD12OvercastMildHighStro ngYesD13OvercastHotNormalWeakYesD14Rai nMildHighStro ngNo8 Question Eight : The definition of three types of fitness functions in gen

48、eticalgorithmQuestio n oneDefinition: A computer program is said to kam from experience E with respect to some class of T and performance measure P, if its performance at casks in as measured bv P* improves with experience E.(举一个例子,比如:导航仪、西洋跳棋)Questio n two:Ini tilize: G=?,?,?,?S=,Step 1:G=?,?,?,?,?

49、,?S=s unny ,warm ,no rmal,str on g,warm,sameStep2: coming one positive in sta nee 2G=?,?,?,?,?,?S=s unn y,warm,?,stro ng,warm,sameStep3: coming one n egative in sta nee 3G=<Su nn y,?,?,?,?,?> <?,warm,?,?,?,?> <?,?,?,?,?,same>S=s unn y,warm,?,str on g,warm,sameStep4: coming one posi

50、tive in sta nee 4S= sunny ,warm,?,str on g,?,? G=<Su nn y,?,?,?,?,?> <?,warm,?,?,?,?>Questio n three(a) Entropy(S)=og(3/5) og(2/5)= 0.971(b) Gain(S,sky) = Entropy(S) (4/5) Entropy(Ssunny) + (1/5) Entropy(Srainny)=0.322Gai n( S,AirTemp) = Gai n(S,wi nd) = Gai n(S,sky) =0.322Gai n( S,Humid

51、ity) = Gain (S,Forcast) = 0.02Gai n( S,water) = 0.171Choose any feature of AirTemp, wind and sky as the top node.The decisi on tree as follow: (If choose sky as the top no de)wind No7Tweak '导 ”gQuestio n FourAn swer:In ductive bias: give some proor assumpti on for a target con cept made by the l

52、earner to have a basis for classify ing un see n in sta nces.Suppose L is a mach ine lear ning algorithm and x is a set of training examples.L(xi, Dc) denotes the classification assigned to xi by L after training examples onDc. Then the in ductive bias is a mini mal set of asserti on B, give n an ar

53、bitrary target con cept C and set of training examples Dc: (Xi ) (BDcXi) -| L(xi, Dc)C_E: the target con cept is contained in the give n gypothesis space H, and the training examples are all positive examples.ID3: a, small trees are preferred over larger trees.B, the trees that place high informatio

54、n gain attribute close to root are preferred over those that do not.BP:Smooth in terpolati on betee n data poin ts.areQuestio n FiveAn swer: In n a?ve bayes classificati on,we assump that all attributesin depe ndent give n the tatget value, whilein bayes belif n et, it specifes a set ofconditional i

55、ndependence along with a set of probability distribution.Question Six :随即梯度下降算法GrDlFNT*DE3CENT(frdining jfxamples. jj)Eac/t training example is a pair of the form (xT t), where x is the vector of input values, andt is lhe large! output value, r is the learning rait (.g.r ,05. lnitULize each 昭 to some mall random ¥hh)c Uniil the Lermination condition ist met, Do* Initialize each AiUj to zero. For each (j?, t) in training e

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论