iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space

上传人：我*** IP属地：北京上传时间：2020-06-18 格式：PDF 页数：10 大小：766.50KB 积分：12 举报 版权申诉

iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space_第2页

iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space_第3页

iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space_第4页

iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space_第5页

已阅读5页，还剩5页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、Bayesian Optimization with Unknown Search Space Huong Ha, Santu Rana, Sunil Gupta, Thanh Nguyen, Hung Tran-The, Svetha Venkatesh Applied Artifi cial Intelligence Institute (A2I2) Deakin University, Geelong, Australia huong.ha, santu.rana, sunil.gupta, thanhnt, hung.tranthe, svetha.venkateshdeakin.ed

2、u.au Abstract Applying Bayesian optimization in problems wherein the search space is unknown is challenging. To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization. We devise a strategy to guarantee that in iterative expansions of the search space, o

3、ur method can fi nd a point whose function value within?of the objective function maximum. Without the need to specify any parameters, our algorithm automatically triggers a minimal expansion required iteratively. We derive analytic expressions for when to trigger the expansion and by how much to ex

4、pand. We also provide theoretical analysis to show that our method achieves? -accuracy after a fi nite number of iterations. We demonstrate our method on both benchmark test functions and machine learning hyper-parameter tuning tasks and demonstrate that our method outperforms baselines. 1Introducti

5、on Choosing where to search matters. A time-tested path in the quest for new products or processes is through experimental optimization. Bayesian optimization offers a sample effi cient strategy for experimental design by optimizing expensive black-box functions 911. But one problem is that users ne

6、ed to specify a bounded region to restrict the search of the objective function extrema. When tackling a completely new problem, users do not have prior knowledge, hence there is no guarantee that an arbitrarily defi ned search space contains the global optimum. Thus application of the Bayesian opti

7、mization framework when the search region is unknown remains an open challenge 16. One approach is to use a regularized acquisition function such that its maximum can never be at infi nity - hence no search space needs to be declared and an unconstrained optimizer can be used 16. Other approaches us

8、e volume expansion, i.e. starting from the user-defi ned region, the search space is expanded during the optimization. The simplest strategy is to repeatedly double the volume of the search space every several iterations 16. Nguyen et al suggest a volume expansion strategy based on the evaluation bu

9、dget 12. All these methods require users to specify critical parameters - as example, regularization parameters 16, or growth rate, expansion frequency (volume doubling) 16 or budget 12 . These parameters are diffi cult to specify in practice. Additionally, 12 is computationally expensive and the us

10、er-defi ned search space needs to be close to the global optimum. In this paper, we propose a systematic volume expansion strategy for the Bayesian optimization framework wherein the search space is unknown. Without any prior knowledge about the objective function argmax or strict assumptions on the

11、 behavior of the objective function, it is impossible to guarantee the global convergence when the search space is continuously expanded. To circumvent this problem, we consider the setting where we achieve the global?-accuracy condition, that is, we aim to fi nd a point whose function value is with

12、in ? of the objective function global maximum. Our volume expansion strategy is based on two guiding principles: 1) The algorithm can reach a point whose function value is within ? of the objective function maximum in one expansion, and, 2) the search space should be minimally expanded so that the a

13、lgorithm does not spend unnecessary 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. evaluations near the search space boundary. As the objective function is unknown, it is not possible to compute this ideal expansion region. Using the GP-UCB acquisition fu

14、nction as a surrogate, this region is computed as one that contains at least one point whose acquisition function value is within?of the acquisition function maximum. However, by using a surrogate to approximate the objective function, there is no guarantee that we can achieve the global?-accuracy w

15、ithin one expansion. Hence multiple expansions are required, and a new expansion is triggered when the local ? -accuracy is satisfi ed, i.e. when the algorithm can fi nd a point whose function value is within? of the objective function maximum in the current search space. Analytical expressions for

16、the size of the new expansion space and when to trigger the expansion are derived. The guarantees for the ?-accuracy condition, however, now lapses in the expanded region, and so we adjust the acquisition function appropriately to maintain the guarantee. Finally, we provide theoretical analysis to s

17、how that our proposed method achieves the global? -accuracy condition after a fi nite number of iterations. We demonstrate our algorithm on fi ve synthetic benchmark functions and three real hyperparameter tuning tasks for common machine learning models: linear regression with elastic net, multilaye

18、r perceptron and convolutional neural network. Our experimental results show that our method achieves better function values with fewer samples compared to state-of-the-art approaches. In summary, our contributions are: Formalising the analysis for Bayesian optimization framework in an unknown searc

19、h space setting, and introducing ?-accuracy as a way to track the algorithmic performance; Providing analytic expressions for how far to expand the search space and when to expand the search space to achieve global ?-accuracy; Deriving theoretical global ?-accuracy convergence; and, Demonstrating ou

20、r algorithm on both synthetic and real-world problems and comparing it against state-of-the-art methods. Our method differs from previous works in that 1) our method does not require any algorithmic parameters, automatically adjusting both when to trigger the expansion and by how much to expand, and

21、, 2) our approach is the only one to guarantee the global?-accuracy condition. This is because we guarantee the local?-accuracy condition in each search space, thus eventually the global?- accuracy is achieved. Without this local guarantee, the suggested solution cannot be guaranteed to reach global

22、?-accuracy. The regularization 16 and the fi ltering method 12 require the global optimum to be within a bound constructed by either the user specifi ed regularizer or the budget. The volume doubling method 16 can continue to expand the search space to infi nity, however, the local ?-accuracy condit

23、ion is not guaranteed in each search space. The paper is organized as follows. Section 2 gives an overview of Bayesian optimization and discusses some of the related work. Section 3 describes the problem setup. Section 4 proposes our new expansion strategy for the Bayesian optimization framework whe

24、n the search space is unknown. A theoretical analysis for our proposed method is presented in Section 5. In Section 6, we demonstrate the effectiveness of our algorithm by numerical experiments. Finally, Section 7 concludes the paper. 2Background and Related Work 2.1Background Bayesian optimization

25、is a powerful optimization method to fi nd the global optimum of an unknown objective functionf(x)by sequential queries 911,17,18. First, at timet, a surrogate model is used to approximate the behaviour off(x)using all the current observed dataDt1= (xi,yi)n i=1, yi= f(xi) + i,wherei N(0,2)is the noi

26、se. Second, an acquisition function is constructed from the surrogate model that suggests the next pointxitrto be evaluated. The objective function is then evaluated atxitrand the new data point(xitr,yitr)is added toDt1. These steps are conducted in an iterative manner to get the best estimate of th

27、e global optimum. The most common choice for the surrogate model used in Bayesian optimization is the Gaussian Process (GP) 14. Assume the functionffollows a GP with mean functionm0(x)and covariance functionk(x,x0), the posterior distribution offgiven the observed dataDt1= (xi,yi)n i=1is a 2 GP with

28、 the following posterior mean and variance, t1(x) = m0(x) + k|Dt1|(x)T(K|Dt1|+ 2I|Dt1|)1y|Dt1|, 2 t1(x) = k(x,x) k|Dt1|(x) T(K|D t1|+ 2I|D t1|) 1k|D t1|(x), (1) wherey|Dt1|= y1,.,y|Dt1|T,k|Dt1|(x) = k(x,xi)|Dt1| i=1 ,K|Dt1|= k(xi,xj)i,j,I|Dt1| is the|Dt1| |Dt1|identity matrix and|Dt1|denotes the car

29、dinality ofDt1. To aid readability, in the sequel we remove the notation that shows the dependence of k,K,I,y on |Dt1|. There are many existing acquisition functions 6,7,10,11,20 and in this paper, we focus only on the GP-UCB acquisition function 1, 2, 5, 19. The GP-UCB acquisition function is defi

30、ned as, UCB(x;Dt1) = t1(x) + p tt1(x),(2) wheret1(x),t1(x)are the posterior mean and standard deviation of the GP given observed data Dt1andt 0is an appropriate parameter that balances the exploration and exploitation. Given a search domain, t can be chosen as in 19 to ensure global convergence in t

31、his domain. 2.2Related Work All the work related to the problem of Bayesian optimization with unknown search space have been described in Section 1. There is the work in 3 introduces the term?-accuracy. However, their purpose is to unify the Bayesian optimization and the Level-set estimation framewo

32、rk. 3Problem Setup We wish to fi nd the global argmaxxmaxof an unknown objective functionf : Rd7 R, whose argmax is at a fi nite location, i.e. xmax= argmaxxSf(x),(3) whereS is a fi nite region that contains the argmax of the functionf(x). In practice, the regionS is not known in advance, so users n

33、eed to identify a search domainSuserwhich is likely to contain the argmax off(x). This search domain can be set arbitrarily or based on limited prior knowledge. Thus there is no guarantee thatSusercontains the global optimum of the objective function. In the trivial cases when the search spaceSis kn

34、own or whenS Suser, the global convergence can be guaranteed through classical analysis 4,19. Here, we consider the general case whenSmay or may not be a subset ofSuser. Without any prior knowledge aboutSor strict assumptions on the behavior of the objective function, it is impossible to guarantee t

35、he global convergence. Therefore, in this work, instead of solving Eq. (3), we consider the setting where we achieve the global?-accuracy condition. That is, for a small positive value ?, we fi nd a solution x? which satisfi es, f(xmax) f(x?) ?.(4) 4Proposed Approach We make some mild assumptions to

36、 develop our main results. Assumption 4.1 The prior mean function m0(x) = 0. This is done by subtracting the mean from all observations and is common practice. Assumption 4.2The kernelk(x,x0) satisfi es, (1) whenkx x0k2 +,k(x,x0) 0; (2) k(x,x0) 1 (x,x0) ; (3) k(x,x) = 2, where 0 is the scale factor

37、of the kernel function. Various kernels satisfy Assumption 4.2, e.g. the Matrn kernel, the Square Exponential kernel. As the function can always be re-scaled, condition 2 is met without loss of generality 15, 19. Defi ning gk(): With these types of kernels, for all small positive , there always exis

38、ts gk() 0, x,x0: kx x0k2 gk(),k(x,x0) .(5) The value ofgk()can be computed fromand the kernel covariance functionk(x,x0)i.e. for Squared Exponential kernelkSE(x,x0) = 2exp(kxx0k2 2/(2l2),gk()will be p2l2log(2/). Assumption 4.3 The kernel k(x,x0) is known in advance or can be learned from the observa

39、tions. 3 Figure 1: Expanded region (blue), when the GP-UCB acquisition function argmax is at (1) infi nity ; or (2) at a fi nite location and its function value is larger or equal t+?/2 ; or (3) at a fi nite location and its function value is smaller than t + ?/2. 4.1Proposed Expansion Strategy The

40、ideal expansion strategy should satisfy two characteristics: 1) The algorithm can reach the global ?-accuracy condition in one expansion, and, 2) the search space should be minimally expanded so that the algorithm does not spend unnecessary evaluations near the search space boundary. Since we have a

41、 black-box objective function, it is not possible to compute the ideal expansion space Sidealdirectly. Let the exploration-exploitation parameterstbe chosen to ensure the objective function is upper bounded by the GP-UCB acquisition function with high probability. Then we can estimateSidealby a regi

42、onSas a minimal region that contains at least one point whose acquisition function value is within?from the acquisition function maximum, i.e.xu S : |UCB(xu;Dt1)maxxRdUCB(x;Dt1)| ?. Due to the approximation, there is no guarantee we can achieve the global?-accuracy in one expansion. Thus we need mul

43、tiple expansions sequential. A new expansion is triggered when the local? -accuracy is satisfi ed in the previous expansion. In the following, we fi rst derive the value of the GP-UCB acquisition function whenx (Proposition 4.1), and then use this value to derive analytical expressions for the size

44、of the expansion spaceS (Theorem 4.1) and when to trigger a new expansion. Proposition 4.1Whenx , the GP-UCB acquisition functionUCB(x;Dt1) t, where tis the exploration-exploitation parameter of the GP-UCB acquisition function andis the scale factor of the kernel function k(x,x0). Derivation of the

45、expansion search spaceOur idea is to choose the regionSsuch thatS = Rd A, where 1)Acontains all the pointsxthat are far from all the current observations, and, 2) A := x Rd: |UCB(x;Dt1) t| ?/2. Here, we will show that with this choice ofS, there exists at least one point inSwhose acquisition functio

46、n value is within?from the acquisition function maximum, given? |t minxRd(UCB(x;Dt1)|. We consider three cases that can happen to the GP-UCB acquisition function (See Figure 1): Case 1 : The argmax of the GP-UCB acquisition function is at infi nity. This means that the GP-UCB acquisition function ma

47、ximum is equal to t. As the GP-UCB acquisition function is continuous and? |t minxRd(UCB(x;Dt1)|, hence, there exists a pointxusuch thatUCB(xu) = t ?/2 . By the defi nition ofS, it is straightforward thatxubelongs toS, thus proving that there exists a point inSwhose GP-UCB acquisition function value

48、 is within ? from the maximum of the acquisition function. Case 2: The argmax of the GP-UCB acquisition functionx0max is at a fi nite location and its acquisition function value is larger or equal t + ?/2. It is straightforward to see that the argmaxx0maxbelongs to the regionS and this is the point

49、that satisfi es |UCB(x0max;Dt1) maxxRdUCB(x;Dt1)| ?. Case 3 : The GP-UCB acquisition function argmax is at a fi nite location and the acquisition function maximum is smaller than t + ?/2. As the GP-UCB acquisition function is continuous and? |t minxRd(UCB(x;Dt1)|, there exists a pointxu S : UCB(xu;D

50、t1) = t ?/2. AsmaxxRdUCB(x;Dt1) t + ?/2, it follows directly that |UCB(xu;Dt1) maxxRdUCB(x;Dt1)| ?. Theorem 4.1 now formally derives an analytical expression for one way to defi ne region S. 4 Algorithm 1 Bayesian optimization with unknown search space (GPUCB-UBO) 1:Input:Gaussian Process (GP)M, acq

51、uisition functionsUCB,LCB, initial observationsDinit, initial search space Suser, function f, positive small threshold ?, evaluation budget T. 2:Output: Point x?: maxf(x) f(x?) ?. 3:Initialize D0= Dinit, S = Suser, 1, tk= 0. Update the GP using D0. 4:for t = 1,2,.,T do 5:Set tlocal= t tk 6:Compute x

52、m= argmaxxSUCB(x;Dt1) 7:Set xt= xm, yt= f(xt). Update Dt= Dt1 (xt,yt). 8:/ Compute the expansion trigger, the regret upper bound / 9:Compute rb= UCB(xt;Dt1) maxxDtLCB(x;Dt1) + 1/t2 local 10:/ If expansion triggered, expand the search space / 11:if (rb 5 L akexp(L/bk) 2,j = 1,d. Pick (0,1) , and defi

53、 net= 2log(t222/(3) + 2dlog(t2dbkrkplog(4dak/), then? 0, with probability larger than1 , thereTk: t Tk,maxxSkUCB(x;Dt1) maxxDtLCB(x;Dt1) ? 1/t2; andt that satisfi es the previous condition, maxxSkf(x) maxxDtf(x) ?. Second, we prove that with a proper choice oft and for a wide range class of kernels,

54、 after a fi nite number of iterations, our algorithm achieves the global ?-accuracy condition with high probability. Theorem 5.1DenoteSkas the series of the expansion search space suggested by our algorithm (k 1). In eachSk, letTk be the smallest number of iterations that satisfi es our expansion tr

55、iggered condition, i.e.rb,Sk(Tk) ?. Suppose the kernelk(x,x0)belong to the kernel classes listed in Proposition 5.1 and it satisfi es the following condition on the derivatives of GP sample pathsf: ak,bk 0,PrsupxSk|f/xj| L akexp(L/bk) 2,j = 1,d. Pick (0,1), and defi ne,t= 2log(t P jk1Tj) 222/(3) + 2

56、dlog(t P jk1Tj) 2dbkrkplog(4dak/), P jk1Tj + 1 t P jkTj,k = 1,2,. Then running the proposed algorithm with the above choice oftfor a samplefof a GP with mean function zero and covariance functionk(x,x0), after a fi nite number of iterations, we achieve global ?-accuracy with at least 1 probability, i.e. Prf(xmax) f(xsuggest) ? 1 , where xsuggestis the algorithm recommendation and xmaxis the objective function global argmax. DiscussionThe difference between our method and previous works is

人人文库> 全部分类> 应用文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space

文档简介

温馨提示

最新文档

评论

iccv2019论文全集9350-bayesian-optimization-with-unknown-search-space

文档简介

温馨提示

最新文档

评论

相关文档