NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback

上传人：我*** IP属地：北京上传时间：2020-06-13 格式：PDF 页数：12 大小：347.56KB 积分：12 举报 版权申诉

NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback_第2页

NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback_第3页

NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback_第4页

NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback_第5页

已阅读5页，还剩7页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback Mingrui ZhangLin ChenHamed HassaniAmin Karbasi, Department of Statistics and Data Science, Yale University Department of Electrical Engineering, Yale University Department of Electrical and Systems Engineering, University of Pennsylvania Department of Computer Science, Yale University mingrui.zhang, lin.chen, Abstract In this paper, we propose three online algorithms for submodular maximization. The fi rst one,Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations fromT1/218 andT3/217 to 1, and achieves a(1 1/e)-regret bound ofO(T4/5). The second one,Bandit-Frank-Wolfe , is the fi rst bandit al- gorithm for continuous DR-submodular maximization, which achieves a(11/e)- regret bound ofO(T8/9). Finally, we extendBandit-Frank-Wolfeto a bandit algorithm for discrete submodular maximization,Responsive-Frank-Wolfe, which attains a(1 1/e)-regret bound ofO(T8/9)in the responsive bandit set- ting. 1Introduction Submodularity naturally arises in a variety of disciplines, and has numerous applications in machine learning, including data summarization 45, active and semi-supervised learning 26,47, com- pressed sensing and structured sparsity 7, fairness in machine learning 8 , mean-fi eld inference in probabilistic models 10, and MAP inference in determinantal point processes (DPPs) 36. We say that a set functionf : 2 R0 defi ned on a fi nite ground setis submodular if for every A B andx B, we havef(x|A) f(x|B), wheref(x|A) , f(A x) f(A) is a discrete derivative 39. Continuous DR-submodular functions are the continuous analogue. LetF : X R0 be a differentiable function defi ned on a boxX , Qd i=1Xi, where eachXi is a closed interval ofR0. We say thatFis continuous DR-submodular if for everyx,y Xthat satisfyx yand everyi d , 1,.,d, we have F xi(x) F xi(y), where x ymeans xi yi,i d 9. In this paper, we focus on online and bandit maximization of submodular set functions and contin- uous DR-submodular functions. In contrast to offl ine optimization where the objective function is completely known beforehand, online optimization can be viewed as a two-player game between the player and the adversary in a sequential manner 50,42,28. LetFbe a family of real-valued functions. The player wants to maximize a sequence of functionsF1,.,FT Fsubject to a constraint setK. The player has no a priori knowledge of the functions, while the constraint set is known and we assume that it is a closed convex set inRd. The natural numberTis termed the horizon of the online optimization problem. At thet-th iteration, without the knowledge ofFt, the player has to select a pointxt K. After the player commits to this choice, the adversary selects a functionFt F. The player receives a rewardFt(xt), observes the functionFtdetermined by the adversary, and proceeds to the next iteration. In the more challenging bandit setting, even the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. functionFtis unavailable to the player and the only observable information is the reward that the player receives 23, 3, 11. The performance of the algorithm that the player uses to determine her choicesx1,.,xTis quantifi ed by the regret, which is the gap between her accumulated reward and the reward of the best singlechoiceinhindsight. Tobeprecise, theregretisdefi nedbymaxxK PT t=1Ft(x) PT t=1Ft(xt). However, even in the offl ine scenario, it is shown that the maximization problem of a continuous DR-submodular function cannot be approximated within a factor of(1 1/e + ?)for any? 0in polynomial time, unless RP = NP 9. Therefore, we consider the (1 1/e)-regret 44, 34, 18 R11/e,T, (1 1/e)max xK T X t=1 Ft(x) T X t=1 Ft(xt). For ease of notation, we write RTfor R11/e,Tthroughout this paper. In this paper, we study the following three problems: OCSM: the Online Continuous DR-Submodular Maximization problem, BCSM: the Bandit Continuous DR-Submodular Maximization problem, and RBSM: the Responsive Bandit Submodular Maximization problem. We note that although special cases of bandit submodular maximization problem (BSM) were studied in 44,27, the vanillaBSMproblem is still open for general monotone submodular functions under a matroid constraint. InBSM, the objective functionsf1,.,fT are submodular set functions defi ned on a common fi nite ground setand subject to a common constraintI. For each functionfi, the player has to select a subsetXi I. Only after playing the subsetXi, the rewardfi(Xi)is received and thereby observed. If the value of the corresponding multilinear extension1Fcan be estimated by the submodular set functionf, we may expect to solve the vanillaBSMby invoking algorithms for continuous DR- submodular maximization. In this paper, however, we will show a hardness result that subject to some constraintI, it is impossible to construct a one-point unbiased estimator of the multilinear extension Fbased on the value off, without knowing the information offin advance. This result motivates the study of a slightly relaxed setting termed the Responsive Bandit Submodular Maximization problem (RBSM). InRBSM, at roundi, ifXi/ I, the player is still allowed to playXiand observe the function value fi(Xi), but gets zero reward out of it. OCSMwas studied in 18,17, whereT1/2exact gradient evaluations orT3/2stochastic gradient evaluations are required per iteration (Tis the horizon). Therefore, they cannot be extended to the bandit setting (BCSMandRBSM) where one single function evaluation per iteration is permitted. As a result, no known bandit algorithm attains a sublinear (1 1/e)-regret. In this paper, we fi rst proposeMono-Frank-WolfeforOCSM, which requires one stochastic gra- dient per function and still attains a(1 1/e)-regret bound ofO(T4/5) . This is signifi cant as it reduces the number of per-function gradient evaluations fromT3/2to 1. Furthermore, it pro- vides a feasible avenue to solvingBCSMandRBSM. We then proposeBandit-Frank-Wolfeand Responsive-Frank-Wolfethat attain a(1 1/e)-regret bound ofO(T8/9)forBCSMandRBSM, respectively. To the best of our knowledge,Bandit-Frank-WolfeandResponsive-Frank-Wolfe are the fi rst algorithms that attain a sublinear(11/e)-regret bound forBCSMandRBSM, respectively. The performance of prior approaches and our proposed algorithms is summarized in Table 1. We also list further related works in Appendix A. 2Preliminaries Monotonicity, Smoothness, and Directional Concavity PropertyA submodular set function f : 2 R is called monotone if for any two sets A B we have f(A) f(B). For two vectorsxandy, we writex yifxi yiholds for everyi. LetFbe a continuous DR-submodular function defi ned onX. We say thatFis monotone ifF(x) F(y)for every x,y Xobeyingx y. Additionally,Fis calledL-smooth if for everyx,y Xit holds that 1 We formally defi ne the multilinear extension of a submodular set function in Section 2. 2 Table 1: Comparison of previous and our proposed algorithms. SettingAlgorithm Stochastic# of grad. (1 1/e)-regret gradientevaluations OCSM Meta-FW 18NoT1/2O(T) VR-FW 17YesT3/2O(T) Mono-FW (this work)Yes1O(T4/5) BCSMBandit-FW (this work)-O(T8/9) RBSMResponsive-FW (this work)-O(T8/9) kF(x) F(y)k Lkx yk. Throughout the paper, we use the notationkkfor the Euclidean norm. An important implication of continuous DR-submodularity is concavity along the non-negative directions 16, 9, i.e., for all x y, we have F(y) F(x) + hF(x),y xi. Multilinear ExtensionGiven a submodular set functionf : 2 R0 defi ned on a fi nite ground set, its multilinear extension is a continuous DR-submodular functionF : 0,1| R0 defi ned byF(x) = P Sf(S)iSxij/ S(1xj), wherexiis thei-th coordinate ofx. Equivalently, for any vectorx 0,1|we haveF(x) = ESxf(S)whereS xmeans thatSis a random subset of such that every element i is contained in S independently with probability xi. Geometric NotationsThed-dimensional unit ball is denoted byBd, and the(d 1)-dimensional unit sphere is denoted bySd1. LetK be a bounded set. We defi ne its diameterD = supx,yKkxyk and radius R = supxKkxk. We say a set K has lower bound u if u K, and x K,x u. 3One-shot Online Continuous DR-Submodular Maximization In this section, we proposeMono-Frank-Wolfe, an online continuous DR-submodular maximization algorithm which only needs one gradient evaluation per function. This algorithm is the basis of the methods presented in the next section for the bandit setting. We also note that throughout this paper, F denotes the exact gradient for F, while F denotes the stochastic gradient. We begin by reviewing the Frank-Wolfe (FW) 24,33 method for maximizing monotone continuous DR-submodular functions in the offl ine setting 9, where we have one single objective functionF. Assuming that we have access to the exact gradientF, the FW method is an iterative procedure that starts from the initial point x(1)= 0, and at the k-th iteration, solves a linear optimization problem v(k) argmax vK hv,F(x(k)i(1) which is used to update x(k+1) x(k)+ kv(k), where kis the step size. We aim to extend the FW method to the online setting. Inspired by the FW update above, to get high rewards for each objective functionFt, we start fromx(1) t = 0, updatex(k+1) t = x(k) t + kv(k) t for multiple iterations (letKdenote the number of iterations), then play the last iteratex(K+1) t forFt. To obtain the pointx(K+1) t which we play, we need to solve the linear program Eq. (1) and thus getv(k) t , where we have to know the gradient in advance. However, in the online setting, we can only observe the stochastic gradient Ftafter we play some point forFt. So the key issue is to obtain the vector v(k) t which at least approximately maximizes h,Ft(x(k) t )i, before we play some point for Ft. To do so, we useKno-regret online linear maximization oraclesE(k),k K, and letv(k) t be the output vector ofE(k)at roundt. Once we updatex(k+1) t byv(k) t for allk K, and playx(K+1) t for Ft, we can observe Ft(x(k) t )and iteratively constructd(k) t = (1 k)d(k1) t + kFt(x(k) t ), an estimation ofFt(x(k) t )with a lower variance than Ft(x(k) t )37,38 for allk K. Then we set h,d(k) t ias the objective function for oracleE(k)at roundt. Thanks to the no-regret property ofE(k), 3 v(k) t , which is obtained before we play some point forFtand observe the gradient, approximately maximizes h,d(k) t i, thus also approximately maximizes h,Ft(x(k) t )i. This approach was fi rst proposed in 18,17, where stochastic gradients atK = T3/2points (i.e., x(k) t ,k K) are required for each functionFt. To carry this general idea into the one-shot setting where we can only access one gradient per function, we need the following blocking procedure. We divide the upcoming objective functionsF1,.,FTintoQequisized blocks of sizeK(so T = QK). For theq -th block, we fi rst setx(1) q = 0, updatex(k+1) q = x(k) q + kv(k) q , and play the same pointxq= x(K+1) q for all the functionsF(q1)K+1,.,FqK. The reason why we play the same pointxq will be explained later. We also defi ne the average function in theq-th block as Fq, 1 K PK k=1F(q1)K+k. In order to reduce the required number of gradients per function, the key idea is to view the average functions F1,., FQas virtual objective functions. Precisely, in theq-th block, let(tq,1,.,tq,K)be a random permutation of the indices(q 1)K + 1,.,qK. After we update all thex(k) q, for eachFt, we playxq and fi nd the correspondingk0such thatt = tq,k0, then observe Ft(i.e., Ftq,k0) atx(k 0) q . Thus we can obtain Ftq,k(x(k) q)for all k K. Sincetq,kis a random variable such thatEFtq,k = Fq, Ftq,k(x(k) q)is also an estimation ofFq(x(k) q), which holds for allk K. As a result, with only one gradient evaluation per function Ftq,k, we can obtain stochastic gradients of the virtual objective function FqatKpoints. In this way, the required number of per-function gradient evaluations is reduced from K to 1 successfully. Note that since we playyt= xqfor eachFtin theq-th block, the regret w.r.t. the original objective functions and that w.r.t. the average functions satisfy that (1 1/e)max xK T X t=1 Ft(x) T X t=1 Ft(yt) = K (1 1/e)max xK Q X q=1 Fq(x) Q X t=1 Fq(xq) # , which makes it possible to view the functions Fqas virtual objective functions in the regret analysis. Moreover, we iteratively constructd(k) q = (1 k)d(k1) q + kFtq,k(x(k) q)as an estimation of Ftq,k(x(k) q), thus also an estimation ofFq(x (k) q). Sov (k) q , the output ofE(k), approximately maximizesh,Fq(x(k) q)i . Inspired by the offl ine FW method, playingxq= x(K+1) q , the last iterate in the FW procedure, may obtain high rewards for Fq. As a result, we play the same pointxqin the q-th block. We also note that oncetq,1,.,tq,kare revealed, conditioned on the knowledge, the expecta- tion ofFtq,k+1is no longer the average function Fqbut the residual average function Fq,k(x) = 1 Kk PK i=k+1Ftq,i(x). As more indicestq,k are revealed, Fq,kbecomes increasingly different from Fq, which makes the observed gradient Ftq,k+1(x(k+1) q )not a good estimation ofFq(x(k+1) q )any more. As a result, although we use the averaging technique (the update ofd(k) q) as in 37,38 for vari- ance reduction, a completely different gradient error analysis is required. In Lemma 6 (Appendix B), we establish that the squared error ofd(k) q exhibits an inverted bell-shaped tendency; i.e., the squared error is large at the initial and fi nal stages and is small at the intermediate stage. We present our proposed Mono-Frank-Wolfe algorithm in Algorithm 1. We will show thatMono-Frank-Wolfeachieves a(1 1/e)-regret bound ofO(T4/5). In order to prove this result, we fi rst make the following assumptions on the constraint setK, the objective functions Ft, the stochastic gradient Ft, and the online linear maximization oracles. Assumption 1. The constraint set K is a convex and compact set that contains 0. Assumption 2.Every objective functionFtis monotone, continuous DR-Submodular,L1-Lipschitz, and L2-smooth. Assumption 3.The stochastic gradient Ft(x)is unbiased, i.e.,EFt(x) = Ft(x). Addi- tionally, it has a uniformly bounded normkFt(x)k M0and a uniformly bounded variance EkFt(x) Ft(x)k2 2 0for every x K and objective function Ft. 4 Algorithm 1 Mono-Frank-Wolfe Input:constraint setK, horizonT, block sizeK, online linear maximization oracles onK: E(1),E(K), step sizes k (0,1),k (0,1), number of blocks Q = T/K Output: y1,y2,. 1:for q = 1,2,.,Q do 2:d(0) q 0, x(1) q 0 3:Fork = 1,2,.,K, letv(k) q Kbe the output ofE(k)in roundq,x(k+1) q x(k) q +kv(k) q . Set xq x(K+1) q 4:Let (tq,1,.,tq,K) be a random permutation of (q 1)K + 1,.,qK 5:Fort = (q 1)K + 1,.,qK, playyt= xqand obtain the rewardFt(yt) ; fi nd the corresponding k0 K such that t = tq,k0, observe Ft(x(k 0) q ), i.e., Ftq,k0(x(k 0) q ) 6:Fork = 1,2,.,K,d(k) q (1 k)d(k1) q + kFtq,k(x(k) q), computehv (k) q ,d(k) qias reward for E(k), and feed back d(k) q to E(k) 7:end for Assumption 4.For the online linear maximization oracles, the regret at horizont(denoted byRE (i) t ) satisfi es RE (i) t Ct,i K, where C 0 is a constant. Note that there exist online linear maximization oraclesE(i)with regretRE (i) t Ct,i Kfor any horizon t (for example, the online gradient descent 50). Therefore, Assumption 4 is fulfi lled. Theorem 1(Proof in Appendix B).Under Assumptions 1 to 4, if we setK = T3/5,k= 1 K,k = 2 (k+3)2/3 when1 k K/2+1, andk= 1.5 (Kk+2)2/3 whenK/2+2 k K, where we assume thatKis even for simplicity, thenyt K,t, and the expected(1 1/e)-regret of Algorithm 1 is at most ERT (N + C + D2)T4/5+ L2D2 2 T2/5, whereN = max52/3(L1+M0)2,4(L2 1+20)+32G,2.25(L21+20)+7G/3,G = (L2R+2L1)2. 4Bandit Continuous DR-Submodular Maximization In this section, we present the fi rst bandit algorithm for continuous DR-submodular maximization, Bandit-Frank-Wolfe, which attains a(11/e)-regret bound ofO(T8/9). We begin by explaining the one-point gradient estimator 23, which is crucial to the proposed bandit algorithm. The proposed algorithm and main results are illustrated in Section 4.2. 4.1One-Point Gradient Estimator Given a functionF , we defi ne its-smoothed version F(x) , EvBdF(x + v), wherev Bd denotes thatvis drawn uniformly at random from the unit ballBd. Thus the functionFis averaged over a ball of radius . It can be easily verifi ed that ifFis monotone, continuous DR-submodular, L1-Lipschitz, andL2-smooth, then so is F, and for allxwe have|F(x) F(x)| L1(Lemma 7 in Appendix C). So the-smoothed version Fis indeed an approximation ofF. A maximizer of F also maximizes F approximately. More importantly, the gradient of the smoothed function Fadmits a one-point unbiased estimator 23,32:F(x) = EuSd1 ?d F(x + u)u ?, where u Sd1denotes thatuis drawn uniformly at random from the unit sphereSd1. Thus the player can estimate the gradient of the smoothed function at pointxby playing the random pointx + ufor the original functionF. So usually, we can extend a one-shot online algorithm to the bandit setting by replacing the observed stochastic gradients with the one-point gradient estimations. In our setting, however, we cannot use the one-point gradient estimator directly. When the pointxis close to the boundary of the constraint setK, the pointx + umay fall outside ofK. To address this 5 issue, we introduce the notion of -interior. A set is said to be a -interior of K if it is

人人文库> 全部分类> 应用文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback

文档简介

温馨提示

最新文档

评论

NIPS20199120-online-continuous-submodular-maximization-from-full-ination-to-bandit-feedback

文档简介

温馨提示

最新文档

评论

相关文档