NIPS20199029-fast-decomposable-submodular-function-minimization-using-constrained-total-variation_第1页
NIPS20199029-fast-decomposable-submodular-function-minimization-using-constrained-total-variation_第2页
NIPS20199029-fast-decomposable-submodular-function-minimization-using-constrained-total-variation_第3页
NIPS20199029-fast-decomposable-submodular-function-minimization-using-constrained-total-variation_第4页
NIPS20199029-fast-decomposable-submodular-function-minimization-using-constrained-total-variation_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Fast Decomposable Submodular Function Minimization using Constrained Total Variation K S Sesh Kumar Data Science Institute Imperial College London, UK s.karriimperial.ac.uk Francis Bach INRIA and Ecole normale superieure PSL Research University, Paris France. francis.bachinria.fr Thomas Pock Institute of Computer Graphics and Vision, Graz University of Technology, Graz, Austria. pockicg.tugraz.at Abstract We consider the problem of minimizing the sum of submodular set functions as- suming minimization oracles of each summand function. Most existing approaches reformulate the problem as the convex minimization of the sum of the correspond- ing Lovsz extensions and the squared Euclidean norm, leading to algorithms requiring total variation oracles of the summand functions; without further assump- tions, these more complex oracles require many calls to the simpler minimization oracles often available in practice. In this paper, we consider a modifi ed convex problem requiring a constrained version of the total variation oracles that can be solved with signifi cantly fewer calls to the simple minimization oracles. We support our claims by showing results on graph cuts for 2D and 3D graphs. 1Introduction A discrete functionF defi ned on a fi nite ground setVofnobjects is said to be submodular if the marginal cost of each object reduces with the increase in size of the set it is conditioned on, i.e., F : 2V Ris submodular if and only if the marginal cost of an objectx Vconditioned on the setA V xdenoted byF(x|A) = F(xA)F(A)reduces as the setAbecomes bigger. The diminishing returns property of submodular functions has been central to solving several machine learning problems such as document summarization 1, sensor placement 2 and graphcuts 3 (see 4 for more applications). Without loss of generality, we consider normalized submodular functions, i.e., F() = 0. Submodular function minimization (SFM) can be solved exactly using polynomial algorithms but with high computational complexity. One of the standard algorithms is the Fujishighe-Wolfe algorithm 5, 6 but most recently, SFM has been tackled using cutting plane methods 7 and geometric scaling 8. All the above algorithms rely on value function oracles, that is, access toF(A)for arbitrary subsets AofV, and solve SFM with high running-time complexities, e.g.,O(n4logO(1)n)and more. These algorithms are typically not trivial to implement and do not scale to problems with large ground sets (such asn = 106in computer vision applications). For scalable practical solutions, it is imperative to exploit the structural properties of the function to minimize. Submodularity is closed under addition 5,4. We make a structural assumption that is practically useful in many machine learning problems 9,10 (e.g., when a 2D-grid graph is seen as the concatenation of vertical and horizontal chains) and consider the problem of minimizing a sum of 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. submodular functions 11, i.e., min AV F(A) := r X i=1 Fi(A),(1) assuming each summandFi,i = 1,.,r , is “simple”, i.e., with available effi cient oracles which are more complex than plain function evaluations. The simplest of these oracles is being able to minimize the submodular functionFiplus some modular function, and we will consider these oracles in this paper. This is weaker than the usual “total variation” oracles detailed below. One of the standard approaches to solve the discrete optimization problem in Eq. (1) is to consider an equivalent continuous optimization problem that minimizes the Lovsz extension 12fof the submodular function over then -dimensional unit hypercube (see a defi nition in Section 2). This approach uses a well known result in the submodularity literature that the minimizers of the set functionFcan be directly obtained from the minimizers of its Lovsz extensionf; the continuous optimization problem is given by min w0,1n f(w) := r X i=1 fi(w),(2) wherefiis the Lovsz extensions of submodular functionFi, for eachi r. Lovsz extension of submodular functions are convex but non-smooth and piecewise linear functions. Therefore, we can use subgradients as they can be calculated using greedy algorithms inO(nlog(n)time andO(n) calls to the value function oracle per iteration 13. However, this is slow withO(1/t)convergence rate wheretis the number of iterations of the optimization algorithms. Moreover, in signal processing applications, high precision is needed, hence the need for faster algorithms. An alternative approach is to consider a continuous optimization problem 4, Chapter 8 of the form min wRn f(w) + n X j=1 (wj),(3) where : R Ris a convex function whose Fenchel-conjugate 14 is defi ned everywhere (in order to have a well-defi ned dual as later shown in Eq. (9). This is equivalent to solving all the following discrete optimization problems parameterized by R, min AV F(A) + |A|0().(4) Given the solutionsAfor all Rin Eq. (4), then we may obtain the optimal solutionwof Eq. (3) usingw j = sup R,j A. Conversely, given the optimal solutionwof Eq. (3), we may obtain the solutionsAof the discrete optimizaton problems in Eq. (4) by thresholding at, i.e.,w . As a consequence of this we can obtain the solution of Eq. (1), whenis chosen so that0() = 0(typically = 0becauseis even). Note that this algorithmic scheme seems wasteful because we take a continuous solution of Eq. (3) and only keep the signs of its solution. One contribution of the paper is to propose a function that focuses only on values of w close to zero. Our problem setting and approach.We assume the availability of SFM oracles of the individual summand functions, which we refer to asSFMDi, for eachi rthat gives the optimal solution of SFMDi: argmin AV Fi(A) u1A,(5) whereu Rnis anyn-dimensional vector and1A 0,1nis the indicator function of the setA. Note that the complexity of the oracle does not typically depend on the vectoru. We consider the following continuous optimization problem, which we refer to as SFMCifor each i r, SFMCi: argmin wRn fi(w) tw + n X j=1 (wj),(6) wheret Rnis anyn-dimensional vector. In our setting, we consideras the following convex function, (w) = ?1 2w 2 if |w| 6 , +otherwise, (7) 2 where R+. In Section 3.1, we show that the continuous optimization problemSFMCican be optimized using discrete oraclesSFMDi using a modifi ed divide-and-conquer algorithm 15. In Section 3.2, we use various optimization algorithms that useSFMCias the inner loop to solve the continuous optimization problem in Eq. (3) consequently solving the SFM problem in Eq. (1). Related work.Most of the earlier works have considered quadratic functions for the choice of 15,16,17,18,19,20, i.e.,(v) = 1 2v 2. As a result, SFMCiin Eq. (6) is referred to as total variation or TV oracle as they solve the problems of the formminwRnf(w)+ 1 2(tw) 2 21. These oraclesareeffi cientforcutfunctionsdefi nedonchaingraphs22,23withO(n)complexity. However, this does not hold for general submodular functions. One way to solve continuous optimization problems likeSFMCiis to use a sequence of at mostndiscrete minimization oracles likeSFMDi through divide-and-conquer algorithms (see Section 3.1). Recent work has also focused on using directly discrete minimization oracles of the formSFMDi, such as 16 that considers a total variation problem with active set methods; 24 used discrete mini- mization oraclesSFMDibut solved a different convex optimization problem. 25 also considers the total variation problem but uses incidence relations and oblique projections for quicker convergence. 26 reduces the search space for the SFM problem, i.e.,V, using heuristics. Our choice ofresults in a similar reduction of the search space that results in a more effi cient solution. Contributions.Our main contribution is to propose a new convex optimization problem that can be used to fi nd the minimum of a sum of submodular set-functions. For graph cuts, this new problem can be seen as a constrained total variation problem that is more effi cient that the regular total variation (lesser number of discrete minimization oracle calls). This is benefi cial when minimizing the sum of constrained total variation problems, and consequently benefi cial for the corresponding discrete minimization problem, i.e., minimizing the sum of submodular functions. For the case of sum of two functions, we show that recent acceleration techniques from 27 can be highly benefi cial in our case. This is validated using experiments on segmentation of two dimensional images and three dimensional volumetric surfaces. Note that we use cuts mainly due to easy access to minimization oracles of cut functions 28, but our result applies to all submodular functions. 2Review of Submodular Function Minimization (SFM) In this section, we review the relevant concepts from submodular analysis (for more details, see 4,5). All possible subsets of the ground setVcan be considered as the vertices0,1nof the hypercube inndimensions (going fromA Vto1A 0,1n). Thus, any set-function may be seen as a functionF defi ned on the vertices of the hypercube0,1n. It turns out thatFmay be extended to the full hypercube0,1nby piecewise-linear interpolation, and then to the whole vector space Rn4. This extensionfis piecewise linear for any set-functionF. It turns out that it is convex if and only ifFis submodular 12. Any piecewise linear convex function may be represented as the support function of a certain polytopeK, i.e., asf(w) = maxsKws14. For the Lovsz extension of a submodular function, there is an explicit description of K, which we now review. Base polytope. We defi ne the base polytope asB(F) = ?s Rn, s(V ) = F(V ), A V,s(A) 6 F(A)?,where we use the classical notations(A) = s1A. A key result in submodular analysis is that the Lovsz extension is the support function of B(F), that is, for any w Rn, f(w) =sup sB(F) ws.(8) The maximizers above may be computed in closed form from an ordered level-set representation ofw using a “greedy algorithm”, which (a) fi rst sorts the elements ofwin decreasing order such thatw(1) . w(n)whererepresents the order of the elements inV; and (b) computes s(k)= F(1),.,(k) F(1),.,(k 1). This leads to a closed-form formula for f(w) and a subgradient. 3 SFM as a convex optimization problem.A key result from submodular analysis 12 is the equivalence between the SFM problemminAVF(A)and the convex optimization problem minw0,1nf(w). One can then obtain an optimalAfrom level sets of an optimalw. More- over, this leads to the dual problemmaxsB(F) Pn i=1(si). Note that for our algorithm to work, we need oracles SFMDithat output both the primal variable (A or w) and the dual variable s B(F). Convex optimization and its dual.We consider the continuous optimization problem in Eq. (3). Its dual problem derived using Eq. (8) is given by max sB(F) n X j=1 (sj).(9) In this paper, we consider the convex function : R R defi ned in Eq. (7). Its Fenchel-conjugate is given by (s) = ( 1 2s 2 if |s| 6 , |s| 2 2 otherwise. (10) 3Fast Submodular Function Minimization with Constrained Total Variation In this section, we propose an algorithm to optimize the continuous optimization problem in Eq. (3) using minimization oracles of individual discrete functionsSFMDi in Eq. (5). As a fi rst step, we propose a modifi ed divide-and-conquer algorithm to solve the continuous optimization problem SFMCi, for eachi rin Section 3.1. In Section 3.2, we use the optimization problemsSFMCias black boxes to solve the continuous optimization problem in Eq. (3). The overview is provided in Algorithm 1. Algorithm 1 From SFMDito SFMC 1:Input Discrete function minimization oracle for Fi: 2V R and R+. 2:output Optimal primal/dual solutions for Eq. (3) and Eq. (9) respectively (w,s) 3:for all i r do 4:OptimizeSFMCusingSFMCiby applying algorithms in Section 3.3 and Section 3.4. This requires optimal primal-dual solutions ofSFMCi,(w i,si)that may be obtained using Algo- rithm 2 in Section 3.1 assuming oracles SFMDi. 5:end for 6:(w(U),s(U) = (w U,s U) 3.1Single submodular function For brevity, we drop the subscriptiand consider the following primal optimization problem. Algo- rithm 2 below is an extension of the classical divide-and-conquer algorithm from 29. Note that it requires access to dual certifi cates for the SFM problems. Algorithm 2 From SFMDito SFMCi 1:Input Discrete function minimization oracle for F : 2V R and R+. 2:output Optimal primal/dual solutions for Eq. (3) and Eq. (9) respectively (w,s) 3:A+= argminAV F(A) + |A| with a dual certifi cate s+ B(F). 4:A= argminAV F(A) |A| with a dual certifi cate s B(F) (we must have A+ A) 5:w(A+) = , s(A+) = s+, w(V A) = , s(V A) = s 6:U := A A+and a discrete functionG : 2Us.t.G(B) = F(A+ B) F(A+)with Lovsz extension g : 0,1|U| R. 7:Solve for optimal solutions ofminwR|U|g(w) + 1 2w 2 and its dual using divide-and-conquer algorithm 16 to obtain (w U,s U). 8:(w(U),s(U) = (w U,s U) 4 Proposition 1Algorithm 2 gives an optimal primal-dual pair for the optimization problem in Eq. (3) and Eq. (9) respectively. See the proof in the supplementary material (Section A). Note that the number of steps is at most the number of different values thatwmay take (the solutionwis known to have many equal components 30). In the worst case, this is stilln, but in practice many components are equal to or, thus reducing the number of SFM calls (forvery close to zero, only two calls are necessary). In Section 5, we show empirically that this is the case, the number of SFM calls decreases signifi cantly when tends to zero. 3.2Sum of submodular functions In this section, we consider the optimization problem in Eq. (3) with the functionfrom Eq. (7). The primal optimization problem is given by min w,n r X i=1 fi(w) + 1 2kwk 2 2. (11) In order to derive a dual problem with the appropriate structure, we consider the functionsgi defi ned as follows: gi(w) = fi(w) if |w| 6 , and + otherwise, with the Fenchel conjugate g i(si) = sup w,n wsi fi(w) =inf tiB(Fi) sup w,n w(si ti) = inf tiB(Fi) ksi tik1. Therefore, we can derive the following dual: min w,n r X i=1 fi(w) + 1 2kwk 2 2 =min wRn r X i=1 gi(w) + 1 2kwk 2 2 =min wRn r X i=1 max siRn ?ws i gi(si) ? + 1 2kwk 2 2 =max (s1,.,sr)Rnr r X i=1 g i(si) 1 2 ? ? r X i=1 si? ?2 2. (12) We are now faced with the similar optimization problem than previous work 15, where the primal problems is equivalent to computing the proximity operator of the sum of functionsg1+ g2. The main difference is that when is infi nite (i.e., with no constraints), then the dual functionsg i are indicator functions of the base polytopesB(Fi), and the dual problem in Eq. (12) can be seen as fi nding the distance between two polytopes. This is not the case for our constrained functions. This limits the choice of algorithms. In this paper, we consider block-coordinate ascent (which was already considered in 15, leading to alternate projection algorithms), and a novel recent accelerated coordinate descent algorithm 27. We could also consider (accelerated) proximal gradient descent on the dual problem in Eq. (12), but it was shown empirically to be worse than alternating refl ections 15 (which we compare to in experiments, but which we cannot readily extend without adding a new hyperparameter). 3.3Optimization algorithms for all r All of our algorithms will rely on the computing the proximity operator of the functionsg i, which we now consider. Proximity operator.The key component we will need is the so-called proximal operator ofg i, that is being able to compute effi ciently, for a certain , min siRn g i(si) + 1 2ksi tik2 2. Using the classical Moreau identity 31, this is equivalent to solving min siRn g i(si) + 1 2ksi tik2 2 =min siRn max wiRn w i si gi(wi) + 1 2ksik 2 2+ 1 2ktik 2 2 1 s i ti =max wiRn gi(wi) 2kwi 1 tik 2 2+ 1 2ktik 2 2 =max wiRn gi(wi) + w i ti 2kwik 2 2. 5 This is exactly the oracleSFMCi, for which we presented in Section 3.1 an algorithm using only the discrete oracles SFMDi. Block coordinate ascent.We consider the following iteration i r, snew i = argmin snew i Rn g i(s new i ) + 1 2 ? ?Pi j=1s new j + Pr j=i+1sj ? ?2 2, which is exactly block-coordinate ascent on the dual problem in Eq. (12). Since the non-smooth function Pr i=1g i(si)is separable, it is globally convergent, with a convergence rate at least equal to O(1/t), where t is the number of iterations (see, e.g., Theorem-6.7 of 32). 3.4Acceleration for the special case r = 2 When there are only two functions, following 27, the problem in Eq. (12) can be written as: max s1Rn g 1(s1) h1(s1), (13) with h1(s1)=inf s2Rn g 2(s2) + 1 2ks1 + s2k2= sup w2Rn inf s2Rn w 2s2 g2(w2) + 1 2ks1 + s2k2 =sup w2Rn g2(w2) + 1 2ks1k 2 2 1 2kw2 + s1k2 2, with s2= w2+ s1, =sup w2Rn g2(w2) 1 2kw2k 2 2 w2s1= sup w2,n f2(w2) w 2s1 1 2kw2k 2 2. The functionh1is1-smooth with gradient equal toh01(s1) = s1+ s 2(s1). Applying proximal gradient to the problem of maximizing maxs1Rng 1(s1) h1(s1) leads to the iteration snew 2 =argmin snew 2 Rn g 2(s new 2 ) + 1 2ks1 + snew 2 k2 snew 1 =argmin snew 1 Rn g 1(snew1 ) + 1 2ks new 1 s1k2 2+ h01(s1)(snew1 s1) =argmin snew 1 Rn g 1(snew1 ) + 1 2ks new 1 s1k2 2+ (s1+ snew2 )(snew 1 s1) =argmin snew 1 Rn g 1(snew1 ) + 1 2ks new 1 + snew 2 k2 2, which is exactly block coordinate descent. Each of these steps are exactly using the same oracle as before. We can now accelerate it using FI

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论