Relative Attributes-带注释修正版.doc_第1页
Relative Attributes-带注释修正版.doc_第2页
Relative Attributes-带注释修正版.doc_第3页
Relative Attributes-带注释修正版.doc_第4页
Relative Attributes-带注释修正版.doc_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Relative Attributes (相对属性)Devi ParikhKristen GraumanToyata Technological Institute Chicago (TTIC)University of Texas at AAbstract Human-nameable visual “attributes” can benefit various recognition tasks. However, existing techniques restrict these properties to categorical labels (for example, a person is smiling or not, a scene is dry or not), and thus fail to capture more general semantic relationships. We propose to model relative attributes.我们提出给相关属性建模。 Given training data stating how object/scene categories relate according to different attributes, we learning a ranking function per attribute我们为每个属性学习出了一个排名函数。. The learned ranking functions predict the relative strength of each property in novel images. We then build a generative model over the joint space of attribute ranking outputs在属性排名输出联合空间上创建了一个生成式模型。, and propose a novel form of zero-shot learning给出了一个zero-shot学习的新颖形式。 in which the supervisor relates the unseen object category to previously seen objects via attributes (for example, bears are furrier than giraffes). We further show how the proposed relative attributes enable richer textual descriptions for new images实验表明相关属性有助于描述更丰富的纹理。, which in practice are more precise for human interpretation. We demonstrate the approach on datasets of faces and natural scenes, and show its clear advantages over traditional attribute prediction for these new tasks.1. IntroductionWhile traditional visual recognition approaches map low-level image features directly to object category labels, recent work proposes models using visual attributes 1-8. Attributes are properties observable in images that have human-designated names (e.g., striped, four-legged), and they are valuable as a new semantic cue in various problems传统视觉识别方法直接将底层特征映射到目标类标签。最近的工作表明,属性(可视作中层描述)在各种问题中是一种有价值的语义线索. For example, researchers have shown their impact for strengthening facial verification 5,object recognition 6, 8,16, generating descriptions of unfamiliar objects 1, and to facilitate “zero-shot” transfer learning 2, where one trains a classifier for an unseen object simply by specifying which attributes it has.Problem问题:绝大多数现有工作仅将属性当做某property存在与否的二值预测。这对某些基于组件的属性和二值properties也许足够了,但是对大多数属性来说,这种二值设定有局限性,并且不自然。: Most existing work focuses wholly on attributes as binary predicates indicating the presence (or absence) of a certain property in an image 1-8, 16. This may suffice for part-based attributes (e.g., has a head) and some binary properties (e.g., spotted). However, for a large variety of attributes, not only is this binary setting restrictive, but it is also unnatural. For instance, it is not clear if in Figure 1(b) Hugh Laurie is smiling or not; different people are likely to respond inconsistently in providing the presence or absence of the smiling attribute for this image, or of the natural attribute for Figure 1(e).Indeed, we observe that relative visual properties are a semantically rich way by which humans describe and compare objects in the world. They are necessary, for instance, to refine an identifying description (“the rounder pillow”; “ the same except bluer”), or to situate with respect to reference objects (“brighter than a candle; dimmer than a flashlight”). Furthermore, they have potential to enhance active and interactive learningfor instance, offering a better guide for visual search (“find me similar shoes, but shinier.” Or “refine the retrieved images of down Chicago to those taken on sunnier days”).Proposal:解决方案(1):为相对属性建模。与预测某个属性是否存在不同,“相对属性”指明相对于其他图像,一个属性在图像中(存在的)强度。 In this work, we propose to model relative attributes. As opposed to predicting the presence of an attributes, a relative attribute indicates the strength of an attribute in an image with respect to other images. For example, in Figure 1, while it is difficult to assign a meaningful value to the binary attribute smiling, we could all agree on the relative attribute, i.e. Hugh Laurie is smiling less than Scarlett Johansson, but more than Jared Leto. In addition to being more natural, relative attributes would offer a richer mode of communication, thus allowing access to more detailed supervision (and so potentially higher recognition accuracy), as well as the ability to generate more informative descriptions of novel images.How can we learn relative properties解决方案(2):设计了一个方法,在给出多对样本的相对相似性约束后,为每个属性学习一个ranking函数。? Whereas traditional supervised classification is appropriate to learn attributes that are intrinsically binary, it falls short when we want to represent visual properties that are nameable but not categorical. Our goal is instead to estimate the degree of that attributes presencewhich, importantly, differs from the probability of a binary classifiers prediction. To this end, we devise an approach that learns a ranking function for each attribute, given relative similarity constraints on pairs of examples (or more generally a partial ordering on some examples). The learned ranking function can estimate a real-valued rank Throughout this paper we refer to rank as a real-valued score. for images indicating the relative strength of the attribute presence in them. Then, we introduce novel forms of zero-shot learning and description that exploit the relative attribute predictions.The proposed ranking approach accounts for a subtle but important difference between relative attributes and conceivable alternatives based on regression or multi-way classification. While such alternatives could also allow for a richer vocabulary, during training they could suffer from similar inconsistencies as binary attributes. For example, it is more difficult to define and perhaps more importantly, agree on, “With what strength is he smiling?” than “Is he smiling more than she is?” Thus, we expect the relative mode of supervision our approach permits to be more natural and consistent for human labelers.Contributions贡献:1.提出了学习相对视觉属性的思想; 2.设计和演示了充分体现相对属性(优越性)的两个新任务(1)从相对属性比较进行zero-shot学习和(2)参照样本图像或类别进行图像描述。: Our main contribution is the idea to learn relative visual attributes, which to our knowledge has not been explored in any prior work. Our other contribution is to devise and demonstrate two new tasks well-served by relative attributes: (1) zero-shot learning from relative comparisons, and (2) image description in reference to example images or categories. We demonstrate the approach for both tasks using the Outdoor Scenes dataset 11 and a subset of the Public Figure Face Database 12. We find that relative attributes yield significantly better zero-shot learning accuracy when compared to their binary counterparts. In addition, we conduct human subject studies to evaluate the informativeness of automatically generated image descriptions, and find that relative attributes are clearly more powerful than exiting binary attributes in uniquely identifying an image.2. Related WorkWe review related work on visual attributes, other uses of relative cues, and methods for learning comparisons.Binary attributes: Learning attribute categories allows prediction of color or texture types 13, and can also provide a mid-level cue for object or face recognition 2, 5, 8. Beyond object recognition, the semantics intrinsic to attributes enable zero-shot transfer 2, 6, 14, or description and part localization 1, 7, 15. Rather than manually define attribute vocabularies, some work aim to discover attribute-related concepts on the Web 3, 4, extract them from existing knowledge sources 6, 16 or discover them interactively 9. In contrast to our approach, all such methods restrict the attributes to be categorical (and in fact, binary与我们的方法不同,所有现有方法都将属性限制为分类的(实际上是二元的)。).Relative information: Relative information has been explored in vision in a variety of ways. Recent work on large-scale recognition exploits WordNet-based information to specify a semantic-distance sensitive classifier 18, or to make do with few labels by sharing training images among semantically similar classes 17. Stemming from a related motivation of limited labeled data, Wang et al. 19 make use of explicit similarity-based supervision such as “A serval is like a leopard” or “A zebra is similar to the cross- walk in texture” to share training instances for categories with limited or no training instances. Unlike our approach, that method learns a model for each object category, and does not model attributes与我们的方法不同,该方法为每个目标类建模,而不是为属性建模。而我们的属性模型是类别无关的,并且可传递,可在类间进行相对描述。我们的方法用图像属性给图像排序,构建了通用的图像排序. In contrast, our attribute models are category- independent and transferrable, enabling relative descriptions between all classes. Moreover, whereas that technique captures similarity among object categories, ours models a general ordering of the images sorted by the strength of their attributes, as well as a joint space over multiple such relative attributes.Kumar et al. 12 explore comparative facial attributes such as “Lips like Barack Obama” for face verification. These attributes, although comparative, are also modeled as binary classifiers and are similarity-based as opposed to an ordering. Gupta et al. 20 and Siddiquie et al. 21 use prepositions and adjectives to relate objects to each other for more effective contextual modeling and active learning, respectively. In contrast, our work involves relative modeling of attribute strengths for a richer vocabulary that enhances supervision and description of images.Learning to rank: Learning to rank has received extensive attention in the machine learning literature 22-24, for information retrieval in general and image retrieval in particular 25, 26. Given a query image, user preferences (often capture via click-data) are incorporated to learn a ranking function with the goal of retrieving more relevant images in the top search results. Learned distance metrics (e.g., 27, 28) can induce a ranking on images; however, this ranking is also specific to a query image, and typically intended for nearest-neighbor-based classifiers. Our work learns a ranking function on images based on constraints specifying the relative strength of attributes, and the resulting function is not relative to any other image in the dataset. Thus, unlike query-centric retrieval tasks, we can characterize individual image by the strength of the attributes present, which we show is valuable for new recognition and description applications与以查询为中心的检索不同,我们使用属性存在强度来描述特定图像。我们的工作表明属性存在强度对新的识别和描述应用很重要。.3. Approach We firstly present our approach for learning relative attributes (Section 3.1), and then explain how we can use relative attributes for enhanced zero-shot learning (Section 3.2) and image description generation (Section 3.3).3.1. Learning Relative AttributesWe are given a set of training images represented in by feature-vectorsand a set of attributes.In addition, for each attribute , we are given a set of ordered pairs of images and a set of un-ordered pairs of such that i.e. image has a stronger presence of attribute than image, and, i.e. and have similar relative strength of . We note that and can be deduced from any partial ordering of images in training data with respect to strengths of按某属性对训练图像集进行偏排序,我们就能够得到图像有序/无序对集合/. Either or, but not both, can be empty.Our goal is to learn M ranking functions (1怎么这个ranking函数就一定能够表示为线性组合的形式呢? 学习的结果:对每个attribute有一个系数向量,或者说超平面。)for ,such that the maximum number of the following constraints is satisfied: (2) (3)While this is a NP hard problem 22, it is possible to approximate the solution with the introduction of non-negative slack variables, similar to SVM classification. We directly adapt the formulation proposed in 22, which was originally applied to web ranking, except we use a quadratic loss function together with similarity constraints, leading to the following optimization problem: (4)需要看22来理解(1)是如何转化成(4)的优化问题的,好像也不是什么问题,就是SVM啊Rearranging the constraints reveals that the above formulation, without the similarity constraints in Eqn. 6, is quite similar to the SVM problem, but no pairwise difference vectors:minimize (8)where is the trade-off constant between maximizing the margin and satisfying the pairwise relative constraints. We solve the above primal problem using Newtons method 29. While we use a linear ranking function in our experiments, the above formulation can be easily extended to kernels既然线性ranking函数很容易扩展为核ranking函数,为什么作者自己不做?效果会更好吗?.We该learning-to- rank表达学习出的函数对训练图像(集)显式执行一个想要的排序,间隔(margin)是在所有想要排序情况下最近的两个投影(projections,是投影之间的距离吗?有点忘了)之间的距离。相比而言,训练二元分类器时,只有两个不同标签中相距最近的两个样本之间的间隔才被保证,其他距定义间隔样本更远的样本之间的顺序是任意的。 note that this learning-to-rank formulation learns a function that explicitly enforces a desired ordering on the training images; the margin is the distance between the closest two projections within all desired (training) rankings. In contrast, if one were to train a binary classifier, only the margin between the nearest binary-labeled examples is enforced; ordering among examples beyond those defining the margin is arbitrary. See Figure 2. Our experiments confirm this distinction does indeed matter in practice, as our learnt ranking function is more effective at capturing the relative strengths of the attributes than the score of a binary classifier (i.e., the magnitude of the SVM decision function).In addition, training with comparisons (image is similar to in terms of attribute, or exhibits less than) well-suited to the task at hand指什么任务啊?. Attribute属性强度使用相对方式表达更为自然,而不是按标准要求进行绝对判断。 strengths are arguably more natural to express in relative terms, as opposed to requiring absolute judgments in isolation (i.e., representswith degree 10).We apply our learnt relative attributes in two new settings: (1) zero-shot learning with relative relationships and (2) generating image descriptions. We now introduce our approach to incorporate relative attributes for each of these applications in turn.3.2. Zero-Shot Learning from RelationshipsConsider categories of interest. For example, each category may be an object class, or a type of scene. During training, of these categories are seen categories for which training images are provided, while the remaining categories are unseen, for which no training images are provided.The categories are described via relative attributes with respect to each other,已知类彼此依靠相对属性参照描述 be it pairwise relation- ships or partial orders. For example, “bears are furrier than giraffes but less furry than rabbits”, “lions are larger than dogs, as large as tigers, but less large than elephants”, etc. We note that all pairs of categories need not be related in supervision, and different subsets of categories can be related for the different attributes.并不是所有类之间都要相关,类的不同子集可以依靠不同的属性相关起来。The unseen categories, on the other hand, are described relative to one or two seen categories for a subset of the attributes未知类依靠属性子集使用1或2个已知类描述。问题:已知/未知类的描述是先验知识吧?通过手工来输入吧?好像是的。, i.e. unseen class can be described as for attribute, or , or , where and are seen categories. We note the simple and flexible supervision 出现很多次了,忍不住要问?到底表示什么啊,supervision?监督,就是先验知识吧required for categories, especially the unseen ones: for any attribute (not necessarily all), the user can select any seen category depicting a stronger and/or weaker presence of the attribute to which这里的which指attribute,to relate the unseen category to是which的定语从句 to relate the unseen category. While alternative list-based learning to rank techniques are available 23, we choose the pairwise learning technique as described in Section 3.1 to ensure this ease of supervision.Ensure this ease of supervision, 不好理解During testing, a novel image is to be classified into any of the categories. Our zero-shot learning setting is more general than the model proposed by Lampert et al. 2, in that the supervisor 看来,这个supervisor就是指人,而supervision指的就是描述不同类别之间的相对属性的过程。这样的话,要将这个zero-shot方法用到人体估计中去估计有困难啊!not only associate attributes with categories, but also express how the categories relate along any number of the attributes. We expect this richer representation to allow better divisions between both the unseen and seen categories, as we demonstrate in the experiments.We propagate the category relationships provided during training to the corresponding images, i.e., for seen classes and, for attribute This generalizes naturally to allow stronger supervision per image instance, when available. We then learn all M relative attributes as described in Section 3.1. Predicting the real-valued rank of all images in the training dataset allows us to transform, such that each image is now represented as an M-dimensional vector indicating its rank score每个得分就是3.1节的 for all attributes. We now build a generative model for each of the seen categories in在属性空间建立生成模型. We use a Gaussian distribution, and estimate the mean and covariance matrix from the ranking-scores of the training images from class, so we have, for .The parameters of the generative model corresponding to each of the unseen categories are selected under the guidance of the input relative descriptions.这些是先验知识,由人给出。 In particular, given an unseen category, we employ the following:1. If is described as (for attribute)文中似乎漏了,是我加上去的, where and are seen categories, then we set the m-th component of the mean to .2. If is described as , we set to , where is the average distance between the sorted mean ranking-scores s of seen classes for attribute . It is reasonable to expect the unseen class to be as far from the specified seen class as other seen classes tend to be from each other.3. Similarly, if is described as , we set to .4. If is not used to describe, we set to be the mean across all training ima

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论