IROS2019国际学术会议论文集0427_第1页
IROS2019国际学术会议论文集0427_第2页
IROS2019国际学术会议论文集0427_第3页
IROS2019国际学术会议论文集0427_第4页
IROS2019国际学术会议论文集0427_第5页
已阅读5页,还剩3页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Reactive Interaction Through Body Motion and the Phase-State-Machine Raphael Deimel AbstractBetween humans, body motion is an intuitive and ubiquitous means to coordinate interactions, but so far collab- orative robots have not embraced this mode of communication. One reason for this is that conventional behavior generation systems use fi nite state machines which make it exceptionally diffi cult to ingest and produce the inherently continuous and concurrent information fl ow that body motion provides. We propose a new, reactive motion generator based on a dynam- ical system instead. It mimics a conventional state machine except that transitions are not instantaneous but time-extended, reversible, and have phases. Moreover, consecutive states and transitions may overlap during execution. Most notably, more than one transition can be active at the same time if they share predecessor state. Together, these unique features enable instantaneous and gradual responses to novel information and continuous decision processes without changing the state graph itself. We demonstrate the systems capabilities in an object handover task coordinated by body motion. I. INTRODUCTION Body language the use of body pose and especially motion for the purpose of communication is a fast, intuitive and prevalent modality for negotiating courses of action in human-human interaction, especially for collaborative tasks 11, 3, 5. Because of its ubiquity in human-human inter- action, even untrained users could quickly learn to interact with robots if they communicate with body language. To do so, robots need to modulate properties ascribed to motion 3, react visibly, and especially react timely to signals by their interaction partners. As an example, consider the task to hand over an object. It can usually be achieved suffi ciently well in several distinct ways, e.g. left-hand or right-hand. Both parties have to agree upon a mutually consistent course of actions in order to succeed though 11. Traditionally, robots determine this course of actions at specifi c points in time (discrete decision events). Decisions are always prior to executing an action, and a decision is not reassessed until after completing the action. This behavior is afforded by the use of discrete state machines (e.g. hybrid automata, MDPs, grid worlds) for partitioning interactions. Their state graphs provide numerous advantages for learning, reasoning and planning, especially when the number of states is small. But they also impose a coarse discretization of time1, which makes them particularly unsuited for reacting timely and smoothly on continuous streams of perceptual information. Conventional state machines are also unable to perform a The author is with the Control Systems Laboratory, Technische Universitt Berlin, Germany We gratefully acknowledge the fi nancial support by BMBF for the project MTI-engAge (16SV7109). 1 While one could argue that this could be counteracted by fi ner-grained discretization, we would also lose all advantages of a small state graph. Fig. 1: Example of facilitating reactive interaction using non- instantaneous state transitions. At fi rst, the robot does not observe any pose that indicates whether left or right hand is preferred. To signal its intent to pass the object, it starts to move. Indecision is conveyed by progressing slowly and mixing both possible reach motions. The human notices the robots intent and reaches out, signaling a preferred side. The robot disambiguates its reach motion quickly to acknowledge the humans proposal. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE6383 speculative execution of actions, i.e. they dont have the ability to start a reversible action (e.g. reach out for handover) for the purpose of proposing that course of action to the in- teraction partner without committing to its completion. With speculative execution, the robot buys itself time to observe corroborating or contradicting body motion to reaffi rm or reconsider its preliminary choice. If guessed correctly the fi rst time which is a probable scenario due to cultural norms or frequent repetition then no extra time is spent on communication, and the interaction is fl uent and swift. Timely and proportional feedback via motions also aids the understandability of robots 4. Further, body language is amenable to imitation, i.e. humans can discover ways to interact with a robot by observing others interact with it. Body motion aware robots may therefore be more intuitive to use for non-expert users. Modulation of a body motion can also be used to commu- nicate roles in interactions. The robot can claim a leading role in an interaction by moving decisively and unambiguously. Conversely, hesitant or ambiguous motion signals a desire to follow. While it is easier to build a robot that never considers any of the humans preferences, humans will likely cease to collaborate with such a robot eventually. In 1, Breazeal notes that “social interaction is a dance”, a phrase which underlines the importance of mutual adaptation during interactions. To advance robots towards this vision we present a novel system architecture for robot behavior synthesis, one which globally behaves like a discrete state machine but actually is a continuous dynamical system where transitions are reversible, have a phase, and are nonexclusive, i.e. transition alternatives can be co-activated for blending motion. The system also provides consistent activation weights and phase values for each transition and state, so that rich body motion can be synthesized from a small set of motion primitives and poses. Potential alternatives for generating behavior are MDP based models and hybrid automata. POMDPs are able to encode some form gradual behavior e.g. by using the ex- pectation of states for blending goals 7, 3. But they do not provide reversibility which is required to implement speculative execution, and do not provide a notion of time for continuous synchronization. With hybrid automata 12, de- cisions too are instantaneous and irreversible. Because of this limitation, controllers in actual robot systems often connect with perception directly. But this fragments interaction state and timing across several system levels besides corrupting the hybrid automaton formalism. In the paper, we will fi rst describe a system that can reconcile this fragmentation, analyze its unique features w.r.t. discrete state machines, and fi nally demonstrate how it enables a robot to synthesize timely and gradual feedback from a small interaction state graph. II. METHOD The system builds upon the work on stable heteroclinic channel (SHC) networks 6, 9. SHC networks are dynami- x0 0.0 0.2 0.4 0.6 0.8 1.0 x1 0.0 0.2 0.4 0.6 0.8 1.0 x2 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 2: Illustration of the vector fi eld of a canonical system with three dimensions, three saddle points and three hetero- clinic channels. Saddle points are each located on their own coordinate axis, and connected into a cycle 1 2 3 1. Every channel is located on its own plane spanned by two coordinate axes. cal systems that have a number of saddle points which can be arbitrarily connected with self-stabilizing limit cycles (stable heteroclinic channels). Fig. 2 illustrates the attractor of the simplest possible SHC network composed of three saddle points, which are located each at their exclusive coordinate axis. If the saddle points are interpreted as states, then SHC networks can be understood to act like a state machine and be used as such 6. In this paper, we additionally interpret the heteroclinic channels to represent reversible, nonexclusive, time-extended transitions between states, propose a method to algebraically partition the state space into individual states and transitions, and a method to compute phase variables for each individual transition. Further, we propose a modifi ed system equation that provides a so called greediness vector that modifi es transitioning behavior. A. Formal Defi nition Let x be an n-dimensional vector that evolves according to this differential equation: x = x ( + (o+ (T + G) x) (t) + (t)(1) where denotes element-wise multiplication. Compared to the equation used in 6, we added the exponent , made explicit the state transition matrix T (Tji= 1 if transition i j exists, 0 otherwise), added a “greediness” matrix G, and added a scalar (t) which adjusts the speed at which x evolves. The matrix 0is chosen such that n mutually inhibiting saddle points occur, each one placed on its exclusive coordinate axis, while adding T creates stable heteroclinic channels. The signal (t) is used to push the system away from saddle points. a) Matrices oand : The n n matrices 0and are constructed from three n-dimensional parameter vectors 6, (growth rates), (saddle point positions), and (saddle point shapes): 6384 x0 0 1 x1 0 1 x2 0 1 (a) Illustration of activation jiin the plane of the related transition i j. Grey line indicates the loca- tion of heteroclinic channel for = 2. x0 0 1 x1 0 1 x2 0 1 (b) Illustration of phase jiin the plane of the related transition i j. Grey line indicates the location of heteroclinic channel for = 2. Fig. 3: Illustration of the activation and phase function in the plane of a channel o= ? 1? ?I 1 1? (2) = ? ?1 + 1? 1(3) where denotes the outer vector product. The matrices are chosen such that the matrix constructed by Eq. 5 in 6 can be computed as = o T . The advantage of the proposed formulation is that 0and do not change if transition matrix T or greediness matrix G is modifi ed. For convenience, we can fi x i= 0(uniform growth rates), i= 1.0 (saddle point positions) and i= 1.0 (symmetric channels) to obtain a canonical system. To illustrate, the matrices for the system in Fig. 2 are: 0=0 122 212 221 , =0 222 222 222 , T= 001 100 010 Channel location: The factor determines the distance of the attractor to the origin of vector space. With = 1 channels approximately maintain constant L1distance (as used in 6), whereas with = 2 they approximately main- tain constant L2distance (assuming a canonical system). In the latter case, the attractor lies on a hypersphere. For the canonical system, we chose = 2, as it simplifi es the partition of vector space and interpretation of . B. Activations and Phases The SHC network provides the notion of discrete states (saddle points) and transitions (stable heteroclinic channels). In order to algebraically partition the vector space of x into fuzzy regions for each state and each possible transition, we can leverage two mathematical properties of the system. First, the coordinates of each saddle point form an orthonor- mal basis. From this follows that the channels are located on the plane spanned by the basis vectors of predecessor and successor state, as can also be seen in Fig. 2. Second, the coordinate vector of each state is sparse, all coordinates are zero but one. From this follows that functions specifi c to every state or transition can be computed from the outer product of x with itself. With these insights we can devise a Fig. 4: Activations and phases resulting from the 3-state system shown in Fig. 2. For clarity, phase values are not plotted when the related activation is less than 0.01. function that computes activations for all possible transitions: transitions= 16 x x |x2| (x 1 + 1 x)4+ |x|4 T(4) The function is chosen such that values are limited to the range of 0.0.1.0, and are invariant to scaling x. Fig.3a illustrates transitions ji for a single active transition i j over coordinates xiand xj. transitionsis sparse in the sense that only few transitions are active at any time due to the chosen attractor shape. If more than one transition is active, then P transitions 1.0 (for systems with = 2). Because of this, matrix transitionscan also be understood as a weight matrix. State activation is computed from the residual of the transition activations, so that activations sum up to 1.0. Additionally, elements of x are squared to ensure sparseness of the state activation values and hence mutual exclusiveness: states= x2 P x2 ? 1 X transitions ? (5) As the diagonal of transitionsis semantically not meaningful, we can combine all transition and state activations into a single activation matrix : ji= ( transitions ji j 6= i states i j = i (6) Fig. 4 shows an example of the resulting set of activation values for the minimal three-state system illustrated in Fig. 2 (using 0= 10, = 5 105) . Transition Phases: Different to (markovian) states, tran- sitions have a notion of time and progress, i.e. they possess a phase. As all channels are located on two-dimensional planes spanned by the coordinate axes of their predecessor and successor state, we can compute a phase jifor each possible transition i j from those coordinates: ji= |xj| |xi| + |xj| (7) The function is illustrated in 3b, and yields values in the range 0.1. Note that the value of jiis only meaningful when transition i j is active, i.e. when |xi| + |xj| ? 0. Fig. 4 illustrates the phases over time for the minimal three- state system illustrated in Fig. 2. 6385 Fig. 5: Example of modifying transition velocities by 3 orders of magnitude while maintaining saddle point stability (A21= 5, A32= 0,A13= 5) 1) Composition of Motion:So far, we established a dynamical system that provides us with a consistent set of activation values for transitions and states, and with phases for transitions. Eqs. 4 and 5 are chosen such that P = 1, therefore can be directly used for weighted averaging of control goals associated with each state and each transition. In terms of control, states and transitions have to be treated differently though. States are without phase, so we can only associate constant control goals with them. Transitions, on the other hand, have a phase, so we can associate phase-parameterized movement primitives with them, e.g. DMPs 10, model-free ProMPs 8, or simply spline interpolation. For the experiments, we used the ProMP formalism to teach, reproduce and mix movements. PD-control goals for states are formulated in a compatible manner via normal distributions over position, velocity and torque. C. Inputs to infl uence system behavior Eq. 1 provides terms that can be used as inputs to modulate the behavior. The transition matrix T defi nes which transitions exist, and may be modifi ed during execution of the system. Matrix G gradually modifi es the direction of transitions and the behavior between competing transitions. Vector determines the amount of time that is spent in a state and which transitions become activated to leave a state. The factor speeds up or slows down the system, which can be used for e.g. synchronization by entrainment. a) Causing transitions: When the system is exactly on a saddle point, e.g. x = (1,0,0), then the system can potentially stay in this state forever. In order to cause a transition, a small positive velocity bias jcan be added, which pushes the system towards successor state j, or a negative jcan be applied to avoid it. Sometimes though, this level of granularity is not enough, and we want to set the velocity bias for each transition specifi cally. We therefore defi ne an input biases matrix B where each element Bji corresponds to the bias towards state j in state i. A resolved vector can then be computed with the help of : = ( B) x + ? W(t)(8) x0 0 1 x1 0 1 x2 0 1 (a) g = .,1,1) x0 0 1 x1 0 1 x2 0 1 (b) g = ,3,3 x0 0 1 x1 0 1 x2 0 1 (c) g = ,10,10 Fig. 6: Raising greedinesses on two competing transitions 0 1 and 0 2 forces the system to decide earlier. The values Bjiare the analogue to control switch conditions in hybrid automata, i.e. B can be used to synchronize on events and to select one out of several successor states. But it also can be used to implement timeout conditions: small values accumulate over time until the saddle point is left, as it was done to create Figs. 2 and 4. Values can be estimated analytically if a specifi c duration is desired 6. Another way to cause transitions is to add stochastic velocity noise W(t) via the parameter ?, which causes the system to take a random transition after a random amount of time. Transition Velocity: A key advantage of the proposed system over hybrid automata is the ability to continuously adjust the speed of a transition and hence movement. In prior work, velocity was adjusted by modifying the growth rate vector 6. Unfortunately though, stability considerations limit the range of values that can be assigned to each j. By using the activation matrix though, we can modify the growth rate (and thus speed of evolution) for each region in vector space independently via the scaling factor : = 2 P A (9) Matrix A contains factors for speeding up or slowing down each transition and state relative to the “default” speed defi ned by o. This approach works well across several orders of magnitude as it does not warp the saddle points. Unmodifi ed system behavior is obtained by setting A = 0. Stable variation of transition duration over 3 orders of magnitude is demonstrated in Fig. 5. b) Co-activation and Reversion of Transitions:A unique feature of the proposed system is the ability to begin transitions from a predecessor state to several successor states at once, by setting positive biases Bjifor the involved transitions. The attractors shape forces one successor state to win eventually, so that only one transition completes. The dynamic beha

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论