IROS2019国际学术会议论文集 0332_第1页
IROS2019国际学术会议论文集 0332_第2页
IROS2019国际学术会议论文集 0332_第3页
IROS2019国际学术会议论文集 0332_第4页
IROS2019国际学术会议论文集 0332_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Abstract Human has an ability to adjust utterance depending on the state of interlocutor In this study we explore the verbal behaviors of human through interaction with two smart speakers that have different level of task competence We analyzed 1 linguistic behaviors appeared in user s utterance 2 length of the uttered speech and 3 required pragmatics skills to understand the user s intent As a result there were no significant difference in linguistic behaviors and length of the speech while user interacts with speakers with different task competence In addition various pragmatics elements were equally utilized and especially implied intentions were frequently observed in user s short utterance even under simple interaction scenarios I INTRODUCTION One of the unique characteristics of human is that humans can adapt their behaviors to diverse interaction contexts 1 Such adaptability can also be found in verbal interaction between human and artificial agents such as computer voice activated devices and robot 2 3 Researchers have found that humans display various linguistic behaviors when they interact with artificial agents For example Hill et al found the utilization of shorter sentences limited vocabularies and more profanity to chatbot than when people talk to human 4 Another line of studies in verbal interaction between human and computer agents reported that as the agent fails accomplishing requested task human became to absentmindedly repeat the same requests which can be considered as rude linguistic behavior in human human conversation 5 6 However one study reported different aspects of linguistic behaviors when people verbally interact with a navigation system that has human level conversation ability 7 They found that people used vague language and tried to mitigate their requests which are linguistic characteristics often found in human human conversation 8 These contrary phenomena This work was supported by the Technology Innovation Program 10077553 Development of Social Robot Intelligence for Social Human Robot Interaction of Service Robots funded By the Ministry of Trade Industry email yslim kist re kr lead to a question whether human would display different linguistic behaviors if artificial agent possesses a different level of linguistic capability Given that interaction between human and voice assistant is verbally mediated language capability of an agent would result in different level of task competence Therefore it would be possible that task performance might affect the linguistic behaviors of user We thus assume that linguistic behaviors of user would change if human interacts with voice activated devices such as smart speakers with different level of task performance Adaptive linguistic behaviors of human could be related with pragmatics ability to alter their utterance depending on shared knowledge with conversation partner For instance a caregiver uses a simple repetitive and exaggerated language in Child Directed Speech 9 10 However it is still unclear under what conversation situation human show different pragmatics to an artificial agent Since linguistic behaviors can be adaptively changed depending on the interaction situation utilization of pragmatics in verbal request may be altered as well especially if the interacted agent can handle various requests with implied intentions that cannot be literally resolved We thus assume that human would use utterances that demands high degree of pragmatics ability when human talks to voice assistants with high task or human like performance In this study we conducted a Wizard of Oz experiment where human users verbally interact with two smart speakers that have different task performances User s spontaneous speech was analyzed to explore the change of user s verbal behaviors depending on smart speaker s task performance First we identified several distinct linguistic behaviors and investigated the correlation with smart speaker s task performances We found that user s linguistic behaviors and task performance of smart speaker has no significant correlation Next we measured the number of Eojeols a spacing unit that consists of lexical and grammatical morphemes 11 12 per utterance to test whether human uses longer sentence to high performance smart speaker However there was no significant difference in number of Eojeols per utterance depending on smart speaker s task performance The length of user s utterance was generally short irrespective of smart speaker s task performance Lastly we analyzed pragmatics elements that are required for smart speakers to understand the user s request based on standard pragmatics assessment methods 13 15 We found no statistical significance in use of pragmatics elements with the performance level of smart speakers This indicates that human applies diverse forms of pragmatics regardless of linguistic ability of smart speaker Also we found that abilities Are you hearing or listening The effect of task performance in verbal behavior with smart speaker Chaewon Park Jongsuk Choi Jee Eun Sung and Yoonseob Lim Member IEEE 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE319 to understand user s indirect speech and negotiate verbally are necessary pragmatics elements for both low and high performance smart speakers Taken together these results indicate that human users tend to make a request with rather simple language structures but with various linguistic forms regardless of the communication capability of smart speaker This verbal behavior of human users may inform the dialogue designs for voice assistants as well as robots for simple service task II METHOD A Participants A total of 38 16 males 22 females participants whose age is from 21 to 57 mean 28 3 SD 9 19 were recruited through online advertisement We divided participants into two groups The first group n 19 started with speaker with lower task performance SwL followed by speaker with higher task performance SwH Same conversation scripts were given to the second group but in reverse order of interaction with smart speakers B Creating Smart Speakers with Different Task Performances Participant interacts with two smart speakers that are designed to have different levels of task performance Performance of smart speaker was measured based on the frequency of correct responses to each fixed request SwL shows approximately 50 chances of task performance if user follows the designed dialogue To make smart speaker s reaction we tested three commercial smart speakers released from different companies in Korea Clova Nugu and Giga Genie with our experimental dialogues We chose verbal responses from Clova because it shows the highest understanding of all the designed requests and can also handle multiple turn takings All verbal responses of SwH were manually designed so that it can successfully perform 80 of interactions with human SwH can even suggest alternatives when request from user requires physical activities such as hanging out or running on an errand The vocal responses of both smart speakers are recorded in MP3 format using commercial TTS engine Yujin voice from Selvy TTS speed rate 100 bpm 16 We chose female voice because most of the commercial smart speakers available in Korea have female voices C Pragmatics Elements We referred standardized assessments used in language pathology for identifying a person with pragmatics deficit 1 Children s Communication Checklist CCC 2 the U S version 2 Communication Checklist Adults CC A and 3 Children s Pragmatic Language Checklist CPLC 13 15 CCC 2 is developed for children and adolescents from age 4 to 16 CC A is an assessment for adults derived from CCC 2 CPLC is a pragmatics assessment for children in Korea currently it is under the process of standardization Since standard pragmatics evaluation test is designed for human we excluded several items that are not appropriate to test the verbal interaction with smart speakers For example Keep quiet in situation where someone else is trying to talk or concentrate was not included since smart speaker does not equip with visual sensors like camera A total of 12 pragmatics elements were selected and Table 1 shows the final items used for pragmatics analysis in the current study D Experiment Procedure Overall procedure of the experiment is illustrated in Fig 1 Before the actual interaction experiment with smart speakers all the participants try a practice session that consists of 5 example interactions In the practice session experimenter acts as a smart speaker and participant learns how to make a conversation with smart speaker without wake up command Also participant learns to follow scripted dialogue and instructions displayed on the tablet screen and touch next button to move to the next dialogue TABLE I PRAGMATICS ELEMENTS Index Definition Source 1 Use the definition of words or context to make an explanation when user did not fully understand CPLC 2 Do not get confused when a word is used with a different meaning from usual e g might fail to understand if an unfriendly person was described as cold and would assume they were shivering CCC 2 19 CPLC 3 Refer to an object person or event without being dichotomous CPLC 4 Express gratitude or apology to a user in an appropriate time CPLC 5 React to user s indirect behavior or request CPLC 6 Appreciate the humor expressed by irony Would be amused rather than confused if someone said isn t it a lovely day when it is pouring with rain CCC 2 54 7 Specify a thing among other similar ones by describing words CPLC 8 Refuse without upsetting a user CPLC 9 Negotiate verbally when there is a situation to reconcile or coordinate CPLC 10 Talk plans for the future CPLC 11 Understand the point of jokes and puns and react appropriately CCC 2 15 CPLC 12 Answer to questions with different interrogatives e g who when where what how why CPLC Figure 1 Experiment design Top Participant interacts with two different smart speakers and order of interaction with speaker is randomly chosen At the end of the experiment participant rated necessity of individual pragmatics element Bottom Each dialogue session consists with 2 4 turns between user and smart speaker Dialogue always ends with spontaneous speech uttered by user Example dialogue is shown on the right U user S smart speaker 320 Figure 2 Linguistic behaviors of user when SwL or SwH succeeds or fails a task a Overall further inquiry and re try behaviors are mostly observed The correlation between linguistic behavior and smart speaker s task performance was not significant b Further inquiry is dominant when smart speaker succeeds a task and re try is the most frequent linguistic behavior when smart speaker fails a task Once participant finishes the practice session successfully actual interaction with two different speakers begins We randomly allocated two topics for each participant All conversation topics 1 Date and weather 2 Music 3 Schedule management and 4 Entertainment In actual interaction vocal responses of smart speakers were delivered through either a laptop or bluetooth speaker JBL Pulse 2 A basic structure of a single dialogue session between participant and smart speaker is illustrated in the bottom of Fig 1 Dialogue always starts with user s fixed request such as What is the weather today After user makes a request smart speaker gives predetermined response to the request Such turn taking between user and smart speaker could continue up to four turns At the end of each dialogue session participant is allowed to make any additional request or comment based on the current interaction context spontaneous speech Smart speaker does not provide any answers to user s spontaneous speech and user proceeds to next dialogue session In each interaction with smart speaker total of 18 21 dialogue sessions are included average number of turns per whole interaction 34 5 Participants spontaneous speech was recorded by recording program installed on a tablet PC or a voice recording device Tascam DR 22WL Screen of tablet was mirrored to another screen installed in experimenter s room so that experimenter can monitor the status of experiment in real time When participant finishes the whole interactions with both type of speakers participant evaluates necessity of individual pragmatics element in terms of vocal interaction with smart speakers 5 point Likert scale All user s requests in the script were evaluated by Korean language experts n 4 to examine whether scripts are suitable for evaluation of assessing pragmatics ability of smart speakers using 5 point Likert scale User s requests whose average evaluation scores are greater than 3 were used All the experiment instructions were given through Google survey forms and displayed on the tablet screen Samsung Galaxy Tab 10 1 See an example dialogue instruction on the tablet screen provided in Supplementary Fig 11 All the experiments described in this study was authorized by IRB in KIST IRB number 2018 013 E Linguistic Data Analysis We transcribed all the spontaneous speech uttered by participants total number of utterances is 2124 Linguistic behaviors number of Eojeols per utterance and pragmatics elements were coded by two different coders In average 2 of initial coded data was different and unmatched codings were readjusted after discussion between coders We manually determined 12 different types of linguistic behaviors based on spontaneous speeches by all the participants Linguistic behavior codes and example dialogues of each linguistic behavior are listed in Supplementary Table I To quantify the length of each utterance we calculated the number of Eojeols per utterance To analyze the pragmatics abilities required for verbal interaction with smart speakers we manually coded 12 different pragmatics elements Table I For linguistic 1 All the supplementary information can be found at qQHz6PMfEa1zAXvQ edit usp sharing behaviors and pragmatics elements we allowed multiple codings for the same utterance III RESULT A Linguistic Behavior of User s Verbal Responses We first analyzed linguistic behaviors that human user manifests during conversation with smart speakers All the linguistic behavior codes and selected example dialogues can be found in Supplementary Table I Fig 2 shows linguistic behaviors in each fail and success condition with SwL or SwH Overall the most frequently appeared linguistic behaviors are further inquiry and re try Fig 2a When a smart speaker succeeds a task the most frequently observed linguistic behavior is further inquiry meaning that humans ask more information related with original request On the other hand re try dominantly appears when smart speaker fails to respond correctly In re try behaviors people use several different linguistic strategies such as repetition or reformulation Such linguistic behaviors have already been reported in situation where computer agent fails to accomplish task goals in verbal interaction with human 5 17 Below dialogue shows one example of re try behaviors with reformulated speech Situation task failed re try reformulation SwL User What do you think about songs sung by IU SwL I could not find the song you want Please say another song User Do you think songs sung by IU are good We compared the distribution of linguistic behaviors of different smart speakers under same task condition success or fail but no significant differences in linguistic behaviors for both smart speakers was found p 0 2 This indicates that users linguistic behaviors do not depend on the task 321 Figure 3 Number of Eojeols per utterance of participant s spontaneous speech A D Distribution of The number of Eeojeols is shown when a SwL succeeded the task b SwH succeeded the task c SwL failed the task and d SwH failed the task respectively Figure 4 Pragmatics element analysis a User s utilization of pragmatics elements for actual verbal interaction with smart speakers We classified pragmatics elements for each spontaneous speech of user that could be required for smart speaker to recognize the intent of user Interestingly we found that most of the subjects expresses their intention indirectly Pragmatics element 5 b Survey result of the necessity of each pragmatics element evaluated by users Average evaluation scores are shown and users selected pragmatics element 5 as highly required for verbal interaction with smart speakers Figure 5 Pragmatics element distribution in different task conditions a Pragmatics element 5 and 9 are mostly used regardless of task condition and performance level of smart speaker b Users indirectly express their intention to retry the request or make a context related comments or further inquiry to speaker c Pragmatics element 9 is also highly used when user shows similar types of linguistic behaviors for the same request performance of smart speaker but on the immediate interaction result fail or success B Average Number of Eojeols per Utterance To test whether or not humans change length of utterance depending on the task performance of smart speaker average number of Eojeols per utterance was assessed Fig 3 shows the distribution of average number of Eojeols per utterance for each interaction topics Overall participants use more Eojeols when smart speaker fails a task than succeeds a task In average participant speaks 3 14 1 85 Eojeols when smart speaker fails as opposed to 2 77 1 84 Eojeols when smart speaker successfully handles user s need p 1e 6 Since two smart speakers have different success rate we compared the number of Eojeols participant uttered at the same interaction condition where both speakers either succeed or fail but with different smart speakers Except two interaction conditions we did not find any significant difference in the average number of Eojeols per utterance between two different speakers For example participants speak sentences wit

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论