IROS2019国际学术会议论文集Are you hearing or listeningThe effect of task perance in verbal behavior with smart speaker_第1页
IROS2019国际学术会议论文集Are you hearing or listeningThe effect of task perance in verbal behavior with smart speaker_第2页
IROS2019国际学术会议论文集Are you hearing or listeningThe effect of task perance in verbal behavior with smart speaker_第3页
IROS2019国际学术会议论文集Are you hearing or listeningThe effect of task perance in verbal behavior with smart speaker_第4页
IROS2019国际学术会议论文集Are you hearing or listeningThe effect of task perance in verbal behavior with smart speaker_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、 AbstractHuman has an ability to adjust utterance depending on the state of interlocutor. In this study, we explore the verbal behaviors of human through interaction with two smart speakers that have different level of task competence. We analyzed (1) linguistic behaviors appeared in users utterance

2、, (2) length of the uttered speech, and (3) required pragmatics skills to understand the users intent. As a result, there were no significant difference in linguistic behaviors and length of the speech while user interacts with speakers with different task competence. In addition, various pragmatics

3、 elements were equally utilized and especially, implied intentions were frequently observed in users short utterance even under simple interaction scenarios. I. INTRODUCTION One of the unique characteristics of human is that humans can adapt their behaviors to diverse interaction contexts 1. Such ad

4、aptability can also be found in verbal interaction between human and artificial agents such as computer, voice activated devices, and robot 2-3. Researchers have found that humans display various linguistic behaviors when they interact with artificial agents. For example, Hill et al. found the utili

5、zation of shorter sentences, limited vocabularies, and more profanity to chatbot than when people talk to human 4. Another line of studies in verbal interaction between human and computer agents reported that as the agent fails accomplishing requested task, human became to absentmindedly repeat the

6、same requests, which can be considered as rude linguistic behavior in human-human conversation 5-6. However, one study reported different aspects of linguistic behaviors when people verbally interact with a navigation system that has human-level conversation ability 7. They found that people used va

7、gue language and tried to mitigate their requests, which are linguistic characteristics often found in human-human conversation 8. These contrary phenomena * This work was supported by the Technology Innovation Program (10077553, Development of Social Robot Intelligence for Social Human-Robot Intera

8、ction of Service Robots) funded By the Ministry of Trade, Industry email: yslimkist.re.kr). lead to a question whether human would display different linguistic behaviors if artificial agent possesses a different level of linguistic capability. Given that interaction between human and voice assistant

9、 is verbally mediated, language capability of an agent would result in different level of task competence. Therefore, it would be possible that task performance might affect the linguistic behaviors of user. We thus assume that linguistic behaviors of user would change if human interacts with voice

10、activated devices such as smart speakers with different level of task performance. Adaptive linguistic behaviors of human could be related with pragmatics ability to alter their utterance depending on shared knowledge with conversation partner. For instance, a caregiver uses a simple, repetitive, an

11、d exaggerated language in Child-Directed Speech 9-10. However, it is still unclear under what conversation situation human show different pragmatics to an artificial agent. Since linguistic behaviors can be adaptively changed depending on the interaction situation, utilization of pragmatics in verba

12、l request may be altered as well, especially if the interacted agent can handle various requests with implied intentions that cannot be literally resolved. We thus assume that human would use utterances that demands high degree of pragmatics ability when human talks to voice assistants with high tas

13、k or human-like performance. In this study, we conducted a Wizard-of-Oz experiment where human users verbally interact with two smart speakers that have different task performances. Users spontaneous speech was analyzed to explore the change of users verbal behaviors depending on smart speakers task

14、 performance. First, we identified several distinct linguistic behaviors and investigated the correlation with smart speakers task performances. We found that users linguistic behaviors and task performance of smart speaker has no significant correlation. Next, we measured the number of Eojeols (a s

15、pacing unit that consists of lexical and grammatical morphemes 11-12) per utterance to test whether human uses longer sentence to high performance smart speaker. However, there was no significant difference in number of Eojeols per utterance depending on smart speakers task performance. The length o

16、f users utterance was generally short irrespective of smart speakers task performance. Lastly, we analyzed pragmatics elements that are required for smart speakers to understand the users request based on standard pragmatics assessment methods 13-15. We found no statistical significance in use of pr

17、agmatics elements with the performance level of smart speakers. This indicates that human applies diverse forms of pragmatics regardless of linguistic ability of smart speaker. Also, we found that abilities Are you hearing or listening? The effect of task performance in verbal behavior with smart sp

18、eaker* Chaewon Park, Jongsuk Choi, Jee Eun Sung, and Yoonseob Lim, Member, IEEE 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE319 to understand users indirect speech and negotiate verbally are nec

19、essary pragmatics elements for both low and high performance smart speakers. Taken together, these results indicate that human users tend to make a request with rather simple language structures but with various linguistic forms regardless of the communication capability of smart speaker. This verba

20、l behavior of human users may inform the dialogue designs for voice assistants as well as robots for simple service task. II. METHOD A. Participants A total of 38 (16 males, 22 females) participants whose age is from 21 to 57 (mean=28.3, SD=9.19) were recruited through online advertisement. We divid

21、ed participants into two groups. The first group (n=19) started with speaker with lower task performance (SwL) followed by speaker with higher task performance (SwH). Same conversation scripts were given to the second group but in reverse order of interaction with smart speakers. B. Creating Smart S

22、peakers with Different Task Performances Participant interacts with two smart speakers that are designed to have different levels of task performance. Performance of smart speaker was measured based on the frequency of correct responses to each fixed request. SwL shows approximately 50% chances of t

23、ask performance if user follows the designed dialogue. To make smart speakers reaction, we tested three commercial smart speakers released from different companies in Korea (Clova, Nugu, and Giga Genie) with our experimental dialogues. We chose verbal responses from Clova because it shows the highes

24、t understanding of all the designed requests and can also handle multiple turn-takings. All verbal responses of SwH were manually designed so that it can successfully perform 80% of interactions with human. SwH can even suggest alternatives when request from user requires physical activities such as

25、 hanging out or running on an errand. The vocal responses of both smart speakers are recorded in MP3 format using commercial TTS engine (Yujin voice from Selvy TTS, speed rate: 100 bpm) 16. We chose female voice because most of the commercial smart speakers available in Korea have female voices. C.

26、Pragmatics Elements We referred standardized assessments used in language pathology for identifying a person with pragmatics deficit: 1) Childrens Communication Checklist (CCC-2, the U.S. version, 2) Communication Checklist - Adults (CC-A), and 3) Childrens Pragmatic Language Checklist (CPLC) 13-15.

27、 CCC-2 is developed for children and adolescents from age 4 to 16. CC-A is an assessment for adults derived from CCC-2. verbal interaction with smart speakers. For example, “Keep quiet in situation where someone else is trying to talk or concentrate” was not included since smart speaker does not equ

28、ip with visual sensors like camera. A total of 12 pragmatics elements were selected and Table 1 shows the final items used for pragmatics analysis in the current study. D. Experiment Procedure Overall procedure of the experiment is illustrated in Fig. 1. Before the actual interaction experiment with

29、 smart speakers, all the participants try a practice session that consists of 5 example interactions. In the practice session, experimenter acts as a smart speaker and participant learns how to make a conversation with smart speaker without wake up command. Also, participant learns to follow scripte

30、d dialogue and instructions displayed on the tablet screen, and touch next button to move to the next dialogue. TABLE I. PRAGMATICS ELEMENTS Index Definition Source 1 Use the definition of words or context to make an explanation when user did not fully understand. CPLC 2 Do not get confused when a w

31、ord is used with a different meaning from usual: e.g. might fail to understand if an unfriendly person was described as “cold” (and would assume they were shivering!) CCC-2 #19, CPLC 3 Refer to an object, person or event without being dichotomous. CPLC 4 Express gratitude or apology to a user in an

32、appropriate time. CPLC 5 React to users indirect behavior or request. CPLC 6 Appreciate the humor expressed by irony. Would be amused rather than confused if someone said “isnt it a lovely day!” when it is pouring with rain. CCC-2 #54 7 Specify a thing among other similar ones by describing words. C

33、PLC 8 Refuse without upsetting a user. CPLC 9 Negotiate verbally when there is a situation to reconcile or coordinate CPLC 10 Talk plans for the future. CPLC 11 Understand the point of jokes and puns and react appropriately CCC-2 #15, CPLC 12 Answer to questions with different interrogatives: e.g. w

34、ho, when, where, what, how, why CPLC Figure 1. Experiment design. (Top) Participant interacts with two different smart speakers and order of interaction with speaker is randomly chosen. At the end of the experiment, participant rated necessity of individual pragmatics element. (Bottom) Each dialogue

35、 session consists with 2-4 turns between user and smart speaker. Dialogue always ends with spontaneous speech uttered by user. Example dialogue is shown on the right (U: user, S: smart speaker). 320 Figure 2. Linguistic behaviors of user when SwL or SwH succeeds or fails a task. (a) Overall, further

36、 inquiry and re-try behaviors are mostly observed. The correlation between linguistic behavior and smart speakers task performance was not significant. (b) Further inquiry is dominant when smart speaker succeeds a task and re-try is the most frequent linguistic behavior when smart speaker fails a ta

37、sk. Once participant finishes the practice session successfully, actual interaction with two different speakers begins. We randomly allocated two topics for each participant (All conversation topics: 1) Date and weather, 2) Music, 3) Schedule management, and 4) Entertainment). In actual interaction,

38、 vocal responses of smart speakers were delivered through either a laptop or bluetooth speaker (JBL Pulse 2). A basic structure of a single dialogue session between participant and smart speaker is illustrated in the bottom of Fig. 1. Dialogue always starts with users fixed request such as “What is

39、the weather today?”. After user makes a request, smart speaker gives predetermined response to the request. Such turn-taking between user and smart speaker could continue up-to four turns. At the end of each dialogue session, participant is allowed to make any additional request or comment based on

40、the current interaction context (spontaneous speech). Smart speaker does not provide any answers to users spontaneous speech and user proceeds to next dialogue session. In each interaction with smart speaker, total of 1821 dialogue sessions are included (average number of turns per whole interaction

41、: 34.5). Participants spontaneous speech was recorded by recording program installed on a tablet PC or a voice recording device (Tascam DR-22WL). Screen of tablet was mirrored to another screen installed in experimenters room so that experimenter can monitor the status of experiment in real time. Wh

42、en participant finishes the whole interactions with both type of speakers, participant evaluates necessity of individual pragmatics element in terms of vocal interaction with smart speakers (5-point Likert scale). All users requests in the script were evaluated by Korean language experts (n=4) to ex

43、amine whether scripts are suitable for evaluation of assessing pragmatics ability of smart speakers using 5-point Likert scale. Users requests whose average evaluation scores are greater than 3 were used. All the experiment instructions were given through Google survey forms and displayed on the tab

44、let screen (Samsung Galaxy Tab 10.1, See an example dialogue instruction on the tablet screen provided in Supplementary Fig. 11). All the experiments described in this study was authorized by IRB in KIST (IRB number: 2018-013). E. Linguistic Data Analysis We transcribed all the spontaneous speech ut

45、tered by participants (total number of utterances is 2124). Linguistic behaviors, number of Eojeols per utterance, and pragmatics elements were coded by two different coders. In average, 2% of initial coded data was different and unmatched codings were readjusted after discussion between coders. We

46、manually determined 12 different types of linguistic behaviors based on spontaneous speeches by all the participants (Linguistic behavior codes and example dialogues of each linguistic behavior are listed in Supplementary Table I). To quantify the length of each utterance, we calculated the number o

47、f Eojeols per utterance. To analyze the pragmatics abilities required for verbal interaction with smart speakers, we manually coded 12 different pragmatics elements (Table. I). For linguistic 1 All the supplementary information can be found at qQHz6PMfEa1zAXvQ/edit?usp=sharing behaviors and pragmati

48、cs elements, we allowed multiple codings for the same utterance. III. RESULT A. Linguistic Behavior of Users Verbal Responses We first analyzed linguistic behaviors that human user manifests during conversation with smart speakers (All the linguistic behavior codes and selected example dialogues can

49、 be found in Supplementary Table I). Fig. 2 shows linguistic behaviors in each fail and success condition with SwL or SwH. Overall, the most frequently appeared linguistic behaviors are further inquiry and re-try (Fig. 2a). When a smart speaker succeeds a task, the most frequently observed linguisti

50、c behavior is further inquiry, meaning that humans ask more information related with original request. On the other hand, re-try dominantly appears when smart speaker fails to respond correctly. In re-try behaviors, people use several different linguistic strategies such as repetition or reformulati

51、on. Such linguistic behaviors have already been reported in situation where computer agent fails to accomplish task goals in verbal interaction with human 5,17. Below dialogue shows one example of re-try behaviors with reformulated speech. Situation: task failed + re-try (reformulation)+ SwL User: ?

52、 (What do you think about songs sung by IU?) SwL: . (I could not find the song you want) . (Please say another song.) User: ? (Do you think songs sung by IU are good?) We compared the distribution of linguistic behaviors of different smart speakers under same task condition (success or fail) but no

53、significant differences in linguistic behaviors for both smart speakers was found (p0.2). This indicates that users linguistic behaviors do not depend on the task 321 Figure 3. Number of Eojeols per utterance of participants spontaneous speech. A-D. Distribution of The number of Eeojeols is shown wh

54、en (a) SwL succeeded the task, (b) SwH succeeded the task, (c) SwL failed the task, and (d) SwH failed the task respectively. Figure 4. Pragmatics element analysis. (a) Users utilization of pragmatics elements for actual verbal interaction with smart speakers. We classified pragmatics elements for e

55、ach spontaneous speech of user that could be required for smart speaker to recognize the intent of user. Interestingly, we found that most of the subjects expresses their intention indirectly (Pragmatics element 5). (b) Survey result of the necessity of each pragmatics element evaluated by users. Av

56、erage evaluation scores are shown and users selected pragmatics element 5 as highly required for verbal interaction with smart speakers. Figure 5. Pragmatics element distribution in different task conditions. (a) Pragmatics element 5 and 9 are mostly used regardless of task condition and performance

57、 level of smart speaker. (b) Users indirectly express their intention to retry the request or make a context-related comments or further inquiry to speaker. (c) Pragmatics element 9 is also highly used when user shows similar types of linguistic behaviors for the same request. performance of smart s

58、peaker but on the immediate interaction result (fail or success). B. Average Number of Eojeols per Utterance To test whether or not humans change length of utterance depending on the task performance of smart speaker, average number of Eojeols per utterance was assessed. Fig. 3 shows the distributio

59、n of average number of Eojeols per utterance for each interaction topics. Overall participants use more Eojeols when smart speaker fails a task than succeeds a task. In average, participant speaks 3.141.85 Eojeols when smart speaker fails as opposed to 2.771.84 Eojeols when smart speaker successfully handles users need (p1e-6). Since two smart speakers have different success rate, we compared the number of Eojeols participant uttered at the

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论