数据可视化分析外文文献_第1页
数据可视化分析外文文献_第2页
数据可视化分析外文文献_第3页
数据可视化分析外文文献_第4页
数据可视化分析外文文献_第5页
已阅读5页,还剩11页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、ThemeRiver: Visualizi ng Theme Cha nges over TimeSusa n Havre, Beth Hetzler, and Lucy NowellBattelle Pacific Northwest Divisio nRichla nd, Washi ngton 99352 USA1+509+375-6948susa n.havre | beth.hetzler | lucy .no wellp AbstractProceed ings of theIEEE Symposium on In formatio n Visualizati on 2

2、000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEThemeRiver? is a prototype system that visualizes thematic variati ons over time with in a large collecti on of documents. The “ river ” flows from left to right through time, cha nging width to depict cha nges in thematic stre ngth of tempora

3、lly associated docume nts. Colored “ currents ” flowing within the river narrow or wide n to in dicate decreases or in creases in the stre ngth of an in dividual topic or a group of topics in the associated docume nts. The river is show n with in the con text of a timeli ne and a corresp onding text

4、ual prese ntati on of exter nal eve nts.Keywords : visualization metaphors, trend analysis, timeli ne1. IntroductionIn exploratory information visualization, one goal is to present information so that users can easily discern patter ns. Patter ns reveal tre nds, relati on ships, anomalies, and struc

5、ture in the data, and may help usersProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEBay of PigsCastr

6、o confiscatesAmerican refineriesCuba and Sovist U.S- imposes relations resume embargo on Cuba(63)weapons(62)Proceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEProceed ings of theIEEE Symposium on In formatio n Visualizati on 20

7、00 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEE合®),3ovi&t (4 9)ch(13)Figure 1: ThemeRiver? uses a river metaphor to represent theme changes over time.i - - :-Proceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEE

8、confirm knowledge or hypotheses. Perhaps more importantly, they also raise unexpected questions leading users to new insights. The challenge is to create visualizations that enable users to find patterns quickly and easily. ThemeRiver, shown in Figure 1, is a prototype system designed to reveal temp

9、oral patterns in text collections.Information visualization systems such as Envision 13, BEAD 1, LyberWorld 3, 4 and SPIRE 18 represent each document or group of documents with a glyph or icon, portraying various document attributes. Various methods have been explored for showing change over time in

10、 document-centric visualizations. See Section 3 below.However, a user may be less interested in documents themselves than in theme changes within the whole collection over time. For example, how did Shakespeare themes change during various periods of his life or in relation to contemporary events? S

11、uch information is difficult, if not impossible, to glean from most visualizations. A visualization that focuses on themes, rather than documents, could be more useful for such exploration.ThemeRiver provides users with a macro-view of thematic changes in a corpus of documents over a serial dimensio

12、n. It is designed to facilitate the identification of trends, patterns, and unexpected occurrence or nonoccurrence of themes or topics. In our prototype, we use time as the serial dimension. We provide contextual information through a timeline and markers for cooccurring events of interest. Figure 1

13、 shows a sample ThemeRiver visualization. This paper describes the design of ThemeRiver, walks through a sample information exploration session, and discusses results of formative usability testing.2. DesignOur major design goal was to provide a visualization of theme change over time. Consider usin

14、g a histogram to visualize these changes. In a histogram (such as the one shown in Figure 2), each bar represents a time slice, and color variations and size within the bar represent the relative strength of themes specific to that slice. However, understanding the histogram requires users to work a

15、t integrating the themes across time because the bars are anchored to a baseline and the position of a particular theme within the bars may vary considerably.Like a histogram, ThemeRiver uses variations in width to represent variations in strength or degree ofrepresentation. However, it connects the

16、 strength values in adjacent time slices with smooth and continuous curves. The horizontal flow of the river represents the flow of time. Colored currents that run horizontally within the river represent themes. Each vertical section of the river corresponds to an ordered time slice.The width of eac

17、h current changes to reflect the thematic strength for each time slice. For example, in Figure 1 the theme “ soviet ” increases in relative strength in June 1960 as indicated by the widening of the upper bright orange current. “ Soviet ” loses relative strength in July and August; thus the same curr

18、ent narrows in the next two time slices.“ Soviet ” then increasessignificantly in relative strength in September; the current widens proportionately.Currents maintain their integrity as a single entity over tim'e.sIf a theme ceases to occur in the documents for a period of time and then recurs,

19、the current likewise disappears and then reappears. Consistent color and relative position to other themes make theme currents easy to recognize. In Figure 1, the lower purple band depicts the changes in relative strength of the theme “ cane. ” The “ cane ” current occurs grows and shrinks over time

20、; “ cane ” occurs most strongly in March 1961.We believe that ThemeRiver' s continuous curveshave much to do with its usability. The Gestalt School of Psychology 8, founded in 1919 in Germany, theorized that with perception,“ the whole is greaterthan the sum of the parts.” Simply put, during the

21、perception process humans do not organize individual, low-level, sensed elements, but sense more complete “ packages ” that represent objects or patterns. In his recent book 6, Hoffman presents a compelling discussion of how our perceptual processes identify curves and silhouettes, recognize parts,

22、and group them together into objects. Numerous aspects of the image influence our ability to perceive these parts and objects, including similarity, continuity, symmetry, proximity, and closure. For example, it is easier to perceive objects that are bounded by continuous curves than those that conta

23、in abrupt changes 17.The vertical proximity of the river currents makes it easy for users to judge the relative width of currents and thus the relative strength of the themes. Similarly, symmetry around the horizontal axis of the river, a current, or group of currents makes it easier for users to pe

24、rceive flow patterns and changes. Widths of currents combine to show cumulative widening and narrowing, representing changing strength for the selected set of themes as a whole.Proceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEE

25、Ethe ability to show parent-child relationships with lines between related time bars 10. The DIVA system 12 uses animation to show how particular measured values change in relation to the temporal flow of a video. To help groups collaborating to create a document or other artifact, the Timewarp syst

26、em developed at Xerox PARC 2 lets users view and edit multiple timelines of the changing state of that artifact. The metaphor used is similar to a state diagram, with lines connecting state nodes and branches. Additional work on timelines includes Karam ' s 7 and Kullberg' s 9.We know of no

27、other systems that use the river metaphor to depict the passage of time. However, Tufte 16 presents a similar idea in an artist' s illustration showingtrends in music. In that illustration, width represents sales and proximity indicates influence of preceding styles. Our work differs in several

28、aspects, such as the use of color, the inclusion of contextual events, and the ability to generate the visualization automatically from a potentially very large collection of documents.Values for theme strength can be calculated various ways. For example, they might represent the number of documents

29、 containing the word. Because the river loses its continuity and structure if there are too few or too many themes, we created several theme subsets for exploration.We have implemented a proof-of-principle prototype and used it to explore data from multiple sources. Figure 1 portrays data from a col

30、lection of speeches, interviews, articles, and other text associated with Fidel Castro. The visualization includes the river, a timeline below the river, and markers for related historical events along the top. With ThemeRiver, users may? display topic and event labels? display time and event grid l

31、ines? display the raw data points? choose among drawing algorithms for the currents and river.Users may also display the associated time or theme name by simply moving the mouse across the image. In addition, users may pan and zoom to see other time periods or parts of the river and to see more deta

32、il or broader context. In this sample data set, we found several interesting correspondences between themes and events, such as the expansion of the“ oilbefore Castro confiscated American oil refineries (see Figure 1).3. Related Work4. Usability EvaluationEarly in ThemeRiver' s development, we c

33、arried outthaemsiemjpulset formative usability evaluation with two users.Questions we wanted to answer with this evaluation includedDo users understand the metaphor?Can they identify themes that are more often discussed?Proceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis

34、'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEMany systems include features for viewing time. One common method is to show discrete time slices. For example, in the Spatial Paradigm for Information Retrieval and Exploration (SPIRE) Galaxy visualization 18, users may choose to progressively step throug

35、h time, showing only the icons for documents originating within each specified time period. Another common approach is to show time as an attribute of documents, as done in the Virginia Tech' s Envision system, which lets usersmap various metadata values, including date, to x-axis, y-axis, or co

36、lor, shape, or size graphical encodings 13.More similar to ThemeRiver in intent are systems that focus directly on time. The LifeLines system, developed jointly by the University of Maryland and IBM, has been used to visualize medical records and juvenile criminal records 14, 15. The visualization d

37、isplays time along the x-axis and uses the y-axis to categorize events. Bars depict duration for a given event, and graphical attributes such as color show event attributes. TmViewer uses a similar approach, adding? Does the visualization help them raise new questions about the data? Do they interpr

38、et details of the visualization in ways we had not expected? How does their interpretation of the visualization differ from that of a histogram showing the same data?The data were the Castro collection described above, focusing on the years 1960-1963. We represented the same data both in ThemeRiver

39、and in a histogram that we created using a spreadsheet. (See Figure 2.) We made the content of the histogram as similar as possible to ThemeRiver ' s. For example, the histogram depicted thematic content by months, using the same values that drive ThemeRiver. The month timeline was shown along t

40、he bottom and we added an event line to the histogram like the one in ThemeRiver.Usability evaluation began with a brief explanation of the purpose of the session, followed by an introduction to the data. Both participants viewed the data in both visualizations; one participant started first with th

41、eProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEMvlnidmpvi-ah-Proceed ings of theIEEE Symposium on

42、In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEfMf rM 叶阿iWMl* Ost teycn+5 alfi-vteT-gR yx=如呻boo 2TB WliW cwlor dr drtcn町席etedbon*fdrgorf

43、eodmhqived Fvn rtpBThrfkdb IrdhdtH M网些 nuMddi m «4却忡 n-.i-in OK :fl platen HWtWlwwirqporv ki|4 f.? '石".十 JTMFigure 2: Like ThemeRiverin Figure 1, this histogram uses the Castro collection data andProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-

44、7695- 0 804-9 /00 $10.00 2000 IEEEdepicts changes in thematic content over time.Proceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695-

45、0 804-9 /00 $10.00 2000 IEEEmost discussedhemeRiver easy to un dersta nd. They also foundthat the barshistogram and one with ThemeRiver. We asked each participa nt questi ons about what they observed in each display.Examples of specific questi ons in clude? In July ' 62, what are the three theme

46、s? Where is a new theme introduced?Examples of more gen eral questi ons in clude? What looks interesting here want to explore? How would you like to change or manipulate the view?We captured verbal protocol duri ng this discussi on. At the end, we asked participants to complete a short questi onn ai

47、re, with feedback about the visualizati on and possible enhan ceme nts.From the verbal protocol and from user behavior, we observed that the users had no difficulty in un dersta nd- ing the metaphor. They were able to ide ntify themes that were stro ngly represe nted and able to un dersta nd the rel

48、ati on ship betwee n the width of the curre nt and theme strength. The visualization also triggered questions about the reasons behind certain theme strengths and patter ns. For exploratory visualizati ons, this is a good result; we believe that a visualizati on should help the user ide ntify questi

49、 ons of in terest to explore.Questi onn aire resp on ses showed that users foundThemeRiver useful, particularly for identifying macro tren ds. They told us that it was less useful for ide nti- fying minor trends because the curves tend to de- whae mphasiize very small values. We asked about the valu

50、e of the river metaphor, and users rated it highly as well. They observed that the conn ected ness of the river helped them follow a trend more easily over time tha n in the histogram; this result is compatible with the perception principles described by Ware 17.Users liked some features of the hist

51、ogram and rec- omme nded addi ng them to ThemeRiver. One such feature is the ability to see n umeric values that drive the histogram and river curre nts. One user expressed more trust in the histogram, because she“ knew”were exactly the data values, whereas she was not sure exactly what the data val

52、ues were in ThemeRiver. Her point is a valid one, especially because the curved linesProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7695- 0 804-9 /00 $10.00 2000 IEEEProceed ings of theIEEE Symposium on In formatio n Visualizati on 2000 (In foVis'00)0-7

53、695- 0 804-9 /00 $10.00 2000 IEEEsents a sample usage scenario, illustrating the capabilities of the curre nt vers ion.We used ThemeRiver to explore the 1990 Associated Press (AP) n ewswire data from the TREC5 distri- buti on disks, a set of over 100,000 docume nts (see Figure 3). To explore the sel

54、ected themes in this collection, a user might beg in with a high-level survey of the visualization by panning along the course of the river.The user might look for wider curre nts that sig nal heavy use of a topic, such as the one for“ baghdad ” in Figure3. Changes in the color distribution of the r

55、iver signal cha nges in themes. We see such a cha nge in August 1990, when the“ kuwait ” current, which had vanished inlate July, sudde nly appears and rapidly wide ns. The user could also look for narrow currents in the river thatsig nal relatively light use of particular themes.In an earlier paper

56、, Hetzler et al. 5 explored the AP data set with a variety of our visual an alysis tools, focus ing on large theme cha nges surrou nding the Iraqi in vasi on of Kuwait on August 2. ThemeRiver also reflects these large theme cha nges. Near the right side of Figure 3, we see several curre nts that exp

57、a nd dramatic-of ThemeRiver do require that we in terpolate betwee n data points to produce the curves. We have added the capability for users to see the exact data points on dema nd.Although users liked the abstracti on to the whole collecti on and thus away from in dividual docume nts, both users suggested add ing features to access documents if desired. They wan ted the ability to see the total number of documents during any time period and to get the text of each docume nt on dema nd. They wan ted to select a curre nt and see the docume nts that con tributed to it.Users also wan ted

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论