




免费预览已结束,剩余1页可下载查看
下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Box Plot (Box-and-whisker Plot)Information source: http:/informationandvisualization.de/blog/box-plot, Submitted by fabian on Sun, 02/03/2008 - 23:00.In my opinion the box plot is one of the most underestimated views in current fashionable information visualization approaches. Modern chart libraries come with a lot of available charts but almost all of them miss the box plot. Thus, I decided to write this article to put the brilliant box plot back on the map again and provide a CSS/Javascript solution for displaying box plots.I checked all chart libraries mentioned in this outstanding article of smashing magazine (and a few more) and this is what I found: From 15 web-based charting engines only one provides box plots. A little bit more encouraging are libraries and tools (although the listing is definitively not exhaustive): out of 8 investigated only one doesnt come with box plots. (Youll find detailed tables at the end of the article). History: The box plot goes back to John Tukey, which published in 1977 this efficient method to display robust statistics (Tukey77). Best Practice:The most impressive and excellent usage of a box plot I found on the world freedom atlas: Lets first look at the view at the top. Also here a box plot is displayed. The red dot in the blue bar is the median; the lines at the left and right represent the lower and upper quartiles (I will explain later on what numbers a box plot actually displays); 0 and 40 are the minimal and maximal possible values. If you move with the mouse over a country on the map, it is highlighted in the box plot as you can see in the picture above: the country with a raw political rights score of 34 (its Mongolia by the way). Another very nice feature here is the stacking of elements with the observed value at the top of the blue bar. This indicates for each value how many countries have this score and thereby providing an immediately comprehensible understanding of the underlying distribution. But, of course, this is only possible if you have a predictable number of values to stack otherwise you cannot determine the necessary height; and if these values are integers otherwise you have an infinite number of possible positions for the values and a stacking is not possible. The box plot at the bottom of the above picture is as recommended by Edward Tufte (Tufte01). Again, the red dot represents the median; the ends of the lines towards the red dot are the lower and upper quartile, respectively; the ends of the lines towards the borders are the minimum and maximum values. Another nice feature here is the yellow line showing the development of the shown index (the raw political rights score) over the last years for the selected country (the currently selected year is displayed in darker blue). Each particular value for the selected country in each year is connected by the yellow line. As one can see immediately, it is a little decreasing. As mentioned above this is probably the most stunning example of a box plot, everything is done correctly. Still, in my opinion, there are some drawbacks with Tuftes recommendation for box plots. Usually, a box plot is displayed in the following way (this one was created with the data exploration tool KNIME, where this box plot was implemented by myself):Tuftes recommendation is based on the notion of avoiding chart junk and the principle of maximizing data ink, i.e. the ink in the drawing should be used to display data and not decoration or junk. While this is certainly a good guideline, it is sometimes difficult to read. In the example of the world freedom atlas, it is only possible to decipher the actual values by looking at the box plot to the left. By maximizing the data ink sometimes the readability is minimized. In the example below definitively more “ink” was used, but in my opinion the essential information the key values and their exact numbers are immediately visible. This might not be as appealing as the box plot above, but if you are really interested in the values this version might better fit your needs. (Maybe, because Im more familiar with it?) Theory:But what is this all about? What values are displayed in a box plot? What are the advantages of a box plot? The image below should at least clarify the used terms, whose meaning is explained below. A small example should make things clear. Consider a small village with 25 inhabitants. This is what they earn and the resulting box plot: As you can see, the basic idea is to sort the data and then select the minimum, the maximum and the values at the referring positions: median (0.5), lower (Q1) (0.25) and upper quartile (Q3) (0.75). Why these values are considered to be robust statistic key values? In order to explain this, consider a similar village with one rich person and the following incomes: Two things are important here: 1. Calculation of the quartiles (X0.25, X0.5, X0.75): Lets consider we would have 4 values. Then the position of the median would be 4 * 0.5 = 2 which is not the the middle of four values. Actually, there is no value in the middle of four values, so we have to take the mean between the 2nd and 3rd value. If we have 5 values, then the position of the median is 5 * 0.5 = 2.5. Then the ceiled value is 3 and the 3rd value is indeed in the middle of 5 values (2 above and 2 below). The same holds for the other quartiles. To sum it up, the quartiles are calculated as follows: 1. calculate the position p 2. check if it is an integer yes: take the mean between value at position p and p+1 no: take the value at ceil(p) Almost all programming languages start counting at zero, so the values dont have to be ceiled but floored to get the correct positon and if it is an integer the mean between p and p-1 has to be taken. 2. The horizontal bars outside of the box in the middle (called whiskers: hence the name box and whisker plot) are not always the maximum and the minimum. The whiskers mark those values which are minimum and maximum unless these values exceed 1.5 * IQR. The IQR is the inter quartile range: the distance between Q1 and Q3. If there are observations which are outside 1.5 * IQR or even 3 * IQR then they are considered as mild and extreme outliers, respectively. The picture below depicts the concept in a qualitative way (distances are not correct): And here the robust statistics become relevant. Lets compare the median with the mean (the mean is the sum of all values divided by the number of values). Robust Statistics: In the first case we have a median of 2,069.79 and a mean of 2,037.38, so they are quite comparable. In the second case according to the mean of 2,303.437 the village is richer, while the median keeps incorruptible saying the truth (1996.705) and the only rich person is displayed as what it is in this village: an outlier. The same holds for the other key values, of course. SummaryAt this point we can summarize, what a box plot actually displays. at least 25% of all values are below the lower quartile Q1. at least 50% of all values are below (or above) the median. at least 25% of all values are above the upper quartile Q3. The box contains 50% of the data (Q3 (75%) - Q1(25%) = 50%). You can read from the size of the box, the distance of the whiskers the distribution of the values. Between the median and the quartiles are 25% of the data (75% - 50% = 25% and 50% - 25% = 25%), i.e. the position of the median inside the box indicates whether there are more values towards the upper or lower quartile. Not to mention the outliers, which are those values, that are far away from most of the other values. Application:In this section we provide a JavaScript and CSS based box plot which hopefully increases the usage of box plots. We first start with the JavaScript to sort the numbers, then access and calculate the key values and detect the outliers. Afterwards these values are displayed with the help of CSS and by inserting elements into the DOM tree. This example page shows how it works. If you want to use a box plot on your page you just have to import the CSS and the javascript and then call createBoxPlot(dataArray, height, divID); where dataArray is your numeric data as an array height is the desired height of the box plot and divID is the id of the div that contains the box plot at the end Thats it. You can position the div with divID as you like. So far, we tested it with Firefox, Safari, Opera, and Internet Explorer 7 on Windows and Ma
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年中国高端紫外线光吸收剂(UVAs)行业市场分析及投资价值评估前景预测报告
- 口腔医院感染培训知识
- 2025年绿色建筑材料市场推广与政策支持绿色建材市场行业政策环境研究报告
- 2025年中国富右旋反式烯丙菊酯行业市场分析及投资价值评估前景预测报告
- 口腔医保知识培训总结课件
- 江苏省江阴市成化高级中学高中地理 2.2 森林的开发和保护 以亚马孙热带雨林为例说课稿 新人教版必修3
- 全国滇人版初中信息技术八年级下册第一单元第4课《多分支结构程序设计》教学设计
- 内容概览说课稿中职基础课-职业道德与法治-高教版(2023)-(政治(道法))-59
- 高级驾驶员考试题及答案
- 高二福建会考试卷及答案
- 7.1 力(课件)2024-2025学年人教版八年级物理下册
- 16.2.1 分式的乘除 华师大版八年级数学下册课件
- 铁艺制作合同范例
- 腰椎骨水泥围手术期的护理
- 2025年日历表(A4版含农历可编辑)
- T-JAASS 128-2024 高标准农田排灌系统生态化建设技术规范
- 高空作业的安全协议书(2024版)
- 2024版标准工厂租赁合同模板
- CIM登峰系列方冰制冰机技术服务手册
- 石渣清运施工方案
- 高速公路无人机施工方案
评论
0/150
提交评论