统计学外文翻译.docx_第1页
统计学外文翻译.docx_第2页
统计学外文翻译.docx_第3页
统计学外文翻译.docx_第4页
统计学外文翻译.docx_第5页
已阅读5页,还剩9页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

外文翻译原文名称:fundamentals_of_statistics measures of central tendency and location: mean, median, mode, percentiles, quartiles and deciles. x sorted x53535553705358556457575753586964576868695370the measures of central tendency are mean, median and modemean x-bar or for a given variable, it is the sum of the values divided by the number of values (sxi/n). in this case, we have n = 11. so we need to add all of the values together and divide by 11. s = 657, = 59.73median the number in a distribution of a variables response where one half of the values are above and one half of the values are below. to find the median, we first need to put our data in ascending order (smallest to largest). then we can determine the medianif the value of n is odd, it is simply the middle observation, but if the value of n is even, it is the average of the two middle observations.in this case, n is odd, so the median will be the middle observation of our sorted values (the 6th value).57mode the value that occurs most frequently. if there are two different values most frequently occurring, the data are said to be bi-modal. if there are more than two modes, and the distribution is said to be multi-modal. in this case, the value that occurs most often is 53. so, the mode is 53.the measures of location are percentile, quartile and decilepercentile the pth percentile is a value such that at least p percent of the observations are less than or equal to this value and at least (100 p) percent of the observations are greater than or equal to this value. to calculate percentiles, we use indices (i). i = (p/100) n for p1, p2, p3,p99if the answer is a whole number (an integer), then i is the average of (p/100)n and 1 + (p/100)n.if the index number is not a whole number, we always round up. the position of the index is the next whole number (integer) greater than the computed index. for example:i(p50) = (50/100)11 = 5.5.this rounds up to 6so, we would count from the lowest value of the sorted data to the index number (6). since the calculated i was not a whole number we had to round up to find the value where at least 50% of the values are equal to or lower than this value and at least 50% are equal to or higher than this value. in this case, the value of the 50th percentile is the 6th value.57 does this look familiar? the 50th percentile is the same thing as the median.what does it tell us? in this distribution, at least 50% of the observations are less than or equal to 57 and at least 50% of the observations are greater than or equal to 57.i(p80) = (80/100)11 = 8.8.this round up to 9. the 9th value is 68.again, since the index number is not a whole number, we round up. so, we would count from the lowest value of the sorted data to the index number (9). in this case, the value of the 80th percentile is 68.since this dataset has 11 observations, we wont have any instances where our calculated index number is a whole number. however, if we just remove our value of 70 and create a new distribution, we will be able to see an example.53 53 53 55 57 57 58 64 68 69i(p30) = (30/100)10 = 3.this is a whole number, so we must take the 3rd and 4th values and average them to find the 30th percentile. (53 + 55)/2 = 54so, the value of the 30th percentile is 54.return to our original data distribution .quartiles are special cases of percentilesq1 = p25, q2 = p50, q3 = p75,these three values divide the distribution into 4 equal quartersi(q1) = (25/100)11 = 2.75.this rounds to 3, so q1 is the 3rd value.53 i(q2) = (50/100)11 = 5.5.this round to 6, so q2 is the 6th value.57i(q3) = (75/100)11 = 8.25.this rounds to 9, so q3 is the 9th value.64measures of dispersion or variability: range, interquartile range (iqr), variance, standard deviation and coefficient of variation.range = this tells us how wide the span is from the maximum value to the minimum value. (max min) = range. in this instance, the range is 69 - 53 = 16.interquartile range (iqr) = this tells us how wide the span is in the middle 50% of the data. (q3 q1) = iqr. in this case . 64 53 = 11we will use iqr in later processes, so we will want to keep this x(x-xbar)(x-xbar)253 53-6.73 -6.7345.29 45.2953 53-6.73 -6.7345.29 45.2953 53-6.73 -6.7345.29 45.2955 55-4.73 -4.7322.37 22.3757 57-2.73 -2.737.45 7.4557 57-2.73 -2.737.45 7.4558 58-1.73 -1.732.99 2.9964 644.27 4.2718.23 18.2368 688.27 8.2768.39 68.3969 699.27 9.2785.93 85.9370 7010.27 10.27105.47 105.47657 657-0.03 -0.03454.18 454.18657/11=59.73454.18/1045.2we use the formula: = s2the variance for these data is 454.18. for our purposes here, the computation of variance is just a step towards the computation of the standard deviation. sample standard deviation (s) is the positive square root of the variance. = sso the formula for sample standard deviation ispopulation variance (s2)uses the same formula in the numerator, but n instead of n-1 in the denominator. since we rarely have information about the entire population, we almost always use the formula for sample variance, s2.population standard deviation: s = since we rarely have information from the entire population, we use the formula for sample standard deviation, s.coefficient of variation: tells us what percent the sample standard deviation is of the sample meanthis number is “relative” and is only of use in comparing the distribution of two or more variables.suppose i have two samples, and i want to know which sample has more variabilityif both samples have the same mean, the one with the higher standard deviation will have the greater variability. however, if they have different means, i need to calculate the coefficient of variation to determine which one has the most variability. xbar = 458, s = 112 versus xbar = 687, s = 192standardized data and detecting outliersz-score: z = the z-score tells us how many standard deviations a value is from the mean. we can look at a picture of what a z-score tells us. in the normal curvethe mean is at the highest point and the curve tails off symmetrically in both directions. the sign of the z-score tells us which direction the value is from the mean on the normal curve. negative values will be to the left, and positive values will be to the right.standardizing scores:standard normal curvethe mean is zero, and the standard deviation is 1. the distribution is bell-shaped and symmetrical. the area under the curve is 1, and the tails of the curve extend out infinitely. they never actually touch the horizontal axis. the highest point on the curve is at the meanreturn to our data lets calculate the z-scores for each of the valuesempirical rule used when the distribution is assumed to known to be approximately normal. approximately 68% of the values will fall within 1 sd of the mean approximately 95% of the values will fall within 2 sd of the mean approximately 99.9% of the values will fall within 3 sd of the meanchebyshevs theorem doesnt require that the data have a normal distributionsays that at least (1 1/z2) values will fall within z standard deviations of the mean.1-1/12 = 0, 1-1/22 = .75, 1-1/32 = .88889, 1-1/42 = .9375, 1-1/52 = .96 we cant make any assumptions about the percent of values that are within 1 sd of the meanbut at least 75% of the values will fall within 2 sd of the mean at least 88.9% of the values will fall within 3 sd of the meanwe use chebyshevs theorem to estimate the variation in a distribution when n 30, or the shape of the distribution is unknown, or the distribution is assumed to be non-normal.outliers:suspect or extreme values of data that must be identified and scrutinized. if they are instances of incorrectly entered data, they should be corrected. if the value was entered correctly and it is a valid number, it should remain in the dataset as part of the initial analysis.when we use the z-score method for identifying outliers, we assume that any value that has a z-score with an absolute value greater than 3.0 (that is less than -3.0 or greater than +3.0) is an outlier. before we proceed with data analysis, we need to examine all outliers for accuracy. if we determine that the value is valid, we often run two sets of analysis. one with the outlier, and one without. another way to identify outliersrelated to iqr is the five number summaryminimum, q1, q2, q3, & maximum. these values feed into upper and lower limits, and we graph them in a box plot.five number summaryminimum53q153q257q364maximum70 use the box plot the advantage of the boxplot is that it is not influenced by outliers or extreme values as are z-scores.box plots whiskers show the range of data within the inner fences 3(iqr) 1.5(iqr) q1 median q3 1.5(iqr) 3(iqr)below q1 below q1 (iqr) above q3 above q3(lower outer & inner fences) (upper inner & outer fences) any values between the inner and outer fences are “unusual,” and any values out beyond the outer fences are “outliers.”advantage of using the box plot method as well as the z-score method.the box plot method is not influenced by extreme values in the same way that the mean and the standard deviation are.it is said to be a more conservative method of evaluating outliers.外文翻译原文课题名称:统计基础 measures of central tendency and location:趋势和位置的划分: mean, median, mode, percentiles, quartiles and deciles. 意思是说,中位数,众数,百分位数,四分位数和十分位数。 x x sorted x 排序x 53 53 53 53 55 55 53 53 70 70 53 53 58 58 55 55 64 64 57 57 57 57 57 57 53 53 58 58 69 69 64 64 57 57 68 68 68 68 69 69 53 53 70 70 the measures of central tendency are mean, median and mode 中央趋势的划分是平均数,中位数和众数均值 均值 对于一个给定的变量,它的值除以变量的数目的总和。 在这种情况下,我们有 n = 11。 因此,我们需要添加所有的值除以11。 s = 657 , s= 657, = 59.73 = 59.73 中位数 值的一半以上和一个值的一半以下再在分配变量的响应。找到中位数,我们首先需要把我们的数据在升序(从最小到最大)。然后我们可以判断,中位数,如果n 的值是奇数,它仅仅是中间的观察,但如果n 的值是偶数,这是中间的两个观测的平均 。在这种情况下,n是奇数,所以中位数将是我们的排序好的变量的中间观察值(第6个值). 57 mode 众数 the value that occurs most frequently. 发生最频繁的值。 if there are two different values most frequently occurring, the data are said to be bi-modal. 如果有两种不同的价值观最经常发生的,说是数据双峰。 if there are more than two modes, and the distribution is said to be multi-modal. in this case, the value that occurs most often is 5 3 . 如果有两个以上的众数,分布被认为是多众数。在这种情况下,最常出现的值是5 3。 so, the mode is 5 3 . 因此,众数是5 3。 the measures of location are 位置的度量 percentile, quartile and decile百分位数,四分位数和十分位数 百分位数 在第p百分是一个变量至少为p 的观测小于或等于这个值 ,至少(100 - p) 的意见是大于或等于这个值。计算百分,我们使用指数(i)。 if the answer is a whole number (an integer), then i is the average of (p/100)n and 1 + (p/100)n . 如果答案是一个整数(整数),那么 i是 和 的平均值。 if the index number is not a whole number, we always round u如果指数数不是一个整数,我们通常取该指数的位置是下一个整数(整数)大于计算指数。 for examp例如:.这时候取6 so, we would count from the lowest value of the so rted data to the index number (6 ). since the calculat所以,我们会把从的最小值排序的数据到索引号(6)。由于计算 i was not a whole number we had to round up to find the value where at least 50% of the values are equal to or lower than this value and at least 50% are equal to or higher than this value. in this case, the value of the 50 th percentile is the 6 th value.57i不是一个整数,我们必须找到值四舍五入到至少50%的值等于或低于这个值和至少50%是等于或者高于这个值。在这种情况下, 第50百分位值是第6个. 57 . does this look familiar? 这是否很熟悉? the 50 th percentile is the same thing as the median. 这和第50百分位数相同。what does it tell us? 它告诉我们什么?in this distribution, at least 50% of the observations are less than or equal to 57 and at least 50% of the observations are greater than or equal to在此分布,至少有50的意见是小于或等于57,至少有50的意见都大于或等于57。 i ( p 80) . 这时候取9。 the 9 th value is第九个变量是668。 again, s ince the index number is not a whole number, we round up. so, we would count from the lowest value of the sor ted data to the index number (9 ). in this case, the value of the 80 th percentile is 同样,由于索引号不是一个完整的数,我们。所以,我所以,我们会把从的最小值排序的数据到索引号(9)。在这种情况下,第80百分位值是 68 . 68。 since this dataset has 11 observations, we wont have any instances where our calculated index number is a whole number.因为这个数据集有11的观察值,我们将不会有任何情况下,我们计算的索引号是一个整数。however, if we just remove our value of 70 and create a new distribution, we will be able to see an example.然而,如果我们只是删除我们的价值为70,并创建一个新的分布,我们将能看到一个例子. 53 53 53 53 53 53 55 55 57 57 57 57 58 58 64 64 68 68 69 69 i (p30) .这是一个整数,所以我们必须采取第3和第4值和平均他们找到第30百分位。(53 + 55)/ 2 = 54 so, t he value of the 30 th percentile is 因此,第30百分位是54。 return to our original data distribut返回到我们的原始数据分布 . . quartiles are special cases of percentilesq 1四分数 - 特殊情况下,这三个值将分布分成4等分i (q1).这时候取3,因此第3个值. 53 i (q2).这时候取6,因此是第6个值. 57i (q3) .这时候取6,因此是第6个值.64 离散程度measures of dispersion or variability : range , interquartile range (iqr) , variance ,离散离散分布 或可变性,极差,四分位间距(iqr),方差, standard deviation and coefficient of variation . 标准差和变异系数。 极差range = this tells us how wide the span is from the maximum value to the minimum value.极:这就告诉我们有多宽跨度是从最大值到最小值。((max min) = range. i n this instance, the range is 69 - 5 3最大值-最小值)=极差在这个实例中,极差是69-53=16。 四分位间距(interquartile range (iqr) = this tells us how wide the span is in the middle 50% of the data. (q3 q1) = iqr.四分位间距(iqr)= 这告诉我们在中间50的数据是跨度有多大 in this case 在这种情况下 . 64 53 = 11. 64 - 53 = 11 we will use iqr in later processes, so we will want to keep this 我们将在以后的过程中使用的四分间距,所以我们要保持这个 样本方差-这告诉我们,从平均值的偏差的平方的总和。大的方差表示偏离程度打,小的方差表示偏离程度小。 we square the values so that we dont end up with zero.由于是变量的平方,所以结果不会为零。让我们来看看这是如何实现的xxx(x-xbar) ()(x-xbar) 253 53-6.73 -6.7345.29 45.2953 53-6.73 -6.7345.29 45.2953 53-6.73 -6.7345.29 45.2955 55-4.73 -4.7322.37 22.3757 57-2.73 -2.737.45 7.4557 57-2.73 -2.737.45 7.4558 58-1.73 -1.732.99 2.9964 644.27 4.2718.23 18.2368 688.27 8.2768.39 68.3969 699.27 9.2785.93 85.9370 7010.27 10.27105.47 105.47总和657 657-0.03 -0.03454.18 454.18657/11=59.7364454.18/1045.2we use the formula: 我们使用的公式: = s 2 t he variance for these data is 454.18 . 这些数据的方差是454.18。 for our purposes here , the computation of variance is just a step towards the computation of the standard deviation. 对于我们这里的目的,方差的计算仅仅是一个对标准偏差的计算步骤。 sample standard deviation ( s ) 样本标准差(s) is the positive square root of the variance. 是方差的平方根。 so the formula for sample standard deviation is 因此,样本标准差的计算公式是:population variance ( s 2 ) uses the same formula in the numerator, but n instead of n-1 in the denomin总体方差() 使用相同的公式中的分子, 但 n的分母,而不是n-1。 since we rarely have information about the entire population, we almost always use the formula for sample variance, s 2 .因为我们很少有整个样本总体,我们几乎总是用公式为样本方差公式,。 population standard deviation: 总体标准偏差:因为我们很少有整个样本总体,我们几乎总是用公式为样本标准差公式。s coefficient of variation: 变异系数: tells us what percent the sample standard deviation is of the sample mean 告诉我们什么是百分比的样本标准差的样本均值这个数字是“相对于”,只是用于比较两个或两个以上的变量的分布suppose i have two samples , and i want to know which sample has more variability假设我有两个样品,我想知道样品有更多的变化. if both samples have the same mean, the one with the highe r standard deviation will have the greate r variability.如果两个样本有相同的均值,拥有更高标准偏差将有更大的可变性。however, if they have different means, i need to calculate the coefficient of variation to determine which one has the most variability. xbar = 458, s = 112 versus xbar = 687, s = 192 但是,如果他们有不同的均值,我需要计算变异系数来确定哪一个有更大的可变性。 = 458,s = 112相比和 = 687,s = 192 standardized data and detecting outliers 标准数据和异常值的检验z -score: z-得分模型:the z-score tells us how many standard deviations a value is from the mean. z-得分告诉我们值的平均值是多少个标准差。 we can look at a picture of what a z-score tells us. 我们可以看看图片告诉我们什么的z-得分。 in the 在正常曲线.均值是在最高点和曲线尾巴远离对称在两个方向。 z-the sign of the z-score tells us which direction the value is from the mean on the normal curve.得分的标志告诉我们哪个方向,均值是指常态曲线。负值则是左边,正值将在右边。standardizing scores: 标准分数: standard normal curve the mean is zero, and the standard deviation is 1. 标准正态曲线均值为零,标准偏差为1。the distribution is bell-shaped and symm钟形和对称分布。曲线下的面积为1,曲线的尾巴无限延伸。他们和横轴永远没有交点。曲线上的最高点是在均值点return to our data lets calculate the z-scores for each of the values 返回到我们的数据,让我们计算每个值的z-得分. empirical rule used when the distribution is assumed to known to be approximately normal. 经验法则 假设分布已知。 approximately 68% of the values will fall within 1 sd of the mean 大约68变量将下降在1个标准差的平均值 approximately 95% of the values will fall within 2 sd of the mean 大约95%的变量将下降在2个标准差的平均值 approximately 99.9% of the values will fall within 3 sd of the mean 大约99.9的变量将下降在3个标准差的平均值chebyshevs theorem

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论