[日语学习]Introduction to biological statistics.ppt_第1页
[日语学习]Introduction to biological statistics.ppt_第2页
[日语学习]Introduction to biological statistics.ppt_第3页
[日语学习]Introduction to biological statistics.ppt_第4页
[日语学习]Introduction to biological statistics.ppt_第5页
已阅读5页,还剩80页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1,Biological statistics,Li, Xinhai,Note Assignment Language,2,Text books,Zar, J. H. 1999. Biostatistical Analysis. Fourth Edition. Prentice Hall: New Jersey, 663 pp.,3,Introduction to biological statistics,History Contents Data in biology Descriptive statistics,4,Biostatistics or biometry,“biostatistics“ and “biometry“ are sometimes used interchangeably “biometry“ is more often used of biological or agricultural applications “biostatistics“ is more often used of medical applications.,5,What is statistics?,Statistics is the science of describing or making inferences about the world from a sample of data. Descriptive statistics are numerical estimates that organize and sum up or present the data. Inferential statistics is the process of inferring from a sample to the population.,6,History,John Graunt (1620-1674) and William Petty (1623-1687): census, longevity, and mortality Blaise Pascal (1623-1662) and Pierre de Fermat (1601-1665), Jacques Bernoulli (1654-1705): probability theory Abraham de Moivre (1667-1754): combine the statistics with probability theory; approximate the normal distribution though the expansion of the binomial distribution Carl Friedrich Gauss (1777-1855) (Germany): least square Adolphe Quetelet (1796-1874) (Belgium): significance of constancy of large numbers (The Quetelet index is the internationally used measure of obesity) Florence Nightingale (1820-1910): graphic presentation of statistics,7,Francis Galton,Francis Galton (1822-1911) (father of biometry and eugenics): ANOVA, regression, correlation African Explorer and elected Fellow in the Royal Geographic Society Creator of the first weather maps and establisher of the meteorological theory of anticyclones Coined term “eugenics“ and phrase “nature versus nurture“ Developed statistical concepts of correlation and regression to the mean Discovered that fingerprints were an index of personal identity and persuaded Scotland Yard to adopt a fingerprinting system First to utilize the survey as a method for data collection Produced over 340 papers and books throughout his lifetime Knighted in 1909,/digitalcollections/hst/scientific-identity/fullsize/SIL14-G001-05a.jpg,Galton, F. (1869/1892/1962). Hereditary Genius: An Inquiry into its Laws and Consequences. Macmillan/Fontana, London. Galton, F. (1883/1907/1973). Inquiries into Human Faculty and its Development. AMS Press, New York.,8,Karl Pearson,Karl Pearson (1857-1936): continued in the tradition of Galton and laid the foundation for much of descriptive and correlational statistics. method of moments chi-square correlation,http:/www.economics.soton.ac.uk/staff/aldrich/New%20Folder/kpreader1.htm,The Grammar of Science (1892) The Life, Letters and Labours of Francis Galton (1930?),9,Ronald A. Fisher,Sir Ronald Aylmer Fisher, (17 February 1890 29 July 1962) an English statistician, evolutionary biologist, and geneticist. was described by Anders Hald as “a genius who almost single-handedly created the foundations for modern statistical science“1 and Richard Dawkins described him as “the greatest of Darwins successors“.2 Analysis of variance Maximum likelihood Fisher information,/wiki/Image:RonaldFisher.jpg,1 Hald, Anders (1998). A History of Mathematical Statistics. New York: Wiley. 2 Dawkins, Richard (1995). River out of Eden.,10,Society and publications,In 1901 Pearson, Weldon and Galton founded Biometrika, a “Journal for the Statistical Study of Biological Problems”. Until the 1940s, the application of statistics to biological questions began to have a profound impact on the scientific community. The biometrics section of the American Statistical Association to publish the Biometrics Bulletin, in 1945. In 1947, International Biometric Society (IBS) was established. Shortly thereafter, the IBS began publishing Biometrics.,11,Contents,Descriptive statistics Probability distribution Hypothesis testing Analysis of variance Linear regression Correlation,12,Contents,Nonparametric statistics Multivariate analysis Logistic analysis Generalized linear model,13,Data,Datum is one observation about the variable being measured. Data are a collection of observations. A population consists of all subjects about whom the study is being conducted. A sample is a sub-group of population being examined.,14,Statistics vs. Parameters,A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency. A statistic computed in samples is used to estimate parameters,15,Variables,Measurement variable Continuous variable Discontinuous variable (e.g. clutch size) Ranked variable Attributes (categorical or nominal variables),16,Derived variables,Ratio Index Rate,17,Accuracy and precision of data,Accuracy,Precision,Inaccuracy,18,Summarizing Qualitative Data,Frequency Distribution Relative Frequency Distribution Percent Frequency Distribution Bar Graph Pie Chart,19,A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes.,The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.,Frequency Distribution,20,Frequency Distribution,Poor Below Average Average Above Average Excellent,2 3 5 9 1 Total 20,Rating,Frequency,Guests staying at Marada Inn were asked to rate the quality of their accommodations as being:,21,The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class.,A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.,Relative Frequency Distribution,22,Percent Frequency Distribution,The percent frequency of a class is the relative frequency multiplied by 100.,A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.,23,Relative Frequency and Percent Frequency Distributions,Poor Below Average Average Above Average Excellent,.10 .15 .25 .45 .05 Total 1.00,10 15 25 45 5 100,Relative Frequency,Percent Frequency,Rating,.10(100) = 10,1/20 = .05,24,Bar Graph,A bar graph is a graphical device for depicting qualitative data.,On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes.,A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis).,Using a bar of fixed width drawn above each class label, we extend the height appropriately.,The bars are separated to emphasize the fact that each class is a separate category.,25,Poor,Below Average,Average,Above Average,Excellent,Frequency,Rating,Bar Graph,Marada Inn Quality Ratings,26,Pie Chart,The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.,First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class.,Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.,27,Marada Inn Quality Ratings,Pie Chart,28,Summarizing Quantitative Data,Frequency Distribution Relative Frequency and Percent Frequency Distributions Dot Plot Histogram Cumulative Distributions,29,Example: Hudson Auto Repair,Sample of Parts Cost for 50 Tune-ups,30,Frequency Distribution,Guidelines for Selecting Number of Classes,Use between 5 and 20 classes.,Data sets with a larger number of elements usually require a larger number of classes.,Smaller data sets usually require fewer classes,Use classes of equal width.,Approximate Class Width =,31,Frequency Distribution,For Hudson Auto Repair, if we choose six classes:,Approximate Class Width = (109 - 52)/6 = 9.5 10,32,Relative Frequency and Percent Frequency Distributions,33,Only 4% of the parts costs are in the $50-59 class.,The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class.,30% of the parts costs are under $70.,10% of the parts costs are $100 or more.,Insights Gained from the Percent Frequency Distribution,Relative Frequency and Percent Frequency Distributions,34,Dot Plot,One of the simplest graphical summaries of data is a dot plot. A horizontal axis shows the range of data values. Then each data value is represented by a dot placed above the axis.,35,50 60 70 80 90 100 110,Cost ($),Dot Plot,Tune-up Parts Cost,36,Histogram,Another common graphical presentation of quantitative data is a histogram.,The variable of interest is placed on the horizontal axis.,A rectangle is drawn above each class interval with its height corresponding to the intervals frequency, relative frequency, or percent frequency.,Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.,37,Histogram,38,Cumulative frequency distribution - shows the number of items with values less than or equal to the upper limit of each class,Cumulative relative frequency distribution,Cumulative Distributions,Cumulative percent frequency distribution,39,Cumulative Distributions,Hudson Auto Repair,40,Stem-and-Leaf Display,Each digit on a stem is a leaf.,Each line in the display is referred to as a stem.,To the right of the vertical line we record the last digit for each item in rank order.,The first digits of each data item are arranged to the left of a vertical line.,It is similar to a histogram on its side, but it has the advantage of showing the actual data values.,A stem-and-leaf display shows both the rank order and shape of the distribution of the data.,41,Example: Leaf Unit = 0.1,If we have data with values such as,8.6 11.7 9.4 9.1 10.2 11.0 8.8,a stem-and-leaf display of these data will be,42,Example: Leaf Unit = 10,If we have data with values such as,1806 1717 1974 1791 1682 1910 1838,a stem-and-leaf display of these data will be,43,Descriptive statistics,Are the scores generally high or generally low? Where the center of the distribution tends to be located Three measures of central tendency Mode Median Mean,44,Mode,The most frequently occurring score Look at the simple frequency of each score Report mode when using nominal scale, the most frequently occurring category If you have a rectangular distribution, do not report the mode Unimodal, bimodal, multimodal, antimode,45,Example of Mode,In this case the data have two modes: 5 and 7 Both measurements are repeated twice,46,Example of Mode,Mode: 3 Notice that it is possible for a data not to have any mode.,47,Median,Score at the 50th percentile If normal distribution the Median is the same as the mode Arrange scores from lowest to highest, if odd number of scores the Median is the one in the middle, if even number of scores then average the two scores in the middle Used when have ordinal scale and when the distribution is skewed,48,Example of Median,Median: (4+5)/2 = 4.5 Notice that only the two central values are used in the computation. The median is not sensible to extreme values,49,Mean,Score at the exact mathematical center of distribution (average) Used with interval and ratio scales, and when have a symmetrical and unimodal distribution Not accurate when distribution is skewed because it is pulled towards the tail,50,Deviations around the Mean,The score minus the mean Include plus or minus sign Sum of deviations of the mean always equals zero (X-M)=0,51,Uses of the Mean,Describes scores Deviation of mean gives us the error of our estimate of the score, with total error equal to zero Predict scores Describe a scores location Describe the population mean () which is a parameter,52,Measures of Variability,Extent to which the scores differ from each other or how spread out the scores are Tells us how accurately the measure of central tendency describes the distribution Shape of the distribution,53,Why do we care about variability?,Where would you rather vacation, LA Bungalows, where the mean temperature is 24 degrees, or Sahara Condos where the mean temperature is 24 degrees? LA temperature range: day = 26 night = 22 Sahara temperature range: day = 40 night = 8,54,Range,Can report the lowest and highest value Or report the maximum difference between the lowest and highest Semi-interquartile range used with the median: one half the distance between the scores at the 25th and 75th percentile,55,Variance,Uses the deviation from the mean Remember, the sum of the deviations always equals zero, so you have to square each of the deviations S2X= sum of squared deviations divided by the number of scores Provides information about the relative variability,56,Some Limits,It isnt the average deviation Interpretation doesnt make sense because: Number is too large And it is a squared value,57,Standard Deviation,Take the square root of the variance SX Uses the same units of measurement as the raw scores How much scores deviate below and above the mean,58,The standard deviation,standard deviation the mean of deviations from the mean (sort of),(lowercase sigma) is the population standard deviation.,the sample standard deviation,S,(s-hat) is the sample estimate of ,59,The deviation (definitional) formula for the population standard deviation,The larger the standard deviation the more variability there is in the scores The standard deviation is somewhat less sensitive to extreme outliers than the range (as N increases),60,The deviation (definitional) formula for the sample standard deviation,Whats the difference between this formula and the population standard deviation? In the first case, all the Xs represent the entire population. In the second case, the Xs represent a sample.,61,Standard Deviation: Example,26.8,106.8,62,Calculating S using the raw-score formula,To calculate X2 you square all the scores first and then sum them To calculate (X)2 you sum all the scores first and then square them,63,Estimating the population standard deviation from a sample,S, the sample standard, is usually a little smaller than the population standard deviation. Why? The sample mean minimizes the sum of squared deviations (SS). Therefore, if the sample mean differs at all from the population mean, then the SS from the sample will be an understimate of the SS from the population Therefore, statisticians alter the formula of the sample standard deviation by subtracting 1 from N,64,Population and sample variance and standard deviation,When we have data from the entire population we use to compute X using the same formulas We us

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论