




已阅读5页,还剩12页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
探索在软件工程项目中努力和持续时间的关系 专业班级:网络工程06-1 姓名:齐子龙 学号:200607030131郑州轻工业学院本科毕业设计(论文)英文翻译题 目 探索在软件工程项目中努力和 持续时间的关系学生姓名 齐子龙 专业班级 网络工程06-1 学 号 200607030131 院 (系) 计算机与通信工程学院 指导教师(职称) 张建伟(副教授) 完成时间 2010年5月10日 7英文原文Exploring the Relation Between Effort and Duration in Software Engineering ProjectsSerge Oligny, Pierre Bourque, Alain Abran, Bertrand FournierUniversit du Qubec MontralC.P. 8888, Succ. Centre-VilleMontral, Qubec, Canada H3C 3P8Email: oligny.serge, bourque.pierre, abran.alain, fournier.bertranduqam.caAbstractThis paper presents a confirmatory analysis of empirical models that predict software engineering project duration based on project effort, based on a more recent and much larger sample. The models are based on the analysis of project data provided by release 4 of the International Software Benchmarking Standards Group (ISBSG) repository. Duration models are built for subsets of projects using personal computer, mid-range and mainframe development platforms.Keywords Software Engineering, Project Duration Models, Project Scheduling Models, ISBSG, International Software Benchmarking Standards Group.1. Introduction and ContextMany parametric models based on project effort have been proposed in the literature 4, 7 to predict the duration of software development projects. A wide selection of tools that incorporate software duration estimation models can also be found on the market 3, 8.In spite of affirmations from vendors 6, studies have shown mixed results regarding the accuracy of these parametric duration models 2, 7.However, one must note that the data sets used to conduct these studies are often small and not too recent. Seven of the eight studies discussed in 7 were published over ten years ago. Ferens and Daly 2 present a study published in 1990 and discuss three other assessments published in 1989 or 1990.Of the 21 empirical duration models analyzed in 7, 12 of them are derived from samples containing 20 data points or fewer. Ferens and Daly 2 discuss the results of three other studies that assess parametric duration models. These studies are conducted on samples containing one, eight and twelve projects while their own study is based on a sample oftwenty-one defense projects which cannot be described because of “contractor-proprietary data restrictions”.This does not mean that these types of models should not be further investigated by researchers or applied by practitioners. It does imply though, that, software duration estimation is a somewhat complex problem and that applying these models correctly requires much expertise and commitment 11.Within this context, this paper proposes to reexamine the relationship between overall project effort and duration:a) based on a relatively large sample of data, using release 4 of ISBSG database 5, containing data on 396 software projects completed between 1989 and 1996;b) in the perspective of obtaining a “first order” or overall estimate of project duration rather than a detailed breakdown of project duration at the phases or activity level;c) to verify if the development platform used, as an indicator of the general approach to project development, bears an impact on project duration;d) to compare the overall characteristics of the empirical relationship based on sample describe in a) above with the overall characteristics of other known and comparable models.These goals will be reached by analyzing the ISBSG sample (section 2). Three empirical models, for personal computer, mid-range and mainframe development platform, will be presented in section 3. A brief conclusion and further opportunities for research are presented in section 4.2. Analysis of the Data SampleAmong the 396 projects in release 4 of the ISBSG repository, projects showing the following characteristics were selected: No reasonable doubt as to data validity; that is, the ISBSG has not flagged the selected project ashaving uncertain data and has retained it for its own analyses; Known effort ; Known duration; Known platform;312 projects satisfied all these criteria. Basic descriptive statistics are shown in Table 1 and a scatter plot of the data is shown in Figure1.Figure 1 suggests that the relation between effort (E) and duration (D) is non-linear. Furthermore, a greater degree of scatter is observable for projects requiring larger effort.Taken as is, such data distributions offer weak support for linear regression. It was thus deemed appropriate to work on a mathematical transformation of the data. A Log transform was therefore applied successfully in the sense that the Log(duration) and the Log(effort) data both follow a normal distribution. Test of normality confirm this at the 95% level.As shown in Figure 2, the Log transformations renders the relation more linear and reduces the extent of scatter for high levels of effort. The Logtransformed values will therefore be used to derivean empirical model linking project effort to project Duration.3. Deriving duration models for each development platformFollowing a recommendation in 10, duration models are built for more homogeneous subsets of data rather than using the entire data sample. From the 312 projects meeting the criteria presented in section 2, 208 (67%) developed software for a mainframe platform, 65 (21%) for a mid-range platform and 39 (12%) for a personal computer platform.We begin by presenting detailed results for the mainframe platform projects. An identical analysis was performed for projects developing software for the other two platforms and the results are presented subsequently.A linear regression was performed on the Logtransformed values of effort and duration for the 208 mainframe projects. Selected results from this regression and from the regressions for mid-range and personal computer data are presented in Table 2.Analysis of residuals against both Log(effort) and predicted Log(duration) shows that the residuals are randomly distributed over the range of the independent variable (Log(effort) and that thevariance of the residuals is constant over the range of the dependent variable (predicted Log(duration). It is thus held that the resulting linear regression model is acceptable.The empirical model linking project effort and duration for the mainframe platform projects can thus be characterized by the following equations. The mean predicted value of Log(duration) is:Log (D) = (0.366*Log (effort) - 0.339 (1)or, in the traditional multiplicative form:D = 0.458 * Effort0.366 (2)Based on this sample, this empirically derived model explains slightly over half the variance in project duration.Regression analyses were conducted separately for the 65 mid-range projects and for the 39 personal computer projects. The four hypotheses of linear regression were satisfied in both cases. For the midrange platform projects, the resulting multiplicative equation is:D = 0.548 * E0.360 (3)The data for the personal computer platform projects is much more scattered and the relation is weaker.The resulting multiplicative equation for this data set is:D = 1.936 * E0.201 (4)Given equations (2, 3 and 4) describing the relation between effort and duration for three classes of projects, it is legitimate to ask whether or not there is a significant difference between the models. Figure 3 shows that the relation for projects developing software for mainframe and mid-range platforms appears to be quite similar, while the relation for projects developing software for personal computer platforms seems to be different.To answer this question in a more formal way, we used the statistical approach described by Neder etal. in 9. This approach is similar to regression modeling using indicator variables and interaction terms to test the equality of equations for different classes of a qualitative variable.Based on this analysis, identity of the regression equations (at the 95% level) is not rejected for midrange and mainframe platform data. There is, however, a significant difference (Students t306 = 2.385, p = 0.0177) when comparing the identity of the slopes for projects developing software for a personal computer (PC) and mainframe platform.4. Conclusion and Further researchEstimation of software project duration is a difficult and key problem in software engineering which can only be solved with an inherently white-box approach. The credibility of an estimate depends entirely on the transparency of the method, data, definitions and assumptions that were used to derive this estimate.Based upon the recent and relatively large release 4 ISBSG data set, this paper showed that: The relation between effort and duration is nonlinear and the exponential term is in the range of 0.3 to 0.4 which is comparable to other duration models including COCOMO 1. Interestingly enough, this is in spite of the roughly 20 years that separate the projects used to develop COCOMO from the ones in release 4 of the ISBSG data set. Secondly, the release 4 of the ISBSG data set to a very large extent comprises business systems, while most other studies are based on defense-oriented projects. The size of the samples of our analyses are much morerecent and larger than those of previous studies. The duration models developed in this paper show that project managers can derive a firstorder estimate of project duration from a valid estimate of project effort. However, other scheduling methods must be used in conjunction with duration models such as those developed in this paper since much of the variance is not explained by the equations developed in this paper. There is a significant difference in the relation between effort and duration based on the development platform of the project. In this regard, project developing software for personalcomputers are different from projects developing software for either a mid-range platform or a mainframe platform. However, there is no significant difference in the relation between effort and duration for projects developing software for mid-range and mainframe platformsIn the perspective of these conclusions, many axes of research could be pursued. The size of the ISBSG sample permits the subdivision of the data set into other homogeneous data sets using characteristics like project type (new developments or major enhancements), or business area (banking, manufacturing, telecommunications, etc.). Better context-sensitive project duration models, explaining more variance, for instance, could possibly be derived from these subsets of data. The usage of non-linear models and the incorporation of other variables in the modeling process should also be explored.A second topic to study would be created by turning this problem around, that is, by developing and assessing empirical models that estimate an acceptable product size and project effort from a given project duration. This is a very worthwhile topic because project duration is often fixed before the project begins in many organizations due to timeto-market or implementation considerations.AcknowledgmentsThis research was carried out at the Software Engineering Management Research Laboratory at the Universit du Qubec Montral. The laboratory is supported through a partnership with Bell Canada. Additional funding is provided by the Natural Sciences and Engineering Research Council of Canada. The opinions expressed in this paper are solely those of the authors.5. References1. B.W. Boehm. Software Engineering Economics, Prentice Hall, 1981.2. D.V. Ferens, B. A.Daly. A Comparison of Software Scheduling Methods. In: Reifer D, ed.Software Management. 4th ed. Washington: IEEE Computer Society Press, 1993.3. A.E. Giles, D. Barney. “Metrics Tools: Software Cost Estimation”. Crosstalk The Journal ofDefense Software Engineering, 8(6) 1995.4. F.J. Heemstra. “Software cost estimation”. Information and Software Technology 34(10): 627-639, 1992.5. ISBSG. Worldwide Software Development - The Benchmark. Victoria, Australia International Software Benchmarking Standards Group, 1997. Can be ordered at http:/www. .au6. C. Jones. Determining software schedules. Computer 28(2):73-75 1995.7. B.A. Kitchenham. “Empirical studies of assumptions that underlie software costestimationmodels”. Information and Software Technology 34(4): 211-218 1992.8. O. Mendes, A. Abran and P. Bourque. Function Point Tool Market Survey. Universit du Qubec Montral, 1996 Available at http:/www.lrgl.uqam.ca/publications/pdf/204.pdf9. J. Neder, W. Wasserman and M. Kutner, Applied Linear Regression Model. SecondEdition ed, Richard D. Irwin Inc., Homewood, IL, 1989.10. S. Oligny, P. Bourque and A. Abran. “An Empirical Assessment of Project DurationModels in Software Engineering”. In: The Eight European Software Control and MetricsConference (ESCOM97), Berlin Germany, May 1997.11. R. E.Park, W. B. Goethert and J.T. Webb. Software Cost and Schedule Estimating: AProcess Improvement Initiative. Pittsburgh, PA Software Engineering Institute, 1994英文翻译探索在软件工程项目中努力和持续时间的关系摘要本文介绍了该工程项目工期预测软件项目工作量的基础上,根据经验模型验证性分析的一个最近的,更大的样本。这些模型的基础上,通过发布国际软件基准标准组(ISBSG)提供的项目信息库的数据分析,建立了项目使用个人电脑,中端和大型机开发平台的持续时间模型子集。关键词 软件工程,项目持续时间模型,项目调度模型,ISBSG,国际软件基准标准组1、介绍和背景根据项目的努力,许多参数模型已经在文献中提出47预测软件开发项目的期限。多种选择的工具软件,纳入持续时间估计模型也可以在市场上发现38。尽管在从供应商的肯定6,研究表明关于这些参数的时间模型的精确度不同的结果27。但是,我们必须注意,数据集来进行这些研究往往很小,不会太最近。在八项讨论中的七项研究发表在10年前。费伦斯和达利2提出了一种在1990年发表的研究报告,并讨论在1989年或1990年出版的其他评估。在21个时间模型实证分析7,其中12次是从含20分或更少的样本数据所得。费伦斯和达利2讨论了三个评估时间模型参数的其他研究结果。这些研究进行含一,八个样本和12个项目,而他们自己的研究是一个示例,防御不能因为“承包专有数据的限制”来形容项目的基础。这并不意味着这些类型的模型不应该作进一步的调查研究人员或从业人员适用。它意味着虽然,软件的时间估计是一个有点复杂的问题,而正确地应用这些模型,需要很多专业知识和承担11。在这一背景下,提出重新审查整个项目之间的努力和时间的关系:(1)以大样本的数据比较的基础上,利用数据库发布ISBSG 4 5,载有396名软件1989年至1996年完成的项目数据;(2)在取得,而不是一个项目的持续时间在详细分析阶段或活动水平的“一阶”或项目工期总体估计的观点;(3)以验证是否使用的开发平台作为项目开发的一般方法的指标,并负有对项目的影响持续时间;(4)为比较的实证关系的样本描述(一)与其他已知的和可比较上述模型的总体特征的总体特征。这些目标将达到通过分析ISBSG样品(二)。三种经验模型,为个人电脑,中端和大型机开发平台,将载于第3条。一个简单的总结和研究,提出更多的机会在第4。2、分析数据采集在释放其中的4 396 ISBSG库项目,项目呈现以下特征选择:为数据的有效性没有理由怀疑,这是没有标记的ISBSG选定的项不确定的数据,并保留了自己的分析它;已知的努力;已知的期限;已知的平台;312项目符合所有这些标准。基本描述性统计见表1和1的数据散点图是在图1所示。图1表明,努力之间的关系(E)和时间(D)是非线性的。此外,更大程度的分散观测项目,用于需要更大努力。两者的是,这些数据分布提供了线性回归的支持不力。因此,有人认为合适的工作,对有关情况的数据进行数学变换。因此,一个登录变换成功地应用在这个意义上说,日志(持续时间)和日志(工作)的数据都遵循正态分布。与正常测试确认在这95的水平。如图2所示,登录变革呈现更多的线性关系,减少了对高层次分散的努力程度。因此,该Logtransformed值将被用来derivean经验模型连接项目的努力,项目持续时间。3、每个开发平台的持续时间模型推导这是继10的建议,持续时间模型,建立了更多的数据子集,而不是均匀使用整个数据样本。从312个项目在第2次会议提出的标准,208(67)的大型主机平台,开发的软件65(21)的中端平台,39(12)的个人电脑平台。我们首先介绍了大型机平台项目的详细结果。一个相同的发展进行分析,对其他两个平台的软件项目,结果后来提出。一个线性回归分析的努力,为208大型机项目期限的Logtransformed值。从这个回归和为中档及个人电脑数据的回归选择结果载于表2。对两个日志分析残差(努力),并预测记录(时间)表明,残差是随机以上的独立变量(日志(努力),而残差thevariance范围分布是恒定的变量范围(预测日志(持续时间)。因此认为,由此产生的线性回归模型是可以接受的。实证模型连接项目工作量和时间的主机平台项目的特点是可以使下面的公式。对数平均预测值(时间)为:日志(四)=(0.366 *日志(努力) - 0.339(1)或者,在传统的乘法表: = 0.458 * Effort0.366(2)在此基础上样品,模型解释这一经验得到略多于半数的项目工期的方差。回归分析是分开进行的65中型项目和39个个人电脑的项目。线性回归的四个假设在两种情况表示满意。对于中端平台项目,所产生的乘法公式为: = 0.548 * E0.360(3)为个人电脑平台项目中的数据是更为分散,关系较弱。由此产生的这组数据中的乘法公式为: = 1.936 * E0.201(4)由于方程(2,3和4),其中介绍了三个项目的阶级之间的努力和时间的关系,它是合法或不问是否有一个模型间的差异显着。图3显示了开发大型主机和中档平台软件项目的关系似乎非常相似,而发展个人电脑平台的软件项目的关系似乎有所不同。为了回答一个更正式的方式在这个问题上,我们使用的统计办法Neder埃塔尔描述9。这种方法类似于使用指标变量回归模型和互动方面,以测试方程的定性变量的不同阶层的平等。在此基础上分析,回归方程在95的水平(标识)不被拒绝的中端和大型机平台的数据。不过这也有一个显着性差异(学生t306 = 2.385,P值0.0177)比较时对发展的个人计算机(PC)和大型机平台软件项目的斜坡身份。4、结论和进一步研究软件项目的持续时间估计是在软件工程,只能用一个固有白箱方法解决困难和关键问题。估计数值的可信性,完全取决于该方法,数据,定义和使用了这一估计的假设得出的透明度。基于最近和较大的释放4 ISBSG数据集,本文发现:并购而努力和时间之间的关系是非线性的指数项的范围是在0.3至0.4这与其他的时间模型,包括COCOMO模型1。有趣的是,这是在大约20年,分别用于开发从公布的数据,ISBSG4的COCOMO模型确定的项目而有所改变。其次,数据公布的ISBSG4设置为一个非常大的程度上包括业务系统,同时是在防守型项目最基础的其他研究。我们分析的样本大小是更最近,比以往研究的大。应大为缩短的时间表明,在本文的项目经理开发模型可以推导出一个第一为了项目工期估算项目工作量的有效估计。但是,其他调度方法必须与诸如本文的变异很大,因为开发的时间模型一起使用不解释本文开发的方程。有一个在工作和时间之间的关系显着差异的基础上,该项目的开发平台。在这方面,项目开发personalcomputers软件无论从开发软件的中端平台或不同的主机平台项目。然而,有没有在努力和发展之间的中端和大型机平台软件项目的持续时间显着差异的关系在这些结论的角度来看,许多研究轴可以继续发展。该ISBSG样本大小允许进入其他同类数据集的数据集使用像细分项目类型(新发展或重大改进)的特点,或业务领域(银行,制造,电信等)。更好的上下文敏感的项目工期模型,解释更多的变异,例如,可能是源于对这些数据的子集。非线性模型的用法和建模过程中的其他变量也应该纳入探讨。第二个课题的研究将
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年度资产经营责任合同模板
- 2025海南三亚人民医院四川大学华西三亚医院海南医科大学校园招聘模拟试卷带答案详解
- 2025物业管理委托合同简化版样本
- 2025第十三届贵州人才博览会沿河土家族自治县县管国有企业引才17人模拟试卷及答案详解(考点梳理)
- 2025内蒙古通辽市开鲁县卫生健康系统招聘卫生专业技术人员15人模拟试卷及一套完整答案详解
- 2025华东师范大学开放教育学院教师发展学院招聘1人(上海)考前自测高频考点模拟试题及答案详解(名师系列)
- 2025赤峰市松山区招聘9名政府专职消防员考前自测高频考点模拟试题及答案详解一套
- 2025安徽宿州萧县中医院面向应届毕业生校园招聘10人模拟试卷附答案详解(考试直接用)
- 安全教育培训全年总结课件
- 学法用法考试题库及答案
- DB37-T 5026-2022《居住建筑节能设计标准》
- 线性代数试题及答案-线性代数试题
- 六年级上册道德与法治课件第四单元第8课
- 量具使用知识培训课件
- 感动中国人物-于敏
- 新苏教版三年级上册科学全册教案
- Q-RJ 557-2017 航天型号产品禁(限)用工艺目录(公开)
- JIS C62133-2-2020 便携式密封二次电池及其电池的安全要求 第2部分:锂系统
- TIPAP患者再次申请表
- T_CCA 024-2022 预制菜
- 双狐地质成图系统使用手册
评论
0/150
提交评论