计算机组成与结构：第二部分-程序和数据的机器级表达 04-浮点数

上传人：窝*** IP属地：安徽上传时间：2023-01-03 格式：PPTX 页数：45 大小：337.39KB 积分：30 举报 版权申诉

已阅读5页，还剩40页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

HunanUniversity第2章信息的表示和处理

浮点数

计算机组成与结构

2016年4月

主讲教师HunanUniversityToday:FloatingPointBackground:FractionalbinarynumbersIEEEfloatingpointstandard:DefinitionExampleandpropertiesRounding,addition,multiplicationFloatingpointinCCaseAnalysisFractionalbinarynumbersWhatis1011.1012?Sumof?2i2i-14211/21/41/82-jbibi-1•••b2b1b0b-1b-2b-3•••b-j•••FractionalBinaryNumbersRepresentationBitstorightof“binarypoint”representfractionalpowersof2Representsrationalnumber:•••FractionalbinarynumbersWhatis1011.1012?

FractionalBinaryNumbers:ExamplesValue Representation

53/4 101.112 27/8

10.111263/64

0.1111112ObservationsDivideby2byshiftingright(unsigned)Multiplyby2byshiftingleftNumbersofform0.111111…2arejustbelow1.01/2+1/4+1/8+…+1/2i+…➙1.0Usenotation1.0–εRepresentableNumbersLimitation#1Canonlyexactlyrepresentnumbersoftheformx/2kOtherrationalnumbershaverepeatingbitrepresentationsValue Representation1/3 0.0101010101[0101]…21/5 0.001100110011[0011]…21/10 0.0001100110011[0001]…2Limitation#2JustonesettingofdecimalpointwithinthewbitsLimitedrangeofnumbers(verysmallvalues?verylarge?)Today:FloatingPointBackground:FractionalbinarynumbersIEEEfloatingpointstandard:DefinitionExampleandpropertiesRounding,addition,multiplicationFloatingpointinCCaseAnalysisExample:

mantissa(尾数)exponent(阶码、指数)

6.02x1021

decimalpoint

radix(base，基)°Normalizedform（规格化形式）:小数点前只有一位非0数°同一个数有多种表示形式。例：对于数1/1,000,000,000•Normalized(唯一的规格化形式):1.0x10-9•Unnormalized（非规格化形式不唯一）:0.1x10-8,10.0x10-10科学计数法(ScientificNotation)与浮点数

mantissa（尾数）exponent（指数）1.101two

-10

binarypoint基为2forBinaryNumbers:只要对尾数和指数分别编码，就可表示一个浮点数（即：实数）浮点数的表示°Normalformat（规格化数形式）：

+/-1.xxxxxxxxxx

×MExponent°32-bit规格化数：310

ExponentFraction1bit?bits?bits

是符号位（Sign）

Exponent(E)用移码来表示Fraction(M)表示xxxxxxxxxxxxx，尾数部分

(基可以是2/4/8/16，计算机内约定为2，无需显式表示)°早期的计算机，各自定义自己的浮点数格式问题：浮点数表示不统一会带来什么问题？“Father”oftheIEEE754standard现在所有计算机都采用IEEE754来表示浮点数1970年代后期,IEEE成立委员会着手制定浮点数标准1985年完成浮点数标准IEEE754的制定Prof.WilliamKahan

/~wkahan/ieee754status/754story.htmlThisstandardwasprimarilytheworkofoneperson,UCBerkeleymathprofessorWilliamKahan.

直到80年代初，各个机器内部的浮点数表示格式还没有统一

因而相互不兼容，机器之间传送数据时，带来麻烦

PrecisionoptionsSingleprecision:32bitsDoubleprecision:64bitsExtendedprecision:80bits(Intelonly)sexpfrac18-bits23-bitssexpfrac111-bits52-bitssexpfrac115-bits63or64-bitsDataRepresentationsCDataTypeTypical32-bitIntelIA32x86-64char111short222int444long448longlong888float444double888longdouble810/1210/16pointer4483casesbasedonvalueofexpNormalizedWhenexpisn’tall0sorall1sMostcommonDenormalizedWhenexpisall0sDifferentinterpretationofEthannormalizedUsedfor+0and-0(Andothernumbercloseto0)SpecialWhenexpisall1sNaN,infinities“Normalized”ValuesWhen:exp≠000…0andexp≠111…1Exponentcodedasbiasedvalue:E=Exp–BiasExp:unsignedvalueexp

Bias=2k-1-1,wherekisnumberofexponentbitsSingleprecision:127(Exp:1…254,E:-126…127)Doubleprecision:1023(Exp:1…2046,E:-1022…1023)Significandcodedwithimpliedleading1:M=1.xxx…x2

xxx…x:bitsoffracMinimumwhenfrac=000…0(M=1.0)Maximumwhenfrac=111…1(M=2.0–ε)Getextraleadingbitfor“free”浮点数(FloatingPoint)的表示范围例：画出下述32位浮点数格式的规格化数的表示范围。018931

第0位数符S；第1～8位为8位移码表示阶码E（偏置常数为127）；第9～31位为24位二进制原码小数表示的尾数M。规格化尾数的小数点后第一位总是1，故规定第一位默认的“1”不明显表示出来。这样可用23个数位表示24位尾数。±1.xxxxx

×2ES阶码E尾数M最大正数：1.1…1x211…10=(2-2-23)x2127

最小正数：1.0…0x200…01=1x2-126

因为原码是对称的，所以其表示范围关于原点对称。

正下溢

负下溢

(2-2-23)

×2127

数轴

零

可表示的正数

可表示的负数

-2-126

2-126

(2-2-23)

×2127

正上溢

负上溢

机器0：尾数为0或落在下溢区中的数浮点数范围比定点数大，但数的个数并没变多，故数之间更稀疏，且不均匀NormalizedEncodingExampleValue:FloatF=15213.0;1521310=111011011011012

=1.11011011011012x213SignificandM = 1.11011011011012frac= 110110110110100000000002ExponentE = 13Bias = 127Exp =13+127=140=100011002Result:

01000110011011011011010000000000sexpfracDenormalizedValuesCondition:exp=000…0Exponentvalue:E=1–Bias(insteadofE=0–Bias)Significandcodedwithimpliedleading0:M=0.xxx…x2xxx…x:bitsoffracCases

exp=000…0,frac=000…0RepresentszerovalueNotedistinctvalues:+0and–0(why?)exp=000…0,frac≠000…0Numbersclosestto0.0EquispacedSpecialValuesCondition:exp=111…1Case:exp=111…1,frac=000…0Representsvalue(infinity)OperationthatoverflowsBothpositiveandnegativeE.g.,1.0/0.0=−1.0/−0.0=+,1.0/−0.0=−Case:exp=111…1,frac≠000…0Not-a-Number(NaN)RepresentscasewhennonumericvaluecanbedeterminedE.g.,sqrt(–1),−,

0Visualization:FloatingPointEncodings+−0+Denorm+Normalized−Denorm−Normalized+0NaNNaNToday:FloatingPointBackground:FractionalbinarynumbersIEEEfloatingpointstandard:DefinitionExampleandpropertiesRounding,addition,multiplicationFloatingpointinCCaseAnalysisTinyFloatingPointExample8-bitFloatingPointRepresentationthesignbitisinthemostsignificantbitthenextfourbitsaretheexponent,withabiasof7thelastthreebitsarethefracSamegeneralformasIEEEFormatnormalized,denormalizedrepresentationof0,NaN,infinitysexpfrac14-bits3-bitssexpfrac E

Value

00000000 -6 000000001 -6 1/8*1/64=1/51200000010 -6 2/8*1/64=2/512…00000110 -6 6/8*1/64=6/51200000111 -6 7/8*1/64=7/51200001000 -6 8/8*1/64=8/51200001001 -6 9/8*1/64=9/512… …00110110 -1 14/8*1/2=14/1600110111 -1 15/8*1/2=15/1600111000 0 8/8*1=100111001 0 9/8*1=9/800111010 0 10/8*1=10/8… …01110110 7 14/8*128=22401110111 7 15/8*128=24001111000 n/a infDynamicRange(PositiveOnly)closesttozerolargestdenormsmallestnormclosestto1belowclosestto1abovelargestnormDenormalizednumbersNormalizednumbersDistributionofValues6-bitIEEE-likeformate=3exponentbitsf=2fractionbitsBiasis23-1-1=3Noticehowthedistributiongetsdensertowardzero.8valuessexpfrac13-bits2-bitsDistributionofValues(close-upview)6-bitIEEE-likeformate=3exponentbitsf=2fractionbitsBiasis3sexpfrac13-bits2-bitsSpecialPropertiesofEncodingFPZeroSameasIntegerZeroAllbits=0Can(Almost)UseUnsignedIntegerComparisonMustfirstcomparesignbitsMustconsider−0=0NaNsproblematicWillbegreaterthananyothervaluesWhatshouldcomparisonyield?OtherwiseOKDenormvs.normalizedNormalizedvs.infinityToday:FloatingPointBackground:FractionalbinarynumbersIEEEfloatingpointstandard:DefinitionExampleandpropertiesRounding,addition,multiplicationFloatingpointinCCaseAnalysisFloatingPointOperations:BasicIdeax+fy=Round(x+y)xfy=Round(xy)BasicideaFirstcomputeexactresultMakeitfitintodesiredprecisionPossiblyoverflowifexponenttoolargePossiblyroundtofitinto

fracRoundingRoundingModes(illustratewith$rounding)

$1.40$1.60$1.50$2.50 –$1.50Towardszero $1 $1 $1 $2 –$1Rounddown(−) $1 $1 $1 $2 –$2Roundup(+) $2 $2 $2 $3 –$1NearestEven(default)$1 $2 $2 $2 –$2CloserLookatRound-To-EvenDefaultRoundingModeHardtogetanyotherkindwithoutdroppingintoassemblyAllothersarestatisticallybiasedSumofsetofpositivenumberswillconsistentlybeover-orunder-estimatedApplyingtoOtherDecimalPlaces/BitPositionsWhenexactlyhalfwaybetweentwopossiblevaluesRoundsothatleastsignificantdigitisevenE.g.,roundtonearesthundredth 1.2349999 1.23 (Lessthanhalfway) 1.2350001 1.24 (Greaterthanhalfway) 1.2350000 1.24 (Halfway—roundup) 1.2450000 1.24 (Halfway—rounddown)RoundingBinaryNumbersBinaryFractionalNumbers“Even”whenleastsignificantbitis0“Halfway”whenbitstorightofroundingposition=100…2ExamplesRoundtonearest1/4(2bitsrightofbinarypoint)Value Binary Rounded Action RoundedValue23/32 10.000112 10.002 (<1/2—down) 223/16 10.001102 10.012 (>1/2—up) 21/427/8 10.111002 11.002 (1/2—up) 325/8 10.101002 10.102 (1/2—down) 21/2FPMultiplication(–1)s1

M12E1x(–1)s2

M22E2ExactResult:(–1)s

M2ESigns: s1^

s2SignificandM: M1x

M2ExponentE: E1+

E2FixingIfM≥2,shiftMright,incrementEIfEoutofrange,overflowRoundMtofitfracprecisionImplementationBiggestchoreismultiplyingsignificandsFloatingPointAddition(–1)s1

M12E1+(-1)s2

M22E2AssumeE1>E2ExactResult:(–1)s

M2ESigns,significandM:Resultofsignedalign&addExponentE: E1FixingIfM≥2,shiftMright,incrementE

ifM<1,shiftMleftkpositions,decrementEbykOverflowifEoutofrangeRoundMtofitfracprecision(–1)s1

(–1)s2

E1–E2+(–1)s

MToday:FloatingPointBackground:FractionalbinarynumbersIEEEfloatingpointstandard:DefinitionExampleandpropertiesRounding,addition,multiplicationFloatingpointinCCaseAnalysisFloatingPointinCCGuaranteesTwoLevelsfloat singleprecisiondouble doubleprecisionConversions/CastingCastingbetweenint,float,anddoublechangesbitrepresentation

double/float→intTruncatesfractionalpartLikeroundingtowardzeroNotdefinedwhenoutofrangeorNaN:GenerallysetstoTMin

int→doubleExactconversion,aslongasinthas≤53bitwordsize

int→floatWillroundaccordingtoroundingmodeSomeimplicationsOrderofoperationsisimportant3.14+(le20-le20)VS(3.14+le20)-le20le20*(le20-le20)VS(le20*le20)-(le20*le20)CompileroptimizationsimpededE.g.,Commonsub-expressionelimination

doublex=a+b+c; doubley=b+c+d; Maynotequal

doubletemp=b+c; doublex=a+temp; doubley=temp+dToday:FloatingPointBackground:FractionalbinarynumbersIEEEfloatingpointstandard:DefinitionExampleandpropertiesRounding,addition,multiplicationFloatingpointinCCaseAnalysis2014秋期末考试题IEEE754标准定义了浮点数表示格式，其中单精度（singleprecision）二进制浮点数的表示格式是：1位符号位，8位阶码，23位有效值。双精度（doubleprecision）格式是：1位符号位，11位阶码，52位有效值。(1)请给出单精度浮点表达式 (float)(1073741824)+(float)2.25+(float)(-1073741824)的结果，其中1073741824=230；(2)请给出双精度浮点表达式 (double)(1073741824)+(double)2.25+(double)(-1073741824)的结果；(3)请简要解释你给出的结果。答案(1)0(2)2.25(3)单精度时，2.25会在对阶时被移位为0，而双精度有效值长度够长，能保留2.25的值。或者在(1)、(2)中都给2.25，解释为若为IA架构处理器，其内部浮点运算器采用比double精度更高的内部表示进行计算，故足够保留2.25的值（依赖于机器结构）。案例分析：爱国者导弹1991年2月25日，海湾战争中，美国在沙特阿拉伯达摩地区设置的爱国者导弹拦截伊拉克的飞毛腿导弹失败，致使飞毛腿导弹击中了沙特阿拉伯载赫蓝的一个美军军营，杀死了美国陆军第十四军需分队的28名士兵。其原因是由于爱国者导弹系统时钟内的一个软件错误造成的，引起这个软件错误的原因是浮点数的精度问题。

爱国者导弹系统中有一个内置时钟，用计数器实现，每隔0.1秒计数一次。程序用0.1的一个24位定点二进制小数x来乘以计数值作为以秒为单位的时间。0.1的二进制表示是一个无限循环序列：0.00011[0011]…，x=0.00011001100110011001100B。显然，x是0.1的近似表示，0.1-x=0.00011001100110011001100[1100]…-0.00011001100110011001100B，即为：=0.000000000000000000000001100[1100]…B=2-20×0.1

9.54×10-8

已知在爱国者导弹准备拦截飞毛腿导弹之前，已经连续工作了100小时，飞毛腿的速度大约为2000米/秒，则由于时钟计算误差而导致的距离误差是多少？

100小时相当于计数了100×60×60×10=36×105次，因而导弹的时钟已经偏差了9.54×10-8×36×1050.343秒

因此，距离误差是2000×0.343秒

687米小故事：实际上，以色列方面已经发现了这个问题并于1991年2月11日知会了美国陆军及爱国者计划办公室（软件制造商）。以色列方面建议重新启动爱国者系统的电脑作为暂时解决方案，可是美国陆军方面却不知道每次需要间隔多少时间重新启动系统一次。1991年2月16日，制造商向美国陆军提供了更新软件，但这个软件最终却在飞毛腿导弹击中军营后的一天才运抵部队。若x用float型表示，则x的机器数是什么？0.1与x的偏差是多少？系统运行100小时后的时钟偏差是多少？在飞毛腿速度为2000米/秒的情况下，预测的距离偏差为多少？0.1=0.00011[0011]B=+1.10011001100110011001100B×2-4，故x的机器数为00111101110011001100110011001100Float型仅24位有效位数，后面的有效位全被截断，故x与0.1之间的误差为：|x–0.1|=0.0000000000000000000000000001100[1100]…B。这个值等于2-24×0.15.96×10-9。100小时后时钟偏差5.96×10-9×36×1050.0215秒。距离偏差0.0215×200043米。比爱国者导弹系统精确约16倍。若用32位二进制定点小数x=0.0001100110011001100110011001101B表示0.1

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

计算机组成与结构：第二部分-程序和数据的机器级表达 04-浮点数

文档简介

温馨提示

最新文档

评论

计算机组成与结构：第二部分-程序和数据的机器级表达 04-浮点数

文档简介

温馨提示

最新文档

评论

相关文档