著名内存测试软件memtest的原理与技术深度剖析.docx_第1页
著名内存测试软件memtest的原理与技术深度剖析.docx_第2页
著名内存测试软件memtest的原理与技术深度剖析.docx_第3页
著名内存测试软件memtest的原理与技术深度剖析.docx_第4页
著名内存测试软件memtest的原理与技术深度剖析.docx_第5页
已阅读5页,还剩4页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Technical Information Windows Installation Usage o Online Commands o Memory Sizing o Error Display o Troubleshooting Memory Errors o Execution Time Detailed Description o Memory Testing Philosophy o Memtest86 Test Algorithms o Individual Test Descriptions Windows InstallationFor windows installation begin by downloading either the Pre-Compiled Windows package to build a boot-able floppy disk or an ISO (zip version) to create a boot-able CD-ROM. After the file is downloaded an extract must be done to uncompress the file(s). To extract right click on the downloaded file and select the Extract All option. The extract option will let you choose where the files will be extracted to. To build a bootable floppy go the the folder where the files were extracted and click on the Install icon. The floppy disk will appear to be unformatted by Windows after the install is complete.To build a boot-able CD-ROM use your CD burning software to create an image from the un-zipped ISO file. Be sure to use a create from image option. DO not simply copy the file to a CD.Since Memtest86 is a standalone program it does not require any operating system support for execution. It can be used with any PC regardless of what operating system, if any, is installed. The test image may be loaded from a floppy disk or may be loaded via LILO on Linux systems. Any Unix, Windows or DOS system may be used to create a boot floppy or bootable CD-ROM.Return to topOnline CommandsMemtest86 has a limited number of online commands. Online commands provide control over cache settings, error report modes, test selection and test address range. A help bar is displayed at the bottom of the screen listing the available on-line commands. Command Description: ESC Exits the test and does a warm restart via the BIOS. c Enters test configuration menu Menu options are: 1) Test selection 2) Address Range 3) Memory Sizing 4) Error Report Mode 5) Show DMI Memory Info 6) ECC Mode 7) CPU Selection Mode 8) Redraw Screen 9) Adv. Options SP Set scroll lock (Stops scrolling of error messages)Note: Testing is stalled when the scroll lock isset and the scroll region is full. CR Clear scroll lock (Enables error message scrolling)Return to top Memory SizingThe BIOS in modern PCs will often reserve several sections of memory for its use and also to communicate information to the operating system (ie. ACPI tables). It is just as important to test these reserved memory blocks as it is for the remainder of memory. For proper operation all of memory needs to function properly regardless of what the eventual use is. For this reason Memtest86 has been designed to test as much memory as is possible.However, safely and reliably detecting all of the available memory has been problematic. Versions of Memtest86 prior to v2.9 would probe to find where memory is. This works for the vast majority of motherboards but is not 100% reliable. Sometimes the memory size detection is incorrect and worse probing the wrong places can in some cases cause the test to hang or crash.Starting in version 2.9 alternative methods are available for determining memory size. By default the test attempts to get the memory size from the BIOS using the e820 method. With e820 the BIOS provides a table of memory segments and identifies what they will be used for. By default Memtest86 will test all of the ram marked as available and also the area reserved for the ACPI tables. This is safe since the test does not use the ACPI tables and the e820 specifications state that this memory may be reused after the tables have been copied. Although this is a safe default some memory will not be tested.Two additional options are available through online configuration options. The first option (BIOS-All) also uses the e820 method to obtain a memory map. However, when this option is selected all of the reserved memory segments are tested, regardless of what their intended use is. The only exception is memory segments that begin above 3GB. Testing has shown that these segments are typically not safe to test. The BIOS-All option is more thorough but could be unstable with some motherboards.The third option for memory sizing is the traditional Probe method. This is a very thorough but not entirely safe method. In the majority of cases the BIOS-All and Probe methods will return the same memory map. For older BIOSs that do not support the e820 method there are two additional methods (e801 and e88) for getting the memory size from the BIOS. These methods only provide the amount of extended memory that is available, not a memory table. When the e801 and e88 methods are used the BIOS-All option will not be available. The MemMap field on the display shows what memory size method is in use. Also the RsvdMem field shows how much memory is reserved and is not being tested.Return to topError DisplayMemtest has three options for reporting errors. The default is an an error summary that displays the most relevant error information. The second option is reporting of individual errors. In BadRAM Patterns mode patterns are created for use with the Linux BadRAM feature. This slick feature allows Linux to avoid bad memory pages. Details about the BadRAM feature can be found at http:/home.zonnet.nl/vanrein/badramThe error summary mode displays the following information: Error Confidence Value: A value that indicates the validity of the errors being reported with larger values indicating greater validity. There is a high probability that all errors reported are valid regardless of this value. However, when this value exceeds 100 it is nearly impossible that the reported errors will be invalid. Lowest Error Address: The lowest address that where an error has been reported. Highest Error Address: The highest address that where an error has been reported. Bits in Error Mask: A mask of all bits that have been in error (hexadecimal). Bits in Error: Total bit in error for all error instances and the min, max and average bit in error of each individual occurrence. Max Contiguous Errors: The maximum of contiguous addresses with errors. ECC Correctable Errors: The number of errors that have been corrected by ECC hardware. Errors per DIMM slot: Error counts are reported for each memory module installed in the system. Use the Show DMI Memory Info runtime option for detailed memory module information. Test Errors: On the right hand side of the screen the number of errors for each test are displayed.For individual errors the following information is displayed when a memory error is detected. An error message is only displayed for errors with a different address or failing bit pattern. All displayed values are in hexadecimal.Tst: Test NumberFailing Address: Failing memory addressGood: Expected data patternBad: Failing data patternErr-Bits: Exclusive or of good and bad data (this shows the position of the failing bit(s)Count: Number of consecutive errors with the same address and failing bitsReturn to top Troubleshooting Memory ErrorsPlease be aware that not all errors reported by Memtest86 are due to bad memory. The test implicitly tests the CPU, L1 and L2 caches as well as the motherboard. It is impossible for the test to determine what causes the failure to occur. However, most failures will be due to a problem with memory module. When it is not, the only option is to replace parts until the failure is corrected.Once a memory error has been detected, determining the failing SIMM/DIMM module is not a clear cut procedure. With the large number of motherboard vendors and possible combinations of memory slots it would be difficult if not impossible to assemble complete information about how a particular error would map to a failing memory module. However, there are steps that may be taken to determine the failing module. Here are four techniques that you may wish to use:1) Removing modulesThis is simplest method for isolating a failing modules, but may only be employed when one or more modules can be removed from the system. By selectively removing modules from the system and then running the test you will be able to find the bad modules. Be sure to note exactly which modules are in the system when the test passes and when the test fails.2) Rotating modulesWhen none of the modules can be removed then you may wish to rotate modules to find the failing one. This technique can only be used if there are three or more modules in the system. Change the location of two modules at a time. For example put the module from slot 1 into slot 2 and put the module from slot 2 in slot 1. Run the test and if either the failing bit or address changes then you know that the failing module is one of the ones just moved. By using several combinations of module movement you should be able to determine which module is failing.3) Replacing modulesIf you are unable to use either of the previous techniques then you are left to selective replacement of modules to find the failure.4) Avoiding allocationThe printing mode for BadRAM patterns is intended to construct boot time parameters for a Linux kernel that is compiled with BadRAM support. This work-around makes it possible for Linux to reliably run with defective RAM. For more information on BadRAM support for Linux, sail to http:/home.zonnet.nl/vanrein/badramSometimes memory errors show up due to component incompatibility. A memory module may work fine in one system and not in another. This is not uncommon and is a source of confusion. In these situations the components are not necessarily bad but have marginal conditions that when combined with other components will cause errors.Often the memory works in a different system or the vendor insists that it is good. In these cases the memory is not necessarily bad but is not able to operate reliably at Athlon speeds. Sometimes more conservative memory timings on the motherboard will correct these errors. In other cases the only option is to replace the memory with better quality, higher speed memory. Dont buy cheap memory and expect it to work with an Athlon! On occasion test 5/8 errors will occur even with name brand memory and a quality motherboard. These errors are legitimate and should be corrected.I am often asked about the reliability of errors reported by Mestest86. In the vast majority of cases errors reported by the test are valid. There are some systems that cause Memtest86 to be confused about the size of memory and it will try to test non-existent memory. This will cause a large number of consecutive addresses to be reported as bad and generally there will be many bits in error. If you have a relatively small number of failing addresses and only one or two bits in error you can be certain that the errors are valid. Also intermittent errors are without exception valid. Frequently memory vendors question if Memtest86 supports their particular memory type or a chipset. Memtest86 is designed to work with all memory types and all chipsets. Only support for ECC requires knowledge of the chipset.All valid memory errors should be corrected. It is possible that a particular error will never show up in normal operation. However, operating with marginal memory is risky and can result in data loss and even disk corruption. Even if there is no overt indication of problems you cannot assume that your system is unaffected. Sometimes intermittent errors can cause problems that do not show up for a long time. You can be sure that Murphy will get you if you know about a memory error and ignore it.Memtest86 can not diagnose many types of PC failures. For example a faulty CPU that causes Windows to crash will most likely just cause Memtest86 to crash in the same way.Return to topExecution TimeThe time required for a complete pass of Memtest86 will vary greatly depending on CPU speed, memory speed and memory size. Memtest86 executes indefinitely. The pass counter increments each time that all of the selected tests have been run. Generally a single pass is sufficient to catch all but the most obscure errors. However, for complete confidence when intermittent errors are suspected testing for a longer period is advised.Return to topMemory Testing PhilosophyThere are many good approaches for testing memory. However, many tests simply throw some patterns at memory without much thought or knowledge of memory architecture or how errors can best be detected. This works fine for hard memory failures but does little to find intermittent errors. BIOS based memory tests are useless for finding intermittent memory errors.Memory chips consist of a large array of tightly packed memory cells, one for each bit of data. The vast majority of the intermittent failures are a result of interaction between these memory cells. Often writing a memory cell can cause one of the adjacent cells to be written with the same data. An effective memory test attempts to test for this condition. Therefore, an ideal strategy for testing memory would be the following:1. write a cell with a zero 2. write all of the adjacent cells with a one, one or more times 3. check that the first cell still has a zero It should be obvious that this strategy requires an exact knowledge of how the memory cells are laid out on the chip. In addition there is a never ending number of possible chip layouts for different chip types and manufacturers making this strategy impractical. However, there are testing algorithms that can approximate this ideal.Return to topMemtest86 Test AlgorithmsMemtest86 uses two algorithms that provide a reasonable approximation of the ideal test strategy above. The first of these strategies is called moving inversions. The moving inversion test works as follows:1. Fill memory with a pattern 2. Starting at the lowest address o check that the pattern has not changed o write the patterns complement o increment the address o repeat 3. Starting at the highest address o check that the pattern has not changed o write the patterns complement o decrement the address o repeat This algorithm is a good approximation of an ideal memory test but there are some limitations. Most high density chips today store data 4 to 16 bits wide. With chips that are more than one bit wide it is impossible to selectively read or write just one bit. This means that we cannot guarantee that all adjacent cells have been tested for interaction. In this case the best we can do is to use some patterns to insure that all adjacent cells have at least been written with all possible one and zero combinations.It can also be seen that caching, buffering and out of order execution will interfere with the moving inversions algorithm and make less effective. It is possible to turn off cache but the memory buffering in new high performance chips can not be disabled. To address this limitation a new algorithm I call Modulo-X was created. This algorithm is not affected by cache or buffering. The algorithm works as follows:1. For starting offsets of 0 - 20 do o write every 20th location with a pattern o write all other locations with the patterns complement o repeat above one or more times o check every 20th location for the pattern This algorithm accomplishes nearly the same level of adjacency testing as moving inversions but is not affected by caching or buffering. Since separate write passes (1a, 1b) and the read pass (1c) are done for all of memory we can be assured that all of the buffers and cache have been flushed between passes. The selection of 20 as the stride size was somewhat arbitrary. Larger strides may be more effective but would take longer to execute. The choice of 20 seemed to be a reasonable compromise between speed and thoroughness.Return to topIndividual Test DescriptionsMemtest86 executes a series of numbered test sections to check for errors. These test sections consist of a combination of test algorithm, data pattern and cache setting. The execution order for these tests were arranged so that errors will be detected as rapidly as possible. A description of each of the test sections follows:Test 0 Address test, walking ones, no cacheTests all address bits in all memory banks by using a walking ones address pattern.Test 1 Address test, own addressEach address is written with its own address and then is checked for consistency. In theory previous tests should have caught any memory addressing problems. This test should catch any addressing errors that somehow were not previously detected. Test 2 Moving inversions, ones&zerosThis test uses the moving inversions algorithm with patterns of all ones and zeros. Cache is enabled even though it interferes to some degree with the test algorithm. With cache enabled this test does not take long and should quickly find all hard errors and some more subtle errors. This test is only a quick check. Test 3 Moving inversions, 8 bit patThis is the same as test one but uses a 8 bit wide pattern of walking ones and zeros. This test will better detect subtle errors in wide memory chips. A total of 20 data patterns are used. Test 4 Moving inversions, random patternTest 4 uses the same algorithm as test 1 but the data pattern is a random number and its complement. This test is particularly effective in finding difficult to detect data sensitive errors. A total of 60 patterns are used. The random number sequence is different with each pass so multiple passes increase effectiveness. Test 5 Block move, 64

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论