STRESS TESTING YOUR COMPUTER BACKGROUND ---------- Today's computers are not perfect. Even brand new systems from major manufacturers can have hidden flaws. If any of several key components such as CPU, memory, cooling, etc. are not up to spec, it can lead to incorrect calculations and/or unexplained system crashes. Overclocking is the practice of trying to increase the speed of the CPU and memory in an effort to make a machine faster at little cost. Typically, overclocking involves pushing their machine to the limits and then backing off just a little bit. For these reasons, both non-overclockers and overclockers need programs that test the stability of their computers. This is done by running programs that put a heavy load on the computer. Though not originally designed for this purpose, this program is one of a few programs that are excellent at stress testing a computer. RESOURCES --------- This program is a good stress test for the CPU, memory, caches, CPU cooling, and case cooling. The torture test runs continuously, comparing your computer's results to results that are known to be correct. Any mismatch and you've got a problem! Note that the torture test sometimes reads from and writes to disk but cannot be considered a stress test for hard drives. You'll need other programs to stress video cards, PCI bus, disk access, networking and other important components. In addition, this is only one of several good programs that are freely available. Some people report better finding problems only when running two or more stress test programs concurrently. You may need to raise prime95's priority when running two stress test programs so that each gets about 50% of the CPU time. Forums are a great place to learn about available stability test programs and to get advice on what to do when a problem is found. The currently popular stability test programs are (sorry, I don't have web addresses for these): Prime95 (this program's torture test) 3DMark2001 CPU Stability test Sisoft sandra Quake Folding@Home Seti@home Genome@home Several useful websites for help (look for overclocking community or forum): http://www.hardocp.com http://www.anandtech.com http://www.tomshardware.com http://www.sharkyextreme.com http://www.overclockers.com Also try the alt.comp.hardware.overclocking Usenet newsgroup. Utility programs you may find useful (I'm sure there are others - look around): Motherboard monitor from http://mbm.livewiredev.com Memtest86 from http://www.memtest86.com TaskInfo2000 from http://www.iarsn.com/ WHAT TO DO IF A PROBLEM IS FOUND? --------------------------------- The exact cause of a hardware problem can be very hard to find. If you are not overclocking, the most likely cause is an overheating CPU or memory SIMMs that are not quite up to spec. Try running MotherBoard monitor and browse the forums above to see if your CPU is running too hot. If so, make sure the heat sink is properly attached, fans are operational, and air flow inside the case is good. For isolating memory problems, try swapping memory SIMMs with a coworker's or friend's machine. If the errors go away, then you can be fairly confidant that memory was the cause of the trouble. If you are overclocking then try increasing the core voltage, reduce the CPU speed, reduce the front side bus speed, or change the memory timings (CAS latency). Also try asking for help in one of the forums above - they may have other ideas to try. CAN I IGNORE THE PROBLEM? ------------------------- Ignoring the problem is a matter of personal preference. There are two schools of thought on this subject. It is likely that most programs you run will not stress your computer enough to cause a wrong result or system crash. A few games stress your machine and a system crash could result. Stay away from distributed computing projects where an incorrect calculation might cause you to return wrong results. You are not helping these projects by returning bad data! In conclusion, if you are comfortable with a small risk of an occasional system crash then feel free to live a little dangerously! The second school of thought is, "Why run a stress test if you are going to ignore the results?" These people want a guaranteed 100% rock solid machine. Passing these stability tests gives them the ability to run any program with confidence. FREQUENTLY ASKED QUESTIONS -------------------------- Q) My machine is not overclocked. If I'm getting an error, then there must be a bug in the program, right? A) Unfortunately, no. The torture test is comparing your machines results against KNOWN CORRECT RESULTS. If your machine cannot generate correct results, you have a hardware problem. Q) How long should I run the torture test? A) I recommend running it for 24 hours. The program has been known to fail only after several hours of operation. In most cases though, it will fail within a few minutes on a flaky machine. Q) Prime95 reports errors during the torture test, but other stability tests don't. Do I have a problem? A) Yes, you've reached the point where your machine has been pushed just beyond its limits. Follow the recommendations above to make your machine 100% stable or decide to live with a machine that could have problems in rare circumstances. Q) A forum member said "Don't bother with prime95, it always pukes on me, and my system is stable!". What do you make of that? or We had a server at work that ran for 2 MONTHS straight, without a reboot I installed Prime95 on it and ran it - a couple minutes later I get an error. You are going to tell me that the server wasn't stable? A) These users obviously do not subscribe to the 100% rock solid school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS. But since they are not presently running any programs that reveal the hardware problem, the machines are quite stable. As long as these machines never run a program that uncovers the hardware problem, then the machines will continue to be stable.