Hi guys, I'm writing this using windows in safe mode + networking, lately been having some problems with rendering using advanced lighting in scanline and some with mental ray. Scenes are around 500k tris, half way through 2nd render the cpu temp warning light comes on and then system freezes with no option but to reset. Was asked to send an email to the manufacturers and the suppliers after many phonecalls to each testing for solutions.
It's a pretty long email so I apologise in advance
18th August 2009
Recently and this morning while working on a 3d project I noticed a flashing red led cpu temperature indicator accompanied by an audible tone and then a freeze . Specifically this problem occoured while performing advanced 3d rendering in a moderately complex 3d scene (500k triangles). Everytime this happens the computer has to be reset as it is completely inoperable/unresponsive.
After reading up on "Vista freezing" through tech forums and blogs I decided to follow a tutorial and streamline the system as much as possible to further minimize memory useage as I regularly perform maintenance anyhow. This tutorial involved turning off alot of unrequired services, removing all unecessary programs and repairing some corrupted windows files using the cbs command prompt.
After a restart and a barely noticable improvement in speed the system was tested under the same conditions and the same freezing error occoured, I then decided to remove as many updates as Vista would allow and tested again, the results were identical.
Later I removed an internet settings update by accident and the lan connection was lost so I could no longer investigate anymore online solutions, this left me with little option but to reinstall windows via the desktop. During this time I thought I should seek some advice with the supplier and manufacturer of the components.
I contacted both companies advising of the situation and also reporting that that upon pressing the delete key to enter the bios setup screen the bottom half of the screen was pixellated with different coloured pixels, the key had to be pressed again to load the bios setup. Once inside the bios the screen had large black and white random blocks surrounding the lettering. I asked if I should flash a new bios but was advised not to as a faulty motherboard/component could possibly cause this.
After an initial discussion I was advised to contact the component suppliers as they may be able to exchange under warranty parts if they were faulty. At this time I was under the impression that the processors were in question due to the specific nature of my work combined with the overheating problem which seemed to be pointing towards this freezing issue.
It was recommended to me to download several programs to ascertain the problem, it was mentioned that it could also be a memory issue and that processors don't usually cause a system to freeze or hang, just crash. I explained the same things that the computer freezes every time after a few minutes and the nature of the useage including the program used which was "3ds max". At this time I enquired about the operating system and that people online were complaining of the system freezing and this seemed to be an issue with Microsoft Windows Vista but I was assured this was not the case.
All the components purchased all carry a Vista compatible certification so it was concluded the problem had to lie elsewhere. During a chat I was informed that the memory could be tested per 2gb stick in each slot though at the time I presumed this individual testing was available through the program!
As suggested, I downloaded some of the test programs as follows;
"prime95" (cpu/gpu/memory stress tester),
"coretemp" (cpu temperature monitoring).
"memtestx86" (bootable memory testing program).
Ran "prime95" and purely stress tested the cpu's several times for around 10 minutes using 8 threads at 100% load. Everytime within the first 3 mins the cpu temp. rose to above 89 degress celcius which activated the temp. warning light and alarm. During the tests it turned out after about 5 mins the cpu temperature stablised at just under 70 degress celcius, the fans slowed down and the warning light/alarm was deactivated. At no time during this phase of testing did the system freeze.
The rest of the tests included stress testing the memory, everytime after around 5 minutes the system froze and as before I had to reset the computer. After several tests I concluded it must be the memory/motherboard at fault.
After a brief call to update the situation, I was advised to make a boot cd with "memtestx86" and run it from dos. I did this, reset the pc to start the program and the test began. It quickly rose to 3% complete then halted, the system was frozen again, more tests revealed exactly the same thing. After another reset I entered the bios and had a look at the memory logs for the dimms, there were around 30+ entries recorded as ecc correctable memory errors. Only two dimms were listed, "3b" and "4a".
Contacted both companies and returned the results. I was told I should test each memory stick in the same slot and then if required new slots to reveal the problem as either a memory stick or a memory controller/mboard fault. After the calls I removed all sticks except one in the first slot and started up the pc, the bios would not load - instead a longish regular alarm sound was heard.
After another phonecall it was concluded that the bios would not accept anything other than the hardware previously connected/detected. The bios had to be reset through the motherboard cmos clear facility to allow for a new hardware config.
I tried doing this as instructed in the manual (turn off power supply, remove power lead and bridge "jbt1" connections with a screwdriver) but after several attempts was unable to clear the bios for some unknown reason.
I then contacted manufacturers and and spoke to someone there who I quickly brought up to speed on the situation, I was informed that it would be better to speak to the man in charge. I then called the distributors again and advised someone else on the situation which is when I mentioned for the first time the only thing I could think of to clear the memory was to remove the cmos battery but I thought I should wait before doing that.
Later on around 5.30pm I got a call from the distributors, I quickly brought the person up to speed on the whole situation and we discussed removing the bios battery then bridging the connections to reset it. Also interleaving the memory was mentioned as the mother board supported this option.
After this call firstly I removed the battery and shorted "jbt1" to clear cmos, interleaved the sticks, 1 in each bank leaving a space between each stick. Performed the same test with battery and cmos leaving in 1 and then 2 memory sticks. The result was the machine powered up but before anything could be displayed it abruptly powered down and reset then the alarm sound was heard and the bios would not load.
Update: The bios is now freezing intermittantly even with default settings, leaving the machine off for a short while allows activity enough to load the o/s into safe mode with networking at the very most. If bios is left to skip to loading full o/s it doesn't get far before the system freezes completely.
At present all the ram sticks are back in place as no other configuration seems to work, my last test was to set the bios as per the hardware and disable all quiet boot modes, this revealed that 3/4 of the installed memory passed the test, around 6600mb and that memory sparing was not available.
Currently I am at a loss at what to do, I need to get on with my work but am unable to.
//Main system specs;
Motherboard = Supermicro x7dwa-n, 5400 series Seaburg Chipset
CPU = 2x Intel Harpertown e5472 Quad Core @3Ghz 1600 FSB
Installed memory = 4x2Gb Supermicro 800Mhz ECC DDR2 Ram
GPU card = 2Gb Radeon 4870x2 GDDR5//
Just wondering you guys, I kind of hate Vista, not because of this but because it's so memory hungry and bloated with things that are mostly completely useless. I basically couldn't care less what it looks like, performance is the most important requirement for me and compatibility with programs which was a nightmare to get right finding patches and whatnot. Seriously when/if this problem is fixed I'm thinking about switching to XPx64, what do you think?