PDA

View Full Version : Arrrrgh!


Susurrus
11-04-2002, 01:59 AM
Let the games begin!

You guys know of any forums or tech docs I can look into about a crappy freezing problem I am having? I have a dual Athlon MP 2000+ system, and it runs fine until I render in LightWave. I can use it for many a day, begin a render in LW and WHAM! Frozen as can be!

Any ideas?

Any data would be greatly appreciated!


Keith W,

GregHess
11-04-2002, 12:38 PM
Rendering is a situation which stresses a variety of system components. Most of all, it stresses the CPU/Ram pretty heavily.

So first you need to diagnose the problems that resolve around these components.

Most often render crashes are due to poor quality ram. So run mushkin's ram test and check for errors

Run a SINGLE pass, with ECC disabled.

www.mushkin.com/support (v3.0 for windows/dos).

If you get even a Single error, JUST ONE, replace the ram.

Remember that dual amd boards required registered DDR if you want to use more then 2 slots.

Moving onto the CPU....

Make sure you have all the correct patches....

At least win2k SP2, and all the amd chipset drivers for agp, inf, power, and ide. (www.amd.com, all 2002 drivers).

Check for heat issues....

If your temp gets past 55C you will start encountering stability fluxuations. The hotter the chips get, the greater the chance.

Check the temp of the cpu's in the bios

(Under PC health, or Hardware monitor). If the temp is hovering around 55-60C while at IDLE, then you need better heatsinks/more airflow.

Also check for bios updates for your particular motherboard.

Other issues can resolve around irq conflicts. Thats easy to check for, just remove all cards sans the video card, and try to reproduce the crash.

Could also be power related.

Try the ram and cpu checks first though.

Susurrus
11-04-2002, 05:55 PM
Thanks for all the great suggestions. I will go through them this evening. As for temps, even under a burn-in I cant get the CPUs to go over 48?C, so I think I am safe there. I have the newest BIOS (v. 1.5), and I got the thing to render for about 2 hrs last nite (a small anim Im working on). But by the time the morning came around, it had given up.

I have ECC REG Corsair PC2100 RAM, I thought I would be in good hands there. Hmmm...

I have some work to do tonight, and I'll let you know how it goes.

Thanks again for your help. Its really appreciated :thumbsup: !


Keith W.

GregHess
11-04-2002, 06:20 PM
I have ECC REG Corsair PC2100 RAM, I thought I would be in good hands there. Hmmm...

Ram can be damaged during shipment. Never assume the ram is ok. Always test it. Bad ram can cause virtually every single OS error known to microsoft.

MAKE SURE when running the ram test, that ECC is disabled. Normal ram should report ZERO errors on a single pass.

Also try and capture the crash info....

Start-Settings-Ctrl Panel-System-Advanced-Startup and Recovery

Turn all the checkmarks off EXCEPT...

(The top login one)
and
(Write to the event viewer)

Next time you go to render, it should BSOD instead of rebooting, giving you an error to write down :).

kwshipman
11-04-2002, 06:30 PM
Ram can be damaged during shipment. Never assume the ram is ok. Always test it. Bad ram can cause virtually every single OS error known to microsoft.

Greg, i know you really recomend Mushkin. What hapens if you receive some ram with errors. Are they good about exchanging? What about other manufactures?

GregHess
11-04-2002, 07:05 PM
Mushkin will tell you to test their own ram with memtest if you think there are errors or problems with it. If an error is found, they will ship you new ram (usually overnight) with a box to ship your damaged ram back to them.

I only recommend them #1 because THEY ARE the BEST when it comes to customer service. Check resellerratings.com, they have one of hte highest scores with thousands upon thousands of votes and comments.

Susurrus
11-05-2002, 04:46 AM
Well...

I tried all of the things posted here w/ no luck. Win2k SP2, AMD AGP and Pwrmgmt, Memtest86 3.0 (Passed), and still FREEZE city. One thing that I couldnt do was install the AMD EIDE drivers, because I could not uninstall the old ones. I keep getting BSOD w/ the error KMODE_EXCEPTION_NOT_HANDLED :surprised ...

At this point I am truly stumped. Am I going to have to buy a new CPU? Motherboard? Are there any software tests I can run to check the integrity of either?

Enough whining. If there are any other suggestions you can make I'd be happy to try them. At the best I have had my machine render for about 1.5 hrs, then freeze. Thats why I am so perplexed. I can run almost any app in the book, and all is fine. I begin a render in LightWave and crap-o.

Thanks for all your help. Please feel free to send more advice my way.


*EDIT* When LightWave freezes I dont get the BSOD. It is just frozen on screen. Nothing works but the reset button.


Keith W.

GregHess
11-05-2002, 11:23 PM
Hey Susurrus,

I just finished retyping up the Troubleshooting FAQ from the old discreet webforum. Give it a look, hopefully it will help you find a solution to your issue.

http://www.3dluvr.com/content/article/105

If you go through all those troubleshooting techniques, and its STILL not working...I might have one or two more tricks up my sleeve.

beaker
11-06-2002, 12:53 AM
One other thing you guys forgot to mention is that the AMD chips need a pretty hefty power supply. If you don't have a 300-400 watt power supply, that could cause crashes. Especially if you have alot of drives on the system.

Susurrus
11-06-2002, 02:01 AM
Thanks Greg. I appreciate all your help, truly! As for my powersupply beaker, its an Antec rated @ 431W, should be plenty, unless you guys know something I dont. I have voltages and stuff I can post, but it seems like it would be overkill.


Thanks again guys, and Ill read over that link.


Keith W.

GregHess
11-06-2002, 02:11 AM
The next step I'd do in troubleshooting is....

Remove everything from the computer except....

1) Cpu/Heatsink
2) A single Dimm in Bank 0 or Bank1 (whatever is lowest)
3) A single video in the primary AGP slot
4) The primary HD for the system.

loading with such a bare config will reduce possible variables for troubleshooting. If the problem disappears with a minimal config, then you can start adding things back in, to see where the trouble starts. Yes I know its extremely tedious, but when it comes down to it, you can't really find the problem without going step by step. It takes a long time, but you should be able to eliminate variables one at a time till you reach the point where you add something (like a soundcard, or 3rd dimm) when the crash suddenly starts appearing.

If the crash still occurs with such a bare config, try running the video card in vga mode (aka sans any drivers). If the render doesn't crash, it could be some sort of weird video conflict. (If it still crashes in vga mode, then it could be some sort of ff'd up driver thing. Like the IDE drivers you mentioned. In which case I'd start over (gulp ya it sucks), and just install the latest drivers from www.amd.com.

If after all that its still happening, I'd start suspecting the CPU's themselves (I'd try swapping different dimms with bank 0/1 [still just 1 dimm at a time, just different ones]), and try swapping their location. Aka cpu 2 to cpu1's socket and vice versa. If that doesn't work, I'd clean the cpu codes off, and verify that their actually mp's. If they are...well then crap, it could be the motherboard.

Hopefully though you'll reach a conclusion before that, as by that point you've wasted a good 4-6 hours.

dvornik
11-06-2002, 02:57 AM
I'm used to troubleshooting in zero-consequence environment (can always ghost the thing back to what it was) but can you try something drastic with your IDE controller? Like remove it from the device manager? Or update driver from there? Try updating the driver from the device manager first.

Susurrus
11-06-2002, 03:12 AM
Last bit of info I can gather...

The whole game of ACPI (or whatever) revolves around the correct HAL. Well I have mine set correctly, and I keep getting this error msg on the BSOD...

IRQL_NOT_LESS_OR EQUAL
Address 8006569E base at 80062000, datestamp 3a248743 - hal.dll

Strange?

I am going to try all of Greg's suggestions tomorrow night, and see what results I get.


See you soon.


Keith W

GregHess
11-06-2002, 03:19 AM
Sus,

As a last ditch attempt. Try setting the System to non acpi (change it from ACPI computer to Standard MPS Computer).

This could of course fubar the win2k install. So backup stuff first.

dvornik
11-06-2002, 03:58 AM
I haven't noticed if you mentioned updating the inf drivers. Make sure you do. Looks like a chipset drivers issue.

MadMax
11-06-2002, 09:07 PM
I have had this happen a couple of times, and in both instances everything worked 100% fine until I hit render.

It would go through part of a render and then boom. Lightwave is pretty sensitive to ram issues. VERY sensitive.

Download Memtest86 3.0, it's free. you can do it from a floppy or burn the ISO image to a self booting CD.

Susurrus
11-06-2002, 10:35 PM
So you guys know, I ran the Mushkin Memtest86 and it checked out fine. After a little pokiing around last night I have a feeling its the second CPU I bought. Either it came damaged or I damaged it :eek: ! Tonight I am going to remove that CPU and see if LW renders okay.

I do have one question though. If I run a burn-in (like Sisoft) the temp never gets above 47ºC and that can run for hours if I want it to. The question is, what is the burn-in doing? If its running the CPUs @ 100%, why doesnt my rig freeze then?

Another topic : If the CPU is bad, does anyone know how AMD is about handling defective/damaged CPUs?


This forum is the best forum on the internet. Bar none. Period.


Keith W.

plotz
11-07-2002, 01:45 AM
It could be the second CPU.

This could be total bunk, but...

Our Audio Engineer just had a new box built and asked them to put a dual capable system together. But he wanted to add the second proc. at a later time to save a few $$.

The guys at the shop told him he shouldn't do that. That if he planned on going dual proc he should do it now so he could get processors that were from the same lot?

Basically they implied that if you put a P4 proc. now, and one a year from now, the manufacturing differences would be big enough to protentially cause problems.

I'm not a tech guy, so I have no idea if this is true...but thought I'd relay it anyway.

BTW. Have you tried rendering in one of the non-realistic modes. Just to see if it freezes up on every kind of render? Might help you point to a LW specific problem vs. a hardware issue.

phatgroovn
11-07-2002, 02:00 AM
I've heard that, that it's optimal to get a pair of CPUs from the same chunk 'o silicon, but this is near impossible with the distribution channel being how it is. I think the point is more that in a year, it could possibly be very difficult to find another of the same CPU.

phatgroovn
11-07-2002, 02:01 AM
Do you have other 3D apps? Try some renderings in other programs or just try to push the hell out of your CPU/RAM with other means. I've had certain scene files have some corruption that locks a system, so try taxing it in other ways.

GregHess
11-07-2002, 02:17 AM
Its actually possible to get subsequent chip ID #'s. When your order your cpu's, call the place your ordering from and ask for subsequent ID's. They usually get a whole batch at once, and its usually sequential chips. I got my MP's that way.

Zero crashes, 24/7, Since Sept 2001.

MadMax
11-07-2002, 05:23 AM
Originally posted by plotz
It could be the second CPU.

This could be total bunk, but...

Our Audio Engineer just had a new box built and asked them to put a dual capable system together. But he wanted to add the second proc. at a later time to save a few $$.

The guys at the shop told him he shouldn't do that. That if he planned on going dual proc he should do it now so he could get processors that were from the same lot?

Basically they implied that if you put a P4 proc. now, and one a year from now, the manufacturing differences would be big enough to protentially cause problems.


This used to be true of older Intel chips. If you did not get the same stepping, it wouldn't work.

That situation was never true on an Athlon based system. Hell you could even put CPU's of different size on a board and it would work.

If that was a problem, the system wouldn't work at all.

ChrisR
11-10-2002, 08:45 AM
Are you using any third party plug-ins or shaders. If so disable them in the scene and try to render again.

Sometimes plug-ins are not fully compatable with the current version of the software they're suppose to be used with or they don't play nice with other plugs.

CGTalk Moderation
01-13-2006, 09:00 PM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.