Hot Drives: Dealing with SMART data on OpenSuse 10.3
One thing I've noticed since upgrading my system to OpenSuse 10.3 is that my 3 older Western Digital 160GB drives (specifically 2 WDC WD1600JB-00D and a WDC WD1600JB-00F) run really hot. Like 120 degrees Celsius hot. I get this information from smartd or smartctl. It leaves scary log messages like Oct 24 19:43:18 copper smartd[4479]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 120 to 116 The drives do get hot to the touch but 120°C sounds really hot. It is possible that the sensors aren't accurate. It's also possible they've been reporting high temperatures ever since I installed them around five years ago. See, until I installed OpenSuse 10.3 I never saw the SMART data. I could also guess that the information just isn't accurate for my system. So to test these ideas out I ran smartctl to see what it had to say about all my drives. Three are the WD drives I mentioned and the fourth is a Seagate SATA drive. copper:~ # for f in a b c d ; do smartctl -a /dev/sd$f | grep 194 ; done ; Here's an edited version of my results:
Device ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH
/dev/sda 194 Temperature_Celsius     0x0022   122   253   000
/dev/sdb 194 Temperature_Celsius     0x0022   124   253   000
/dev/sdd 194 Temperature_Celsius     0x0022   040   049   000
They all show up as /dev/sd now because of a SCSI to ATA translation layer that OpenSuse 10.3 uses. Under 10.2 the same drives showed up as /dev/hda, /dev/hdb and /dev/hdc (another good reason to mount by label). Only my SATA drive showed up as /dev/sda. So I can see that my Seagate SATA drive reports a much cooler 40°C. So is it because they're old? Is it because there's not enough air flow between them in the case? As an extreme test I separated the drives so there's an empty bay between each of them, I took the side off the case and I let a pedestal fan blow in there full blast. Not a situation I'd leave in place for any length of time but rather a way to see if I could lower the temperature at all. It made no difference. Even so I've been trying to find some fans that are quiet and will fit in my case. There were two 80mm fans in there but they sound like a wood-chipper. One of them can be throttled down by the QFan setting in the BIOS (it's an Asus M2NPV-VM motherboard) but the other case fan seems unaffected by that setting so I'd have one quiet case fan and one loud one. My first attempt at some quieter fans was to buy a couple of nice 120mm fans from ThermalTakeicon. I bought them online at Tiger Direct and got them a couple days later. They're very quiet but don't fit in the case for this computer (I don't mind having extra parts for the other machines here). So my next attempt was a 92mm fan for the case and what's called the Thermaltake A2309 iCage to mount the drives in. Choosing cooling products for me is a balance between sound and heat dissipation. And I don't want anything with a knob I have to twist. The cage fits in the space of three 5.25 inch drive bays but holds three standard 3.5 inch hard drives and a 120mm fan. I just installed the cage tonight and it looks odd because the fan sticks out the front of the case. I don't mind at all but I'm sure it wouldn't be appropriate for everyone. It's very quiet and I can feel cool air around the drives (which also get a decent air gap between them). The fan runs quieter than my CPU fan. I also installed the 92mm fan at the back of the case. It has a 4 pin connector that won't fit the 3 pin headers on my motherboard. It would fit but wouldn't you know that there's a capacitor right next to each of the 3 pin headers. I just rigged up an adapter from parts I have lying around and connected it up. The fourth pin is unused and the fan is moving a lot of air quietly at 2000RPM. So with all this cool air I should be all set right? No. The results from smartctl above were actually just taken now with the new fans in. I think the problem isn't with air flow but instead with the drives. Either they're reporting as much hotter than they really are or they're generating a lot of heat from friction somewhere that's not going to get better. Whatever the case is I'm going to be buying another hard drive to copy all my data over. The critical stuff got backed the day I started worrying, of course, but it's better not to have a hard drive crash at all. I don't mind spending the money on cooler, quieter fans for my development machine. I've always hated loud fans so I took this as an opportunity to improve things a little in that area. In fact I'm probably going to splurge on a new quieter CPU fan when I order the new hard drive. I'm looking at the Ruby Orbicon. I know, another LED and this time it's red, but 17dBA is very quiet and it moves 78CFM of air. I was thinking about buying it when I first built this machine but thought it was too much to spend when the CPU comes with a fan included. Now $40 doesn't seem like too much after hearing the difference between my old case fans and my new ones. And since I'll be buying a Seagate 500GB driveicon at the same time, it probably won't affect the shipping costs. I just have to make sure it'll clear those capacitors next to the CPU...
2
Your rating: None Average: 2 (2 votes)

The SMART values are just 'friendly' values, and are somewhat arbitary as each manufacturers have different schemes. The actual temperature is (usually) reported under the RAW_VALUE column - and it wraps around to the next line on a standard 40 column console.

I have nearly the same drive, a WD1600JD-00G

# smartctl -d ata /dev/sdb -a | grep 194
194 Temperature_Celsius 0x0022 120 253 000 Old_age Always - 30

The 30 is the temperature.

This is mentioned in the FAQ for smartmontools (of which smartctl is part of). You can find this here http://smartmontools.sourceforge.net/#FAQ

Yeah, I trimmed out the raw values in my post but they didn't seem to be likely values for the actual temperature to me at the time. That combined with the fact that scary messages were being logged made me act. After all the changes I've made the raw value now reports a comfortable 24.