Monitoring HP G5 server hardware RAID on Debian

Personally I prefer to use Linux MDADM software raid because of the following factors

  • Homogenous set of utilities, always the same, unlike all the different custom utils from all the many hardware vendors.
  • Long term support for the platform.
  • Proven performance and stability.
  • Cheaper RAID cards use the CPU in any case and the MDADM implementation will blow it out the water for features/performance.
  • Ability to run any type of RAID level unlike most hardware which usually only support 0,1 and 0+1.

But some times you get a system with decent dedicated controllers with cache and battery backup and you want to be able to offload to it. This is what was in my G5 system

lspci -nn
06:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array Controller [103c:3230] (rev 04)

Googling “pciid 103c:3230” quickly yielded that I was dealing with a “HP Smart Array P400i” card

Now while the card is supported by the OS out of the box an I can see any array that I created in the BIOS, the problem that I sit with is that I need to be able to monitor the disks for failure and issue rebuild commands without taking the system down. Trying to get this right with the vendor provided tools is usually near impossible as the vendor abandoned support and usually only had support for one or 2 commercial linux distros in any case. Enter the good folks at the HWraid project

Just add their repository and install the tools for your card (in this case the HP tools)

echo deb http://hwraid.le-vert.net/debian squeeze main >> /etc/apt/sources.list
apt-get update
apt-get install hpacucli

Now we test the tools.

hpacucli controller slot=0 physicaldrive all show
Smart Array P400i in Slot 0 (Embedded)
   
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 500 GB, OK)

Success.. now that I have the tools that can interrogate the controller, I need to build some monitoring, so I add a script which I schedule to run every hour in cron.

#!/bin/bash
MAIL=noc@acme.com
HPACUCLI=`which hpacucli`
HPACUCLI_TMP=/tmp/hpacucli.log

if [ `$HPACUCLI ctrl all show config | grep -E 'Failed|Rebuilding'| wc -l` -gt 0 ]
then
msg="RAID Controller Errors"
logger -p syslog.error -t RAID "$msg"
$HPACUCLI ctrl all show config > $HPACUCLI_TMP
mail -s "$HOSTNAME [ERROR] - $msg" "$MAIL" < $HPACUCLI_TMP
echo $msg
cat $HPACUCLI_TMP
rm -f $HPACUCLI_TMP
fi

Configure your mail subsystem and ensure your system is actually able to send mail.

dpkg-reconfigure exim4-config

The script is very basic but it gets the job done, and yes you will generate alerts every hour if there is an issue until its resolved, think of it as a feature. The script sends a mail to the hardcoded email address as well as adds it to your syslog. If you are performing syslog monitoring and alerts with something like Solarwinds, Splunk or Graylog then you could rather depend on those systems for alerts by checking for the alert message in syslog and scrap the emailing bit of the script.