Monitoring HP G5 server hardware RAID on Debian

Personally I prefer to use Linux MDADM software raid because of the following factors

  • Homogenous set of utilities, always the same, unlike all the different custom utils from all the many hardware vendors.
  • Long term support for the platform.
  • Proven performance and stability.
  • Cheaper RAID cards use the CPU in any case and the MDADM implementation will blow it out the water for features/performance.
  • Ability to run any type of RAID level unlike most hardware which usually only support 0,1 and 0+1.

But some times you get a system with decent dedicated controllers with cache and battery backup and you want to be able to offload to it. This is what was in my G5 system

lspci -nn
06:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array Controller [103c:3230] (rev 04)

Googling “pciid 103c:3230” quickly yielded that I was dealing with a “HP Smart Array P400i” card

Now while the card is supported by the OS out of the box an I can see any array that I created in the BIOS, the problem that I sit with is that I need to be able to monitor the disks for failure and issue rebuild commands without taking the system down. Trying to get this right with the vendor provided tools is usually near impossible as the vendor abandoned support and usually only had support for one or 2 commercial linux distros in any case. Enter the good folks at the HWraid project

Just add their repository and install the tools for your card (in this case the HP tools)

echo deb http://hwraid.le-vert.net/debian squeeze main >> /etc/apt/sources.list
apt-get update
apt-get install hpacucli

Now we test the tools.

hpacucli controller slot=0 physicaldrive all show
Smart Array P400i in Slot 0 (Embedded)
   
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 500 GB, OK)

Success.. now that I have the tools that can interrogate the controller, I need to build some monitoring, so I add a script which I schedule to run every hour in cron.

#!/bin/bash
MAIL=noc@acme.com
HPACUCLI=`which hpacucli`
HPACUCLI_TMP=/tmp/hpacucli.log

if [ `$HPACUCLI ctrl all show config | grep -E 'Failed|Rebuilding'| wc -l` -gt 0 ]
then
msg="RAID Controller Errors"
logger -p syslog.error -t RAID "$msg"
$HPACUCLI ctrl all show config > $HPACUCLI_TMP
mail -s "$HOSTNAME [ERROR] - $msg" "$MAIL" < $HPACUCLI_TMP
echo $msg
cat $HPACUCLI_TMP
rm -f $HPACUCLI_TMP
fi

Configure your mail subsystem and ensure your system is actually able to send mail.

dpkg-reconfigure exim4-config

The script is very basic but it gets the job done, and yes you will generate alerts every hour if there is an issue until its resolved, think of it as a feature. The script sends a mail to the hardcoded email address as well as adds it to your syslog. If you are performing syslog monitoring and alerts with something like Solarwinds, Splunk or Graylog then you could rather depend on those systems for alerts by checking for the alert message in syslog and scrap the emailing bit of the script.

Making your Debian server networking redundant

You will need at least the following…

  1. A pair of stacked switches that support creating an LACP bonded port across the stack on 2 different nodes. This gives you the best of all worlds being able to provide redundancy and increase your bandwidth.
  2. Or alternatively, 2 ports on the same or on different unstacked switches. This is the bare minimum you can do to mitigate link failure. Note this setup has no polling mechanism so if the physical ethernet link stays up but is not operational because of device switching failure, or a failure on the another port on the device that provides the uplink, then this wont help you.

On your server you will need 2 (or more) network cards and some “simple” setup

Install the packages that you will need in case you don’t have them already.

  • apt-get install ifenslave vlan bridge-utils

The example sets up the following

  • eth0 and eth1 bonded together into bond0
  • create 2 bridges br8 and br9
  • create 2 vlans bond0.8 and bond0.9
  • place them in each bridge respecitvely
  • add IP details on br9
  • br8 has no L3 config on it and in this specific case is used by KVM to bridge virtual machines into as they come online

For option 1 edit your /etc/network/interfaces to look something like this


# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo bond0 bond0.8 bond0.9 br8 br9
iface lo inet loopback

iface bond0 inet manual
 bond-slaves eth0 eth1
 bond-mode 802.3ad
 mond-miimon 100
 bond-use-carrier 1
 bond-lacp-rate 1
 bond-min-links 1
 # send traffic over the available links based on src/dst MAC address
 bond-xmit-hash-policy layer2
 mtu 1600

iface bond0.8 inet manual
iface bond0.9 inet manual

iface br8 inet manual
 bridge_stp off
 bridge_ports bond0.8

iface br9 inet static
 address 192.168.0.2
 netmask 255.255.255.0
 gateway 192.168.0.1
 bridge_ports bond0.9
 bridge_stp off

For option 2 edit your /etc/network/interfaces to look something like this (only the bond0) config changes


# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo bond0 bond0.8 bond0.9 br8 br9
iface lo inet loopback

iface bond0 inet manual
 slaves eth0 eth1
 bond_mode active-backup
 bond_miimon 100
 bond_downdelay 200
 bond_updelay 200

iface bond0.8 inet manual
iface bond0.9 inet manual

iface br8 inet manual
 bridge_stp off
 bridge_ports bond0.8

iface br9 inet static
 address 192.168.0.2
 netmask 255.255.255.0
 gateway 192.168.0.1
 bridge_ports bond0.9
 bridge_stp off

Most use cases probably will not require bridging or VLAN but I thought it best to provide examples of the entire feature set, you can always reduce to what you need.