Monitoring Xen via SNMP

I wanted to monitor disk I/O and CPU usage for xen’s without running SNMP on each xen domain. I couldn’t find anything out of the box to do this (I’m the first do want to do this? surely there’s something?!) – so… here’s my own way of doing it. It involved hooking a script in to your dom0’s snmp server, and then monitoring whatever tool you like. I have a download at the end of the post for cacti.

This article has been updated: Monitoring Xen via SNMP – update

Continue reading “Monitoring Xen via SNMP”

Monitoring Dell Poweredge 2850 RAID status over SNMP

This took me a while to figure out, so I thought I’d quickly document it.

In our PE2850, we have a ‘Dell PowerEdge Expandable RAID Controller 4e/Si’. To check the status of the disks, you’ll need to fetch megarc from LSI. Download this file onto the server with the RAID card, and also download check_lsi_megaraid. You’ll need to slightly modify check_lsi_megaraid , it prints out things like ‘RAID OK:’ and ‘RAID WARNING:’, change these to just say ‘OK’, and ‘WARNING’, and obviously update the others too. Notice there’s no : in my version.

This script takes about 3 seconds to run, and should produce the following output.

# ./check_lsi_megaraid
OK All arrays OK [1 array checked on 1 controller]

I didn’t want SNMP to block whlist waiting for this script to run, so I used cron to run it every minute, and throw it’s output into a temp file. In /etc/crontab I have this:

* * * * * root   /root/mega/check_lsi_megaraid  > /tmp/raid-status

In snmpd.conf, I then put this:

extend raid-status /bin/cat /tmp/raid-status

Restart your snmpd server, and follow these instructions for configuring nagios.

See also: Monitoring Dell SAS 5iR RAID

Outputting from Postgres to CSV

I can never remember how to output to a CSV file from postgres, and end up having to google it time and time again – so I’m making a note of it here mostly for my own use 🙂

f ','
o /tmp/moocow.csv
SELECT foo,bar FROM whatever;

If a field has newlines, this will break. You can do something like this instead…..

 SELECT foo, bar, '"' || REPLACE(REPLACE(field_with_newilne, 'n', '\n'), '"', '""') || '"' FROM whatever;

Monitoring Dell SAS 5/iR RAID with nagios

The Dell PERC/5 shows like this under ‘lspci’

07:08.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
        Subsystem: Dell SAS 5/iR Adapter RAID Controller

The status of this RAID card can be read using mpt-status, in Gentoo this package is available as sys-block/mpt-status. Here’s an example of the output:

# mpt-status
ioc0 vol_id 0 type IM, 2 phy, 148 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 32 ATA      WDC WD1600JS-75N 2E04, 149 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 1 ATA      WDC WD1600JS-75N 2E04, 149 GB, state ONLINE, flags NONE

The latest ‘check_mpt’ script can be found on Nagios Exchange. Download it and put it in your libexec folder, for me on gentoo its ‘/usr/nagios/libexec/’. Open the file, and make sure the ‘use lib’ line points to the correct place.

The script uses sudo to run mpt-status, so you’ll need to modify your /etc/sudoers – adding a line like this:

%nagios ALL=(ALL) NOPASSWD:/usr/sbin/mpt-status

Next, you need to configure nagios, your filenames might be different from the names I use below.

/etc/nagios/commands.cfg : Note, the -c param refers to the number of disks you expect to be active.

define command{
  command_name  check_mpt
  command_line  $USER1$/check_mpt -c 2


define service{
  use                  local-service
  host_name            localhost
  service_description  mpt - Dell Raid
  check_command        check_mpt

Reload nagios, on gentoo, it’s /etc/init.d/nagios reload

See also: Monitoring PERC 4e over SNMP with nagios

skinning nagios – nagios doesn't have to be ugly!

Nagios can be pretty, and several people I’ve told this too seemed surprised, so I thought I’d put a quick note here. Here is a nice theme for nagios….;d=1

Unfortunately, the underlying UI is still the same horrible interface, but…. this does make a big difference to the aesthetics 🙂

Dell PERC 6/i and RAID monitoring

A few pointers for people trying to get Dell’s PERC 6/i RAID monitoring working under Ubuntu, and any other linux for that matter. It also applies to PERC 5/i too, and… other stuff 🙂

First, visit Dells Linux site. Have a poke about, see what’s there.

Next, we need to download a tool to get information from your array. Download LSI’s MegaRAID CLI tool for linux. It comes as a .RPM, so if you’re an ubuntu user, you can convert it to a .deb using alien, or convert it to a .tar.gz.

# alien --to-tgz MegaCli-1.01.39-0.i386.rpm

You then have a CLI tool you can use to get all your data now! For example:

# ./opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL | grep State
State: Optimal

One thing I spent a while figuring out was how to get the rebuild progress, so here’s how:

# ./opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv [32:1] -aALL

There’s also a really useful cheat sheet for common tasks

Don’t forget to actually monitor this output with nagios, or your favorite monitoring tool!

Trac and googlebot, a crafty trick!

I noticed that google was going crazy indexing trac for doctrine. Today it downloaded over 90000 pages, transfering 3 gig of data! It was causing quite a bit of load on the server (not huge amounts, but enough to show in my graphs!)

Eventaully , I came up with a nice little trick for reducing the number of hits google will make against a trac install. Google have extended robots.txt to allow some slightly improved pattern matching. Here’s my snippet, if you don’t understand it, please don’t use it.

User-Agent: Googlebot
Disallow: /*?rev*

resizing a ext3 disk image

Took me a while to figure this out, so thought I’d put it here for others. This is useful for Xen setups, where you use a file for the disk image. AFAIK, you can only grow an image, not shrink it.

# dd if=/dev/zero bs=1M count=1024 >> disk.img
# e2fsck -f disk.img
# resize2fs disk.img
# e2fsck -f disk.img

This makes the disk image bigger, checks the image, resizes the file system, and then checks it again