Monitoring Xen via SNMP

I wanted to monitor disk I/O and CPU usage for xen’s without running SNMP on each xen domain. I couldn’t find anything out of the box to do this (I’m the first do want to do this? surely there’s something?!) – so… here’s my own way of doing it. It involved hooking a script in to your dom0’s snmp server, and then monitoring whatever tool you like. I have a download at the end of the post for cacti.

This article has been updated: Monitoring Xen via SNMP – update

Continue reading “Monitoring Xen via SNMP”

Monitoring Dell Poweredge 2850 RAID status over SNMP

This took me a while to figure out, so I thought I’d quickly document it.

In our PE2850, we have a ‘Dell PowerEdge Expandable RAID Controller 4e/Si’. To check the status of the disks, you’ll need to fetch megarc from LSI. Download this file onto the server with the RAID card, and also download check_lsi_megaraid. You’ll need to slightly modify check_lsi_megaraid , it prints out things like ‘RAID OK:’ and ‘RAID WARNING:’, change these to just say ‘OK’, and ‘WARNING’, and obviously update the others too. Notice there’s no : in my version.

This script takes about 3 seconds to run, and should produce the following output.

I didn’t want SNMP to block whlist waiting for this script to run, so I used cron to run it every minute, and throw it’s output into a temp file. In /etc/crontab I have this:

In snmpd.conf, I then put this:

Restart your snmpd server, and follow these instructions for configuring nagios.

See also: Monitoring Dell SAS 5iR RAID

Outputting from Postgres to CSV

I can never remember how to output to a CSV file from postgres, and end up having to google it time and time again – so I’m making a note of it here mostly for my own use 🙂

If a field has newlines, this will break. You can do something like this instead…..

Monitoring Dell SAS 5/iR RAID with nagios

The Dell PERC/5 shows like this under ‘lspci’

The status of this RAID card can be read using mpt-status, in Gentoo this package is available as sys-block/mpt-status. Here’s an example of the output:

The latest ‘check_mpt’ script can be found on Nagios Exchange. Download it and put it in your libexec folder, for me on gentoo its ‘/usr/nagios/libexec/’. Open the file, and make sure the ‘use lib’ line points to the correct place.

The script uses sudo to run mpt-status, so you’ll need to modify your /etc/sudoers – adding a line like this:

Next, you need to configure nagios, your filenames might be different from the names I use below.

/etc/nagios/commands.cfg : Note, the -c param refers to the number of disks you expect to be active.


Reload nagios, on gentoo, it’s /etc/init.d/nagios reload

See also: Monitoring PERC 4e over SNMP with nagios

skinning nagios – nagios doesn't have to be ugly!

Nagios can be pretty, and several people I’ve told this too seemed surprised, so I thought I’d put a quick note here. Here is a nice theme for nagios….;d=1

Unfortunately, the underlying UI is still the same horrible interface, but…. this does make a big difference to the aesthetics 🙂

Dell PERC 6/i and RAID monitoring

A few pointers for people trying to get Dell’s PERC 6/i RAID monitoring working under Ubuntu, and any other linux for that matter. It also applies to PERC 5/i too, and… other stuff 🙂

First, visit Dells Linux site. Have a poke about, see what’s there.

Next, we need to download a tool to get information from your array. Download LSI’s MegaRAID CLI tool for linux. It comes as a .RPM, so if you’re an ubuntu user, you can convert it to a .deb using alien, or convert it to a .tar.gz.

# alien --to-tgz MegaCli-1.01.39-0.i386.rpm

You then have a CLI tool you can use to get all your data now! For example:

# ./opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL | grep State
State: Optimal

One thing I spent a while figuring out was how to get the rebuild progress, so here’s how:

# ./opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv [32:1] -aALL

There’s also a really useful cheat sheet for common tasks

Don’t forget to actually monitor this output with nagios, or your favorite monitoring tool!

Trac and googlebot, a crafty trick!

I noticed that google was going crazy indexing trac for doctrine. Today it downloaded over 90000 pages, transfering 3 gig of data! It was causing quite a bit of load on the server (not huge amounts, but enough to show in my graphs!)

Eventaully , I came up with a nice little trick for reducing the number of hits google will make against a trac install. Google have extended robots.txt to allow some slightly improved pattern matching. Here’s my snippet, if you don’t understand it, please don’t use it.

resizing a ext3 disk image

Took me a while to figure this out, so thought I’d put it here for others. This is useful for Xen setups, where you use a file for the disk image. AFAIK, you can only grow an image, not shrink it.

This makes the disk image bigger, checks the image, resizes the file system, and then checks it again