Sunday, October 25, 2009

Logging smartd (or any daemon) messages with rsyslog

Rsyslog is the enhanced syslogd for Linux and Unix, extending the good syslog-ng.

If you use smartmontools to monitor your disks with smartd, you can see that your logs are filled with too many information related to your disks. The first solution is to tell smartd to send messages into a local facility (1-7 are available) and then in syslog-ng write all messages sent to that facility into file smartd.log. It's a good solution, as long as this facility is not used for something else.

With rsyslog you can do better by creating /etc/rsyslog.d/smartd.conf with the following:
:programname, isequal, "smartd" -/var/log/smartd.log
& ~

According to rsyslog.conf(5) man page, programname matches the program name sending the message to the syslog. Therefore with this configuration, all messages sent by smartd are now logged into /var/log/smartd.log. Easy!

Reload rsyslog and you're done.

This feature can be extended to other daemons that you want to log into a separate file, for instance ntpd.

Logging iptables messages with rsyslog

Rsyslog is the enhanced syslogd for Linux and Unix, extending the good syslog-ng.

If you use netfilter's iptables, you may know that a target could be LOG (with a prefix that can be configured) sending detailed information in the kernel log (since netfilter runs in the kernel). Rsyslog and syslog-ng regularily read the kernel log and save it to kern.log (among others).

The common issue is: how to write netfilter messages into a specific log file ?
As explained on this good blog, the first solution can be to send netfilter messages with a fixed priority (let's say DEBUG, the highest) and then, in rsyslog/syslog-ng filter out messages with that priority to save them into another log file, e.g. iptables.log. However, all messages, especially those not sent by netfilter, are saved in this log file, which make this solution not acceptable. The second solution is to use rsyslog's instructions to match the contents of message. Assuming netfilter sends message starting with "iptables: " (using iptables' --log-prefix) we can easily filter these messages with rsyslog and save them into iptables.log.

To do this, just create /etc/rsyslog.d/iptables.conf with the following:
:msg, startswith, "iptables: " -/var/log/iptables.log
& ~
:msg, regex, "^\[ *[0-9]*\.[0-9]*\] iptables: " -/var/log/iptables.log
& ~

According to rsyslog.conf(5) man page, the first line matches all messages that starts with "iptables: " using startswith instruction, save them into /var/log/iptables.log, then the second line ("& ~") says to delete the message otherwise it would be logged in kern.log and other kernel logging files. Third and fourth lines do the same with messages starting with kernel time. For instance, "[ 123.456] iptables: " would also match.

Reload rsyslog and you're done.

Thursday, October 22, 2009

IPv6 NDP Proxy

A few days ago, I wrote an article about IPv6 NDP Proxy for a blog that I love to read. The problem discussed is that some routers who provide you an IPv6 prefix want to see the IPv6 addresses on the same L2 segment, so you can't use routing directly.

This problem can be encountered at OVH - an European dedicated server provider - if you want to set up virtual machines with IPv6 routing; or, if you have a Freebox - the set-top box provided by Free.fr provider - and want to use your own router behind.

In the article, I explain a solution based on the use of the NDP Proxy included in the linux kernel, with the example of a dedicated server at OVH.

Read the article (French)

Reducing the number of devices in a RAID-5

(This blog entry has already been published in French on Nibbles' microblog)

Since 2006, linux RAID allows to grow a RAID-5 volume by adding new devices, and thus allowing to grow the filesystem underneath in order to have more free space. The opposite operation was missing, that is shrinking a RAID-5 volume by removing devices, assuming the underneath filesystem has already been reduced (otherwise data would be lost).
This summer, Neil Brown developed this feature among others, as he announced on his blog.

To be able to use it, you will need:
  • linux kernel >= 2.6.31
  • mdadm >= 3.1, not yet announced but available in Neil Brown's personal git repository in branch devel-3.1
I suggest you to try this new feature with LVM volumes.

We create 4 logical volumes and use them to create a RAID-5 volume. You can see this logical volumes as real disks.
# lvcreate -n test1 -L 4M vg1
# lvcreate -n test2 -L 4M vg1
# lvcreate -n test3 -L 4M vg1
# lvcreate -n test4 -L 4M vg1
# mdadm -C /dev/md9 --level=5 --raid-devices=4 /dev/vg1/test1 /dev/vg1/test2 /dev/vg1/test3 /dev/vg1/test4
# mdadm -w /dev/md9
# cat /proc/mdstat
[...]
md9 : active raid5 dm-9[0] dm-12[3] dm-11[2] dm-10[1]
12096 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
[...]


Then, set up a filesystem on that RAID-5 volume.
# mkfs.ext3 /dev/md9

Now imagine the filesystem is used normally, but one day we need to remove one of these logical volumes (think of a disk), and we don't want the RAID-5 to be degraded. If the filesystem has enough free space, we can reduce it and benefit from this new feature.

Before starting, check if linux kernel >= 2.6.31.
$ uname -a
Linux x5452.stalkr.net 2.6.31.2 #1 SMP Tue Oct 6 15:28:59 CEST 2009 i686 GNU/Linux


Reduce the filesystem on the RAID-5.
# resize2fs -M /dev/md9

The mdadm version featuring RAID-5 volume reduction hasn't been announced yet, so we first get the branch devel-3.1 of mdadm from Neil Brown's personal git repository.
$ git clone git://neil.brown.name/mdadm /tmp/mdadm
Initialized empty Git repository in /tmp/mdadm/.git/
remote: Counting objects: 5943, done.
remote: Compressing objects: 100% (3525/3525), done.
remote: Total 5943 (delta 4540), reused 3118 (delta 2412)
Receiving objects: 100% (5943/5943), 1.71 MiB | 95 KiB/s, done.
Resolving deltas: 100% (4540/4540), done.
$ cd /tmp/mdadm
$ git branch --track devel-3.1 origin/devel-3.1
Branch devel-3.1 set up to track remote branch refs/remotes/origin/devel-3.1.
$ git checkout devel-3.1
Switched to branch "devel-3.1"
$ make
[...]


We see the new option --array-size
# ./mdadm --grow --help
Usage: mdadm --grow device options
[...]
--array-size= -Z : Change visible size of array. This does not change
: any data on the device, and is not stable across restarts.


Size of the array with 4 logical volumes in RAID-5 : (4-1)*(device size) = 3*4M = 12M
# ./mdadm -Q -D /dev/md9
[...]
Array Size : 12096 (11.81 MiB 12.39 MB)
[...]


Reducing size of the array to a size corresponding to 3 logical volumes in RAID-5 : (3-1)*(device size) = 2*4M = 8M
According to the manual, this reduce is only visual (disappears after reboot), the time to trigger the effective reduction.
# ./mdadm /dev/md9 --grow --array-size=8064
# ./mdadm -Q -D /dev/md9
[...]
Array Size : 8064 (7.88 MiB 8.26 MB)
[...]


Now we trigger the effective reduction of the RAID-5 by reducing the number of devices.
Just like in an extend operation, mdadm needs a file to backup the critical section.
# ./mdadm /dev/md9 --grow --raid-devices=3 --backup-file=/tmp/backup
mdadm: Need to backup 384K of critical section..


The RAID-5 volume is now on 3 logical volumes + 1 logical volume for hot-spare.
# cat /proc/mdstat
[...]
md9 : active raid5 dm-9[0] dm-12[3](S) dm-11[2] dm-10[1]
8064 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
[...]


Win! We can now remove the hot-spare if needed elsewhere.

And also, since we reduced the filesystem to its minimum size, we resize the filesystem to take all the available space.
# resize2fs /dev/md9

Big thanks goes to Neil Brown and the other contributors of the Linux RAID project for this very useful feature.

SATA2 and linux software RAID performance

A server I have to manage has 9 SATAII disks of 1TB each. Six of them are plugged directly to an ASRock motherboard, and three of them on a Promise PCI card providing 4 SATA II ports. These disks are partitioned on the same design: 2 partitions, a small one (1GB) for a RAID-1 and a the remaining for a RAID-5. I am therefore not using dedicated hardware but the great software RAID implemented in linux kernel.

I was wondering what were the performances of these disks and the RAID volumes, so I performed a very simple test with hdparm. First, there is an hddtemp for you to see disk models and temperature of operation.
# hddtemp /dev/sd?
/dev/sda: SAMSUNG HD103UJ: 18°C
/dev/sdb: Hitachi HDT721010SLA360: 28°C
/dev/sdc: ST31000340AS: 25°C
/dev/sdd: WDC WD10EACS-00D6B0: 37°C
/dev/sde: MAXTOR STM31000340AS: 38°C
/dev/sdf: ST31000340AS: 38°C
/dev/sdg: Hitachi HDT721010SLA360: 39°C
/dev/sdh: SAMSUNG HD103UJ: 25°C
/dev/sdi: ST31000528AS: 30°C

# hdparm -tT /dev/sd? /dev/md?

/dev/sda:
Timing cached reads: 4956 MB in 2.00 seconds = 2478.45 MB/sec
Timing buffered disk reads: 278 MB in 3.00 seconds = 92.62 MB/sec

/dev/sdb:
Timing cached reads: 4912 MB in 2.00 seconds = 2456.58 MB/sec
Timing buffered disk reads: 290 MB in 3.01 seconds = 96.24 MB/sec

/dev/sdc:
Timing cached reads: 4938 MB in 2.00 seconds = 2469.78 MB/sec
Timing buffered disk reads: 250 MB in 3.00 seconds = 83.29 MB/sec

/dev/sdd:
Timing cached reads: 4924 MB in 2.00 seconds = 2462.42 MB/sec
Timing buffered disk reads: 252 MB in 3.01 seconds = 83.59 MB/sec

/dev/sde:
Timing cached reads: 4798 MB in 2.00 seconds = 2400.08 MB/sec
Timing buffered disk reads: 306 MB in 3.01 seconds = 101.79 MB/sec

/dev/sdf:
Timing cached reads: 4970 MB in 2.00 seconds = 2486.40 MB/sec
Timing buffered disk reads: 308 MB in 3.01 seconds = 102.33 MB/sec

/dev/sdg:
Timing cached reads: 4914 MB in 2.00 seconds = 2458.28 MB/sec
Timing buffered disk reads: 324 MB in 3.00 seconds = 107.92 MB/sec

/dev/sdh:
Timing cached reads: 4964 MB in 2.00 seconds = 2482.84 MB/sec
Timing buffered disk reads: 334 MB in 3.02 seconds = 110.78 MB/sec

/dev/sdi:
Timing cached reads: 4908 MB in 2.00 seconds = 2454.37 MB/sec
Timing buffered disk reads: 322 MB in 3.03 seconds = 106.30 MB/sec

/dev/md1:
Timing cached reads: 4004 MB in 2.00 seconds = 2002.08 MB/sec
Timing buffered disk reads: 236 MB in 3.01 seconds = 78.28 MB/sec

/dev/md5:
Timing cached reads: 4530 MB in 2.00 seconds = 2265.74 MB/sec
Timing buffered disk reads: 670 MB in 3.00 seconds = 223.07 MB/sec
Besides the impressive cached reads, you can see how the disks perform. On their own, they are at about 100MB/s, or 800Mbit/s which is not so bad compared to SATAII 's 3Gbit/s. RAID-5 performs twice better that the disk solely (I already knew it was better, but not so much!), while RAID-1 is 30% lower (which I didn't expect).

Funny thing, did you notice how temperature influenced the disk performance? The cold (18°C) disks seems to be slower than the hot (39°C) ones.

Welcome

I finally decided to start a blog, in English to reach a broader audience. I don't know exactly what will be the topics discussed, but I guess mainly technical stuff about computers, system and network administration, security, linux...

With this blog, I want to thank all the blogs I read for everything I learned.
I expect nothing in return, so please do the same and nobody will be disappointed. ;)