Tux on VM

Last updated on:
Sunday July 06, 2008







Valid XHTML 1.0 Strict
Linux for Big Iron

BMRtool for file-level Bare Metal Restore of SLES 9 zSeries

By John Romanowski, April 2007
Email: john.romanowski at oft.state.ny.us

BMRtool is a bash script I wrote to make preparing for and doing a file-level bare metal restore (BMR) of my z/VM SLES9 Linux servers reliable, quick and easy.

For a file-level BMR, BMRtool is an automated solution to the problem of needing to have on-hand and accurately enter the commands and parameters to load drivers, activate disks, reformat, re-partition, re-create logical volume definitions and re-mkfs a failed server's file systems before you can start restoring them from file-level backups. The problem is worse when (like my site) you have multiple servers with different numbers and sizes of disk, mixes of DASD and FCP SCSI disk, and LVM managed disks. Each server needs a custom set of commands and parameters.

BMRtool's scan and diskinit modes solve the problem. BMRtool scan scans a server, generates the needed BMR commands and parameters and writes them to a file. Later on, at BMR-time from a rescue system running on the failed server, BMRtool diskinit reads that commands file and runs the commands, resulting in the server's empty file systems mounted and ready to be restored from file-level backups. BMRtool's scan generates the restore commands too and BMRtool restore reads and runs them to restore the server's files. After that run a chroot-ed mkinitrd and chroot-ed zipl, a BMRtool umount to unmount the restored file systems and then reboot the restored server.

Since I wrote this tool for my site's needs I didn't code support for every possible disk and file system feature. Below are BMRtool's supported and unsupported configuration features to help you evaluate using it. Read the Caveats and Gotchas section too.

SUPPORTED Configurations:

(review UNSUPPORTED too for possible exceptions)
  • SLES 9 31- or 64-bit on a z/VM guest.
  • ECKD DASD, dasdfmt-ed in compatible disk layout (cdl is the default), blocksize 4096.
  • Partitioned and unpartitioned FBA DASD.
  • Partitioned and unpartitioned FCP SCSI LUNs.
  • Logical Volume Manager (LVM)
  • Multipath-tools
  • Ext2 and ext3 file systems created with all defaults except that BMRtool preserves the reserved block % (-m), maximum mount count (tune2fs -c) and check interval (tune2fs -i). It also preserves a filesystem's mode, uid and gid numbers.
  • mkswap with no options other than -v1. BMRtool generates mkswap commands for the swap partitions it finds.
  • BMRtool expects to find sysfs at /sys and procfs at /proc.
  • For a file-level BMR you need to run a SLES 9 rescue system on the Linux guest with access to the guest's file-level backups. BMRtool scan writes its output files and a copy of itself to /etc/BMR/. The rescue system needs and can get the failed server's recent /etc/BMR/ directory from the file-level backups.

UNSUPPORTED Configurations:

(mainly because I didn't add the code to handle these)
  • SLES 9 in an LPAR - I didn't have one to test, script will probably work? Let me know.
  • ECKD DASD in 'CMS disk layout' or 'Linux disk layout', and the DIAG DASD driver - I didn't code for them.
  • ECKD DASD non-default volser '-l <volser>' - script preserves the volser with dasdfmt but non-interactive fdasd resets volser to default. Script warns about this when found.
  • PAV DASD (Parallel Access Volumes) - I didn't code for it.
  • Extended and logical disk partitions on FBA and SCSI - I didn't code for them.
  • LVM snapshots, dcss block devices, cdrom, tape, etc... - need to filter out and skip them. I didn't have any to code for and test.
  • LABEL= and UUID= mounts - I didn't code for them.
  • Filesystems other than ext2 or ext3 - the script attempts to make them; it generates a mkfs -t <fstype> command and warns it might not work.
  • mdadm, md (multiple device administration), EVMS (Enterprise Volume Manager System) - I didn't code for them.

Using BMRtool to prepare for file-level bare metal restore

Download the BMRtool bash script (chown root: and chmod u+x to make it executable; might need to run it through dos2unix)

BMRtool has five 'modes': scan diskinit restore mount umount
Scan mode generates the commands file. At BMR-time the other modes read the commands file and run their part of it.

BMRtool -h  gives brief usage help about the modes and BMRtool options.

To prepare for possibility of needing to do a BMR in the future, as root run BMRtool scan on the healthy system . Scan mode examines the server's in-use disk objects ( local disk file systems mounted rw; swap partitions; swap volumes; LVM physical volumes) and generates a file named BMRcmds holding all the commands BMRtool's other modes will read and execute on a rescue system at BMR-time. Scan mode writes the BMRcmds file, disk configuration files and a copy of BMRtool to directory /etc/BMR/ which it creates if necessary. Schedule BMRtool scan to run before the server's regular file backups so /etc/BMR/ automatically reflects any disk configuration changes and is available from the latest backup.

Scan mode writes these additional files as needed to /etc/BMR/:

  • A copy of BMRtool itself if there is none or it's out-of-date.
  • Partition configuration files, with a filename starting with ptbl-, such as ptbl-0.0.0203 for DASD 203 or ptbl-3600507680181807628000000000002a9 for the LUN with that scsi_id.
  • /etc/lvm/lvm.conf, if LVM is used.
  • LVM vgcfgbackup config files are stored as VG-VolumeGroupName, e.g., VG-rootvgx.
  • /etc/multipath.conf, if multipath-tools is used.

Here's an example of running a BMRtool scan:

dz2mfse1:~ # /usr/local/sbin/BMRtool scan
*** Doing- BMRtool scan      Wed Apr 4 10:55:20 EDT 2007
    BMRdir=/etc/BMR   File BMRcmds=/etc/BMR/BMRcmds

Copying LVM config file /etc/lvm/lvm.conf into /etc/BMR to use at recovery time.
Volume group "vg1" successfully backed up.
Writing commands to /etc/BMR/BMRcmds
Saving a copy of BMRtool in /etc/BMR

*** Done- BMRtool scan      Wed Apr 4 10:55:24 EDT 2007

dz2mfse1:~ # ls -l /etc/BMR
total 108
drwxr-xr-x   2 root root  4096 Apr  4 10:55 .
drwxr-xr-x  65 root root  4096 Apr  4 10:52 ..
-rw-r--r--   1 root root  7092 Apr  4 10:55 BMRcmds
-rwxr--r--   1 root root 54917 Apr  2 14:02 BMRtool
-rw-------   1 root root  4282 Apr  4 10:55 VG-vg1
-rw-r--r--   1 root root 10164 Nov 17  2005 lvm.conf
-rw-r--r--   1 root root    24 Apr  4 10:55 ptbl-0.0.0200
-rw-r--r--   1 root root    11 Apr  4 10:55 ptbl-0.0.0201
-rw-r--r--   1 root root    11 Apr  4 10:55 ptbl-0.0.0202
You can see an example of the generated BMRcmds file.

Using BMRtool to help do a bare metal restore

Later on, in a BMR scenario you'd boot your rescue system on the failed server, restore the failed server's latest /etc/BMR/ directory from backups to the rescue system's /etc/BMR/, run BMRtool diskinit and then (assuming you're using IBM's TSM or you have tweaked BMRtool to generate restore commands for your site's file backup utility) run BMRtool restore or manually enter restore commands. Finish with a chroot-ed mkinitrd and chroot-ed zipl to make the restored server bootable, then BMRtool umount to unmount the restored file systems, and reboot the restored server. If your server ran a database then some further database restores may be needed after the server's up.

To do a file-level bare metal restore run these commands as root on the rescue system:

dsmc restore /etc/BMR/ -replace=all   # TSM command to restore failed server's
                                      # /etc/BMR to rescue system's /etc/BMR.
/etc/BMR/BMRtool diskinit             # Recreate disk structures ending with failed
                                      # server's empty file systems mounted at /mnt/a.
/etc/BMR/BMRtool restore              # Run the generated TSM restore commands.
mount -t sysfs sys /mnt/a/sys         # Prep for chroot-ed mkinitrd.
mount -t proc proc /mnt/a/proc
chroot /mnt/a mkinitrd
chroot /mnt/a zipl                    # Make restored system bootable.
/etc/BMR/BMRtool umount               # Unmounts it all from /mnt/a.
Logoff/force the z/VM guest and xautolog it to verify the restored server boots up. Then run mkinitrd and zipl on it again to get a clean initial ramdisk that doesn't reference the rescue system's disk addresses.

An example session log from a file-level bare metal restore using those commands is available.

When invoked, each BMR-time mode asks you to confirm you want to run it. The /etc/BMR/BMRcmds file is a text file with a letter code in columns 1-2 indicating to which mode(s) the record belongs. The rest of the record is a command or bash comment. A mode simply reads the BMRcmds file selecting records whose letter code matches one of that mode's codes, displays each selected record and runs its command. If a command gives a non-zero status code a numbered message reports the status code and the command's name. BMRtool reports the number of records selected and how many commands ended with a non-zero status code. Because the commands are displayed and run quickly it's a good idea to use script to capture a log of what happens in case you need to review it. You might see a record whose command runs a BMRtool internal function, either $(atBMR ) or Tptbl, to resolve things that aren't known until BMR-time. $(atBMR ) returns a device name such as dasdc2 or sda. Tptbl conditionally partitions a multipath LUN on its first available path and then skips partitioning on the LUN's other paths.

Messages That Might Occur

During scan mode
  • File descriptor nn left open - Harmless; From a bug in LVM as best I can tell.
  • fsync failed: Invalid argument - Harmless; LVM.
  • Volume group "abcd" successfully backed up - LVM; vcfgbackup backed up the metadata, not the contents of the volume group.
  • /dev/wxyz: open failed: No such device or address - Harmless; LVM; an old /dev/ name was in its cache file.
  • Volume group "abcd" is exported - See Caveats and Gotchas topic Exported LVM volume group.
During diskinit mode
  • no config file entry for partition n found... - Harmless fdasd message if n is greater then 1.
  • Couldn't find device with uuid 'nnnnnnn-nnnn-nnnn-nnnn-nnnn-nnnn-nnnnnn' - Harmless LVM pvcreate message.
  • WARNING: Forcing physical volume creation on /dev/mapper/name of volume group "vgname" - Harmless LVM pvcreate message. The PV has prior LVM metadata on it, likely the same metadata the command is restoring from the vgcfgbackup file.
  • fsync failed: Invalid argument - Harmless LVM vgscan message.

Caveats and Gotchas

Rescue system - The SLES 9 rescue system must not conflict with the server being restored in terms of same disk addresses or conflicting use of features like LVM or multipath-tools. For example the rescue system can't need a LVM volume group name the failed server uses.

Unmounted file systems - scan mode doesn't find unmounted disk file systems or generate mkfs and mount commands for them. You could mount them before you run scan.

Swap partitions in swapoff status - scan mode doesn't find them or generate mkswap commands for them. scan finds swap partitions using 'swapon -s'.

Read-only file system - Local disk file system mounted read-only: what to do with it? I didn't have any to test with, so I took the easy way out and coded to skip them. The BMRcmds file has a comment for each skipped one. But suppose it's on a DASD partition and another of the DASD's partitions is mounted r/w?

Exported LVM volume group - Beware if its Physical Volumes (PV's) are on DASD: the BMRcmds file will dasdfmt the DASD. Maybe I should have generated commented out commands you could choose to uncomment? What if PV dasdb1 is exported and PV dasdb2 isn't?

LVM uninstalled - BMRtool scan does some LVM query commands without first checking that LVM's installed. Using LVM is not required but if it's not installed expect some missing-command error messages. This should not a problem for a regular SLES 9 install.

mknod - BMRtool doesn't generate mknod commands for the disks. If your rescue system needs more special files you'll have to do some mknod's manually.

Site hosting courtesy of Velocity Software