Last updated on:
Sunday July 06, 2008
BMRtool for file-level Bare Metal Restore of SLES 9 zSeriesBy John Romanowski, April 2007
Email: john.romanowski at oft.state.ny.us
BMRtool is a bash script I wrote to make preparing for and doing a file-level bare metal restore (BMR) of my z/VM SLES9 Linux servers reliable, quick and easy.
For a file-level BMR, BMRtool is an automated solution to the problem of needing to have on-hand and accurately enter the commands and parameters to load drivers, activate disks, reformat, re-partition, re-create logical volume definitions and re-mkfs a failed server's file systems before you can start restoring them from file-level backups. The problem is worse when (like my site) you have multiple servers with different numbers and sizes of disk, mixes of DASD and FCP SCSI disk, and LVM managed disks. Each server needs a custom set of commands and parameters.
BMRtool's scan and diskinit modes solve the problem. BMRtool scan scans a server, generates the needed BMR commands and parameters and writes them to a file. Later on, at BMR-time from a rescue system running on the failed server, BMRtool diskinit reads that commands file and runs the commands, resulting in the server's empty file systems mounted and ready to be restored from file-level backups. BMRtool's scan generates the restore commands too and BMRtool restore reads and runs them to restore the server's files. After that run a chroot-ed mkinitrd and chroot-ed zipl, a BMRtool umount to unmount the restored file systems and then reboot the restored server.
Since I wrote this tool for my site's needs I didn't code support for every possible disk and file system feature. Below are BMRtool's supported and unsupported configuration features to help you evaluate using it. Read the Caveats and Gotchas section too.
SUPPORTED Configurations:(review UNSUPPORTED too for possible exceptions)
UNSUPPORTED Configurations:(mainly because I didn't add the code to handle these)
Using BMRtool to prepare for file-level bare metal restore
Download the BMRtool bash script (chown root: and chmod u+x to make it executable; might need to run it through dos2unix)
BMRtool has five 'modes': scan diskinit restore mount umount
BMRtool -h gives brief usage help about the modes and BMRtool options.
To prepare for possibility of needing to do a BMR in the future, as root run BMRtool scan on the healthy system . Scan mode examines the server's in-use disk objects ( local disk file systems mounted rw; swap partitions; swap volumes; LVM physical volumes) and generates a file named BMRcmds holding all the commands BMRtool's other modes will read and execute on a rescue system at BMR-time. Scan mode writes the BMRcmds file, disk configuration files and a copy of BMRtool to directory /etc/BMR/ which it creates if necessary. Schedule BMRtool scan to run before the server's regular file backups so /etc/BMR/ automatically reflects any disk configuration changes and is available from the latest backup.
Scan mode writes these additional files as needed to /etc/BMR/:
Here's an example of running a BMRtool scan:
dz2mfse1:~ # /usr/local/sbin/BMRtool scan *** Doing- BMRtool scan Wed Apr 4 10:55:20 EDT 2007 BMRdir=/etc/BMR File BMRcmds=/etc/BMR/BMRcmds Copying LVM config file /etc/lvm/lvm.conf into /etc/BMR to use at recovery time. Volume group "vg1" successfully backed up. Writing commands to /etc/BMR/BMRcmds Saving a copy of BMRtool in /etc/BMR *** Done- BMRtool scan Wed Apr 4 10:55:24 EDT 2007 dz2mfse1:~ # ls -l /etc/BMR total 108 drwxr-xr-x 2 root root 4096 Apr 4 10:55 . drwxr-xr-x 65 root root 4096 Apr 4 10:52 .. -rw-r--r-- 1 root root 7092 Apr 4 10:55 BMRcmds -rwxr--r-- 1 root root 54917 Apr 2 14:02 BMRtool -rw------- 1 root root 4282 Apr 4 10:55 VG-vg1 -rw-r--r-- 1 root root 10164 Nov 17 2005 lvm.conf -rw-r--r-- 1 root root 24 Apr 4 10:55 ptbl-0.0.0200 -rw-r--r-- 1 root root 11 Apr 4 10:55 ptbl-0.0.0201 -rw-r--r-- 1 root root 11 Apr 4 10:55 ptbl-0.0.0202You can see an example of the generated BMRcmds file.
Using BMRtool to help do a bare metal restoreLater on, in a BMR scenario you'd boot your rescue system on the failed server, restore the failed server's latest /etc/BMR/ directory from backups to the rescue system's /etc/BMR/, run BMRtool diskinit and then (assuming you're using IBM's TSM or you have tweaked BMRtool to generate restore commands for your site's file backup utility) run BMRtool restore or manually enter restore commands. Finish with a chroot-ed mkinitrd and chroot-ed zipl to make the restored server bootable, then BMRtool umount to unmount the restored file systems, and reboot the restored server. If your server ran a database then some further database restores may be needed after the server's up.
To do a file-level bare metal restore run these commands as root on the rescue system:
dsmc restore /etc/BMR/ -replace=all # TSM command to restore failed server's # /etc/BMR to rescue system's /etc/BMR. /etc/BMR/BMRtool diskinit # Recreate disk structures ending with failed # server's empty file systems mounted at /mnt/a. /etc/BMR/BMRtool restore # Run the generated TSM restore commands. mount -t sysfs sys /mnt/a/sys # Prep for chroot-ed mkinitrd. mount -t proc proc /mnt/a/proc chroot /mnt/a mkinitrd chroot /mnt/a zipl # Make restored system bootable. /etc/BMR/BMRtool umount # Unmounts it all from /mnt/a.Logoff/force the z/VM guest and xautolog it to verify the restored server boots up. Then run mkinitrd and zipl on it again to get a clean initial ramdisk that doesn't reference the rescue system's disk addresses.
An example session log from a file-level bare metal restore using those commands is available.
When invoked, each BMR-time mode asks you to confirm you want to run it. The /etc/BMR/BMRcmds file is a text file with a letter code in columns 1-2 indicating to which mode(s) the record belongs. The rest of the record is a command or bash comment. A mode simply reads the BMRcmds file selecting records whose letter code matches one of that mode's codes, displays each selected record and runs its command. If a command gives a non-zero status code a numbered message reports the status code and the command's name. BMRtool reports the number of records selected and how many commands ended with a non-zero status code. Because the commands are displayed and run quickly it's a good idea to use script to capture a log of what happens in case you need to review it. You might see a record whose command runs a BMRtool internal function, either $(atBMR ) or Tptbl, to resolve things that aren't known until BMR-time. $(atBMR ) returns a device name such as dasdc2 or sda. Tptbl conditionally partitions a multipath LUN on its first available path and then skips partitioning on the LUN's other paths.
Messages That Might OccurDuring scan mode
Caveats and Gotchas
Rescue system - The SLES 9 rescue system must not conflict with the server being restored in terms of same disk addresses or conflicting use of features like LVM or multipath-tools. For example the rescue system can't need a LVM volume group name the failed server uses.
Unmounted file systems - scan mode doesn't find unmounted disk file systems or generate mkfs and mount commands for them. You could mount them before you run scan.
Swap partitions in swapoff status - scan mode doesn't find them or generate mkswap commands for them. scan finds swap partitions using 'swapon -s'.
Read-only file system - Local disk file system mounted read-only: what to do with it? I didn't have any to test with, so I took the easy way out and coded to skip them. The BMRcmds file has a comment for each skipped one. But suppose it's on a DASD partition and another of the DASD's partitions is mounted r/w?
Exported LVM volume group - Beware if its Physical Volumes (PV's) are on DASD: the BMRcmds file will dasdfmt the DASD. Maybe I should have generated commented out commands you could choose to uncomment? What if PV dasdb1 is exported and PV dasdb2 isn't?
LVM uninstalled - BMRtool scan does some LVM query commands without first checking that LVM's installed. Using LVM is not required but if it's not installed expect some missing-command error messages. This should not a problem for a regular SLES 9 install.
mknod - BMRtool doesn't generate mknod commands for the disks. If your rescue system needs more special files you'll have to do some mknod's manually.
Site hosting courtesy of Velocity Software