Last updated on: Sunday, July 06, 2008
Software
Information
Community
News
Fun
Credits
|
LVM Volume Group Descriptor Area (VGDA) Recovery
This information was originally posted to the Linux-390 mailing list on January
16, 2005, by Peter Abresch.
We ran into an interesting situation. We suddenly lost access to our DASD due to human
error. This corrupted the reiser file systems that we had defined under Logical Volume
Manager (lvm). After running reiserfsck, we still had corruption of
the Volume Group Descriptor Area (VGDA). Our symptom was that /dev/linuxd01/opt and /dev/linuxd01/tmp were the same
file system.
An excerpt from the SuSE LVM white paper available at http://www.novell.com/products/linuxenterpriseserver8/whitepapers/LVM.pdf
states:
The volume group descriptor area (or VGDA) holds the metadata of the LVM
configuration. It is stored at the beginning of each physical volume. It contains four
parts: one PV descriptor, one VG descriptor, the LV descriptors and several PE
descriptors. LE descriptors are derived from the PE ones at activation time. Automatic
backups of the VGDA are stored in files in /etc/lvmconf/ (please see the commands
vgcfgbackup/vgcfgrestore too). Take care to include these files in your regular (tape)
backups as well.
LVM will backup the configuration automatically when modifications are made to the LVM
configuration. The corruption was on the physical volume somewhere. The goal was to
recover the VGDA and not lose any data that was residing on the logical volume (LV). Here
is what I did:
-
Identify that corruption exists. As stated before, our symptom was that /dev/linuxd01/opt and /dev/linuxd01/tmp were the
same file system. For the rest of this document, assume linuxd01 is the volume group
name. You can substitute your own volume group name as appropriate.
ls -l /dev/linuxd01
crw-r----- 1 root disk 109, 2 2005-01-16 15:49 group
brw-rw---- 1 root disk 58, 1 2005-01-16 15:49 home
brw-rw---- 1 root disk 58, 2 2005-01-16 15:49 local
brw-rw---- 1 root disk 58, 0 2005-01-16 15:49 opt
brw-rw---- 1 root disk 58, 0 2005-01-16 15:49 tmp
brw-rw---- 1 root disk 58, 3 2005-01-16 15:49 var
As you can see, opt and tmp shared the same minor number of 0. This obviously is
incorrect. One of them is wrong.
-
Verify which one is incorrect. We issued the following command to list the lvm
backups:
ls -l /etc/lvmconf
-rw-r----- 1 root root 183168 2005-01-16 16:20 linuxd01.conf
-rw-r----- 1 root root 183168 2004-11-30 16:37 linuxd01.conf.1.old
-rw-r----- 1 root root 182368 2004-11-28 09:52 linuxd01.conf.2.old
-rw-r----- 1 root root 183168 2004-11-27 09:07 linuxd01.conf.3.old
-rw-r----- 1 root root 182368 2004-07-29 12:33 linuxd01.conf.4.old
-rw-r----- 1 root root 180828 2004-07-29 12:19 linuxd01.conf.5.old
-rw-r----- 1 root root 176312 2004-07-29 12:16 linuxd01.conf.6.old
-rw-r----- 1 root root 172604 2004-07-28 10:34 linuxd01.conf.7.old
-rw-r----- 1 root root 158664 2004-07-22 10:42 linuxd01.conf.8.old
-rw-r----- 1 root root 150580 2004-07-22 10:13 linuxd01.conf.9.old
linuxd01.conf is the most recent and the one actually in use.
-
List the most recent backup and verify the information, paying particular attention
to the "Block device" line under the logical volume section.
vgcfgrestore -f /mnt/etc/lvmconf/linuxd01.conf -n linuxd01 -ll
vgcfgrestore -- this is a backup of volume group "linuxd01"
--- Volume group ---
VG Name linuxd01
VG Access read/write
VG Status NOT available/resizable
VG # 2
MAX LV 256
Cur LV 5
Open LV 0
MAX LV Size 255.99 GB
Max PV 256
Cur PV 3
Act PV 3
VG Size 6.86 GB
PE Size 4 MB
Total PE 1755
Alloc PE / Size 1572 / 6.14 GB
Free PE / Size 183 / 732 MB
VG UUID ehREE0-MrMY-QyvP-PezE-RBbz-FDk8-noY8D4
--- Logical volume ---
LV Name /dev/linuxd01/opt
VG Name linuxd01
LV Write Access read/write
LV Status available
LV # 1
# open 0
LV Size 1.97 GB
Current LE 505
Allocated LE 505
Allocation next free
Read ahead sectors 1024
Block device 58:0
--- Logical volume ---
LV Name /dev/linuxd01/home
VG Name linuxd01
LV Write Access read/write
LV Status available
LV # 2
# open 0
LV Size 2.20 GB
Current LE 564
Allocated LE 564
Allocation next free
Read ahead sectors 1024
Block device 58:1
--- Logical volume ---
LV Name /dev/linuxd01/local
VG Name linuxd01
LV Write Access read/write
LV Status available
LV # 3
# open 0
LV Size 300 MB
Current LE 75
Allocated LE 75
Allocation next free
Read ahead sectors 1024
Block device 58:2
--- Logical volume ---
LV Name /dev/linuxd01/var
VG Name linuxd01
LV Write Access read/write
LV Status available
LV # 4
# open 0
LV Size 1.10 GB
Current LE 282
Allocated LE 282
Allocation next free
Read ahead sectors 1024
Block device 58:3
--- Logical volume ---
LV Name /dev/linuxd01/tmp
VG Name linuxd01
LV Write Access read/write
LV Status available
LV # 5
# open 0
LV Size 584 MB
Current LE 146
Allocated LE 146
Allocation next free
Read ahead sectors 1024
Block device 58:4
--- Physical volume ---
PV Name /dev/dasdd1
VG Name linuxd01
PV Size 2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
PV# 1
PV Status available
Allocatable yes (but full)
Cur LV 2
PE Size (KByte) 4096
Total PE 585
Free PE 0
Allocated PE 585
PV UUID L3fZI4-8owE-rz5O-CKUq-FcEC-6BvK-9SWH5K
--- Physical volume ---
PV Name /dev/dasde1
VG Name linuxd01
PV Size 2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
PV# 2
PV Status available
Allocatable yes (but full)
Cur LV 3
PE Size (KByte) 4096
Total PE 585
Free PE 0
Allocated PE 585
PV UUID v7cc4b-5vxP-yUFZ-xTyY-UjBw-7se4-62UDkn
--- Physical volume ---
PV Name /dev/dasdc1
VG Name linuxd01
PV Size 2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
PV# 3
PV Status available
Allocatable yes
Cur LV 2
PE Size (KByte) 4096
Total PE 585
Free PE 183
Allocated PE 402
PV UUID 1yy4v8-0XDU-3uJ8-ssmS-7kaJ-MBDx-gR3FBr
As you can see, /dev/linuxd01/opt had a block device 58:0 and /dev/linuxd01/tmp had a block device 58:4. This is different from
what was listed in step 1.
brw-rw---- 1 root disk 58, 0 2005-01-16 15:49 opt
brw-rw---- 1 root disk 58, 0 2005-01-16 15:49 tmp
/dev/linuxd01/tmp should be 58:4. We decided to move forward
with the VGDA restore of the physical volume that contained /dev/linuxd01/tmp.
-
Issue the following command to identify the physical volumes (PVs) that make up the
logcal volumes.
lvdisplay -v /dev/linuxd01/tmp
output similar to the following should be displayed:
.
.
--- Distribution of logical volume on 1 physical volume ---
PV Name PE on PV reads writes
/dev/dasdc1 146 9755 2486
.
.
.
This identifies that logical volume /dev/linuxd01/tmp only
resides on /dev/dasdc1. The VGDA on physical volume /dev/dasdc1 will need to be restored.
-
Identify the UCB addresses (device numbers).
cat /proc/dasd/devices | grep /dev/dasdc1
output similar to the following should be displayed:
232d(ECKD) at ( 94: 8) is dasdc : active at blocksize: 4096, 601020 blocks, 2347 MB
which reveals that the necessary UCB address is 232d. We also made note of the root
device from our /etc/zipl.conf parameters="dasd=232b-232e,232a
root=/dev/dasda1" which was UCB 232b.
Be aware that what is in /etc/zipl.conf may not reflect the
parameters that were passed to the Linux system for the current IPL. You can verify
if they are the same by doing a cat /proc/cmdline command.
-
We chose to correct this problem under our recovery Linux system. Depending on the
logical volume needing recovery, this might not be necessary. We could have
- gone into single user mode,
- unmounted all the file systems under volume group linuxd01, and then
- deactivated the linuxd01 volume group using vgchange -an
linuxd01.
However, we chose to err on the side of caution and shut down our Linux at this point
and booted our emergency recovery Linux.
-
Log on to your emergency Linux and issue the following commands to gain access to
your DASD
modprobe dasd_mod dasd=2320-232f dasd_disciplines=dasd_eckd_mod
echo "add device range=232b" >> /proc/dasd/devices
echo "add device range=232d" >> /proc/dasd/devices
If you look carefully, you will see that the modprobe command
doesn't exactly match the parameters that were found in /etc/zipl.conf. This will have consequences that are seen in a couple of
steps.
-
Mounted your original root file system. This is necessary because the recovery Linux
does not contain any LVM commands and the VGDA backups are on your root file system
anyway. We issued the following commands to gain access to LVM commands.
mount /dev/dasda1 /mnt
export PATH=/mnt/sbin:$PATH
cp /mnt/lib/liblvm-10.so.1 /lib
vgscan
The vgscan command is a simple test that should discover the
volume group on the one volume as follows:
vgscan -- reading all physical volumes (this may take a while...)
vgscan -- found inactive volume group "linuxd01"
vgscan -- "/etc/lvmtab" and "/etc/lvmtab.d" successfully created
vgscan -- WARNING: This program does not do a VGDA backup of your volume group
-
From the Linux recovery system, identify the old physical path and the new physical
path. If the parameters to the DASD driver had been identical to what was in /etc/zipl.conf, these would be the same. Since we did not, remember
that the old physical path is /dev/dasdc1 on UCB 232d as
identified in previous steps. However, a cat /proc/dasd/devices
under the Linux recovery system reveals that UCB 232d is now /dev/dasdb1. This is the new physical path. The restore can be performed
using the following command:
vgcfgrestore -f /mnt/etc/lvmconf/linuxd01.conf -o /dev/dasdc1 /dev/dasdb1
Once vgcfgrestore is completed, the Linux recovery system can
be shutdown and your production Linux rebooted. You'll know rather quickly if /opt
and /tmp are correct. However, for peace of mind, you can confirm this:
ls -l /dev/linuxd01
crw-r----- 1 root disk 109, 2 2005-01-16 15:49 group
brw-rw---- 1 root disk 58, 1 2005-01-16 15:49 home
brw-rw---- 1 root disk 58, 2 2005-01-16 15:49 local
brw-rw---- 1 root disk 58, 0 2005-01-16 15:49 opt
brw-rw---- 1 root disk 58, 4 2005-01-16 15:49 tmp
brw-rw---- 1 root disk 58, 3 2005-01-16 15:49 var
- Drink beer, we deserve it. :)
|