Linux/VM: LVM Volume Group Descriptor Area (VGDA) Recovery

Last updated on:
Sunday, July 06, 2008

Software

Information

Community

News

Fun

Credits

Cast of characters

LVM Volume Group Descriptor Area (VGDA) Recovery

This information was originally posted to the Linux-390 mailing list on January 16, 2005, by Peter Abresch.

We ran into an interesting situation. We suddenly lost access to our DASD due to human error. This corrupted the reiser file systems that we had defined under Logical Volume Manager (lvm). After running reiserfsck, we still had corruption of the Volume Group Descriptor Area (VGDA). Our symptom was that /dev/linuxd01/opt and /dev/linuxd01/tmp were the same file system.

An excerpt from the SuSE LVM white paper available at http://www.novell.com/products/linuxenterpriseserver8/whitepapers/LVM.pdf states:
The volume group descriptor area (or VGDA) holds the metadata of the LVM configuration. It is stored at the beginning of each physical volume. It contains four parts: one PV descriptor, one VG descriptor, the LV descriptors and several PE descriptors. LE descriptors are derived from the PE ones at activation time. Automatic backups of the VGDA are stored in files in /etc/lvmconf/ (please see the commands vgcfgbackup/vgcfgrestore too). Take care to include these files in your regular (tape) backups as well.

LVM will backup the configuration automatically when modifications are made to the LVM configuration. The corruption was on the physical volume somewhere. The goal was to recover the VGDA and not lose any data that was residing on the logical volume (LV). Here is what I did:

Identify that corruption exists. As stated before, our symptom was that /dev/linuxd01/opt and /dev/linuxd01/tmp were the same file system. For the rest of this document, assume linuxd01 is the volume group name. You can substitute your own volume group name as appropriate.

ls -l /dev/linuxd01
```
crw-r-----    1 root     disk     109,   2 2005-01-16 15:49 group
brw-rw----    1 root     disk      58,   1 2005-01-16 15:49 home
brw-rw----    1 root     disk      58,   2 2005-01-16 15:49 local
brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 opt
brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 tmp
brw-rw----    1 root     disk      58,   3 2005-01-16 15:49 var
```
As you can see, opt and tmp shared the same minor number of 0. This obviously is incorrect. One of them is wrong.

Verify which one is incorrect. We issued the following command to list the lvm backups:

ls -l /etc/lvmconf

-rw-r----- 1 root root 183168 2005-01-16 16:20 linuxd01.conf
-rw-r----- 1 root root 183168 2004-11-30 16:37 linuxd01.conf.1.old
-rw-r----- 1 root root 182368 2004-11-28 09:52 linuxd01.conf.2.old
-rw-r----- 1 root root 183168 2004-11-27 09:07 linuxd01.conf.3.old
-rw-r----- 1 root root 182368 2004-07-29 12:33 linuxd01.conf.4.old
-rw-r----- 1 root root 180828 2004-07-29 12:19 linuxd01.conf.5.old
-rw-r----- 1 root root 176312 2004-07-29 12:16 linuxd01.conf.6.old
-rw-r----- 1 root root 172604 2004-07-28 10:34 linuxd01.conf.7.old
-rw-r----- 1 root root 158664 2004-07-22 10:42 linuxd01.conf.8.old
-rw-r----- 1 root root 150580 2004-07-22 10:13 linuxd01.conf.9.old

linuxd01.conf is the most recent and the one actually in use.

List the most recent backup and verify the information, paying particular attention to the "Block device" line under the logical volume section.

vgcfgrestore -f /mnt/etc/lvmconf/linuxd01.conf -n linuxd01 -ll

vgcfgrestore -- this is a backup of volume group "linuxd01"
--- Volume group ---
VG Name               linuxd01
VG Access             read/write
VG Status             NOT available/resizable
VG #                  2
MAX LV                256
Cur LV                5
Open LV               0
MAX LV Size           255.99 GB
Max PV                256
Cur PV                3
Act PV                3
VG Size               6.86 GB
PE Size               4 MB
Total PE              1755
Alloc PE / Size       1572 / 6.14 GB
Free  PE / Size       183 / 732 MB
VG UUID               ehREE0-MrMY-QyvP-PezE-RBbz-FDk8-noY8D4

--- Logical volume ---
LV Name                /dev/linuxd01/opt
VG Name                linuxd01
LV Write Access        read/write
LV Status              available
LV #                   1
# open                 0
LV Size                1.97 GB
Current LE             505
Allocated LE           505
Allocation             next free
Read ahead sectors     1024
Block device           58:0

--- Logical volume ---
LV Name                /dev/linuxd01/home
VG Name                linuxd01
LV Write Access        read/write
LV Status              available
LV #                   2
# open                 0
LV Size                2.20 GB
Current LE             564
Allocated LE           564
Allocation             next free
Read ahead sectors     1024
Block device           58:1

--- Logical volume ---
LV Name                /dev/linuxd01/local
VG Name                linuxd01
LV Write Access        read/write
LV Status              available
LV #                   3
# open                 0
LV Size                300 MB
Current LE             75
Allocated LE           75
Allocation             next free
Read ahead sectors     1024
Block device           58:2

--- Logical volume ---
LV Name                /dev/linuxd01/var
VG Name                linuxd01
LV Write Access        read/write
LV Status              available
LV #                   4
# open                 0
LV Size                1.10 GB
Current LE             282
Allocated LE           282
Allocation             next free
Read ahead sectors     1024
Block device           58:3

--- Logical volume ---
LV Name                /dev/linuxd01/tmp
VG Name                linuxd01
LV Write Access        read/write
LV Status              available
LV #                   5
# open                 0
LV Size                584 MB
Current LE             146
Allocated LE           146
Allocation             next free
Read ahead sectors     1024
Block device           58:4


--- Physical volume ---
PV Name               /dev/dasdd1
VG Name               linuxd01
PV Size               2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
PV#                   1
PV Status             available
Allocatable           yes (but full)
Cur LV                2
PE Size (KByte)       4096
Total PE              585
Free PE               0
Allocated PE          585
PV UUID               L3fZI4-8owE-rz5O-CKUq-FcEC-6BvK-9SWH5K

--- Physical volume ---
PV Name               /dev/dasde1
VG Name               linuxd01
PV Size               2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
PV#                   2
PV Status             available
Allocatable           yes (but full)
Cur LV                3
PE Size (KByte)       4096
Total PE              585
Free PE               0
Allocated PE          585
PV UUID               v7cc4b-5vxP-yUFZ-xTyY-UjBw-7se4-62UDkn

--- Physical volume ---
PV Name               /dev/dasdc1
VG Name               linuxd01
PV Size               2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
PV#                   3
PV Status             available
Allocatable           yes
Cur LV                2
PE Size (KByte)       4096
Total PE              585
Free PE               183
Allocated PE          402
PV UUID               1yy4v8-0XDU-3uJ8-ssmS-7kaJ-MBDx-gR3FBr

As you can see, /dev/linuxd01/opt had a block device 58:0 and /dev/linuxd01/tmp had a block device 58:4. This is different from what was listed in step 1.

brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 opt
brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 tmp

/dev/linuxd01/tmp should be 58:4. We decided to move forward with the VGDA restore of the physical volume that contained /dev/linuxd01/tmp.

Issue the following command to identify the physical volumes (PVs) that make up the logcal volumes.

lvdisplay -v /dev/linuxd01/tmp

output similar to the following should be displayed:
```
.
.
--- Distribution of logical volume on 1 physical volume ---
PV Name PE on PV reads writes
/dev/dasdc1 146 9755 2486
.
.
.
```
This identifies that logical volume /dev/linuxd01/tmp only resides on /dev/dasdc1. The VGDA on physical volume /dev/dasdc1 will need to be restored.
Identify the UCB addresses (device numbers).

cat /proc/dasd/devices | grep /dev/dasdc1

output similar to the following should be displayed:
```
232d(ECKD) at ( 94: 8) is dasdc : active at blocksize: 4096, 601020 blocks, 2347 MB
```
which reveals that the necessary UCB address is 232d. We also made note of the root device from our /etc/zipl.conf parameters="dasd=232b-232e,232a root=/dev/dasda1" which was UCB 232b.

Be aware that what is in /etc/zipl.conf may not reflect the parameters that were passed to the Linux system for the current IPL. You can verify if they are the same by doing a cat /proc/cmdline command.
We chose to correct this problem under our recovery Linux system. Depending on the logical volume needing recovery, this might not be necessary. We could have
- gone into single user mode,
- unmounted all the file systems under volume group linuxd01, and then
- deactivated the linuxd01 volume group using vgchange -an linuxd01.
However, we chose to err on the side of caution and shut down our Linux at this point and booted our emergency recovery Linux.
Log on to your emergency Linux and issue the following commands to gain access to your DASD
```
modprobe dasd_mod dasd=2320-232f dasd_disciplines=dasd_eckd_mod
echo "add device range=232b" >> /proc/dasd/devices
echo "add device range=232d" >> /proc/dasd/devices
```
If you look carefully, you will see that the modprobe command doesn't exactly match the parameters that were found in /etc/zipl.conf. This will have consequences that are seen in a couple of steps.
Mounted your original root file system. This is necessary because the recovery Linux does not contain any LVM commands and the VGDA backups are on your root file system anyway. We issued the following commands to gain access to LVM commands.
```
mount /dev/dasda1 /mnt
export PATH=/mnt/sbin:$PATH
cp /mnt/lib/liblvm-10.so.1 /lib
vgscan
```
The vgscan command is a simple test that should discover the volume group on the one volume as follows:
```
vgscan -- reading all physical volumes (this may take a while...)
vgscan -- found inactive volume group "linuxd01"
vgscan -- "/etc/lvmtab" and "/etc/lvmtab.d" successfully created
vgscan -- WARNING: This program does not do a VGDA backup of your volume group
```
From the Linux recovery system, identify the old physical path and the new physical path. If the parameters to the DASD driver had been identical to what was in /etc/zipl.conf, these would be the same. Since we did not, remember that the old physical path is /dev/dasdc1 on UCB 232d as identified in previous steps. However, a cat /proc/dasd/devices under the Linux recovery system reveals that UCB 232d is now /dev/dasdb1. This is the new physical path. The restore can be performed using the following command:
```
vgcfgrestore -f /mnt/etc/lvmconf/linuxd01.conf -o /dev/dasdc1 /dev/dasdb1
```
Once vgcfgrestore is completed, the Linux recovery system can be shutdown and your production Linux rebooted. You'll know rather quickly if /opt and /tmp are correct. However, for peace of mind, you can confirm this:

ls -l /dev/linuxd01
```
crw-r-----    1 root     disk     109,   2 2005-01-16 15:49 group
brw-rw----    1 root     disk      58,   1 2005-01-16 15:49 home
brw-rw----    1 root     disk      58,   2 2005-01-16 15:49 local
brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 opt
brw-rw----    1 root     disk      58,   4 2005-01-16 15:49 tmp
brw-rw----    1 root     disk      58,   3 2005-01-16 15:49 var
```
Drink beer, we deserve it. :)

Site hosting courtesy of Velocity Software