Last updated on: Sunday, July 06, 2008
Software
Information
Community
News
Fun
Credits
|
Using BIND Mounts to Create A Simplified BaseVol/GuestVol Linux Server for SUSE Linux
Enterprise Server 9
The information in this document is based on the concept provided in the IBM Redbook Linux on IBM
eServer zSeries and S/390: Large Scale Linux Deployment, SG24-6824-00. This is a simpler
procedure to follow to create a base-volume/guest-volume Linux for zSeries server which
exploits the mount command with the bind option, introduced into the
kernel at the 2.4 level. The procedure here was developed on a SUSE SLES 9 system (2.6 kernel),
64-bit, at the GA level.
This document last updated on 24 October 2005 by William
Scully, Senior Systems Programmer, Computer Associates.
Assistance for the SLES 9 design was provided by Robert Yeung, Systems
Programmer for Visa.
Product, company, and service names referenced in this document may be
copyright, trademark, or service marks of others. The information provided herein is provided
as is and has not been subjected to formal testing by Computer Associates. Use this information
at your own risk and only after you test in your environment.
Overview
The plan is to create a two-pack system for each Linux system.
- The first pack is the so-called base volume, or basevol. Among these
materials are those which are rarely changed and can be shared read-only among several
servers simultaniously. This device is owned by the so-called base-server, or, in the
examples below, VM userid
LINXBASE .
- The second volume is the so-called guest volume or guestvol. These are
the materials which are unique to each Linux server and in directories which must be
read/write. The VM userid used in the examples below is
LNXGUEST .
The goal is to place as many programs (both the kernel code and vendor-supplied packages) as
possible in the base volume so that it is leveraged over and over again. The remaining guest
volume read/write disk space given to a typical Linux server can be far smaller as a result.
Also, because fixes applied to packages installed onto the read-only disk are used by all
servers, the maintenance burden is substantially decreased.
To implement this scheme requires only a modest change in the way a Linux server is created.
Overall, the approach is to prevent the operating system from mounting the base volume
materials in read/write mode during the boot process. This is easily accomplished because:
- When the Linux boot process starts all directories are in read-only mode.
- As the boot process continues, vendor-supplied scripts remount Linux directories in
read/write mode, as needed.
It is easy to change the vendor-supplied boot-time scripts to avoid the read/write mounting
of the Linux materials. It is also easy to create new boot scripts which mount in read/write
mode only those directories which must be read/write. Thus we can easily pick and choose
which directories (and subdirectories) are read-only and which are read/write. It is the use of
bind -type mount commands which allow specific (sub-)directories to be
R/O or R/W. The Redbook reference above includes a useful description of how bind mounts work. In
particular you may want to read Sections 8.1 through 8.8.
Note that an objective is to hide, as much as possible, the fact that the server is using
bind mounts. Using this implementation users who issue mount -l or cat
/etc/mtab will see the same results as if the R/O server was booted with R/W
directories. Only if the user issues a cat /proc/mounts will they see the details
of what directories are mounted using the bind option.
Planning
Read-only and Read/Write Directories
The directories to be designated as read-only or read/write are selected based on commonly
followed standards used by Linux. Please refer to Filesystem Hierarchy Standard for additional information.
The directories which will be included in the read-only base volume are:
Directory |
Purpose |
/bin |
Linux's essential general-user commands are found here. |
/boot |
Linux's kernel configuration files are found here. |
/lib & /lib64 |
Libraries for Linux and installed packages are found here. |
/lost+found |
The location of erased files. This directory is R/O. |
/mnt |
This directory is normally empty but is commonly used as a temporary mount-point. This
directory is R/O. |
/opt |
Common packages, such as KDE and GNOME are found here. Note that /opt/local is
available R/W to the guest. |
/sbin |
Linux's essential privledged commands are found here. |
/usr |
Applications are typically found here, in particular in /usr/bin. Note that /usr/local
is available R/W to the guest. |
/var/rpm |
Kept R/O to ensure GIS can properly log application of all maintenance |
Of special note is the directory in which is kept the Red Hat Package Manager (RPM) data
base. This is typically located in /var/lib/rpm . This directory is kept in R/O
mode to prevent any attempt at using the RPM command from working. It is to be
understood that all package or kernel maintenance which requires the use of the
RPM command be performed by MIS. However, since any user can create an RPM data
base and use the RPM command, we cannot stop someone from trying to circumvent
this policy. But since the production RPM data base is on a R/O device, there's no chance MIS's
data will be adversely affected.
The directories which will be included in the read/write guest volume are:
Directory |
Purpose |
/dev |
The location of special or device files. |
/etc |
Many configuration files are found here. |
/home |
User's home directories are found in this directory. |
/var |
Logs and mail are typically found here. |
/root |
This is root's home directory. |
/srv |
New for United Linux, web server documents |
/tmp |
Temporary files are commonly written here. No materials in this directory need be kept
when a system is booted. |
There are a few other special-case or locally-defined directories as well:
Directory |
Purpose |
/proc & /sys |
Kernel-created process and system information virtual file systems. |
/opt/local |
Available for users to install packages. This directory is R/W. |
/usr/local |
Available for users to install packages. This directory is R/W. |
/basevol |
This is the mount-point for the bind mounts used for this implementation. This
directory is R/O. |
/guestvol |
This is the mount-point for the bind mounts used for this implementation. This
directory is R/O. |
Base Server Attributes
The Linux servers which are going to share materials were tested using the following
attributes:
- A temporary disk is used for swap space and is located at virtual address 200. The size
of this disk depends entirely on your workload but you may want to start at 40 cylinders and
adjust as needed.
- The read-only base-volume root file system ("
/ " not "/root ") is
located at virtual address 201 and be known as /dev/dasdb1 . The format of the
base-volume is EXT2. Do not use a journalling file system; it is not needed
since this disk will remain in R/O mode, except when the kernel or applications are being
serviced by the systems programmer. The size of the disk should be 3338 3390 cylinders.
- The read/write guest-volume file systems must be located at virtual address 202 and be
known as
/dev/dasdc1 . The format of the guest-volume is EXT3. The size of the
disk should be at least 150 3390 cylinders. Since end-users home directories are located on
this disk you may wish to substantially increase the size of this disk from the suggested
minimum.
- Additional space added to the server should be found in subdirectories located off
/spacenn . These disks should be in virtual address range 203 and above,
and be known as /dev/dasdd1 through whatever.
- Networking is supported as Virtual Switch Virtual Switch at C00-C02 with portname
INTRANET.
- On the 191 A-disk, create file
PROFILE EXEC to IPL the Linux 201
boot-disk.
Given the above, a typical directory entry for the Linux base-volume server should appear
similar to:
USER LINXBASE password 64M 512M G
* You may need up to 512M to install SLES 9 although SLES 9 will run in 64M
ACCOUNT acntcode distcode
MACHINE ESA
IPL CMS
CONSOLE 009 3215 T OPERATOR
SPOOL 00C 2540 READER *
SPOOL 00D 2540 PUNCH A
SPOOL 00E 1403 A
LINK MAINT 190 190 RR
LINK MAINT 19E 19E RR
LINK MAINT 19D 19D RR
NICDEF C00 TYPE QDIO DEVICES 3 LAN SYSTEM INTRANET
MDISK 191 3390 start-cyl 1 MR
MDISK 200 3390 T-DISK cyls
MDISK 201 3390 start-cyl 3338 MR
MDISK 202 3390 start-cyl 150 MR
Installation of a Base Server
At this point you're ready to create the "model" server, LINXBASE , which will
hold the materials to be shared by several Linux servers. The disks owned by this model server
are called the golden base volumes.
Keeping the above requirements in mind, you must now conduct a SLES 9 installation. Conduct
a completely normal installation, following SUSE's instructions. The SLES 9 "Default System"
installation can be used as the "Software Selection". If there are packages which you do or do
not want installed ensure they are selected now. You may find it more convenient to install
all the packages that any of your Linux servers may use. Remember, installing a
package doesn't necessarily mean that any particular Linux server must be configured to utilize
it.
Note: As always, remember the logon password for root. This password
will be needed each time a guest server is cloned using the base server's materials.
Keep in mind that it is important to be careful not to make any unnecessary changes to the
default installation. This is because the so-called "golden base volumes" will be replicated
and used by several Linux servers. You should make only those changes which are to be
used by all Linux servers. Any customization changes needed by a single Linux server can
(and will) be made later, on that server's private (that is, non-shared) disk(s).
Post-Installation Modifications
After the standard SUSE installation is complete, a few minor local modifications are needed
and are discussed below.
-
The following additional packages must be installed:
- The cpint package, to allow Linux to communicate with the VM Control Program. This
package is used to determine the read-only or read/write status of disks and is available
as part of the standard SLES 9 distribution.
- The Regina package adds the Rexx scripting language to Linux. The basevol/guestvol
tools are written in Rexx. (Although SUSE does not include Regina with the SLES 9
distribution you can continue to use the installation RPM from the SLES 8 CDs. You may
also find Regina at Freshmeat.)
-
If you want to use Virtual Disks (that is, FBA) for swap space you may need to update
/etc/sysconfig/kernel to include the record:
INITRD_MODULES="dasd_fba_mod"
After any update to this file you will need to issue the mkinitrd and
zipl commands.
-
Create several new mount-points, to be used later:
mkdir /guestvol
mkdir /basevol
mkdir /basevol/var
mkdir /opt/local
mkdir /usr/local
Directory /usr/local may already exist, depending on the packages you install.
-
Ensure file
/etc/fstab includes records similar to:
/dev/dasda1 swap swap pri=42 0 0
/dev/dasdb1 / ext2 acl,user_xattr 1 1
devpts /dev/pts devpts mode=0620,gid=5 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
# /dev/dasdc1 reserved for MIS use; do not use!
# /dev/dasdd1 is the first available for your use... .
Ensure that device /dev/dasdc1 is not mounted in /etc/fstab .
-
Because swap space is on z/VM temporary disk space the device is unformatted at boot time.
For this reason the swap space will be formatted and activated via
/etc/rc.d/boot.local . A typical /etc/rc.d/boot.local should include
the following statements:
#! /bin/sh
# Enable the so-called "timer patch", for performance under z/VM
sysctl -w kernel/hz_timer=0
# Create and activate TDisk-based swap space for the server
mkswap /dev/dasda1
swapon -a
- Make any final changes which you want reflected on all your Linux servers. For example,
you may want to display a standard greeting to be shown to all users when they log onto
Linux. In this case put that message in file
/etc/motd now. You may also want to
create a personal userid for yourself (or your peers or the staff in Operations) to allow you
to gain access to the server at a future date.
Servicing the Base Server
Service the base server as normal.
Conversion to Basevol/Guestvol
At this point in these instructions you have created a standard Linux server, albeit with a
few minor changes to support running Linux under z/VM. In the next steps you will alter this
standard configuration to support running in a basevol/guestvol mode.
Create Supporting Boot Script
SUSE provides several scripts used at boot-time to prepare the server for operation. For the
purposes of a basevol/guestvol implementation one additional boot script, shown below, is
required, to be called boot.guestvol :
#!/usr/bin/regina
/*---------------------------------------------------------------------+
| Check if we're running on the so-called "model" Linux server or if |
| we're running on a Linux server using the model's disks R/O. If we |
| are running on the model then we completely skip this tool so all |
| the "standard" scripts run unmodified. If we're running on a userid |
| which has the 201 disk in R/O mode then this tool's logic is used to |
| set up the server correctly. |
+---------------------------------------------------------------------*/
Trace Normal
Say ''
Say 'Basevol/Guestvol script begins... .'
/*---------------------------------------------------------------------+
| Is the 201 disk R/W? If so then this must be the systems programmer |
| logging onto the Linux server which manages the "golden" materials. |
| Otherwise this must be one of the many Linux servers who are using |
| these shared materials in R/O mode. Bring up the server accordingly |
+---------------------------------------------------------------------*/
save_rc = Popen( 'hcp QUERY VIRTUAL 0201', rec. )
Select
When Word( rec.1, 5 ) = 'R/W'
Then Do
Say 'Welcome to the Linux disk model userid!'
Say 'All directories will be in R/W mode.'
End
When Word( rec.1, 5 ) = 'R/O'
Then Do
Say 'This server is running with shared disk support.'
Say 'Many directories will be in R/O mode.'
Call Do_It
End
Otherwise Do
Say 'Unexpected results checking disk status.'
Say 'By default all directories will be in R/W mode.'
End
End
Say 'Basevol/Guestvol script ends.'
Say ''
Exit 0
Do_It: /*--------------------------------------------------------------+
| Do what's needed to prepare and mount the various directories in R/O |
| or R/W mode |
+---------------------------------------------------------------------*/
Procedure
Say 'Forcing R/O mount of root file systems... .'
'mount -n -o remount,ro /' /* no update /etc/mtab, option remount R/O */
Say 'Forcing a filesystem check on R/W guest volume... .'
'e2fsck /dev/dasdc1'
Say 'Mounting R/W guest volume... .'
'mount -w -t ext3 -n /dev/dasdc1 /guestvol' /* R/W, type EXT3, no update /etc/mtab */
Say 'Temporarily enabling directory /etc from guest volume... .'
'mount -w -t ext3 --bind /guestvol/etc /etc'
Say 'Discarding obsolete mtab file... .'
'rm -f /etc/mtab*'
Say 'Creating proper /etc/mtab file... .'
'mount -f /' /* -f is "fake", to add entries for devices mounted earlier with -n */
Say 'Overlapping R/O mount points with R/W materials... .'
'mount -w -t ext3 --bind /guestvol/dev /dev'
Say 'Remounting /dev/pts... .'
'mount -t devpts devpts /dev/pts'
Say 'Continuing the overlapping mounts... .'
'mount -w -t ext3 --bind /guestvol/home /home'
'mount -w -t ext3 --bind /guestvol/root /root'
'mount -w -t ext3 --bind /guestvol/srv /srv'
'mount -w -t ext3 --bind /guestvol/tmp /tmp'
'mount -w -t ext3 --bind /guestvol/opt/local /opt/local'
'mount -w -t ext3 --bind /guestvol/usr/local /usr/local'
/* required, in this order, to force /var/lib/rpm to be R/O */
'mount -r -t ext3 --bind /var /basevol/var'
'mount -w -t ext3 --bind /guestvol/var /var'
'mount -r -t ext3 --bind /basevol/var/lib/rpm /var/lib/rpm'
Return
This locally-written boot script must be made executable and visible to its owner,
root , alone:
chmod u+rwx /etc/rc.d/boot.guestvol
chmod go-rwx /etc/rc.d/boot.guestvol
Enabling Basevol/Guestvol
Above you created a boot-time script. The following steps will enable its execution.
-
Issue the following command to create a symbolic link in
/etc/init.d/boot.d
pointing to your script:
ln -s /etc/rc.d/boot.guestvol /etc/init.d/boot.d/S03boot.guestvol
-
For clarity, rename one SUSE-provided boot script to make the run-order unambigous:
mv /etc/rc.d/boot.d/S03boot.rootfsck /etc/rc.d/boot.d/S04boot.rootfsck
-
Create a useful tool to copy the read/write directories from the boot device. Called
/etc/rc.d/make-guestvol , the script should appear similar to:
#!/usr/bin/regina
Trace Normal
Say 'Make-GuestVol begins... .'
'umount /guestvol'
'mke2fs -j -b 4096 /dev/dasdc1'
If RC <> 0 Then Exit RC
'mount /dev/dasdc1 /guestvol'
If RC <> 0 Then Exit RC
dirs = '/dev /etc /home /var /root /srv /tmp /opt/local /usr/local'
Do i = 1 To Words( dirs )
dir = Subword( dirs, i, 1 )
Say 'Copying 'dir' to guest volume... .'
'tar -clpSf - 'dir' | (cd /guestvol ; tar -xpSf - )'
If RC <> 0 Then Exit RC
End i
/*---------------------------------------------------------------------+
| Discard from the copy of /etc/rc.d (that is, /guestvol/etc/rc.d) the |
| SUSE-provided boot script boot.rootfsck. This is because when using |
| a R/O root file system there is no need for a file system check |
+---------------------------------------------------------------------*/
'rm /guestvol/etc/rc.d/boot.d/S04boot.rootfsck'
If RC <> 0 Then Exit RC
Say 'Make-GuestVol ends.'
Exit 0
Ensure that only root can see and execute this tool:
chmod u+rwx /etc/rc.d/make-guestvol
chmod go-rwx /etc/rc.d/make-guestvol
-
Recall that the objective is to keep the boot device in read-only mode with a small subset
of all the directories read/write, located on a disk separate from the boot device. To
create this small subset of directories issue the command:
/etc/rc.d/make-guestvol
The response is similar to:
Make-GuestVol begins... .
umount: /guestvol: not mounted
6 *-* 'umount /guestvol'
+++ RC=1 +++
mke2fs 1.34 (25-Jul-2003)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
54016 inodes, 53997 blocks
2699 blocks (5.00%) reserved for the super user
First data block=0
2 block groups
32768 blocks per group, 32768 fragments per group
27008 inodes per group
Superblock backups stored on blocks:
32768
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Copying /dev to guest volume... .
tar: Removing leading `/' from member names
tar: /dev/log: socket ignored
Copying /etc to guest volume... .
tar: Removing leading `/' from member names
Copying /home to guest volume... .
tar: Removing leading `/' from member names
Copying /var to guest volume... .
tar: Removing leading `/' from member names
tar: /var/lib/ntp/dev/log: socket ignored
tar: /var/run/.resmgr_socket: socket ignored
tar: /var/run/.nscd_socket: socket ignored
tar: /var/spool/postfix/private/rewrite: socket ignored
tar: /var/spool/postfix/private/bounce: socket ignored
tar: /var/spool/postfix/private/defer: socket ignored
tar: /var/spool/postfix/private/trace: socket ignored
tar: /var/spool/postfix/private/verify: socket ignored
tar: /var/spool/postfix/private/proxymap: socket ignored
tar: /var/spool/postfix/private/smtp: socket ignored
tar: /var/spool/postfix/private/relay: socket ignored
tar: /var/spool/postfix/private/error: socket ignored
tar: /var/spool/postfix/private/local: socket ignored
tar: /var/spool/postfix/private/virtual: socket ignored
tar: /var/spool/postfix/private/lmtp: socket ignored
tar: /var/spool/postfix/private/anvil: socket ignored
tar: /var/spool/postfix/private/maildrop: socket ignored
tar: /var/spool/postfix/private/cyrus: socket ignored
tar: /var/spool/postfix/private/uucp: socket ignored
tar: /var/spool/postfix/private/ifmail: socket ignored
tar: /var/spool/postfix/private/bsmtp: socket ignored
tar: /var/spool/postfix/private/vscan: socket ignored
tar: /var/spool/postfix/private/procmail: socket ignored
tar: /var/spool/postfix/public/cleanup: socket ignored
tar: /var/spool/postfix/public/flush: socket ignored
tar: /var/spool/postfix/public/showq: socket ignored
Copying /root to guest volume... .
tar: Removing leading `/' from member names
Copying /srv to guest volume... .
tar: Removing leading `/' from member names
Copying /tmp to guest volume... .
tar: Removing leading `/' from member names
Copying /opt/local to guest volume... .
tar: Removing leading `/' from member names
Copying /usr/local to guest volume... .
tar: Removing leading `/' from member names
Make-GuestVol ends.
If you review the logic of the Rexx tool, shown above, you'll see that the tool:
- Mounts and formats the guest volume (minidisk 202, also known as
/dev/dasdc1 )
- Copies the R/W directories from the base volume (minidisk 201, also known as
/dev/dasdb1 ) to the guest volume
- Discards the SUSE-provided script which on a conventional system does a file system
check. This check is never needed when the Linux guest ID never has the root file
system in R/W mode.
At this point you have completed the installation of Linux on the base server,
LINXBASE , which owns the golden base volume. The guest server(s) can now be
created and can utilize the DASD in shared R/O mode.
Implementation of a Guest Server
Now that the base volume has been created it is possible to exploit it. To do so create a
Linux server which will utilize the read-only boot device, as well a its own copy of the
necessary read/write materials. The steps to follow are detailed below. You must perform each
of these steps every time you create a new Linux server:
-
Create or update the server so that the VM directory entry matches the supported
configuration:
USER LNXGUEST password 64M 256M G
* 64M is acceptable to run SLES8.
ACCOUNT acntcode distcode
MACHINE ESA
IPL CMS
CONSOLE 009 3215 T OPERATOR
SPOOL 00C 2540 READER *
SPOOL 00D 2540 PUNCH A
SPOOL 00E 1403 A
LINK MAINT 190 190 RR
LINK MAINT 19E 19E RR
LINK MAINT 19D 19D RR
NICDEF C00 TYPE QDIO DEVICES 3 LAN SYSTEM INTRANET
MDISK 191 3390 start-cyl 1 MR
* dsk 200 Swap 409600 512-byte blocks = 200M
MDISK 200 FB-512 V-DISK 409600 MR
* Link to "golden" Linux base-volume MUST be in R/O mode ALWAYS
LINK LINXBASE 201 201 RR
MDISK 202 3390 start-cyl 150 MR
The server will find the kernel and the majority of its applications on the read-only
201 disk. Since these materials are never updated by the guest server a large number of
Linux servers can share the materials on this disk, saving DASD for more valuable uses.
The size of the 202 disk must match the size of the 202 disk which was used to create
the model server. In the example shown here the 202 disk is the minimum size, 150
cylinders. This is enough space for the basic SLES materials. Any additional space for user
home directories is above and beyond this value. You should also understand that the size
of this device can be increased, at a later date, if needed, by following procedures documented on
the LinuxVM.org web site.
- Log onto the Linux guest server. On the 191 A-disk, create file
PROFILE EXEC
to IPL the Linux 201 boot-disk.
-
Recall that during the post-installation step the tool
make-guestvol copied
several directories to the base-servers's 202 disk. A copy of this disk will be created on
the gust server's 202 disk now. Use the CP LINK command to gain access to the
model's 202 disk in R/O mode:
CP LINK LINXBASE 202 1202 RR
Use DDR to copy the model's 202 disk to the new server's 202 disk:
DDR
SYSPRINT CONS
INPUT 1202 DASD
OUTPUT 202 DASD
COPY ALL
Detach the model's 202 disk when done:
CP DETACH 1202
- Run the
PROFILE EXEC which will start the new Linux server for the first
time.
At this point you're booting a functionally identical copy of the model base-volume server.
All the attributes of the model server are still in effect. For example, the IP address and
host-name used by the base server are still as configured when Linux was first installed. For
this reason the base-volume server (LINXBASE ) cannot run simultaniously with the
guest-volume server (LNXGUEST ).
Final Customizations
The files which Linux uses to uniquely identify this server are located on R/W disk space,
and in particular in directory /etc . In the next few steps you'll perform the
final customization for the guest server, by changing these files.
- Allocate an IP address for the new server. This address must be on the same Virtual
Switch as the original base-server. (After the server is completely operational you can
change the IP address, or even the Virtual Switch used, if you see fit.)
- Telnet to the IP address or DNS name of the base server. For example:
linxbase.ca.com
- Logon as root. Specify the logon password for root which you used when you first
installed Linux onto
LINXBASE .
- Start YaST. Configure the IP address and host-name for the new Linux server. Exit from
YaST.
-
Shutdown Linux using the reboot option:
reboot
- Telnet to the guest Linux server using its proper name. For example:
lnxguest.ca.com
-
Logon as root once more. Change root's password:
passwd
This password you will give to the owner of the server.
Above are the minimal final customizations. However since directories such as
/etc are read/write and since virtually all the customization of Linux is
controlled by files in /etc , you can, if need be, make each server completely
unique.
Note: Should you find a configuration file (or virtually any other file) which
resides in a directory on the R/O shared disk space, you can use a Linux symbolic link to
redirect that file from its current location to another R/W directory, such at
/etc .
|