Software
Information
Community
NewsFunCredits |
Notes on using an STK Iceberg/IBM RVA on Linux zSeriesThis information was contributed by Jim Sibley on July 19, 2002.We have been successfully using the STK Iceberg/IBM RVA as DASD storage for Linux on the S/390-zSeries in LPAR and VM environments in our software lab since July, 2000. It behaves much like any other DASD device with a few exceptions, which I discuss here. The STK Iceberg, marketed by IBM as the RVA, is a large scale direct access data storage device using the IBM OS (MVS/VM/zOS/zVM) Extended Count Key Data (ECKD) recording format. It provides hardware data compression to minimize the size of the "backend" or real disk for data recording and RAID5 protection for the recorded data. It appears to Linux as a 3990 controller and the volumes are most often 3390-3 images, but may be other 3390 sizes or 3380 images. The information I mention below is for 3390-3 images. Only one volume on the RVA we used is an MVS volume for MVS IXFP communication and reporting from an MVS LPAR. All others are Linux formatted using the ext2 file system. The MVS IXFP LISTCFG DEV report was used to show the backend space occupied. There are three areas though, to which we have to pay close attention:
The RVA achieves some economy by compressing data before it is recorded on the actual backend disk. Hardware data compression reduces the logical data to a smaller physical data package for recording. Hardware timing gaps found on physical disk images are eliminated. ECKD has a lot of space "wasted" on the disk with these timing gaps. The smaller the blocks written, the more "waste" there is. A record on disk would look like: R0 defines the length of the trackfile Rn is the length of the key/data section Key is the data key (only written for keyed data). Data is the data block, which can be variable in size ... are physical gaps on the track to allow for rotational delay Then keyed data would be recorded on an ECKD device as R0 ... R1 ... Key ... Data ... R2 ... Key ... Data but recorded on the RVA as R0R1KeyDataR2KeyDataNon-keyed data would be recorded as R0 ... R1 ... Data ... R2 ... Data ... R3 ... Data but be recorded as R0R1DataR2DataR3DataRVA Data Compression for Linux For Linux, the RVA compression may be less effective than other OS. The capacity of a 3390-3 logical image is 2.29 GB when preformatted in 4k blocks. 12 blocks/track 15 tracks/cylinder 3,339 cylinders/vol = 601,020 4k blocks per volume = 2.29 GB capacityAll this Linux preformatting causes the tracks to be non-empty. They occupy 102.6 MB of space on the real disk, even though Linux reports the volumes to be empty. (RVA usage was obtained with the IXFP SIBBATCH LISTCFG DEV report on an MVS system that can access the RVA). So, 256 "empty" Linux volumes require 25.7 GB of real disk space or a net capacity load (NCL) of 10.9% on an RVA with 235.76 GB real disk storage. In addition, a Linux file system, such as the standard Linux ext2 has file system overhead. You can use the tune2fs -l /dev/dasd/a1to see this overhead. Basically, it shows 601,017 4k blocks 3 blocks are lost to overhead) 30,050 blocks are reserved 570,967 blocks for user data and file system overhead or 2.17 GB useable capacityLinux data provides fewer opportunities for compression than much of the MVS and VM data. Often, records are padded with blanks or other recurring data, which the RVA can squeeze out. Under Linux, a tab character frequently defines multiple blanks and the record is terminated by a character rather than fixed length padding. The compression would be less effective, much as binary data is less compressible than text data for MVS or VM. For example, the SuSE SLES7 IPL volume is not very compressible. With KDE and networking, it is typically on the order of 1.3 GB and occupies 751.2 MB on the backend, or a compression ratio of 1.7 to one. To take this a bit further, if you split the IPL volume into 3 such that the /var directory is on one volume, the /usr directory is on a second, and the rest remain on the IPL volume, you would see something like directory reported recorded compression size size ratio / 314.1 MB 255.9 MB 1.23 to 1 /usr 790.0 MB 438.8 MB 1.80 to 1 /var 201.1 MB 259.6 MB .77 to 1The last result is a bit surprising as "df" shows the volume to be about 15% full. However, much of the volume is empty and an "empty" volume is 102.6 MB. There is more of the space reported for that volume occupied by zero tracks than by recorded data. II. Space Usage The IXFP software communicates between MVS and the RVA so that tracks that are empty need not be written, when a track is "deleted" for the OS VTOC, the track no longer occupies space on the RVA, and there is periodic space collection initiated from MVS. Since an empty track is not written, there additional savings by reducing the ECKD overhead. There is similar software for VM. Since the RVA software does not "know" about the Linux disk format and the Linux file systems, when a file is removed, no real space is given back. The file system changes the lists or queues the blocks are in, but the blocks are not rewritten, and the available block queue can account for considerable real space occupied on the RVA. Even with the Compatible Data Layout (CDL), where an OS VTOC is written, all tracks are stored because the VTOC shows the linux area as one large data set with no free space on the volume. If you were to fill up a volume with some random data, it might show about 1818 MB compressed data on the backend. file system 1k-blocks Used Available Use% Mounted on /dev/dasdd1 2366248 20 0 100% /mnt/3f63If you removed the files from the volume, Linux df would report file system 1k-blocks Used Available Use% Mounted on /dev/dasdd1 2366248 20 2246028 1% /mnt/3f63but the IXFP would still show 1818 GB. No space has been returned on the RVA. The Linux file system chains have just been rearranged. You must explicitly write compressible data to the RVA to get space back. You might use the Linux "dd" command to write a compressible file, then remove the file. dd if=/dev/zero of=/mnt/3f63/f1 bs=1k count=2246028/dev/zero is a device supplied by Linux /mnt/3f63/1 is the volume you want to compress bs=1k as the df command blocks are reported as 1k count=2246028, the available blocks from the df command Once the file /mnt/3f63/f1 is removed, IXFP would then report about 103.8 MB used on the volume. One should note that the full volume must be written before the zeroed files are removed or Linux may just reuse the zeroed available blocks and not clear the whole volume. There does not seem to be any advantage writing blanks over zeroes. They are reported to occupy the same backend space and the write rates for a single stream are very much the same. The following bash script might be a basis for compressing the free space on a Linux system #!/bin/bash # sample script to write zeroes on the end of all # mounted ext2 volumes then remove file to compress RVA # volumes. jlsibley@us.ibm.com # # No warranty given or implied by the author or IBM. # Use at your own risk # # 1) display the local ext2 files df -l -t ext2 # 2) remove the heading and root | tail +3 # 3) squeeze out unwanted blanks | tr -s ' ' # 4) use gawk to generate a line for each mounted # file system of the form # # dd bs=1k count="$4" if=/dev/zero \ # of="$6"/zeroes;rm "$6"/zeroes" # # where $4 is the fourth column (available space) # $6 is the sixth column (mount point) # # | gawk -F ' ' '{print "dd bs=1k count="$4" \ # if=/dev/zero of="$6"/zeroes;rm "$6"/zeroes"}' # # 5) execute the script | /bin/bash # you could delete this command if you just want to see the script generated # or write it to a file write to a file # # the actual command is a single line of code! (continued over three lines) df -l -t ext2 | tail +3 | tr -s ' ' | \ gawk -F ' ' '{print "dd bs=1k count="$4" \ if=/dev/zero of="$6"/zeroes;rm "$6"/zeroes"}' | /bin/bashCaution This technique should be used with caution because other processes may also be writing to the volume and may terminate with a "No space left on device" message before the zeroed file is removed. We are very careful to ensure that no other processes are writing to the volume when we use this technique. Use at your own risk as no guarantee or warranty of the results is given or implied. Special Error Notifications Though the RVA appears to Linux as a 3990 controller, there are some differences. The RVA can return a "long busy" for an I/O request. The usual action is to retry and Linux handles this correctly, though it puts in the /var/log/messages file a message like:
May 31 23:59:00 svlxdbt1 kernel: dasd_erp(3990): /dev/dasda(94:0),3d21@0xabc:Perform logging requested May 31 23:59:00 svlxdbt1 kernel: dasd:/dev/dasda(94:0),3d21@0xabc:ERP successful As of this writing, I am unaware of any mechanism under Linux on zSeries to notify Linux when the RVA is over 90% full. Under VM I would assume that VM intercepts the message and displays it to the operator. In an LPAR, there is no adequate warning and the RVA could fill up and stall. For this reason, we frequently monitor the Net Capacity Load (NCL) of the RVA. Summary The STK Iceberg/IBM RVA can be used as a storage device under Linux on the IBM s/390-zSeries. It behaves like a 3390/3390 or 3390/3380 device. However, assumptions about data compressibility, space recovery techniques, and space monitoring need to be rethought and revised. Disclaimer: All results were obtained on a SuSE SLES7 patchlevel1 Linux system, IBM 9672-G6 processor in LPAR mode, and an RVA x82 with 512 3390-3 volumes and a real capacity of 236 GB. Results may vary from installation to installation. Jim Sibley 07/18/2002 jlsibley@us.ibm.com |
|