I heard of storage hierarchies first, when learning the ideas of the Tivoli Space Manager for AIX (formerly known as HSM - hierarchical storage manager). There is similar software like SAMfs for Solaris or OTG Diskextender for Microsoft's OSes, or even more.
Hierarchical Storage points to the fact, that there is a hierarchie of availability of data to a system. On the high end of the hierarchy is data in the cpu, on the low end there is data on removable media.
This article suggests a way to deal with removable media, while keeping them known to the system. It is not my objective to imitate one of the trademarked products mentioned above (they concentrate on moving files to tape). However there might be some similarities: HSM leaves a stub in the filesystem that indicates where to find the original file that was there, before it was moved to tape or another lower hierarchy.
Short answer: me - for my video and audio recordings.
Long answer: Everybody who systematically archives data on external media.
The data most frequently stored on external media - besides backups - is probably such as TV recordings, radio plays, MP3 music, etc to CDROM or DVD. You do not want to put everything on harddisk, for say 1Euro/GB but you would prefer DVD for 0.1Euro/GB that you can take to friends or the DVD-Player in the living room.
Another reason for wanting this is file serving. I have five harddisks and three DVD drives in my system (this concept does not need to stop at system boundary, but that assumption makes the explanation simpler). Everytime I am asked for putting a DVD into a drive, I try the wrong drive first. I would like to access the media without caring about the drive. I want to replace media according to the media contents not according to the drive properties.
Too theoretically? Here is an example: Why do I prefer SuSE Linux from DVD instead of CDROM? I start the update and go away for an hour. I do not like to be asked to exchange the CDROM every 10 minutes. Even if I have 6 drives for all CDROMs, I do have to exchange them in the first named drive (or I always have to replace the mount path). Here I describe a way how to mount the correct volume no matter which drive is being used.
Currently there is a first running implementation of this. I will try to improve the usability of the program interfaces which is no focal point in the current implementation efforts.
I also restricted myself to iso filesystems. The main reason was that I found a very quick way to generate hash numbers on the contents without mounting the volumes. The details see below.
In principle there is no need to consider a single system only. But there are other things to be done first. The current impetus is to mount available volumes quickly and to ask for those that are not accessible.
There are circumstances (e.g. 'ls' in color mode) where you do not want to be asked for media changes, but you prefer to ignore them silently.
It is best described when using an example. I start with a DVD that I burned myself and put it to a drive. I run (yes currently it requires manual registration) the isoregister program that generates the required links. I look to my link directory
# ls /video . .. Godfell1.avi Godfell2.avi # ls -l /video total 1024 drwxrwxrwt 9 aneuper users 3256 2004-08-14 15:28 . drwxr-xr-x 3 root root 992 2004-08-13 20:24 .. lrwxrwxrwx 1 aneuper users 58 2004-03-20 18:20 Godfell1.avi -> /hash/ae31386abe053e305ceb2b932e7bc005c6d89b70/Godfell1.avi lrwxrwxrwx 1 aneuper users 58 2004-03-20 18:20 Godfell2.avi -> /hash/ae31386abe053e305ceb2b932e7bc005c6d89b70/Godfell2.avi # mplayer /video/Godfell*
Whenever I now ask for /video/Godfell1.avi I try to access the media with the SHA1 hashnumber ae31386abe053e305ceb2b932e7bc005c6d89b70. If the media is in the drive I get its contents within a few seconds, otherwise I am asked for inserting it.
Please note, that this hashnumber is NOT for the complete image, but the content table only. (Therefore it does not ensure the integrity of the content.)
I do not want to talk about religion here, but I do believe into the KISS principle:
Keep It Short and Simple
Therefore I restricted myself to four parts, that the implementation builds on:
I suggest to keep a conventional symbolic link containing a hash number. The hash number is a reference for the media and should be unique. Each file on the media is available to the system by accessing:
/mountdir/hashnumber/mediapath
The mountdir is arbitrary in principle. But once selected and having generated links to it, you may hardly like to change it easily. The hashnumber is generated using the programs isoinfo and sha1sum. Maybe we depend on certain releases of isoinfo, since changes in the layout of the '-l' reply influences the checksum. The mediapath is identical to that from of the file on the filesystem when standard mount options apply.
The most vital part is mounting the media automatically, if it is presented to the system.
For my current implementation I use the following line in /etc/auto.master:
/hash program:/etc/auto.hashNeedless to say there should be an executable and correctly working /etc/auto.hash. I am currently testing this one:
#!/bin/sh
#
# This is currently for testing purposes only
#
KEY="$1"
LISTDIR=/hashlist
DEVICELIST="/images/*.iso /dev/scd0 /dev/hdc /dev/hdh"
# I ranked my drives by performance here,
# which is equivlent to the times I use it.
# The first hit wil be returned (should we return all?)
#
if [ ! -d $LISTDIR ]
then
mkdir $LISTDIR
fi
#
if [ -x /usr/bin/sha1sum ]
then
HASHSUM=/usr/bin/sha1sum
else
if [ -x /usr/bin/sha1sum ]
then
HASHSUM=/usr/bin/md5sum
else
exit 1
fi
fi
if [ -x /usr/bin/isoinfo ]
then
ISOINFO=/usr/bin/isoinfo
else
exit 1
fi
#
#
for DEVICE in $DEVICELIST
do
HASHID=$($ISOINFO -l -i $DEVICE | $HASHSUM | /usr/bin/cut -d' ' -f1)
if [ ! -f "$LISTDIR/$HASHID" ]
then
$ISOINFO -l -i "$DEVICE" >"$LISTDIR/$HASHID"
fi
if [ "$HASHID" = "$KEY" ]
then
case $(dirname $DEVICE) in
/dev*) echo -e "-fstype=iso9660,ro\t:$DEVICE"
;;
*) echo -e "-fstype=iso9660,ro,loop\t:$DEVICE"
;;
esac
exit 0
fi
done
#/usr/X11R6/bin/xprompt -t $LISTDIR/$HASHID
exit 1
The register script helps you to collect the information in a big link base. This is a kind of database. It really depends on the media contents that you want to register, how you want it to behave:
Feel free to suggest new options (as long as you explain it).
The reference directory is used to store information that helps you to find thee requested CDROM. With this distribuition comes a default collection of the CDROM/DVD header information (containing Volume ID, etc) and a listing of directory structure (done by isoinfo).
This information pops up, when you are asked to insert the media. The presence of this information is not vital, if you have other means to identify the media (I do recognise the media by the requested files - no not the hashnumber).
Whenever you do something, that seem not to be there before, you think it could have been done better afterwards. However, I spent less time on putting this together than looking for an existing and working solution on the internet.
The point I am not happy about is the hash number. It is too proprietary from my point of view, since it allows only iso formats in the current version. I do not know a general solution yet. Do not forget, the identifier must be quick to obtain from the media and it must contain a component that identifies the media reliably.
Initially I wanted a human readable identifier. And I found that the first three CDs I took had a nearly identical ISO-Header. Therefore I was thinking of something unique and I immediately thought of checksums.
You might immediately think how long it would take to calculate a checksum of a CDROM. You are right, time is relative, but nobody requires the whole CDROM to be investigated. I suggest to do a checksum on the directory structure. Using isoinfo allows to read the directory structure without mounting the drive. This is fast.
But there is a weak point: Minor changes in the layout of isoinfo and all checksums are wrong. Please suggest a better solution, if you know one. Until then, I suppose it is best to adjust isoinfo with a standard volume.
The way the hashnumbers are currently implemented in the interface, there are only minor changes necessary to use it with automated libraries, either disk or tape. You can easily replace the hashnumber with the cartridge label or the the slot number (if you lack a barcode reader).
I suggested to access the external media by an symbolic link. The media does not need be present to read the contents. Further it does not take much space and can carry a lot of information in its filename. I think there is little improvement possible here.