IBM's RDAC for Solaris (failover for DS4000 line)
note: Details on RDAC for Solaris at CIT are kept up-to-date at our wiki site. Sadly you currently need a UVM NetID to access it. Write to me if you can't get to it.
CIT is looking at deploying a storage array from IBM's DS4000 line. I am testing RDAC for Solaris on diamondback, one of our WebCT servers.
To install RDAC on Solaris 9, go to www.ibm.com, downloads, storage, select the DS4500 (or whatever), and look for "storage manager". What you want is the package labelled "IBM DS4000 Storage Manager for Solaris". What you want is the "DS4000 Storage Manager for Solaris (single package)". It actually splits out the different modules available, one of which is RDAC. These are .pkg files which you can install with pkgadd.
Before you install them, though, make sure the host is up on its recommended patchsets and Sun Alert patchsets. This takes a while :-(
I have put the tarball in /:/fixdist/solaris.
I only installed the RDAC package, assuming that we would run SMagent/SMclient from a linux box. However, if someone wants to give SM a try under Solaris, be my guest - you'll need to get the X clients working, first, though.
RDAC under solaris is much more solaris-y than under Linux. There is no "mppUtil" command. Instead there is an "rdacutil" command, which appears to be fairly limited, and there isn't much for docs that I can find (except ancient stuff on docs.sun.com for Solaris 2.5.) Utilities and config files are in /etc/raid, so you'll need to add /etc/raid/bin to your PATH.
After installing the RDAC package, reboot with boot -r. Then you should see the DS4000 subsystems listed in /etc/raid/rdac_address, along with a string indicating controller ownerships. I see in my rdac_address file that it can see zoosan7. So I can run:
[11:59 AM root@diamondback raid]# rdacutil -i zoosan7
I get from this:
zoosan7: active/passive
Active controller a (1T50869436) units: none
Active controller b (1T50869529) units: none
rdacutil succeeded!
So at least it can see the subsystem. If you can't see this, check your zoning and whatnot. Use Qlogic's scli (on diamondback in /opt/QLogic_Corporation/SANblade_CLI/) to display WWN's and names of devices that can be seen. If you can't see IBM 1742-900, you can't see the DS4500 (FastT900). (The product ID will probably be different with the DS4800.)
It also helps to make sure that each HBA can only see one controller (in a dual-HBA config.) Use scli to verify this. If a single HBA in a dual-HBA config can see both controllers, you'll have problems during failover.
To add disks to the system, use the rdac_disks command. That will take a minute or so. If it has problems finding disks, it will tell you so. If it says nothing, use the "format" command to see if you have new disks available to the system. If you do, partition them up and newfs as you normally would.
Failover
Failover seems to work (well, when I yank cables, anyway.) You know that a failover has occurred by looking in /var/adm/messages. There is no equivalent to setsp (in SANPath) or mppUtil (Linux RDAC), at least not one I can find, but I'll ask. Failback doesn't happen automatically like it does with Linux RDAC. What will happen is that the array will get upset because a logical drive is not on the preferred path and send us email. We start up the SMclient GUI, find the LD that's on the "wrong" controller and move it back. This causes another "failure" to RDAC, and it "fails" back to the preferred path.
RDAC doesn't complain at all if you yank the path that isn't in use, but the kernel does. You see messages like this:
Jul 8 15:29:21 diamondback.uvm.edu qla2300: [ID 948457 kern.info] qla2300(0): Fibre Channel Loop is Down (8012)
So we would probably need to modify big brother to look for that. Either that or a sec script which watches for down/up events and will page if a down isn't rather quickly followed by an up.
Any questions?