[ Docs | Tools | Advisories | Full-Disclosure ]
We recently experienced a drive failure in a RAID attached to a Compaq Proliant 1600R. The RAID controller in question is a Compaq Smart Array 5304 (128MB cache). Here's what to expect:
The failure is detected:
cpqary3: [ID 702911 kern.warning] WARNING: cpqary3: [ID 103154 kern.warning] WARNING: Bus = 1 : Device = 11 : Function = 0 cpqary3: [ID 404339 kern.notice] Event Occured on ......... 03/26/2002 cpqary3: [ID 678209 kern.notice] Event Time................ 13:00:27 cpqary3: [ID 269178 kern.notice] Description............... Physical drive failure: SCSI port 3 ID 2 cpqary3: [ID 715728 kern.notice] Physical Drive Num........ 16 cpqary3: [ID 647361 kern.notice] Failure Reason............ UNKNOWN
A hot spare is located:
cpqary3: [ID 702911 kern.warning] WARNING: cpqary3: [ID 103154 kern.warning] WARNING: Bus = 1 : Device = 11 : Function = 0 cpqary3: [ID 404339 kern.notice] Event Occured on ......... 03/26/2002 cpqary3: [ID 678209 kern.notice] Event Time................ 13:00:27 cpqary3: [ID 269178 kern.notice] Description............... State change, logical drive 0 cpqary3: [ID 677830 kern.notice] Logical Drive Num......... 0 cpqary3: [ID 407483 kern.notice] Prev Logical Drive State.. OK cpqary3: [ID 732945 kern.notice] New Logical Drive State... Regenerating cpqary3: [ID 553769 kern.notice] Current Spare Status...... cpqary3: [ID 166510 kern.notice] Defined cpqary3: [ID 509785 kern.notice] Available
The hot spare is activated:
cpqary3: [ID 702911 kern.warning] WARNING: cpqary3: [ID 103154 kern.warning] WARNING: Bus = 1 : Device = 11 : Function = 0 cpqary3: [ID 404339 kern.notice] Event Occured on ......... 03/26/2002 cpqary3: [ID 678209 kern.notice] Event Time................ 13:00:27 cpqary3: [ID 269178 kern.notice] Description............... State change, logical drive 0 cpqary3: [ID 677830 kern.notice] Logical Drive Num......... 0 cpqary3: [ID 407483 kern.notice] Prev Logical Drive State.. Regenerating cpqary3: [ID 732945 kern.notice] New Logical Drive State... Needs Rebuild Permission cpqary3: [ID 553769 kern.notice] Current Spare Status...... cpqary3: [ID 166510 kern.notice] Defined cpqary3: [ID 974324 kern.notice] Active
... and rebuild begins:
cpqary3: [ID 702911 kern.warning] WARNING: cpqary3: [ID 103154 kern.warning] WARNING: Bus = 1 : Device = 11 : Function = 0 cpqary3: [ID 404339 kern.notice] Event Occured on ......... 03/26/2002 cpqary3: [ID 678209 kern.notice] Event Time................ 13:00:28 cpqary3: [ID 269178 kern.notice] Description............... State change, logical drive 0 cpqary3: [ID 677830 kern.notice] Logical Drive Num......... 0 cpqary3: [ID 407483 kern.notice] Prev Logical Drive State.. Needs Rebuild Permission cpqary3: [ID 732945 kern.notice] New Logical Drive State... Rebuilding cpqary3: [ID 553769 kern.notice] Current Spare Status...... cpqary3: [ID 166510 kern.notice] Defined cpqary3: [ID 974324 kern.notice] Active cpqary3: [ID 388622 kern.notice] Building
The machine was powered down, and we booted the RAID tools from the Compaq Diagnostic partition. Once this was loaded, we were able to swap the defected drive, and restart Solaris. On boot, the following messages were recorded:
cpqary3: [ID 702911 kern.warning] WARNING: cpqary3: [ID 103154 kern.warning] WARNING: Bus = 1 : Device = 11 : Function = 0 cpqary3: [ID 404339 kern.notice] Event Occured on ......... 03/26/2002 cpqary3: [ID 678209 kern.notice] Event Time................ 13:00:27 cpqary3: [ID 269178 kern.notice] Description............... Hot-plug drive removed: SCSI port 3 ID 2 cpqary3: [ID 486352 kern.notice] Physical Drive Num ....... 16 cpqary3: [ID 479030 kern.notice] Configured Drive ? ....... YES cpqary3: [ID 702911 kern.warning] WARNING: cpqary3: [ID 103154 kern.warning] WARNING: Bus = 1 : Device = 11 : Function = 0 cpqary3: [ID 404339 kern.notice] Event Occured on ......... 03/26/2002 cpqary3: [ID 678209 kern.notice] Event Time................ 13:00:58 cpqary3: [ID 269178 kern.notice] Description............... Hot-plug drive inserted: SCSI port 3 ID 2 cpqary3: [ID 486352 kern.notice] Physical Drive Num ....... 16 cpqary3: [ID 479030 kern.notice] Configured Drive ? ....... YES
John Cartwright <johnc@grok.org.uk>