SMART Error Alert: Locating Disk to be Replaced Print

  • 0

From time to time, servers will send out alerts when their disk(s) are having problems. A sample alert message is shown below:-

----- begin sample SMART error alert -----

Subject: SMART error (CurrentPendingSector) detected on host: server15.nocser.net

---

This email was generated by the smartd daemon running on:

host name: server15.nocser.net
DNS domain: nocser.net
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sdb, 1 Currently unreadable (pending) sectors

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.

----- end sample SMART error alert -----

Action Required:-

1. In the SMART error message, some preliminary info is already available to us:-
- Node: server15
- Disk device: /dev/sdb

2. SSH to node and confirm that the disk device has unreadable sector in node logs:-
# grep 'sdb' /var/log/messages
.....
Jul 3 12:18:26 server15 kernel: sdb: Current [descriptor]: sense key: Aborted Command
Jul 3 12:18:26 server15 kernel: end_request: I/O error, dev sdb, sector 35570367
.....

Jul 3 12:21:05 server15 kernel: raid5:md0: read error corrected (8 sectors at 35570304 on sdb1)
.....

Jul 3 12:24:42 server15 smartd[5510]: Device: /dev/sdb, 1 Currently unreadable (pending) sectors

3. Once confirmed via sample messages above, check model and serial number of disk:-
# smartctl -a /dev/sdb | more

.....
Model Family: Western Digital Caviar Blue Serial ATA
Device Model: WDC WD5000AAKX-001CA0
Serial Number: WD-WCAYUH804259
LU WWN Device Id: 5 0014ee 1ae6a07c3
Firmware Version: 15.01H15
User Capacity: 500,107,862,016 bytes [500 GB]
.....
( Take note HDD info above: Model, Serial Number, and Capacity ).

4. Prepare replacement disk of similar spec ( size and type only - model can be different ), and schedule for on site replacement.

5. When onsite, search for the disk matching the model and serial number obtained and replace it.

Please refer to Full documentation on disk replacement here: http://www.nocser.net/clients/knowledgebase/333/Replacing-A-Failed-Hard-Drive-In-A-Software-RAID1-Array.html



Was this answer helpful?

« Back