Hard disks can fail unexpectedly and it is always best to keep recent backups of all important data. Please keep in mind that even if a current or oncoming failure is detected, there may not be enough time to backup the data. Below are several methods that can be used to identify bad blocks or disk errors in CentOS/RHEL.

 

Using smartctl

If there are several I/O errors in /var/log/messages or one simply suspects the hard disks may be failing, smartctl can be a helpful tool in checking them. S.M.A.R.T. stands for Self-Monitoring, Analysis and Reporting Technology. You have to enable the S.M.A.R.T. support in the BIOS before using it.

 

Next, install the needed packages to run /usr/sbin/smartctl. In Red Hat Enterprise Linux, it is provided by the smartmontools package.

 

1. Verify if your hard disk supports S.M.A.R.T. :

# smartctl -i /dev/xxx

 

Replace /dev/xxx with the hard disk of interest when using the commands outlined in this post.

 

2. For SATA drives use:

# smartctl -i -d ata /dev/xxx

 

3. Enable S.M.A.R.T. support with:

# smartctl -s on /dev/xxx            ### For SCSI Disks
# smartctl -s on -d ata /dev/xxx     ### for SATA Disks

 

4. Running the following command as root can be a quick PASS/FAIL test but more thorough testing discussed below is generally more conclusive:

# smartctl -H /dev/xxx

 

Running smartctl in the background

 

To start a background test run the following as root:

# smartctl -t long /dev/xxx

 

To access the results, use the following command:

# smartctl -a /dev/xxx

 

To learn more about various options that can be used with smartctl view the man page of the command:

# man smartctl

 

Using badblocks

You can also use the “badblocks” command in order to check for bad blocks on a disk device. The “badblocks” command can be very useful in isolating problems with syncing LVM partitions within Linux. LVM operations will fail due to bad blocks on a disk. Bad blocks on either the source or destination disk within a LVM mirror will cause a synchronization failure.

 

Badblocks can also be used in conjunction with the fsck and makefs to mark the blocks as bad. If the output of badblocks is going to be fed to the e2fsck or mke2fs programs, it is important that the block size is properly specified, since the block numbers which are generated are very dependent on the block size in use by the filesystem. For this reason, it is strongly recommended that users not run badblocks directly, but rather use the -c option of the e2fsck and mke2fs programs.

 

Warning:

The mis-use of these commands can cause data loss. Additional information on the command “badblocks” is available using the “man badblocks” command.

 

1. Use the disk checking tool badblocks to scan the specified hard disk block by block. For example, to scan /dev/sdd issue the commands:

 

# mount | grep sdd                  # find all mounted partitions of sdd
# umount /dev/sdd1                  # unmount the partitions (may be more then one)
# badblocks -n -vv /dev/sdd

 

Where -n is use non-destructive read-write mode. By default only a non-destructive read-only test is done.

 

Note: Never use the -w option on a device containing an existing file system. This option erases data! If write-mode testing needs to be performed on an existing file system, use the -n option instead. It is slower, but it will preserve the data.

 

2. If the messages similar to the examples found below appear in /var/log/messages or to the console following the running of badblocks it is recommended to backup any data on the affected devices and replace the device:

 

Apr  4 13:50:40 test kernel: sdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr  4 13:50:40 test kernel: sdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=74367249, sector=74367232
Apr  4 13:50:40 test kernel: ide: failed opcode was: unknown
Apr  4 13:50:40 test kernel: end_request: I/O error, dev sdd, sector 74367232
Apr  4 13:50:42 test kernel: sdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr  4 13:50:42 test kernel: sdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=74367249, sector=74367240
Apr  4 13:50:42 test kernel: ide: failed opcode was: unknown
Apr  4 13:50:42 test kernel: end_request: I/O error, dev sdd, sector 74367240
Apr  4 13:50:44 test kernel: sdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }

 

3. The command below will dump found bad blocks to the output file: badblocks.log.

 

# badblocks -v -o badblocks.log /dev/sdd

 

Was this answer helpful? 0 Users Found This Useful (0 Votes)