How does Linux md-RAID handle disk read errors?

There are 2 cases:

  • the read command breaks at bit degree (30 secs by default),
  • the drive reports its unability to read an offered field prior to the bit shed perseverance (the instance I'm interested in)

Kernel timeout

As drive accessibility is generally experiencing the Linux SCSI layer, I assume the timeout instance is entirely taken care of by this layer. According to , it attempts the command numerous time after having reset the drive, after that the bus, after that the host, etc If none of this functions, the SCSI layer will certainly offline the tool. Now, I assume the md layer simply "discovers" that drive is gone, and also mark it as missing out on (fallen short). Is this proper?

Drive reported mistake

Some drives can be set up to report a read mistake after a particular timeout is gotten to, hence terminating inner recuperation efforts. This is called (or TLER, CCTL). The disk timeout is generally set up to cause prior to the OS timeout (or hw RAID controller), to make sure that the last recognizes what actually took place as opposed to simply "waiting and also aborting".

My inquiry is: just how does Linux (and also md) take care of drive - reported read mistakes?

Will it attempt once more, do something brilliant, or simply offline the drive without experiencing all efforts defined in "Kernel timeout" over? Is md also mindful when something takes place?

Some individuals that ERC threatens on Linux as it will certainly not offer adequate time for the drive to attempt to recoup. They additionally claim that ZFS - raid behaves due to the fact that if a read mistake takes place, it will certainly calculate the missing out on unreadable field information many thanks to RAID redundancy, and also overwrite it back on the drive. The last need to after that stop attempting to read the horrible field, instantly note it as negative (not to be made use of any longer), and also remap it on a wonderful rational field.

Is md additionally with the ability of doing this?

5
2022-07-25 20:41:07
Source Share
Answers: 0