[vox-tech] Linux Block Layer is Lame (it retries too much)

Mike Simons vox-tech@lists.lugod.org
Wed, 28 May 2003 15:09:21 -0400


--cWoXeonUoKmBZSoM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, May 28, 2003 at 11:31:56AM -0700, Jeff Newmiller wrote:
> On Tue, 27 May 2003, Mike Simons wrote:
> >   Last week I was having problems with the ide layer... it retries
> > way too many times.  I was trying to read 512 byte blocks from a dying
> > /dev/hda (using dd_rescue which calls pread), for each bad sector the
> > kernel would try 8 times,=20
[...]
> >   Even better because the process is inside a system call, it is not
> > killable and so there is no practical way to speed up the process.
>=20
> It should be open to termination between the time the read system call
> returns and the write system call starts.

  Yes, it was "killable" in that you could ^C or send a signal with kill,
after waiting 10 minutes the kernel would finish retrying and the=20
process would exit cleanly on the signal.

  I meant there was no way to abort the 8 sector read attempt.


> > - How does a 1 sector read is expanded to an 8 sector chunk?
>=20
> I don't know.  But I suspect it has to do with the "natural" way files are
> read in... by "mmap"ing them to pages in RAM.  i386 memory managers
> usually use 4k pages... ergo, 8 x 512B sectors.
>=20
> Some of this behavior may be due to the algorithms in dd_rescue.

  Nah... dd_rescue is certainly not the cause.  It is a very simple
program that reads blocks of a size you can specify on the command line.

  It has the concept of a "soft block size" which it uses to quickly cover
the good sections of disk, and a "hard block size" which it uses to
slowly walk the bad sections of disk.  By default it will use the soft
size until a read error happens, it will then drop to the hard block
size and read until it travels a few "soft" block sizes without errors.
  I realize I was not explicit enough, but I has set the "soft" and=20
"hard" block size to 512 bytes, which because soft and hard are the
same will prevents dd_rescue from retrying the read of any bad blocks...

> > - Any other ideas on how to pull the disk blocks?
>=20
> Not easy ones. (Build your own device driver that doesn't use mmap.)

  Michael Wenk suggested using O_DIRECT on the open call, which is
an excellent idea.  This was what the Oracle people at their Clustering=20
Filesystem talk.  I have one more failing hard drive around which I'm=20
going to try that on...=20


> >   I was using a custom Knoppix boot floppy and a standard Knoppix CD to=
=20
> > boot a laptop with the bad drive, NFS mounting a local machine, where I=
=20
> > was dd_rescue sending the blocks that could be read.=20
>
> I had a similar experience a few weeks ago... dd would fail at certain
> areas on the disk, so I would use the skip option for dd to pick up after
> the dead spots.  (I didn't know about dd_rescue.) Nevertheless, the
> process was too slow, so I pulled the disk and simply replaced it.

  The slowness is really due to how many times the kernel retries, it
only takes a few seconds for the kernel to know the block is bad...

  If you haven't already returned that drive, you may be able to get=20
most of the filesystem off of it... all of the 4 or 5 failing drives
I've tried pulling data off have provided a working filesystem (if
you ignore the one that last one that transfer didn't complete due to=20
timing).


> Slick.  I was using netcat.

  What I used to do was use attach a good drive to the system, use the
debian install floppies to boot the system, then mount a floppy disk
with the junk I needed (dd_rescue) to pull the image.

  Knoppix as the rescue system works much nicer.  If you want to tweak
the kernel Knoppix uses create a "boot floppy" from the Knoppix CD
(which is meant to allow the disk to boot on machines with non-bootable
CD rom drives).  The .img is a dos filesystem which you can replace
the vmlinuz image with one of your own making.

  In order to be a NFS client in Knoppix you will need to start a two
local services... nfs-common and portmap (see previous post on Knoppix).
The Knoppix images support NFSv3, which results in a dramatically faster
transfer rate... but you will need the nfs-kernel-server package on
the server side to support that... over 100 Mbit Ethernet I was getting
something like 11 MiB per second.
  I would still recommend putting the target drive in the machine with
the source bad drive if you can, because with dma mode on you should be=20
able to get about 40 or 50 MiB/s ... in the good sections of disk.

  By using a nfs server I was easily able reboot a few times and still=20
keep all log files from the mirroring and scripts to minimize how much=20
I needed to type to get things going right... "ddr" or "ddr -s 10289.0k"
I'll send the ddr script I was using if you are interested...

--=20
GPG key: http://simons-clan.com/~msimons/gpg/msimons.asc
Fingerprint: 524D A726 77CB 62C9 4D56  8109 E10C 249F B7FA ACBE

--cWoXeonUoKmBZSoM
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+1Qlh4Qwkn7f6rL4RApf8AKCi+n6d24iHjA6P8MMgV4yXRTcJigCdGX1c
XmPdQravQbhcL6o26sC+ReY=
=KVZS
-----END PGP SIGNATURE-----

--cWoXeonUoKmBZSoM--