Re: [PATCH] Fail on bad verify. from Shawn Lewis on 2007-08-03 (fio-devel)

From: Shawn Lewis <shawnlewis_at_google.com>
Date: Thu, 2 Aug 2007 15:35:27 -0700

On 8/2/07, Shawn Lewis <shawnlewis_at_google.com> wrote:
>
>
>
> On 8/2/07, Jens Axboe <jens.axboe_at_oracle.com> wrote:
> >
> > On Thu, Aug 02 2007, Shawn Lewis wrote:
> > > On 8/2/07, Jens Axboe <jens.axboe_at_oracle.com> wrote:
> > > >
> > > > On Wed, Aug 01 2007, Shawn Lewis wrote:
> > > > > diff --git a/verify.c b/verify.c
> > > > > index 5bdfb76..fadf1c0 100644
> > > > > --- a/verify.c
> > > > > +++ b/verify.c
> > > > > @@ -349,7 +349,7 @@ int verify_io_u(struct thread_data *td,
> > > > > struct verify_header *hdr;
> > > > > unsigned int hdr_inc, hdr_num = 0;
> > > > > void *p;
> > > > > - int ret;
> > > > > + int ret = 0;
> > > > >
> > > > > if (td->o.verify == VERIFY_NULL || io_u->ddir != DDIR_READ)
> > > > > return 0;
> > > > > @@ -401,7 +401,7 @@ int verify_io_u(struct thread_data *td,
> > > > > hdr_num++;
> > > > > }
> > > > >
> > > > > - return 0;
> > > > > + return ret;
> > > > > }
> > > > >
> > > > > static void fill_meta(struct verify_header *hdr, struct
> > thread_data
> > > > *td,
> > > >
> > > > Hmm, I think it's useful, but perhaps we also need an option to keep
> > > > going? It may be interesting to continue verifying and see if the
> > rest
> > > > of the blocks are ok, or if there are other failures in that data
> > set.
> > > > Something like a
> > > >
> > > > verify_failure_fatal
> > > >
> > > > or whatever, I'm not very good with names. If you have a better
> > option
> > > > name, please tell me :-)
> > >
> > >
> > > Actually I'd rather continue if there is a verify failure but it looks
> > like
> > > the original intention was to die, and it got lost somewhere. ret is
> > now not
> > > used. Maybe we should just always continue?
> >
> > Continue is what it does now. So I'm obviously fine with that behaviour,
> > as long as it's clearly noted that it failed (and the exit status shows
> > that).
>
>
> Hmm, exit status doesn't indicate failure if a checksum is bad currently:
>
> [root_at_ipww25 fio]# ./fio helloworld.fio
> file: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> Starting 1 thread
> crc32: verify failed at 2560/512
> crc32: wanted 213c02ee, got e51fc835
>
> file: (groupid=0, jobs=1): err= 0: pid=26601
> read : io=8KiB, bw=8,192KiB/s, iops=2,000, runt= 1msec
> clat (usec): min= 7, max= 13, avg=10.00, stdev= 4.24
> cpu : usr=0.00%, sys=0.00%, ctx=4
> IO depths : 1=100.0%, 2=0.0%, 4=0.0% , 8=0.0%, 16=0.0%, 32=0.0%,
> >=64=0.0%
> issued r/w: total=2/0, short=0/0
> lat (usec): 10=50.00%, 20=50.00%
>
>
> Run status group 0 (all jobs):
> READ: io=8KiB, aggrb=8,192KiB/s, minb=8,192KiB/s, maxb=8,192KiB/s,
> mint=1msec, maxt=1msec
>
> Disk stats (read/write):
> sda: ios=0/3, merge=0/0, ticks=0/3, in_queue=3, util=0.79%
> [root_at_ipww25 fio]# echo $?
> 0
>
> verify_io_u is a bit inconsistent about what it does for various types of
> errors. After you apply the verify_fill_pattern I can try to make this a
> little more consistent.
>

I was looking at the code to see how to implement a verify_failure_fatal
option, but it seems that it'll be more difficult than I originally thought.

verify_io_u is called via the callback end_io. Since we don't know what
we're actually calling when we call end_io it's a bit hard to figure out if
we had a verify failure or not.

We could have verify_io_u set some global when it has an error but always
return 0. That seems wrong though.

Any ideas?

-Shawn

--
> > Jens Axboe
> >
> >
>

Received on Fri Aug 03 2007 - 00:35:27 CEST

This archive was generated by hypermail 2.2.0 : Fri Aug 03 2007 - 01:00:01 CEST