You are not logged in.
Hello,
It seems this problem is recurrent, I just got it and also some guy here:
https://bbs.archlinux.org/viewtopic.php?pid=1605209
How would you select the suggested burning mode in Brasero ?
Thanks
Marc
Offline
I burnt another CD using wodim and it worked ! Thanks x1000 for the -dao suggestion !
$ wodim -v dev=/dev/sg2 speed=4 -dao -eject archlinux-2016.02.01-dual.iso
Offline
Hi,
this must be a new bug with Linux kernel or udev.
A significant difference between CD TAO and CD SAO at read time
is that a TAO track ends by two non-data sectors, which cannot
be read by normal SCSI READ commands.
This can happen only with CD media. Not with DVD or USB sticks.
In the times of kernel 2.X, the situation in the Linux kernel was
quite bad: I/O error somewhere beginning 128 blocks before the
track end.
With kernel 3.X it improved. The I/O error reliably hits 2 blocks
before track end on 3.16. I did not try 4.X with real iron yet.
My best guess would be that this confuses the entity among the
ISO payload, which is in charge for creating /dev/disk/by-label/
links while the system boots up.
(If somebody can explain the factor 8 between "sector 1226752"
and "logical block 153344", then please do.)
Obviously the problem is still in archlinux-2016.02.01-dual.iso
What happens if you let the booted Linux read the non-working CD
up to the end ?
ISO=archlinux-2016.02.01-dual.iso
blocks=$(expr $(ls -l $ISO | awk '{print $5}') / 2048 )
dd bs=2048 skip=$blocks of=/dev/null if=/dev/sr0
Have a nice day
Thomas
Last edited by scdbackup (2016-02-17 22:23:43)
Offline
Hi,
the problem is reproducible with archlinux-2016.02.01-dual.iso
on an amd64 BIOS (i.e. not EFI) machine.
I used on CD-RW the command
xorriso -as cdrecord -v dev=/dev/sr0 blank=as_needed -eject -tao archlinux-2016.02.01-dual.iso
which adds a padding of 300 kB to ensure that the whole ISO is
readable even with the old read-ahead bug that is fixed in 3.16.
The CD has 359062 readable sectors. The i/o error is reported
with numbers 1436248 (= 359062 * 4) and 179531 (= 359062 / 2).
So it is really about the two unreadable sectors at the end of
the TAO track, which is recorded with 359064 sectors.
I get to a root shell prompt. The immediate problem seems to be that
blkid /dev/sr0
yields again the i/o error message and no id strings. Therefore udev
(or what's in charge) does not create the /dev/disk/by-label/ link.
On Debian 8 with kernel 3.16, i get from the same command
/dev/sr0: UUID="2016-02-01-15-37-21-00" LABEL="ARCH_201602" TYPE="iso9660" PTUUID="3dd32031" PTTYPE="dos"
The problem seems to be a nasty kernel regression
dd if=/dev/sr0 bs=2048 skip=358000 of=/dev/null
yields i/o error at 1436008 (= 359002 * 4) which is a new record
in terms of read-ahead bug. Kernel 4.3.3-3.
On kernel 3.16, i get no i/o error and all 1062 readable blocks get
reported.
Let's see whether outmost padding helps. The CD-RW offers 359847.
So if i fill 900 of the 935 remaining blocks with zeros ...
xorriso -as cdrecord -v dev=/dev/sr0 -eject -tao padsize=900s archlinux-2016.02.01-dual.iso
yields 359814 readable blocks.
Booting ... no success.
Regrettably the ISO is too fat for adding much more padding.
So i cannot explore whether this would help at all.
Note to myself: No 4.3.3 kernels.
Have a nice day
Thomas
Offline
I now filed https://bugs.archlinux.org/task/48234
Offline
Well, the bug report did not live long.
Obviously it is not considered important that a CD sized bootable ISO image
contains a kernel and/or blkid which cannot cope with the default write type of
about half of the CD burn programs in Archlinux.
Offline
scdbackup,
I think technically they are correct as the problem is not related to the archiso itself .
Is the error still present in latest arch kernel ( currently that's 4.4.1 ) ?
If so, is this problem arch-specific ?
In case the answer to both questions is yes, an arch bug against the kernel should be accepted.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Hello,
There is nothing to do with archiso, and reporting bug against kernel in our bugtracker does does not make sense, is not a packaging bug, should be reported to upstream, but this kind of errors is really old, something to do between incompatible drive firmware and kernel driver.
archiso documentation recommends cdrecord with DAO (wodim is a fork from cdrecordis, but dead and broken software). See https://projects.archlinux.org/archiso. … ansfer#n36
Offline
Hi,
i don't want to quarrel.
It is the decision of the Archlinux people what they put into their ISOs,
which are likely to be burned as TAO CD.
I diagnosed the problem as far as it is in the scope of my expertise
as an upstream programmer of burn software.
In this role, i can only roll my eyes to heaven when i read
"incompatible drive firmware". The TAO read-ahead bug is old, yes.
But it is solved in kernel 3.16. Not with the correct diagnosis
in the comments, but quite effectively.
See
https://github.com/torvalds/linux/blob/ … /scsi/sr.c
at line 345 ff. Only ioctl(BLKGETSIZE) still returns two blocks too
many on a Debian 8. Everything else works.
Somehow this progress has been devalued by new mechanisms for
reading ahead.
The reason of the read-ahead bug, as explained in the bug report, is that
a TAO track ends by two unreadble "run-out" blocks, which nevertheless
are announced by the CD Table-Of-Content as part of the track.
One just needs to avoid reading these two blocks, or one has to
react skillfully on the ILLEGAL REQUEST error which is caused by such
a read attempt.
Well, i myself can lean back. My software can read TAO tracks, and
if not the user or a frontend decides different, it burns CD as SAO
where possible.
Have a nice day
Thomas
Offline
Hi scdbackup
I have one question, if this is not a thing between firmware vs kernel driver. Why the same CD-R burned as TAO in some unit works but in other fails? I do not use CD/DVD since long time, but I remember things like this.
Thanks.
Last edited by djgera (2016-02-20 03:17:43)
Offline
Hi,
> Why the same CD-R burned as TAO in some unit works but in other fails?
I have five drives attached to my workstation. They all behave roughly the
same in this aspect. I can read all valid blocks by SCSI READ via ioctl(SG_IO)
if i avoid to include the last two blocks of the track in the block range
of the READ commands.
On my previous machine (kernel 2.6), it was the same.
The problem itself is reproducible and deterministic. But the way varies
how Linux triggers it. In the times of kernel 2.x, when i explored this,
there was quite obviously a read window of 32 or 64 blocks. The error struck
when this window touched the two end blocks.
The classical remedy of padding 300 kB of zeros prevented filesystem
drivers to request a window read near the very end of the track.
There was a small chance that reading the last valid blocks did not hit the
end blocks, if the window end did still fit into the valid block range.
Since nothing in the payload filesystem refers to the invalid blocks,
no window will then be read which contains the bad blocks.
The behavior of kernel 4.3 is not that clear, because i see complaints
about 2 i/o error blocks when dd fails. One at a quite random address
in the valid range, and one of the first invalid run-out block.
My theory is that the first address is the one wanted, and the second
one is where the READ command reports actual failure.
When blkid fails, both numbers are the same. So i assume that it really
wants to read the lasts two blocks of the track.
(Pity that the output of btrace(8) is so cryptic. I fail to spot the
READ commands and their block addresses. There are several CD inspection
commands to see, though.)
I now booted Archlinux on my noisy test machine from SAO CD and plugged
in a USB stick with GNU xorriso on it. (The system lets me exchange but
not umount the CD from which it booted. So running a checkread on the
TAO CD needs special option o_excl_off to circumvent the O_EXCL ban on
open(2)):
/mnt/iso/xorriso -osirrox o_excl_off -indev /dev/sr0 -check_media --
The kernel i/o error message appears already when xorriso opens the drive
by open(2). xorriso perceives no error at that moment. It can read all
valid blocks and perceives SCSI error ILLEGAL REQUEST when it hits the
run-out blocks. (This is drive #6 in my collection. An old LG DVD-ROM.)
I have to unroll my eyes from heaven, because drive #7 (Samsung SH-223B)
really behaves a bit different. It does not report ILLEGAL REQUEST but
rather MEDIUM ERROR when it hits the unreadable blocks.
This is indeed wrong firmware behavior but it would still be covered by the
error handler in drivers/scsi/sr.c .
Whatever, all seven drives can read all valid blocks, if they only
refrain from sending SCSI READ with start address and block range which
would include one of the two unreadable blocks.
------------------------------------------------------------------------
My proposal to Linux upstream would be to test CD tracks for unreadble
last two blocks and to reduce perceived track size by 2 blocks if they
seem bad. One may argue that the MEDIUM ERROR of drive #7 is not a valid
indicator for a TAO track. But ILLEGAL REQUEST surely is, if the track
is labeled as data track.
My problem with Linux upstream is that if you only say "CD burning"
at LKML, you get mistaken for Joerg Schilling, and consequentially
tared and feathered.
Pun aside: One needs to have done the homework before showing up there.
I roughly understand what sr.c does. But i am still lost when i try to
explore the code around blk_update_request() .
I reported the problem to Archlinux, because the behavior of the ISO
should give Arch a reason to be interested in a fix. I also warned
debian-cd about the upcomming troubles, which can already be demonstrated
by the Sid mini.iso images. They still boot, because they do not use
blkid for finding the ISO. But dd and blkid show the same symptoms as
with Archlinux. (Kernel 4.3.0.)
Have a nice day
Thomas
Last edited by scdbackup (2016-02-20 10:12:26)
Offline
Hi,
it get the impression that the kernel is just more verbous than the
older ones, and that the problem is actually in blkid.
dd if=/dev/sr0 | wc
yields 735358976 bytes = 359062 CD blocks, which is exactly the
really readable size of the CD content. Nothing missing.
(One has to look closely to find the number amidst several
asynchronous kernel messages on the console.)
blkid on the TAO CD just says nothing and returns exit value 2.
All visible error messages are from kernel log.
https://www.kernel.org/pub/linux/utils/ … lkid-docs/
says that Karel Zak <kzak@redhat.com> holds the copyright.
(Maybe he is less demanding than LKML.)
Have a nice day
Thomas
Offline
Hi,
meanwhile the suspicion is back in the kernel:
ioctl(BLKGETSIZE) as used in
https://github.com/karelzak/util-linux/ … b/blkdev.c
in cooperation with mmap(2) as used in
https://github.com/karelzak/util-linux/ … rc/probe.c
line 647, and the underlying virtual memory management.
I made a little program by which i can test this gesture of size
determination and mmapping.
Both kernels perform mmap() to the end of the CD and also mmap
unreadable blocks without complaints.
But on 4.3.3, the device size is returned by ioctl(BLKGETSIZE) as
1436256
whereas kernel 3.16 returns
1436248
The latter is the correct size, not including the unreadable run-out
blocks of the TAO track.
The attempt to use the invalid part of the mmapped data causes a
SIGSEGV on both kernels. (Dunno how blkid prevents it in the same
situation.)
If i override the wrong size of kernel 4.3.3 by the size reported by kernel
3.16, then i get to see the usual kernel log messages, but there is
no SIGSEGV when using the last 4096 mmapped bytes.
So i assume that blkid would work fine if it only got the correct
device size.
Have a nice day
Thomas
Last edited by scdbackup (2016-02-21 09:33:27)
Offline
I had the same problem and have the solution(s)
Good Luck
It is 2 bad syslinux files in the archlinux-2016.02.01-dual.iso
https://bbs.archlinux.org/viewforum.php?id=48
this will also show you how to recover your flash drives,
Good Luck, Mark
Last edited by Mark Ackerman (2016-02-22 10:42:11)
Offline
Hi,
i doubt that wrong SYSLINUX files can cause the blkid problem
which persists even in the rescue prompt, or on an Archlinux which
was booted properly from SAO CD, if you apply blkid to a TAO CD.
The timeout happens when Linux already has taken over.
(Your link is obviously missing a few digits at the end.)
Meanwhile i found one drive in my collection from which the TAO CD
does boot. It confuses the kernel fewly enough to let it determine
the block count correctly. Thus my impression that my workstation
with kernel 3.16 is not that ill.
This impression is questionable meanwhile. I need to do more experiments.
I am now in contact with Karel Zak, the developer of blkid.
We discuss how to avoid the failure if the kernel is willing to read
the third block counted from the end.
If the third block is unreadble too, then blkid shall refrain from
reading the medium end at all. It will find the ISO label only at the
start of the medium, anyway.
(Third block failure happens here with kernel 4.3.3 on an old DVD-ROM
and a not so old Samsung DVD burner, if the number of TAO CD blocks is
not even.
This block is readable by an SCSI command 28h READ(10) via ioctl(SG_IO).
It is not readable via read(2) or by accessing memory mapped by mmap(2).)
Have a nice day
Thomas
Offline
Hi,
an update after some experiments about my theories:
- The kernel version is not decisive.
3.16 and 4.3.3. fail on the same drives the same way.
- The blkid version is decisive.
Older blkid on Debian 8 does not suffer from failures of kernel 3.16.
(Archlinux should ask Karel Zak for an early copy of his planned remedy.)
- The drive is decisive. It is about the exact reply to SCSI command
READ CAPACITY.
2 of my 7 DVD drives exclude the two TAO run-out blocks from the
reported capacity. They can boot the Archlinux ISO from TAO CD.
5 others include the run-out blocks in the capacity count.
With them the kernels report wrong size by ioctl(BLKGETSIZE).
So blkid asks for the wrong end blocks.
- The kernels are far from healthy in this aspect:
If the number of blocks on the TAO CD is not divisible by 2,
then the kernel refuses to read the last valid block of the track,
if READ CAPACITY includes the run-out blocks.
I.e. number 3 from alleged end cannot be read by read(2) or mmap(2).
This block can be read by SCSI command READ(10) via ioctl(SG_IO).
So it is not a drive thing, but a kernel confusion.
Nevertheless, my 2 truth loving drives do not cause this problem.
I posted a proposal based on my findings to linux-scsi mailing list.
http://marc.info/?l=linux-scsi&m=145666692729714&w=2
Now this idea would need a sponsor who cannot be easily shrugged off.
Have a nice day
Thomas
Offline
Thomas, you are doing a great work. Thanks for your explanation.
Good luck!
Offline
Hi,
regrettably the insight does not bring any progress in the kernel.
Yesterday i had to learn that there is a remedy pending for the
sluggish throughput problem with multiple concurrent DVD burns.
http://marc.info/?l=linux-scsi&m=135705061804384&w=2
Submitted 3 years ago. Looks quite simple. Ignored by those who
would be in charge. Especially by the drive-by programmer who
once removed the Big Kernel Lock in sr.c but did not bother to test.
https://lkml.org/lkml/2010/9/14/338
"Still need to check whether this is safe to do."
Such patches go through. The good ones rot in the dark.
-----------------------------------------------------------------
Sorry for the rant. Easiest remedy for the problem discussed here
would be to include in the Archlinux ISOs a blkid which does not
try to read the end of device files. Like the one of Debian 8 does.
It yields valid output even on my worst drives.
Have a nice day
Thomas
Offline
This seems like a bug that has plagued whoever it should go to for quite some time. I don't think this is a dao/tao issue (although I will defer to the experts). I get the same error with ARCH_201603 (which I've installed Arch from several times w/o issue) and ARCH 201609 and openSuSE 13.1. (which I've installed from before) All cd's were md5/shasum verified (before and after burning), all burning was done from the command line per https://wiki.archlinux.org/index.php/Op … D.2C_or_BD (the ARCH_201609 was to DVD since it finally exceeded the overburn of my discs).
I posted to the mailing list, but I'll recount the relevant parts here to add my confirmation to this issue. My install was to a Supermicro H8DME-2 server with twin AMD Opteron 8382 procs. After sorting some bios issues and disabling unrelated hardware, boot of the ARCH_201603 cd continued normally until the attempt to mount '/dev/disk/by-label/ARCH_201603' to '/run/archiso/bootmnt', e.g.
:: running hook [archiso_pxe_nfs]
:: Mounting '/dev/disk/by-label/ARCH_201603' to '/run/archiso/bootmnt'
Waiting 30 seconds for devices....
/dev/disk/by-label wasn't created, so reading from other posts, I attempted to manually create the directory and symlink of the install device (e.g. /dev/sr0 to /dev/disk/by-label/ARCH_201603) and then exit the maintenance shell. The corresponding dmesg output associated with the error (before and after) attempting the manual fix was:
Buffer I/O error on dev sr0 logical block 0, async page read
sr 6:0:1:0 [sr0] tag#0 UNKNOWN(0x2003) Result:hostbyte-0x00 driverbyte=0x08
sr 6:0:1:0 [sr0] tag#0 Sense Key : 0x4 [current]
sr 6:0:1:0 [sr0] tag#0 ASC=0x8 ASCQ=0x3
sr 6:0:1:0 [sr0] tag#0 CDB: opcode: 0x28 28 00 00 00 00 00 00 00 02 00
blk_update_request: I/O error, dev sr0, sector 0
isofs_fill_super: bread failed, dev=sr0, iso_blknum=16, block=16
I'm not an expert in drive errors or deciphering them, tao track run-out blocks, obscure optical drive firmware bugs or otherwise, so I'm not going to hazard a guess. However after approximately 30 installs since '09, this is the first time I've been bitten by this bug. My intent is to try a couple of different cd/dvd drives with the disks, and I'll report back if there is any progress.
Last edited by drankinatty (2016-09-21 05:31:52)
David C. Rankin, J.D.,P.E.
Offline
Hi,
it looks like your problem is not the same that i can reproduce.
> sr 6:0:1:0 [sr0] tag#0 CDB: opcode: 0x28 28 00 00 00 00 00 00 00 02 00
The drive was told to read 4 KB beginning at block 0.
(Not from block 16, as the isofs error message indicates. Shrug.)
> sr 6:0:1:0 [sr0] tag#0 Sense Key : 0x4 [current]
> sr 6:0:1:0 [sr0] tag#0 ASC=0x8 ASCQ=0x3
The drive reported this as error code. SCSI specs tell about it:
4 08 03 LOGICAL UNIT COMMUNICATION CRC ERROR (ULTRA-DMA/32)
Well, the part "(ULTRA-DMA/32)" is most probably obsolete, nowadays.
I'd say that the drive complains about bad cabling or a bad controller
at one side of the cabling.
The expectable effect of the problem is that the disk label (ISO Volume Id)
cannot be read and thus the /dev/disk/by-label link cannot be created.
Have a nice day
Thomas
Offline