You are not logged in.
Hey y'all,
I have one of those silly USB wireless dongles with the Windows drivers and such on the storage device. Anyway, anytime I try to boot Arch with the dongle plugged in, my system freezes--this is in the first few seconds after booting grub.
Suggestions? I've tried a few things, I can also tell you what I've done (with not much luck).
Best,
Oh my SQL!!!
Offline
Do you have any logs with the error messages when the system hangs just after boot? You can make just photo of the screen, upload it in some picture hosting site and insert link to it here. We will take a look at those messages in the picture.
Offline
Oh yeah, thanks--I'm able to get dmesg:
Here is what it does in Arch. Not that at 9 seconds, this is where I unplug the USB device and the system starts to boot normally:
[ 0.234387] ACPI: (supports S0 S3 S4 S5)
[ 0.234388] ACPI: Using IOAPIC for interrupt routing
[ 0.234409] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 9.928199] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])I've tried noacpi but the boot is even more messy, and quite a bit slower. If there were someway to signal to ACPI "just don't look at that one boot device for 20 seconds" that would be ideal.
And in case it's interesting, here's the last line before the freeze in Linux Mint.
[ 0.253079] ACPI: No dock devices found.Offline
What happens if you use the kernel parameter, as is suggested?
Offline
Right, forgot to mention, the exact same thing. I assume you're referring to pci=nocrs. That said, this error occurs not on every single boot so it's hard to determine how much something is helping or not, but I will keep an eye out. I tried a few times with no success, but I'll program it in and let's see how that goes.
I've been reading here:
http://www.jonmasters.org/blog/2007/12/ … -is-wrong/
It just seems like somewhere along the line this USB drive is screwing up some kind of interrupt table. I do have a multi-core processor, but this is all way out of my league.
Offline
Here is what it does in Arch. Not that at 9 seconds, this is where I unplug the USB device and the system starts to boot normally:
So disconnecting the offending dongle unfreezes the system and makes it continue booting as if nothing had happened? That's beyond weird. Does it always freeze in this exact place?
Maybe disabling "legacy usb" in BIOS would change something, but this means no USB booting and no USB keyboards in BIOS.
Offline
Ya, you got it! Sometimes the system just hard freezes at that point--unplugging does nothing whatsoever and I'm forced to reboot. Usually if I catch it fast (~10 seconds) it will continue to boot, and sometimes the system completely ignores it.
Also, the system is doing something because before I upgraded my CPU cooler to a much better one, it would also work so hard it would try to melt my CPU!
What you're suggesting about disabling BIOS might be a thought but I use a usb keyboard to decide between 4 different distros I've got going on my hard drive, so that is definitely not the most practical option. (I'm pretty sure this is clear, but I'll say it in parentheses that this hard freeze is happening after grub in the first milliseconds of the kernel loading). One thing that would help me understand: is this the linux kernel per se, is this initramfs (or whatever that's called in Arch--I might be using the Mageia lingo) that's freezing at this point?
Thanks--we'll get there, I have a feeling.
I'll try what you suggested for fun but my hopes are not stellar. And I'll keep trying nocrs.
Offline
FYI as far as I can tell, turning off legacy USB in the bios did help... although like I said, it did completely disable my keyboard. Also, it's worth noting that setting "nousb" in the kernel parameters did absolutely nothing... which I thought was surprising. I don't need anything USB for the entirety of the boot process (after GRUB until the greeter) so... if there's some way to disable USB there, that would be great.
FYI this is under nocrs (and I'll leave a few lines before there too, in case those are useful). 19 seconds is obviously where I unplug the USB dongle and it starts to boot normally again.
Also, I tried passing a kernel command 'nousb' on the linux line of grub and that didn't do anything either. :
[ 0.223963] PCI: Using configuration type 1 for base access
[ 0.224232] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.224233] mtrr: probably your BIOS does not setup all CPUs.
[ 0.224234] mtrr: corrected configuration.
[ 0.237514] ACPI: Added _OSI(Module Device)
[ 0.237517] ACPI: Added _OSI(Processor Device)
[ 0.237518] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.237519] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.239094] ACPI: Actual Package length (1) is larger than NumElements field (0), truncated
[ 0.239099] ACPI: Actual Package length (1) is larger than NumElements field (0), truncated
[ 0.239119] ACPI: Actual Package length (1) is larger than NumElements field (0), truncated
[ 0.239123] ACPI: Actual Package length (1) is larger than NumElements field (0), truncated
[ 0.241024] ACPI: Interpreter enabled
[ 0.241031] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20150619/hwxface-580)
[ 0.241034] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20150619/hwxface-580)
[ 0.241046] ACPI: (supports S0 S3 S4 S5)
[ 0.241048] ACPI: Using IOAPIC for interrupt routing
[ 0.241068] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[ 19.229828] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])Last edited by ohmysql (2015-12-09 01:44:16)
Offline
It looks like having this dongle enabled by BIOS causes PCI detection to stall. It happens before USB drivers start so nousb can't help. Do those other distributions work, btw?
Offline
Ah! You may not be aware how helpful comments like that are but I literally have had no idea what's happening or why, and these are starting to give me a clue. FYI I have only PCI-e graphics card but otherwise, I don't think I have anything in the PCI slots I'm open to disabling PCI--pretty sure that would have no effect on my system, so long as PCI-e is handled differently. (I told you your comment might be helpful!)
In any case, no, I have the same exact problem on every distro I've ever installed (Mint, Ubuntu, Mageia, Arch). But that said, some are more or less likely to completely freeze up, which I've never understood. I have a feeling we're getting closer. I understand almost nothing about IOAPIC and ACPI so I have a feeling we're getting much closer.
Offline
Unfortunately, we are not getting anywhere. By PCI I mean everything made to look like PCI to the software, which includes PCIe and even some devices integrated in the CPU. There won't be much left if you disable that.
You are dealing with some obscure bug in PCI drivers or motherboard firmware (or both). If other kernels don't work than maybe BIOS update would help. Or using different USB controller (USB2 vs. USB3, maybe a PCI-USB board). Heck, if you have a USB hub you can also try that.
Offline
Maybe I should post this wayyyy upstream? Like I don't even know, where do you report a bug in the kernel?
Offline
I think BIOS is the more likely culprit here. I'd try
1. updating the BIOS
2. checking if the same happens with other flash storage (maybe?)
3. checking if the same happens on other machines (my guess is not)
4. googling whether others have this issue with the same exact device (as above)
5. testing with other OSes
before bothering anyone about that.
If you know C you may want to trace execution through drivers/acpi/scan.c:acpi_scan_init as it seems to be getting stuck there. Whether you know C or not, you'll likely end up needing to do something like that if you decide to report it because no one has a crystal ball to know what's going on in this broken machine.
Last edited by mich41 (2015-12-09 19:09:48)
Offline
Hmm, ok. Well, good to know how screwed I am
I don't think there are any BIOS updates available, unless I'm missing something. This is my motherboard and pretty sure that's my BIOS.
http://www.gigabyte.com/products/produc … =3726#bios
Looks like no upgrade is available.
I have tried with a few other USB storage devices such as an external drive and I don't think I've had that problem. But I could try more extensively, so ok, I will. I will also try a few other machines.
Hmm--I am willing to install another OS--did you have one in mind, like Unix or something? Not sure I have access to anything Windows but maybe I can try that too.
Another thought--could the content of the wireless dongle (a Cisco AM10) be something to try changing? For instance, I could format it different ways or put stuff on it. But it sounds like from what you're saying that the system is just figuring out what's on.
I don't speak C but I will gladly try to figure out what's happening in the files you mentioned. Thanks.
And you think there's no udev rule that could help because it's too early in the process, right?
I'm just trying to think of workarounds. Like I know kernels and initrds and stuff are modular, so it has me wondering what mods I can install or uninstall.
So you seem to think that toying with ACPI, IOAPIC, toying with AHCI or the IRQs wouldn't do me any good?
Also, can you tell me more about using a "different USB controller"? I googled a bit and didn't understand what would be involved: just a change in software?
One clue I'm getting is that the system does seem to freeze around the IOAPIC moment, which my very rudimentary googling seems to suggest that this involves routing requests to a multicore processor. I wonder if part of the reason for the freeze is that basically the system is setting up different priority streams (I forget what that's called, but see the article below) for the processor and maybe something hangs or gets dropped there. I was reading here:
http://www.jonmasters.org/blog/2007/12/ … -is-wrong/
Thanks!
PS Here's another idea:
Could I chainboot? Basically, get it out of the BIOS system, load certain drivers then throw up another grub screen? Basically, that would allow me to turn off USB through the BIOS but still be able to choose my Linux distro at the Grub screen.
Last edited by ohmysql (2015-12-09 21:02:38)
Offline
Hmm--I am willing to install another OS--did you have one in mind, like Unix or something? Not sure I have access to anything Windows but maybe I can try that too.
Don't bother, I though that maybe you have something installed on this machine (or others, if the problem is present elsewhere).
Another thought--could the content of the wireless dongle (a Cisco AM10) be something to try changing? For instance, I could format it different ways or put stuff on it. But it sounds like from what you're saying that the system is just figuring out what's on.
I suspect that it wouldn't matter and that this memory is read only to begin with.
I don't speak C but I will gladly try to figure out what's happening in the files you mentioned. Thanks.
Well, if you understand enough to add few printk-s, recompile and reboot you are good to go.
I wonder if booting with pci=noacpi or even acpi=off would help, btw, as one possible reason might be ACPI code waiting for some stupid thing.
It happens before userspace startup, udev is irrelevant.
Also, can you tell me more about using a "different USB controller"? I googled a bit and didn't understand what would be involved: just a change in software?
If you have USB2 and USB3 ports, they are connected to different controllers of which very likely only one is affected by bugs. Also, you can get a PCI based USB controller which won't be touched by BIOS and probably won't cause trouble.
And no, I don't think there are USB keyboard drivers for grub. OTOH, what may be possible, is disabling USB storage support in BIOS. Sometimes there is such option, somewhere near "legacy usb".
Offline
You may need to help me with this part: "drivers/acpi/scan.c:acpi_scan_init" I didn't understand what you're saying -- you mean in lib/modules/uname -r/?
Ok, and then I don't see any file by that name--scan.c (either in Arch or Mint).
But I'll look into the rest of it tomorrow!
Offline
You might try more extreme measures. You might try removing certain USB modules from the initrd. For example, xhci_hcd. That module is used for storage media. Perhaps if you built an initrd that did not have that module[1] you could boot your system, and still have USB hid capability. Of course, when the kernel boots, that module may be instantiated and it is not clear what would happen then. It may need to be blacklisted in your install as well.
No guarantees, but it is a different tack.
[1] I do not know for certain that this module is even in the initrd. Nor am I positive that this is the offending module.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
The shortest way to ruin a country is to give power to demagogues.— Dionysius of Halicarnassus
---
How to Ask Questions the Smart Way
Offline
You might try removing certain USB modules from the initrd.
It hangs during PCI subsystem initialization, see dmesg in post #8. USB drivers don't matter, udev doesn't matter, nothing matters
It's just the kernel blob and BIOS.
You may need to help me with this part: "drivers/acpi/scan.c:acpi_scan_init" I didn't understand what you're saying -- you mean in lib/modules/uname -r/?
Well, you didn't tell anything about the BIOS options and boot parameters I suggested. This would be a lot easier, if it works.
And to answer your question, you need source code. Go to https://kernel.org and download tar.xz for the version closest to your running kernel.
Offline
But, removing the device allows the process to continue. Sometimes.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
The shortest way to ruin a country is to give power to demagogues.— Dionysius of Halicarnassus
---
How to Ask Questions the Smart Way
Offline
OK, maybe I wasn't fully clear. PCI initialization happens early when there is no USB drivers, no initramfs, no userspace, no udev and no web browsers either.
Offline
FYI Mich41, I am able to write on the USB wireless dongle, I can format or change what's on it. But like you, I suspect it would make no difference.
A few preliminary results:
pci=nocrs clearly does very little to help me, and I observed occasional freezes on Mint and Arch.
pci=noacpi did nothing on Arch. I didn't test it with Mint since the results were non-existent with Arch. I will try twice more, just in case.
noacpi did seem to bypass the USB dongle on the first try but then I had mega-irq problems. Then I tried noacpi irqpoll and that worked for one of my Arch installs but the other one still hangs. When I unplugged the dongle, the system resumed. So it does seem that noacpi might be one partial solution, but it does seem to be not the entire solution.
Also, does turning off ACPI come with risks and if so, how can I evaluate if I'm having any problems? My graphics card does seem to be working normally.
I have googled and I see no evidence that others are having the same problem with the Cisco AM10. That makes me think it might be Gigabyte specific. Seems I'm having the opposite problem as this guy:
http://chromasoft.blogspot.ca/2010/10/s … -from.html
Also, I also have been completely unable to boot from USB
Now that is just !@#$ing ironic!
Ok, I've confirmed that my BIOS has no updated version (M68MT-S2P rev. 3.0), I stupidly turned off all USB in my BIOS which meant I couldn't do anything in my computer. So then I shorted my BIOS and now I'm back
I'll try with other flash storage, I don't have other machines handy so I can't check very easily if this happens to other machines using the same device.
Thanks!
Like, oh my SQL!!! (Said with your best valley girl imitation
)
Last edited by ohmysql (2015-12-15 03:25:30)
Offline
noacpi did seem to bypass the USB dongle on the first try but then I had mega-irq problems.
I'm not quite sure if I know what "mega-irq problems" exactly is ![]()
Then I tried noacpi irqpoll and that worked for one of my Arch installs but the other one still hangs.
Different kernel version is my guess.
Also, does turning off ACPI come with risks and if so, how can I evaluate if I'm having any problems?
It's not exactly turning ACPI off, just not using it for PCI configuration. To turn it off completely you'd need acpi=off . But then you lose some power management, including the ability to power off by software ![]()
http://chromasoft.blogspot.ca/2010/10/s … -from.html
Also, I also have been completely unable to boot from USB
Now that is just !@#$ing ironic!
Did you try this trick with plugging the device during POST? If this causes the device to suddenly appear in boot menu and also allows linux to boot correctly then I think it's pretty obvious that BIOS is screwing something up here.
Offline
ohmysql wrote:noacpi did seem to bypass the USB dongle on the first try but then I had mega-irq problems.
I'm not quite sure if I know what "mega-irq problems" exactly is
ohmysql wrote:Then I tried noacpi irqpoll and that worked for one of my Arch installs but the other one still hangs.
Different kernel version is my guess.
ohmysql wrote:Also, does turning off ACPI come with risks and if so, how can I evaluate if I'm having any problems?
It's not exactly turning ACPI off, just not using it for PCI configuration. To turn it off completely you'd need acpi=off . But then you lose some power management, including the ability to power off by software
ohmysql wrote:http://chromasoft.blogspot.ca/2010/10/s … -from.html
Also, I also have been completely unable to boot from USB
Now that is just !@#$ing ironic!
Did you try this trick with plugging the device during POST? If this causes the device to suddenly appear in boot menu and also allows linux to boot correctly then I think it's pretty obvious that BIOS is screwing something up here.
Mega-IRQ problems means that different IRQ #s (e.g. 10, 11) say "no one answered." In other words, part of the processor is turned off, if I'm understanding correctly. Then I was dropped to emergency shell!
That is a problem, I'd say.
Alas, pci=noacpi did nothing on either kernel, as far as I can tell. But, in good news, when I do noacpi and irqpoll in my kernel line, I have to say: my computer boots very quickly!
Apparently this is at the expense of some serious power management stuff. Sigh. Well, let me try that POST trick! But I honestly don't care about booting from USB.
Offline
For the post thing, I honestly don't even understand. When I look at my boot options, USB is always listed, whether this thing is plugged in or not. I don't think I have any other devices that could be confused for a storage device. And then when I get to my GRUB menu, I can drop to terminal and try to find a USB key with the dongle plugged in Grub finds nothing. But normal storage devices, I'm pretty sure GRUB finds it.
So I have not had any luck reproducing the problem with other devices nor can I find this device as a storage device through the grub command line. But GRUB is finding other USB storage (I believe). So that has me wondering how the system is reading this thing. That makes this situation stranger, does it not?
I think the next step will be the kernel download you mentioned. I'd really like to get to the bottom of this. I wonder if just always doing irqpoll might help.
Best,
Like, oh my SQL!!!
Last edited by ohmysql (2015-12-16 05:57:22)
Offline
Just to reiterate, pci=noacpi and pci=nocrs doesn't seem to do much. What does seem to be working is acpi=off combined with irqpoll, but as you mentioned, it's an ugly workaround. If we can get more information about what's happening, we may be able to find something more elegant. I'll be looking into the kernel!
Offline