You are not logged in.

#1 2016-07-25 21:25:16

Demokrittor
Member
Registered: 2016-07-25
Posts: 11

Computer 'nearly freezes' while compiling

Hello :-)

I need help debugging a strange phenomenon:
My colleague and I are working with OpenWRT and compile a lot.
Both of us are running arch and have problems with system stability during compilation sessions - not only OpenWRT but also the stock kernel or gcc.
Our computers (4-12 cores, different makes) seem to freeze for long periods of time - regardless of the amount of jobs (-j 1 can freeze the 12 core machine).
In order to get an idea of what's happening, I turned on some music before I started compiling: It'll run for a fraction of a second and then hang for 5 or stutter for 10 seconds and then hang for 30 seconds.
Mouse and keyboard are absolutely unusable.
Using SSH (also horribly unresponsive), I was able to observe processes running into time-outs. Process vary (kworker,Firefox,jbd2); the call seems to always be schedule.

I believe this issue might be related to: https://bbs.archlinux.org/viewtopic.php?id=168062

I'm currently compiling linux-ck because I don't bloody understand most of the stock Linux scheduler. But really, this shouldn't be happening.

Where else can I look for clues? What can I do?

hoping that one of you has a clever idea,
Leon

Here is an extract from my syslog:

Jul 25 22:53:14 ring kernel: INFO: task jbd2/sda3-8:426 blocked for more than 120 seconds.
Jul 25 22:53:14 ring kernel:       Tainted: G           O    4.6.4-1-ARCH #1
Jul 25 22:53:14 ring kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 25 22:53:14 ring kernel: jbd2/sda3-8     D ffff88041c1939f8     0   426      2 0x00000000
Jul 25 22:53:14 ring kernel:  ffff88041c1939f8 00ff880037560110 ffff88041d18b900 ffff88041b9b0e40
Jul 25 22:53:14 ring kernel:  ffff88041c194000 ffff88042fa95940 7fffffffffffffff ffffffff815c4080
Jul 25 22:53:14 ring kernel:  ffff88041c193b58 ffff88041c193a10 ffffffff815c37bc 0000000000000000
Jul 25 22:53:14 ring kernel: Call Trace:
Jul 25 22:53:14 ring kernel:  [<ffffffff815c4080>] ? bit_wait+0x60/0x60
Jul 25 22:53:14 ring kernel:  [<ffffffff815c37bc>] schedule+0x3c/0x90
Jul 25 22:53:14 ring kernel:  [<ffffffff815c6293>] schedule_timeout+0x1d3/0x260
Jul 25 22:53:14 ring kernel:  [<ffffffff812b331b>] ? queue_unplugged+0x3b/0xc0
Jul 25 22:53:14 ring kernel:  [<ffffffff810eea31>] ? ktime_get+0x41/0xb0
Jul 25 22:53:14 ring kernel:  [<ffffffff810eea31>] ? ktime_get+0x41/0xb0
Jul 25 22:53:14 ring kernel:  [<ffffffff815c4080>] ? bit_wait+0x60/0x60
Jul 25 22:53:14 ring kernel:  [<ffffffff815c2c44>] io_schedule_timeout+0xa4/0x110
Jul 25 22:53:14 ring kernel:  [<ffffffff815c409b>] bit_wait_io+0x1b/0x70
Jul 25 22:53:14 ring kernel:  [<ffffffff815c3c7b>] __wait_on_bit+0x5b/0x90
Jul 25 22:53:14 ring kernel:  [<ffffffff8116e7b9>] wait_on_page_bit+0xc9/0xe0
Jul 25 22:53:14 ring kernel:  [<ffffffff810bd4d0>] ? autoremove_wake_function+0x40/0x40
Jul 25 22:53:14 ring kernel:  [<ffffffff8116e8c3>] __filemap_fdatawait_range+0xf3/0x160
Jul 25 22:53:14 ring kernel:  [<ffffffff8116e944>] filemap_fdatawait_range+0x14/0x30
Jul 25 22:53:14 ring kernel:  [<ffffffff8116e983>] filemap_fdatawait+0x23/0x30
Jul 25 22:53:14 ring kernel:  [<ffffffffa009ecfc>] jbd2_journal_commit_transaction+0x77c/0x1910 [jbd2]
Jul 25 22:53:14 ring kernel:  [<ffffffff810a0f48>] ? finish_task_switch+0x78/0x210
Jul 25 22:53:14 ring kernel:  [<ffffffff810e779e>] ? try_to_del_timer_sync+0x5e/0x90
Jul 25 22:53:14 ring kernel:  [<ffffffffa00a523a>] kjournald2+0xca/0x260 [jbd2]
Jul 25 22:53:14 ring kernel:  [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
Jul 25 22:53:14 ring kernel:  [<ffffffffa00a5170>] ? commit_timeout+0x10/0x10 [jbd2]
Jul 25 22:53:14 ring kernel:  [<ffffffff81099998>] kthread+0xd8/0xf0
Jul 25 22:53:14 ring kernel:  [<ffffffff815c7442>] ret_from_fork+0x22/0x40
Jul 25 22:53:14 ring kernel:  [<ffffffff810998c0>] ? kthread_worker_fn+0x170/0x170
Jul 25 22:53:14 ring kernel: INFO: task kworker/u16:3:10390 blocked for more than 120 seconds.
Jul 25 22:53:14 ring kernel:       Tainted: G           O    4.6.4-1-ARCH #1
Jul 25 22:53:14 ring kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Jul 25 22:53:14 ring kernel: kworker/u16:3   D ffff8803c100b748     0 10390      2 0x00000000
Jul 25 22:53:14 ring kernel: Workqueue: writeback wb_workfn (flush-8:0)
Jul 25 22:53:14 ring kernel:  ffff8803c100b748 00ff88018ff85830 ffff88041b2bd580 ffff88009e984740
Jul 25 22:53:14 ring kernel:  ffff8803c100c000 ffff88042fa15940 7fffffffffffffff ffff880037560030
Jul 25 22:53:14 ring kernel:  ffff880037560000 ffff8803c100b760 ffffffff815c37bc 0000000000000000
Jul 25 22:53:14 ring kernel: Call Trace:
Jul 25 22:53:14 ring kernel:  [<ffffffff815c37bc>] schedule+0x3c/0x90
Jul 25 22:53:14 ring kernel:  [<ffffffff815c6293>] schedule_timeout+0x1d3/0x260
Jul 25 22:53:14 ring kernel:  [<ffffffff81172085>] ? mempool_alloc_slab+0x15/0x20
Jul 25 22:53:14 ring kernel:  [<ffffffff810eea31>] ? ktime_get+0x41/0xb0
Jul 25 22:53:14 ring kernel:  [<ffffffff815c2c44>] io_schedule_timeout+0xa4/0x110
Jul 25 22:53:14 ring kernel:  [<ffffffff812b4a4e>] get_request+0x42e/0x810
Jul 25 22:53:14 ring kernel:  [<ffffffff8130da46>] ? __percpu_counter_add+0x56/0x80
Jul 25 22:53:14 ring kernel:  [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
Jul 25 22:53:14 ring kernel:  [<ffffffff812b7577>] blk_queue_bio+0x127/0x400
Jul 25 22:53:14 ring kernel:  [<ffffffff812b5872>] generic_make_request+0xf2/0x1a0
Jul 25 22:53:14 ring kernel:  [<ffffffff812b5996>] submit_bio+0x76/0x160
Jul 25 22:53:14 ring kernel:  [<ffffffffa023629e>] ext4_io_submit+0x3e/0x60 [ext4]
Jul 25 22:53:14 ring kernel:  [<ffffffffa02322ef>] ext4_writepages+0xb9f/0xde0 [ext4]
Jul 25 22:53:14 ring kernel:  [<ffffffff812e0b84>] ? cfq_dispatch_requests+0xac4/0xc60
Jul 25 22:53:14 ring kernel:  [<ffffffff81271a81>] ? kernfs_path_from_node_locked+0x291/0x330
Jul 25 22:53:14 ring kernel:  [<ffffffff8117e2fe>] do_writepages+0x1e/0x30
Jul 25 22:53:14 ring kernel:  [<ffffffff812246c5>] __writeback_single_inode+0x45/0x3b0
Jul 25 22:53:14 ring kernel:  [<ffffffff81224fca>] writeback_sb_inodes+0x2ba/0x5c0
Jul 25 22:53:14 ring kernel:  [<ffffffff81225362>] __writeback_inodes_wb+0x92/0xc0
Jul 25 22:53:14 ring kernel:  [<ffffffff8122576e>] wb_writeback+0x30e/0x360
Jul 25 22:53:14 ring kernel:  [<ffffffff81225da2>] wb_workfn+0x112/0x490
Jul 25 22:53:14 ring kernel:  [<ffffffff810a0f48>] ? finish_task_switch+0x78/0x210
Jul 25 22:53:14 ring kernel:  [<ffffffff81093a05>] process_one_work+0x1e5/0x480
Jul 25 22:53:14 ring kernel:  [<ffffffff81093ce8>] worker_thread+0x48/0x4e0
Jul 25 22:53:14 ring kernel:  [<ffffffff81093ca0>] ? process_one_work+0x480/0x480
Jul 25 22:53:14 ring kernel:  [<ffffffff81093ca0>] ? process_one_work+0x480/0x480
Jul 25 22:53:14 ring kernel:  [<ffffffff81099998>] kthread+0xd8/0xf0
Jul 25 22:53:14 ring kernel:  [<ffffffff815c7442>] ret_from_fork+0x22/0x40
Jul 25 22:53:14 ring kernel:  [<ffffffff810998c0>] ? kthread_worker_fn+0x170/0x170

Jul 25 22:53:14 ring kernel: INFO: task firefox:24502 blocked for more than 120 seconds.
Jul 25 22:53:14 ring kernel:       Tainted: G           O    4.6.4-1-ARCH #1
Jul 25 22:53:14 ring kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 25 22:53:14 ring kernel: firefox         D ffff880412743978     0 24502  14340 0x00000000
Jul 25 22:53:14 ring kernel:  ffff880412743978 00ff88001db8e4a8 ffffffff8180d500 ffff8803eed9aac0
Jul 25 22:53:14 ring kernel:  ffff880412744000 ffff88042fa15940 7fffffffffffffff ffff880037560030
Jul 25 22:53:14 ring kernel:  ffff880037560000 ffff880412743990 ffffffff815c37bc 0000000000000000
Jul 25 22:53:14 ring kernel: Call Trace:
Jul 25 22:53:14 ring kernel:  [<ffffffff815c37bc>] schedule+0x3c/0x90
Jul 25 22:53:14 ring kernel:  [<ffffffff815c6293>] schedule_timeout+0x1d3/0x260
Jul 25 22:53:14 ring kernel:  [<ffffffff81172085>] ? mempool_alloc_slab+0x15/0x20
Jul 25 22:53:14 ring kernel:  [<ffffffff81172085>] ? mempool_alloc_slab+0x15/0x20
Jul 25 22:53:14 ring kernel:  [<ffffffff810eea31>] ? ktime_get+0x41/0xb0
Jul 25 22:53:14 ring kernel:  [<ffffffff815c2c44>] io_schedule_timeout+0xa4/0x110
Jul 25 22:53:14 ring kernel:  [<ffffffff812b4a4e>] get_request+0x42e/0x810
Jul 25 22:53:14 ring kernel:  [<ffffffff811e5fe6>] ? lock_page_memcg+0x16/0x80
Jul 25 22:53:14 ring kernel:  [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
Jul 25 22:53:14 ring kernel:  [<ffffffff812b7577>] blk_queue_bio+0x127/0x400
Jul 25 22:53:14 ring kernel:  [<ffffffff812b5872>] generic_make_request+0xf2/0x1a0
Jul 25 22:53:14 ring kernel:  [<ffffffff812b5996>] submit_bio+0x76/0x160
Jul 25 22:53:14 ring kernel:  [<ffffffffa023629e>] ext4_io_submit+0x3e/0x60 [ext4]
Jul 25 22:53:14 ring kernel:  [<ffffffffa02322ef>] ext4_writepages+0xb9f/0xde0 [ext4]
Jul 25 22:53:14 ring kernel:  [<ffffffff815c6293>] ? schedule_timeout+0x1d3/0x260
Jul 25 22:53:14 ring kernel:  [<ffffffff8117e2fe>] do_writepages+0x1e/0x30
Jul 25 22:53:14 ring kernel:  [<ffffffff811708d6>] __filemap_fdatawrite_range+0xc6/0x100
Jul 25 22:53:14 ring kernel:  [<ffffffff8117095c>] filemap_flush+0x1c/0x20
Jul 25 22:53:14 ring kernel:  [<ffffffffa022f37c>] ext4_alloc_da_blocks+0x2c/0x90 [ext4]
Jul 25 22:53:14 ring kernel:  [<ffffffffa0227938>] ext4_release_file+0x78/0xc0 [ext4]
Jul 25 22:53:14 ring kernel:  [<ffffffff811f917f>] __fput+0x9f/0x1e0
Jul 25 22:53:14 ring kernel:  [<ffffffff811f92fe>] ____fput+0xe/0x10
Jul 25 22:53:14 ring kernel:  [<ffffffff81097c04>] task_work_run+0x84/0xa0
Jul 25 22:53:14 ring kernel:  [<ffffffff810036aa>] exit_to_usermode_loop+0xba/0xc0
Jul 25 22:53:14 ring kernel:  [<ffffffff81003bde>] syscall_return_slowpath+0x4e/0x60
Jul 25 22:53:14 ring kernel:  [<ffffffff815c72ba>] entry_SYSCALL_64_fastpath+0xa2/0xa4

Offline

#2 2016-07-26 13:05:03

shazeal
Member
From: New Zealand
Registered: 2007-06-05
Posts: 341

Re: Computer 'nearly freezes' while compiling

Its probably the disk scheduler (CFQ/Noop etc), try using the pf kernel and use BFQ, or patch BFQ manually and set as default sheduler. CK patches will probably just make it worse unless that kernel includes BFQ as well?
There are some videos showing how the stock kernel IO schedulers degrade badly with increasing numbers of IO operations, this has been going on since kernel 2.3 or so.

Offline

#3 2016-07-26 13:28:33

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,520
Website

Re: Computer 'nearly freezes' while compiling

Welcome to the forums.  I agree with the above that I/O saturation may be an issue - but even the stock kernel shouldn't be that bad at managing it.  The -j1 test is good to rule out cpu saturation, but there could also be a ram issue.  How much ram is available?  How much is free/used while compiling (output of `free -m` when the slowdown starts)?


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#4 2016-07-26 23:06:04

shazeal
Member
From: New Zealand
Registered: 2007-06-05
Posts: 341

Re: Computer 'nearly freezes' while compiling

Trilby wrote:

Welcome to the forums.  I agree with the above that I/O saturation may be an issue - but even the stock kernel shouldn't be that bad at managing it.  The -j1 test is good to rule out cpu saturation, but there could also be a ram issue.  How much ram is available?  How much is free/used while compiling (output of `free -m` when the slowdown starts)?

The stock kernel with 100% CPU saturation  (ie with -j5 on a 4 core cpu, and heavy compile loads, eg firefox/libreoffice etc) , swap disabled and running within memory limits has huge IO waits, there was a 100+ page thread on this very problem on the Gentoo forums around the 2.6.27 kernel era. It was supposedly solved in 3.15ish with blk-mq (for SATA it means adding scsi_mod.use_blk_mq=1 to the command line).  blk-mq bypasses the default scheduler (CFQ/Deadline etc, and iirc sets the /sys/block/sdx/queue/scheduler to none), and threads the requests across multiple CPUs/cores. The stock kernel as of 4.7 still exhibits the same IO wait problem if the blk-mq feature is not enabled however, and from what I remember of testing it did not work nicely with rotational devices.

I still personally use the BFQ scheduler without the blk-mq as it has a super low overhead and virtually 0 IO wait even on huge compile tasks, and it works nicely across all devices SSD/rotational/flash etc.

EDIT: For reference, here is a video documenting the sort of thing that is happening with the stock kernel schedulers vs BFQ https://www.youtube.com/watch?v=1cjZeaCXIyM

Last edited by shazeal (2016-07-26 23:10:17)

Offline

#5 2016-07-29 09:24:39

Demokrittor
Member
Registered: 2016-07-25
Posts: 11

Re: Computer 'nearly freezes' while compiling

Thank you for your pointing me to I/O :-)

The (unreliable) cpu-history of uptime confirms that the CPU anywhere near being saturated.
BFS didn't change as much as I hoped.
I wasn't yet able to try the BFQ (need time) but replacing the optical drive on my laptop with a SSD sounds like a practical solution.

For the sake of science, the results of any empiric comparisons will be posted here ;-)

Offline

#6 2016-07-29 10:10:58

kokoko3k
Member
Registered: 2008-11-14
Posts: 2,393

Re: Computer 'nearly freezes' while compiling

I'd first try the stock cfq scheduler, but starting the compilation process with something like:
#nice -n 19 ionice -c3 make


Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !

Offline

Board footer

Powered by FluxBB