iotop: khugepaged at 99.99% (2.6.38.X)

hamelg · 2011-05-24 18:37:16

Hello,

today I had a task hung forever. Here is the behavior :

few messages sent by kernel:

[40802.952619] INFO: task plasma-desktop:13028 blocked for more than 120 seconds.
[40802.952621] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                                                                                
[40802.952623] plasma-desktop  D c5207f2c     0 13028      1 0x00000000             
[40802.952626]  c5207f3c 00000086 00000002 c5207f2c 00000000 00000101 00000024 00000000                                                                                 
[40802.952630]  c111167e c5206000 00000000 c5207ecc c11ac11d f5406380 c14c4380 f2c988a0                                                                                 
[40802.952634]  f2c98a64 c11ac164 c5207f60 c14c4380 f5406380 f2c988a0 c142df60 bfb60c68                                                                                 
[40802.952638] Call Trace:                                                          
[40802.952643]  [<c111167e>] ? do_filp_open+0x14e/0x6a0                             
[40802.952647]  [<c11ac11d>] ? __copy_to_user_ll+0x5d/0x70                          
[40802.952649]  [<c11ac164>] ? copy_to_user+0x34/0x50                               
[40802.952652]  [<c131b885>] rwsem_down_failed_common+0x95/0xe0                     
[40802.952655]  [<c131b8e2>] rwsem_down_write_failed+0x12/0x20                      
[40802.952657]  [<c131b94a>] call_rwsem_down_write_failed+0x6/0x8                   
[40802.952660]  [<c131b0f5>] ? down_write+0x15/0x17                                 
[40802.952663]  [<c10e12e0>] sys_mmap_pgoff+0xd0/0x1c0                              
[40802.952665]  [<c10037df>] sysenter_do_call+0x12/0x28                             
[40802.952668]  [<c1310000>] ? amd_64_threshold_cpu_callback+0xbc/0x23c

This task has ended after some times.

iotop saw me two another tasks totally hang :

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND              
 
   26 be/7 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [khugepaged]
13199 be/4 hamelg      0.00 B/s    0.00 B/s  0.00 % 96.27 % convert -de~ar-pluie.gif

$ ps -lp 13199                                                            
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD               
0 D  2000 13199 13196  0  80   0 - 24709 conges ?        00:00:00 convert

I had to reboot.

It's the first time i see that.

my kernel is :

$ yaourt -Q kernel26
core/kernel26 2.6.38.6-2 (base)
$ cat /proc/version 
Linux version 2.6.38-ARCH (tobias@T-POWA-LX) (gcc version 4.6.0 20110429 (prerelease) (GCC) ) #1 SMP PREEMPT Fri May 13 07:54:18 UTC 2011

Last edited by hamelg (2011-05-31 21:39:27)

hamelg · 2011-05-31 16:22:38

The bug has still occured yesterday.
it looks like this one :
https://lkml.org/lkml/2011/4/20/435

Last edited by hamelg (2011-05-31 21:34:59)

hamelg · 2011-06-17 20:43:38

Same behavior with 2.6.39.1
It seems there are some severe issues with the mm subsystem since 2.6.38 :

http://www.mentby.com/Group/linux-kerne … -to-0.html
https://bugzilla.kernel.org/show_bug.cgi?id=35512

Here is a typical stack of a hung task :

[<c10e47b7>] congestion_wait+0x57/0xf0
[<c1064030>] ? abort_exclusive_wait+0x80/0x80
[<c10ff6c1>] compact_zone+0x731/0x760
[<c10ff76e>] compact_zone_order+0x7e/0xa0
[<c10ff835>] try_to_compact_pages+0xa5/0xe0
[<c10d32c3>] __alloc_pages_direct_compact+0x83/0x170
[<c10d3791>] __alloc_pages_nodemask+0x3e1/0x7b0
[<c110d128>] ? __mem_cgroup_commit_charge+0x78/0x110
[<c110a56f>] do_huge_pmd_anonymous_page+0x10f/0x2f0
[<c101ea7b>] ? lapic_next_event+0x1b/0x20
[<c10ec035>] handle_mm_fault+0x175/0x210
[<c1028a30>] ? vmalloc_sync_all+0x120/0x120
[<c1028b43>] do_page_fault+0x113/0x420
[<c10687e1>] ? hrtimer_interrupt+0x141/0x260
[<c104cedc>] ? irq_exit+0x3c/0x90
[<c134c84b>] ? smp_apic_timer_interrupt+0x5b/0x8a
[<c1028a30>] ? vmalloc_sync_all+0x120/0x120
[<c134be93>] error_code+0x67/0x6c

a possible workaround is to disable the khugepaged kernel thread :
echo never >/sys/kernel/mm/transparent_hugepage/enabled

am i alone to encounter this bug ?
I am waiting the pushed patches reach the mainline ...

diman · 2011-08-04 19:55:14

I have the same problem, but my knowledge of kernel is not so powerful to solve this.

Arch Linux

#1 2011-05-24 18:37:16

iotop: khugepaged at 99.99% (2.6.38.X)

#2 2011-05-31 16:22:38

Re: iotop: khugepaged at 99.99% (2.6.38.X)

#3 2011-06-17 20:43:38

Re: iotop: khugepaged at 99.99% (2.6.38.X)

#4 2011-08-04 19:55:14

Re: iotop: khugepaged at 99.99% (2.6.38.X)

Board footer