page allocation failure; but I have ~ 13G available

Spider.007 · 2016-10-02 10:43:21

I get batches of page allocation failures every few weeks, and I'm not sure how to prevent them. Here's one example:

[  +0.000081] swapper/0: page allocation failure: order:2, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
[  +0.000001] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.7.4-1-ARCH #1
[  +0.000001]  0000000000000286 6774149c59705725 ffff88087f203b30 ffffffff812ecb02
[  +0.000001]  0000000000000000 0000000000000002 ffff88087f203bc0 ffffffff8117b151
[  +0.000001]  0208402000000001 0000000000000246 0000000000000100 0000000000000020
[  +0.000001] Call Trace:
[  +0.000001]  <IRQ>  [<ffffffff812ecb02>] dump_stack+0x63/0x81
[  +0.000002]  [<ffffffff8117b151>] warn_alloc_failed+0x101/0x160
[  +0.000001]  [<ffffffff8117b6f1>] __alloc_pages_nodemask+0x541/0xf30
[  +0.000001]  [<ffffffff811cb885>] alloc_pages_current+0x95/0x140
[  +0.000002]  [<ffffffffa000099d>] bnad_rxq_refill_page+0x18d/0x270 [bna]
[  +0.000002]  [<ffffffffa0002025>] bnad_napi_poll_rx+0x725/0x9c0 [bna]
[  +0.000001]  [<ffffffff814cd22e>] net_rx_action+0x21e/0x3a0
[  +0.000001]  [<ffffffff815da286>] __do_softirq+0xe6/0x2ec
[  +0.000001]  [<ffffffff8107f973>] irq_exit+0xa3/0xb0
[  +0.000001]  [<ffffffff815d9fb4>] do_IRQ+0x54/0xd0
[  +0.000002]  [<ffffffff815d80c2>] common_interrupt+0x82/0x82
[  +0.000000]  <EOI>  [<ffffffff8148ece4>] ? cpuidle_enter_state+0x134/0x2e0
[  +0.000002]  [<ffffffff8148ecbf>] ? cpuidle_enter_state+0x10f/0x2e0
[  +0.000001]  [<ffffffff8148eec7>] cpuidle_enter+0x17/0x20
[  +0.000001]  [<ffffffff810bd24a>] call_cpuidle+0x2a/0x50
[  +0.000001]  [<ffffffff810bd668>] cpu_startup_entry+0x2d8/0x390
[  +0.000001]  [<ffffffff815caa14>] rest_init+0x84/0x90
[  +0.000001]  [<ffffffff8190efeb>] start_kernel+0x443/0x464
[  +0.000001]  [<ffffffff8190e120>] ? early_idt_handler_array+0x120/0x120
[  +0.000001]  [<ffffffff8190e2db>] x86_64_start_reservations+0x2f/0x31
[  +0.000001]  [<ffffffff8190e429>] x86_64_start_kernel+0x14c/0x16f
[  +0.000000] Mem-Info:
[  +0.000002] active_anon:4245143 inactive_anon:33419 isolated_anon:0
               active_file:2244270 inactive_file:930883 isolated_file:0
               unevictable:10760 dirty:20277 writeback:0 unstable:0
               slab_reclaimable:295871 slab_unreclaimable:53848
               mapped:337005 shmem:335954 pagetables:14202 bounce:0
               free:216068 free_pcp:353 free_cma:0
[  +0.000002] Node 0 DMA free:15388kB min:124kB low:152kB high:180kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  +0.000002] lowmem_reserve[]: 0 1369 32087 32087
[  +0.000001] Node 0 DMA32 free:126972kB min:11184kB low:13980kB high:16776kB active_anon:833148kB inactive_anon:5060kB active_file:92948kB inactive_file:56252kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1478300kB managed:1402648kB mlocked:0kB dirty:352kB writeback:0kB mapped:18708kB shmem:14516kB slab_reclaimable:196996kB slab_unreclaimable:10136kB kernel_stack:624kB pagetables:780kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  +0.000003] lowmem_reserve[]: 0 0 30717 30717
[  +0.000001] Node 0 Normal free:721912kB min:250832kB low:313540kB high:376248kB active_anon:16147424kB inactive_anon:128616kB active_file:8884132kB inactive_file:3667280kB unevictable:43040kB isolated(anon):0kB isolated(file):0kB present:31981568kB managed:31454912kB mlocked:44600kB dirty:80756kB writeback:0kB mapped:1329312kB shmem:1329300kB slab_reclaimable:986488kB slab_unreclaimable:205256kB kernel_stack:13136kB pagetables:56028kB unstable:0kB bounce:0kB free_pcp:1408kB local_pcp:240kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  +0.000003] lowmem_reserve[]: 0 0 0 0
[  +0.000001] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15388kB
[  +0.000005] Node 0 DMA32: 1219*4kB (UMEH) 1098*8kB (UME) 1110*16kB (UM) 847*32kB (UM) 538*64kB (UM) 266*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 127004kB
[  +0.000005] Node 0 Normal: 175272*4kB (UEH) 2607*8kB (UH) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 721960kB
[  +0.000004] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[  +0.000001] Node 0 hugepages_total=64 hugepages_free=64 hugepages_surp=0 hugepages_size=2048kB
[  +0.000000] 3512011 total pagecache pages
[  +0.000001] 0 pages in swap cache
[  +0.000000] Swap cache stats: add 0, delete 0, find 0/0
[  +0.000001] Free swap  = 0kB
[  +0.000000] Total swap = 0kB
[  +0.000001] 8368966 pages RAM
[  +0.000000] 0 pages HighMem/MovableOnly
[  +0.000001] 150601 pages reserved
[  +0.000000] 0 pages hwpoisoned

I've tried these sysctl settings but they seem to have little effect:

vm.min_free_kbytes = 262144
vm.nr_hugepages = 64
vm.nr_overcommit_hugepages = 32

When this happens, free reports:

              total        used        free      shared  buff/cache   available
Mem:          32102       16391         829        1312       14881       13236
Swap:             0           0           0

seth · 2016-10-02 13:04:20

This is most unlikely related to available memory, but interrupt handling.

Does the error come alongside a "The following is only an harmless informational message." header?

like:

The following is only an harmless informational message.
Unless you get a _continuous_flood_ of these messages it means
everything is working fine. Allocations from irqs cannot be
perfectly reliable and the kernel is designed to handle that.

*Usually* this (and it is on the network stack) means a package loss, what triggers a resubmit, so no harm is done.
Doesn't mean there could be a real bug, though.

Spider.007 · 2016-10-02 15:41:52

Thanks, there is actually no such header for any of the messages; but you might still be right that this is harmless.
I also found https://access.redhat.com/solutions/90883 which I think describes the same issue, explaining how the allocation will eventually be fulfilled.

Arch Linux

#1 2016-10-02 10:43:21

page allocation failure; but I have ~ 13G available

#2 2016-10-02 13:04:20

Re: page allocation failure; but I have ~ 13G available

#3 2016-10-02 15:41:52

Re: page allocation failure; but I have ~ 13G available

Board footer