You are not logged in.
Hi all,
I am connecting to a VPN using the method described in L2TP/IPsec_VPN_client_setup. Shortly after successful handshake, but before adding routes, my system locks up (and I have to hard reboot). I believe the culprit is pppd, and I am using 4.5.3-1-ARCH kernel on x86_64 system. Today is the first time I have tried this. From searching around, it seems like this issue has maybe been around for a long time:
System full hang on xl2tpd connection
Soft lockup in 3.17-rc7 when using PPP over L2TP over IPSEC
CPU hangs on pppd disconnect -- patch?
wrong routing leads to cpu lockup
xl2tpd freezes all system
Xl2tpd freeze with high cpu utilization
xl2tpd makes kernel soft lockup -- most recent bug report from Apr 27, 2016 with kernel 4.5
hard cpu lockup during pppd initialization of vpn -- says most recently working was 3.16, broke on 3.18
I am not sure how common L2TP/IPsec is but I want to make this thread for anyone else running into this issue.
What I see in journal is:
$ journalctl -b-1
...
xl2tpd[2080]: xl2tpd[2080]: Connecting to host xxx.xxx.xxx.xxx, port 1701
xl2tpd[2080]: xl2tpd[2080]: Connection established to xxx.xxx.xxx.xxx, 1701. Local: 22285, Remote: 148 (ref=0/0).
xl2tpd[2080]: xl2tpd[2080]: Calling on tunnel 22285
xl2tpd[2080]: xl2tpd[2080]: Call established with xxx.xxx.xxx.xxx, Local: 29723, Remote: 147, Serial: 1 (ref=0/0)
xl2tpd[2080]: xl2tpd[2080]: start_pppd: I'm running:
xl2tpd[2080]: xl2tpd[2080]: "/usr/sbin/pppd"
xl2tpd[2080]: xl2tpd[2080]: "plugin"
xl2tpd[2080]: xl2tpd[2080]: "pppol2tp.so"
xl2tpd[2080]: xl2tpd[2080]: "pppol2tp"
xl2tpd[2080]: xl2tpd[2080]: "7"
xl2tpd[2080]: xl2tpd[2080]: "passive"
xl2tpd[2080]: xl2tpd[2080]: "nodetach"
xl2tpd[2080]: xl2tpd[2080]: ":"
xl2tpd[2080]: xl2tpd[2080]: "debug"
xl2tpd[2080]: xl2tpd[2080]: "file"
xl2tpd[2080]: xl2tpd[2080]: "/etc/ppp/options.l2tpd.client"
pppd[2337]: Plugin pppol2tp.so loaded.
pppd[2337]: pppd 2.4.7 started by root, uid 0
pppd[2337]: using channel 1
pppd[2337]: Using interface ppp0
pppd[2337]: Connect: ppp0 <-->
pppd[2337]: Overriding mtu 1500 to 1410
pppd[2337]: PPPoL2TP options: debugmask 0
pppd[2337]: Overriding mru 1500 to mtu value 1410
pppd[2337]: sent [LCP ConfReq id=0x1 <mru 1410> <asyncmap 0x0> <magic xxx>]
pppd[2337]: rcvd [LCP ConfReq id=0x1 <auth chap MS-v2> <magic xxx>]
pppd[2337]: sent [LCP ConfAck id=0x1 <auth chap MS-v2> <magic xxx>]
pppd[2337]: rcvd [LCP ConfRej id=0x1 <mru 1410> <asyncmap 0x0>]
pppd[2337]: sent [LCP ConfReq id=0x2 <magic xxx>]
pppd[2337]: rcvd [LCP ConfAck id=0x2 <magic xxx>]
pppd[2337]: Overriding mtu 1500 to 1410
pppd[2337]: PPPoL2TP options: debugmask 0
pppd[2337]: Overriding mru 1500 to mtu value 1410
pppd[2337]: sent [LCP EchoReq id=0x0 magic=xxx]
pppd[2337]: rcvd [CHAP Challenge id=0x1 <xxx>, name = ""]
pppd[2337]: added response cache entry 0
pppd[2337]: sent [CHAP Response id=0x1 <xxx>, name = "cherubic"]
pppd[2337]: rcvd [LCP EchoRep id=0x0 magic=xxx]
pppd[2337]: rcvd [CHAP Success id=0x1 "S=xxx"]
pppd[2337]: response found in cache (entry 0)
pppd[2337]: CHAP authentication succeeded
pppd[2337]: sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
pppd[2337]: rcvd [IPCP ConfReq id=0x1 <addr xxx.xxx.xxx.xxx>]
pppd[2337]: sent [IPCP ConfAck id=0x1 <addr xxx.xxx.xxx.xxx>]
pppd[2337]: rcvd [IPCP ConfNak id=0x1 <addr xxx.xxx.xxx.xxx> <ms-dns1 xxx> <ms-dns2 8.8.8.8>]
pppd[2337]: sent [IPCP ConfReq id=0x2 <addr xxx.xxx.xxx.xxx> <ms-dns1 xxx> <ms-dns2 8.8.8.8>]
pppd[2337]: rcvd [IPCP ConfAck id=0x2 <addr xxx.xxx.xxx.xxx> <ms-dns1 xxx> <ms-dns2 8.8.8.8>]
pppd[2337]: not replacing existing default route via 192.168.0.1
pppd[2337]: Cannot determine ethernet address for proxy ARP
pppd[2337]: local IP address xxx.xxx.xxx.xxx
pppd[2337]: remote IP address xxx.xxx.xxx.xxx
pppd[2337]: primary DNS address xxx.xxx.xxx.xxx
pppd[2337]: secondary DNS address 8.8.8.8
pppd[2337]: Script /etc/ppp/ip-up started (pid 2346)
pppd[2337]: Script /etc/ppp/ip-up finished (pid 2346), status = 0x0
ntpd[531]: Listen normally on 9 ppp0 xxx.xxx.xxx.xxx:123
ntpd[531]: new interface(s) found: waking up resolver
And from the kernel:
$ journalctl -b-1 --dmesg
...
kernel: PPP generic driver version 2.4.2
kernel: NET: Registered protocol family 24
kernel: l2tp_core: L2TP core driver, V2.0
kernel: l2tp_netlink: L2TP netlink interface
kernel: l2tp_ppp: PPPoL2TP kernel driver, V2.0
If anyone has run into this issue before feel free to chime in! In the meantime, my workaround is to use a VM.
Last edited by cherub (2016-05-29 20:39:43)
Offline
Well I tried again today, with the 4.5.4-1-ARCH kernel with the patch from Bug 83661 applied, see below. No difference, the problem remains...
diff -crB a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
*** a/drivers/net/ppp/ppp_generic.c 2016-03-14 00:28:54.000000000 -0400
--- b/drivers/net/ppp/ppp_generic.c 2016-05-22 15:44:57.425873045 -0400
***************
*** 2380,2385 ****
--- 2380,2386 ----
{
struct channel *pch = chan->ppp;
struct ppp_net *pn;
+ int do_unlock;
if (!pch)
return; /* should never happen */
***************
*** 2391,2399 ****
* the channel's start_xmit or ioctl routine before we proceed.
*/
down_write(&pch->chan_sem);
! spin_lock_bh(&pch->downl);
pch->chan = NULL;
! spin_unlock_bh(&pch->downl);
up_write(&pch->chan_sem);
ppp_disconnect_channel(pch);
--- 2392,2402 ----
* the channel's start_xmit or ioctl routine before we proceed.
*/
down_write(&pch->chan_sem);
! do_unlock = spin_trylock_bh(&pch->downl);
pch->chan = NULL;
! /* Unlock the spinlock only if it was obtained successfully. */
! if (do_unlock)
! spin_unlock_bh(&pch->downl);
up_write(&pch->chan_sem);
ppp_disconnect_channel(pch);
Last edited by cherub (2016-05-22 22:13:03)
Offline
Today I tried using libreswan instead of openswan (essentially a drop in replacement). This did not solve the issue, so it's really looking like xl2tpd and/or pppd is the problem. I am confused about the state of L2TP/IPsec support in Linux, so I've compiled a list of all the crap that comes up on Google related to it and what I think.
IPSec
openswan -- Fork of FreeS/WAN, still in active development by Xelerance
libreswan -- Fork of openswan, maybe more actively developed and seems to be favored for some trademark reasons
strongSwan -- Actively developed, but good luck finding out how to configure it as a client
racoon -- From the venerable ipsec-tools package, but less features
L2TP
From what I've read, openl2tp became part of the Linux kernel. So you can create unmanaged l2tp tunnels using iproute2, i.e. 'ip l2tp add', but managed tunnels require a userspace daemon like xl2tpd (which is the most commonly used one). There's also rp-l2tp which hasn't been updated since 2004. But it still builds, so I will give it a try sometime today.
Helpers
NetworkManager -- Does not support L2TP out of the box, you must install a plugin. It just uses xl2tpd and either libreswan or strongSwan, so I won't bother.
PPP
L2TP uses ppp, which uses kernel PPPoE support which I think is still considered experimental.
Last edited by cherub (2016-05-29 02:19:30)
Offline
Well I am making progress with rp-l2tp (no system lockup so far) and libreswan. What I see now is in the l2tpd debug output is "too many retransmissions"
...
67.148 loop in xmit_queued(10371/360): rws=16 cwnd=1 ssthresh=1 outstanding=1
75.149 xmit_queued(10371/360): rws=16 cwnd=1 ssthresh=1 outstanding=0
75.149 dgram_send_to_wire() -> type=HELLO, tid=360, sid=0, Nr=3, Ns=4
75.149 loop in xmit_queued(10371/360): rws=16 cwnd=1 ssthresh=1 outstanding=1
Error: Too many retransmissions on tunnel (10371/360); closing down
83.157 tunnel_free(10371/360)
83.157 session((10371/360, 52326/359)) state established -> idle
83.157 session_free((10371/360, 52326/359)) Tunnel closing down
And in the pppd debug log I can see successful CHAP authentication, followed by "Modem hangup" after a few minutes.
...
not replacing existing default route via 192.168.0.1
Cannot determine ethernet address for proxy ARP
local IP address xx.xx.x.xx
remote IP address xxx.xxx.xxx.xxx
primary DNS address xx.xx.x.xx
secondary DNS address 8.8.8.8
Script /etc/ppp/ip-up started (pid 13994)
Script /etc/ppp/ip-up finished (pid 13994), status = 0x0
Modem hangup
Connect time 1.3 minutes.
Sent 30492651 bytes, received 0 bytes.
Script /etc/ppp/ip-down started (pid 14337)
Connection terminated.
Waiting for 1 child processes...
script /etc/ppp/ip-down, pid 14337
Script /etc/ppp/ip-down finished (pid 14337), status = 0x0
I can see the ppp0 interface come up and get assigned an IP address on the VPN. But absolutely no traffic is making it through the tunnel.
Last edited by cherub (2016-05-28 22:08:07)
Offline
Back to xl2tpd and libreswan. I'm trying to get some more debugging info but it's proving difficult. Here is my startup script:
start() {
sudo sysctl kernel.soft_watchdog=1 kernel.softlockup_panic=1 kernel.softlockup_all_cpu_backtrace=1
sudo sysctl kernel.nmi_watchdog=1 kernel.hardlockup_panic=1 kernel.hardlockup_all_cpu_backtrace=1
sudo sysctl kernel.hung_task_panic=1 kernel.hung_task_timeout_secs=60
sudo sysctl kernel.sysrq=1
sudo sh -c "echo 7 > /proc/sys/kernel/printk"
sudo systemctl start ipsec; sleep 1
sudo strace -o /var/log/xl2tpd_strace.log xl2tpd -D &; sleep 1
sudo ipsec auto --up my-vpn; sleep 1
sudo sh -c "echo 'c vpn-connection' > /var/run/xl2tpd/l2tp-control"
}
I have lockup detection enabled in the kernel, but I never see a kernel panic even though my system definitely locks up (no mouse, no keyboard, can't change to another TTY, zero responsiveness), and after sitting for a few minutes in that state. I don't know what that means. I also set printk to highest output level, but I do not see anything more in dmesg.
I am also running strace on xl2tpd and I get the following output (I've replaced my VPN address with x's, and removed some message data):
execve("/usr/bin/xl2tpd", ["xl2tpd", "-D"], [/* 17 vars */]) = 0
brk(NULL) = 0x1169000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=452348, ...}) = 0
mmap(NULL, 452348, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7effeed2d000
close(3) = 0
open("/usr/lib/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0pU\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1063296, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7effeeda9000
mmap(NULL, 3158248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7effee875000
mprotect(0x7effee978000, 2093056, PROT_NONE) = 0
mmap(0x7effeeb77000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x102000) = 0x7effeeb77000
close(3) = 0
open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\10\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1960968, ...}) = 0
mmap(NULL, 3803440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7effee4cd000
mprotect(0x7effee664000, 2097152, PROT_NONE) = 0
mmap(0x7effee864000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x197000) = 0x7effee864000
mmap(0x7effee86a000, 14640, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7effee86a000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7effeeda8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7effeeda7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7effeeda6000
arch_prctl(ARCH_SET_FS, 0x7effeeda7700) = 0
mprotect(0x7effee864000, 16384, PROT_READ) = 0
mprotect(0x7effeeb77000, 4096, PROT_READ) = 0
mprotect(0x616000, 4096, PROT_READ) = 0
mprotect(0x7effeeda0000, 4096, PROT_READ) = 0
munmap(0x7effeed2d000, 452348) = 0
brk(NULL) = 0x1169000
brk(0x118a000) = 0x118a000
open("/etc/xl2tpd/xl2tpd.conf", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=117, ...}) = 0
read(3, "[lac vpn-connection]\nlns = xxx.xxx"..., 4096) = 117
read(3, "", 4096) = 0
close(3) = 0
uname({sysname="Linux", nodename="cherub", ...}) = 0
uname({sysname="Linux", nodename="cherub", ...}) = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_NO_CHECK, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
setsockopt(3, SOL_IP, 0x1e /* IP_??? */, "\1\0\0\0\0\0\0\0", 8) = -1 ENOPROTOOPT (Protocol not available)
getpid() = 2465
write(2, "xl2tpd[2465]: setsockopt recvref"..., 61) = 61
setsockopt(3, SOL_IP, IP_PKTINFO, "\1\0\0\0\0\0\0\0", 8) = 0
setsockopt(3, SOL_SOCKET, SO_NO_CHECK, "\1\0\0\0\0\0\0\0", 8) = 0
socket(PF_PPPOX, SOCK_DGRAM, 1) = 4
close(4) = 0
write(2, "xl2tpd[2465]: Using l2tp kernel "..., 41) = 41
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
open("/var/run/xl2tpd.pid", O_RDONLY) = -1 ENOENT (No such file or directory)
setsid() = 2465
unlink("/var/run/xl2tpd.pid") = -1 ENOENT (No such file or directory)
open("/var/run/xl2tpd.pid", O_WRONLY|O_CREAT, 0640) = 4
write(4, "2465\n", 5) = 5
close(4) = 0
rt_sigaction(SIGTERM, {0x401de6, [TERM], SA_RESTORER|SA_RESTART, 0x7effee500310}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {0x401df1, [INT], SA_RESTORER|SA_RESTART, 0x7effee500310}, {SIG_IGN, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {0x401dfc, [CHLD], SA_RESTORER|SA_RESTART, 0x7effee500310}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR1, {0x401e07, [USR1], SA_RESTORER|SA_RESTART, 0x7effee500310}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGHUP, {0x401e12, [HUP], SA_RESTORER|SA_RESTART, 0x7effee500310}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7effee500310}, {SIG_DFL, [], 0}, 8) = 0
unlink("/var/run/xl2tpd/l2tp-control") = -1 ENOENT (No such file or directory)
mknod("/var/run/xl2tpd/l2tp-control", S_IFIFO|0600) = 0
open("/var/run/xl2tpd/l2tp-control", O_RDONLY|O_NONBLOCK) = 4
fcntl(4, F_SETFL, O_RDONLY) = 0
write(2, "xl2tpd[2465]: xl2tpd version xl2"..., 69) = 69
write(2, "xl2tpd[2465]: Written by Mark Sp"..., 72) = 72
write(2, "xl2tpd[2465]: Forked by Scott Ba"..., 63) = 63
write(2, "xl2tpd[2465]: Inherited by Jeff "..., 50) = 50
write(2, "xl2tpd[2465]: Forked again by Xe"..., 74) = 74
write(2, "xl2tpd[2465]: Listening on IP ad"..., 57) = 57
select(5, [3 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "c vpn-connection\n", 1024) = 17
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=156, ...}) = 0
read(5, "# Generated by resolvconf\ndomain"..., 4096) = 156
read(5, "", 4096) = 0
close(5) = 0
open("/dev/urandom", O_RDONLY|O_NONBLOCK) = 5
read(5, "\275\323", 2) = 2
close(5) = 0
write(2, "xl2tpd[2465]: Connecting to host"..., 59) = 59
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310"..., 103}], msg_controllen=0, msg_flags=0}, 0) = 103
read(4, "", 1024) = 0
close(4) = 0
open("/var/run/xl2tpd/l2tp-control", O_RDONLY|O_NONBLOCK) = 4
fcntl(4, F_SETFL, O_RDONLY) = 0
select(5, [3 4], NULL, NULL, {0, 999614}) = 1 (in [3], left {0, 971532})
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310"..., 4086}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("enp4s0"), ipi_spec_dst=inet_addr("192.168.0.18"), ipi_addr=inet_addr("192.168.0.18")}}], msg_flags=0}, 0) = 111
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310", 20}], msg_controllen=28, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("enp4s0"), ipi_spec_dst=inet_addr("192.168.0.18"), ipi_addr=inet_addr("192.168.0.18")}}], msg_flags=0}, 0) = 20
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_NO_CHECK, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
fcntl(5, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, 16) = 0
socket(PF_PPPOX, SOCK_DGRAM, 1) = 6
fcntl(6, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(6, {sa_family=AF_PPPOX, sa_data="\1"...}, 38) = 0
write(2, "xl2tpd[2465]: Connection establi"..., 100) = 100
open("/dev/urandom", O_RDONLY|O_NONBLOCK) = 7
read(7, "^\346", 2) = 2
close(7) = 0
write(2, "xl2tpd[2465]: Calling on tunnel "..., 38) = 38
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310"..., 48}], msg_controllen=28, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("enp4s0"), ipi_spec_dst=inet_addr("192.168.0.18"), ipi_addr=inet_addr("192.168.0.18")}}], msg_flags=0}, 0) = 48
select(6, [3 4 5], NULL, NULL, {0, 969719}) = 1 (in [5], left {0, 943105})
recvmsg(5, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310", 4086}], msg_controllen=0, msg_flags=0}, 0) = 12
select(6, [3 4 5], NULL, NULL, {0, 942518}) = 1 (in [5], left {0, 942352})
recvmsg(5, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310", 4086}], msg_controllen=0, msg_flags=0}, 0) = 28
write(2, "xl2tpd[2465]: Call established w"..., 99) = 99
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310"..., 50}], msg_controllen=0, msg_flags=0}, 0) = 50
socket(PF_PPPOX, SOCK_DGRAM, 1) = 7
fcntl(7, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(7, {sa_family=AF_PPPOX, sa_data="\1"...}, 38) = 0
write(2, "xl2tpd[2465]: start_pppd: I'm ru"..., 40) = 40
write(2, "xl2tpd[2465]: \"/usr/sbin/pppd\" \n", 32) = 32
write(2, "xl2tpd[2465]: \"plugin\" \n", 24) = 24
write(2, "xl2tpd[2465]: \"pppol2tp.so\" \n", 29) = 29
write(2, "xl2tpd[2465]: \"pppol2tp\" \n", 26) = 26
write(2, "xl2tpd[2465]: \"7\" \n", 19) = 19
write(2, "xl2tpd[2465]: \"passive\" \n", 25) = 25
write(2, "xl2tpd[2465]: \"nodetach\" \n", 26) = 26
write(2, "xl2tpd[2465]: \":\" \n", 19) = 19
write(2, "xl2tpd[2465]: \"debug\" \n", 23) = 23
write(2, "xl2tpd[2465]: \"file\" \n", 22) = 22
write(2, "xl2tpd[2465]: \"/etc/ppp/options."..., 47) = 47
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7effeeda79d0) = 2547
close(7) = 0
select(6, [3 4 5], NULL, NULL, {0, 940635}) = 1 (in [5], left {0, 914335})
recvmsg(5, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310"..., 4086}], msg_controllen=0, msg_flags=0}, 0) = 12
select(6, [3 4 5], NULL, NULL, {0, 913902}) = 1 (in [5], left {0, 913900})
recvmsg(5, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310"..., 4086}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1701), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, msg_iov(1)=[{"\310", 12}], msg_controllen=0, msg_flags=0}, 0) = 12
select(6, [3 4 5], NULL, NULL, {0, 913544}) = 0 (Timeout)
select(6, [3 4 5], NULL, NULL, {0, 27542}) = 0 (Timeout)
select(6, [3 4 5], NULL, NULL, {0, 699}) = 0 (Timeout)
select(6, [3 4 5], NULL, NULL, {0, 27276}) = 0 (Timeout)
select(6, [3 4 5], NULL, NULL, {0, 971147}
As you can see the strace gets truncated when the system hangs. It looks similar to this issue where xl2tpd gets a bunch of timed out reads, except I don't see anything from the kernel. Are there other kernel parameters I can set to try and get a backtrace?
EDIT: I tried sysrq and it is totally unresponsive. I cannot trigger a crash or reboot or anything.
EDIT 2: I have replicated the problem on a different computer (my laptop) with the same kernel. I can see that the caps-lock light starts blinking when my system hangs, which means that it is a kernel panic. Conclusively knowing that, it seems the next step is to try to get a kdump. I may give that a try soon.
Last edited by cherub (2016-05-29 18:54:50)
Offline
Well I have got a crash dump and it's pointing to pppd, I think. I'm not really sure what to do next. Any suggestions of what to look at? Anyone? I think the end result should be a bug report, but I'm not sure what I'd put on it. EDIT: Has anyone successfully set up a L2TP/IPsec client with the 4.x kernel?
crash> foreach pluto xl2tpd pppd ip bt
PID: 1124 TASK: ffff8804298b0e40 CPU: 0 COMMAND: "pluto"
#0 [ffff88042a80bd68] __schedule at ffffffff815b13b1
#1 [ffff88042a80bdb8] schedule at ffffffff815b1b5c
#2 [ffff88042a80bdd0] schedule_hrtimeout_range_clock at ffffffff815b48e2
#3 [ffff88042a80be60] schedule_hrtimeout_range at ffffffff815b4973
#4 [ffff88042a80be70] ep_poll at ffffffff81237b7c
#5 [ffff88042a80bf10] sys_epoll_wait at ffffffff8123909e
#6 [ffff88042a80bf50] entry_SYSCALL_64_fastpath at ffffffff815b56ae
RIP: 00007f85b2759ca3 RSP: 00007ffdafd65940 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 00000000000000c1 RCX: 00007f85b2759ca3
RDX: 0000000000000020 RSI: 000055d48ee59d70 RDI: 0000000000000006
RBP: 000055d48ee63510 R8: 000055d48ee658a0 R9: 0000000000000000
R10: 0000000000002019 R11: 0000000000000293 R12: 000000007fb32485
R13: 000055d48edebcc0 R14: 000055d48edf5370 R15: 000055d48edebd2c
ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b
PID: 1129 TASK: ffff8804270a4740 CPU: 2 COMMAND: "pluto"
#0 [ffff88042a567b80] __schedule at ffffffff815b13b1
#1 [ffff88042a567bd0] schedule at ffffffff815b1b5c
#2 [ffff88042a567be8] schedule_timeout at ffffffff815b45d3
#3 [ffff88042a567c98] unix_stream_read_generic at ffffffff81552c45
#4 [ffff88042a567d88] unix_stream_recvmsg at ffffffff81552edf
#5 [ffff88042a567dd0] sock_read_iter at ffffffff8148e6b7
#6 [ffff88042a567e48] __vfs_read at ffffffff811eec7b
#7 [ffff88042a567ed0] vfs_read at ffffffff811efa76
#8 [ffff88042a567f08] sys_read at ffffffff811f0a55
#9 [ffff88042a567f50] entry_SYSCALL_64_fastpath at ffffffff815b56ae
RIP: 00007f85b274c6ed RSP: 00007f85af6382b0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 00007f85a8000078 RCX: 00007f85b274c6ed
RDX: 0000000000001000 RSI: 00007f85a8000d20 RDI: 000000000000000b
RBP: 000000000000270e R8: 00007f85a80009a0 R9: 00007f85af638340
R10: 00007f85a80008c0 R11: 0000000000000293 R12: 00007f85a8000078
R13: 0000000000001540 R14: 00007f85a80153e0 R15: 00007f85a8000020
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
PID: 1130 TASK: ffff8804270a1c80 CPU: 2 COMMAND: "pluto"
#0 [ffff88042a56bb80] __schedule at ffffffff815b13b1
#1 [ffff88042a56bbd0] schedule at ffffffff815b1b5c
#2 [ffff88042a56bbe8] schedule_timeout at ffffffff815b45d3
#3 [ffff88042a56bc98] unix_stream_read_generic at ffffffff81552c45
#4 [ffff88042a56bd88] unix_stream_recvmsg at ffffffff81552edf
#5 [ffff88042a56bdd0] sock_read_iter at ffffffff8148e6b7
#6 [ffff88042a56be48] __vfs_read at ffffffff811eec7b
#7 [ffff88042a56bed0] vfs_read at ffffffff811efa76
#8 [ffff88042a56bf08] sys_read at ffffffff811f0a55
#9 [ffff88042a56bf50] entry_SYSCALL_64_fastpath at ffffffff815b56ae
RIP: 00007f85b274c6ed RSP: 00007f85aee372b0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000063 RCX: 00007f85b274c6ed
RDX: 0000000000001000 RSI: 00007f85a0000d20 RDI: 000000000000000e
RBP: 0000000000002710 R8: 00007f85a00009a0 R9: 0000000000001010
R10: 00007f85a00008c0 R11: 0000000000000293 R12: 00007f85a0000078
R13: 0000000000001010 R14: 00007f85a0000d10 R15: 00007f85a0000020
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
PID: 1131 TASK: ffff8804270a0e40 CPU: 0 COMMAND: "pluto"
#0 [ffff88042a56fb80] __schedule at ffffffff815b13b1
#1 [ffff88042a56fbd0] schedule at ffffffff815b1b5c
#2 [ffff88042a56fbe8] schedule_timeout at ffffffff815b45d3
#3 [ffff88042a56fc98] unix_stream_read_generic at ffffffff81552c45
#4 [ffff88042a56fd88] unix_stream_recvmsg at ffffffff81552edf
#5 [ffff88042a56fdd0] sock_read_iter at ffffffff8148e6b7
#6 [ffff88042a56fe48] __vfs_read at ffffffff811eec7b
#7 [ffff88042a56fed0] vfs_read at ffffffff811efa76
#8 [ffff88042a56ff08] sys_read at ffffffff811f0a55
#9 [ffff88042a56ff50] entry_SYSCALL_64_fastpath at ffffffff815b56ae
RIP: 00007f85b274c6ed RSP: 00007f85ae6362b0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000063 RCX: 00007f85b274c6ed
RDX: 0000000000001000 RSI: 00007f8598000d20 RDI: 0000000000000010
RBP: 0000000000002710 R8: 00007f85980009a0 R9: 0000000000001010
R10: 00007f85980008c0 R11: 0000000000000293 R12: 00007f8598000078
R13: 0000000000001010 R14: 00007f8598000d10 R15: 00007f8598000020
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
PID: 1216 TASK: ffff880427cce3c0 CPU: 1 COMMAND: "xl2tpd"
#0 [ffff8800b7933880] __schedule at ffffffff815b13b1
#1 [ffff8800b79338d0] schedule at ffffffff815b1b5c
#2 [ffff8800b79338e8] schedule_hrtimeout_range_clock at ffffffff815b48e2
#3 [ffff8800b7933978] schedule_hrtimeout_range at ffffffff815b4973
#4 [ffff8800b7933988] poll_schedule_timeout at ffffffff81203074
#5 [ffff8800b79339a8] do_select at ffffffff81203b54
#6 [ffff8800b7933d58] core_sys_select at ffffffff81203e39
#7 [ffff8800b7933ee8] sys_select at ffffffff81203fda
#8 [ffff8800b7933f50] tracesys_phase2 at ffffffff815b57de
RIP: 00007f3d4f722333 RSP: 00007ffe89b514f8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000f8b320 RCX: 00007f3d4f722333
RDX: 0000000000000000 RSI: 00007ffe89b51588 RDI: 0000000000000006
RBP: 0000000000000005 R8: 00007ffe89b51520 R9: 0000000000f8b300
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000040 R14: 0000000000000000 R15: 0000000000000024
ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b
PID: 1337 TASK: ffff880428f31c80 CPU: 1 COMMAND: "ip"
#0 [ffff8804281b3cc8] __schedule at ffffffff815b13b1
#1 [ffff8804281b3d38] sock_recvmsg at ffffffff8148e613
#2 [ffff8804281b3d48] ___sys_recvmsg at ffffffff81490fab
#3 [ffff8804281b3eb0] __sys_recvmsg at ffffffff814922a1
#4 [ffff8804281b3f40] sys_recvmsg at ffffffff814922f2
#5 [ffff8804281b3f50] entry_SYSCALL_64_fastpath at ffffffff815b56ae
RIP: 00007f81b03aa600 RSP: 00007ffd6a526218 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 000000000065b2a0 RCX: 00007f81b03aa600
RDX: 0000000000000000 RSI: 00007ffd6a526280 RDI: 0000000000000003
RBP: 00007ffd6a526040 R8: 0000000000000000 R9: 0000000000000000
R10: 000000000000083b R11: 0000000000000246 R12: 00007f81b0a99a60
R13: 000000000065b3c0 R14: 0000000000000000 R15: 00007ffd6a526068
ORIG_RAX: 000000000000002f CS: 0033 SS: 002b
PID: 1296 TASK: ffff880428f33900 CPU: 2 COMMAND: "pppd"
#0 [ffff88043ed03cd0] machine_kexec at ffffffff8105a488
#1 [ffff88043ed03d30] __crash_kexec at ffffffff8110413d
#2 [ffff88043ed03df8] __crash_kexec at ffffffff81104215
#3 [ffff88043ed03e10] panic at ffffffff81167131
#4 [ffff88043ed03e98] watchdog_timer_fn at ffffffff81116ff7
#5 [ffff88043ed03ed0] __hrtimer_run_queues at ffffffff810e5995
#6 [ffff88043ed03f28] hrtimer_interrupt at ffffffff810e623a
#7 [ffff88043ed03f80] local_apic_timer_interrupt at ffffffff81051ba8
#8 [ffff88043ed03f98] smp_apic_timer_interrupt at ffffffff815b810d
#9 [ffff88043ed03fb0] apic_timer_interrupt at ffffffff815b63f2
--- <IRQ stack> ---
#10 [ffff88042aef7740] apic_timer_interrupt at ffffffff815b63f2
RIP: 0000000000000001 RSP: ffff8804289f68a0 RFLAGS: 00000001
RAX: ffff8804289f68a0 RBX: 0000000000000021 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000101
RBP: ffff880428f33900 R8: ffff88042aef7810 R9: ffff8804289f68b8
R10: ffff880427406848 R11: ffff880427406840 R12: ffff88043ed157b0
R13: ffff88043ed157b0 R14: ffff88043ed157b0 R15: 0000000000000001
ORIG_RAX: 0000000000000101 CS: 0101 SS: ffffffffffffff10
bt: WARNING: possibly bogus exception frame
Last edited by cherub (2016-05-30 17:29:21)
Offline
I tried this on my Android 5.0.2 phone (kernel 3.4.0) and it works (Settings > Networks > More Networks > VPN > Basic > L2TP/IPsec PSK). Interestingly I can see that it uses racoon. I will try to find out how to use racoon as a client and see if it works on my desktop.
EDIT: You can see how Android does it here. They do not use xl2tpd, instead they have their own mtpd which is discussed here. It uses pppd under the hood.
EDIT 2: Also tried xl2tpd and libreswan on Debian 8 (kernel 3.16) in a VM, and it works perfectly as expected from the bug report. I don't want to build/install 3.16, so my workaround will be to run the VM headless, then tunnel rdesktop through that. The hope is that will have better performance and better integration vs running rdesktop inside a Windows VM.
Last edited by cherub (2016-05-30 22:20:50)
Offline
Here are the details of my workaround. This should tide anyone over until the root issue is uncovered and fixed. Originally I tried using a Windows VM (you can get one for free at modern.ie) to login to the VPN and then use the native remote desktop to login to my work computer, but the performance was too poor to get any work done. The better way is to set up a Debian 8 (which is still on 3.16 kernel) headless VM and tunnel with that. This is easy to setup with Vagrant but you will have to install it and Virtual Box.
# pacman -S freerdp vagrant virtualbox
Vagrantfile
Vagrant.configure(2) do |config|
config.vm.box = "debian/jessie64"
config.vm.box_check_update = false
config.vm.network "private_network", ip: "192.168.4.10"
config.vm.provider "virtualbox" do |vb|
vb.gui = false
vb.memory = "1024"
end
config.vm.provision "file", source: "ipsec.conf", destination: "/home/vagrant/ipsec.conf"
config.vm.provision "file", source: "ipsec.secrets", destination: "/home/vagrant/ipsec.secrets"
config.vm.provision "file", source: "options.l2tpd.client", destination: "/home/vagrant/options.l2tpd.client"
config.vm.provision "file", source: "xl2tpd.conf", destination: "/home/vagrant/xl2tpd.conf"
config.vm.provision "file", source: "vpn-start", destination: "/home/vagrant/vpn-start"
config.vm.provision "file", source: "vpn-stop", destination: "/home/vagrant/vpn-stop"
config.vm.provision "shell", inline: <<-SHELL
sudo apt-get update
sudo apt-get install -y build-essential libnss3-dev libnspr4-dev pkg-config libpam-dev libcap-ng-dev libcap-ng-utils libselinux-dev libcurl4-nss-dev libgmp3-dev flex bison gcc make libunbound-dev libnss3-tools xl2tpd
wget -qN https://download.libreswan.org/libreswan-3.12.tar.gz -O libreswan-3.12.tar.gz
tar zxvf libreswan-3.12.tar.gz
cd libreswan-3.12
make programs
sudo make install
sudo systemctl enable ipsec.service
sudo systemctl enable xl2tpd.service
sudo mkdir -p /var/run/xl2tpd
sudo touch /var/run/xl2tpd/l2tp-control
sudo mv -f /home/vagrant/ipsec.conf /etc/
sudo mv -f /home/vagrant/ipsec.secrets /etc/
sudo mv -f /home/vagrant/options.l2tpd.client /etc/ppp/
sudo mv -f /home/vagrant/xl2tpd.conf /etc/xl2tpd/
sudo chown root:root /etc/ipsec.secrets
sudo chmod +x /home/vagrant/vpn-start
sudo chmod +x /home/vagrant/vpn-stop
SHELL
config.ssh.insert_key = true
end
vpn-start
#!/bin/sh
sudo systemctl start ipsec
sleep 1
sudo systemctl start xl2tpd
sleep 1
sudo ipsec auto --up my-vpn
sleep 1
sudo sh -c "echo 'c vpn-connection' > /var/run/xl2tpd/l2tp-control"
sleep 3
sudo ip route replace default dev ppp0
sudo ip route replace <vpn server ip addr> via 10.0.2.2 dev eth0
vpn-stop
#!/bin/sh
sudo sh -c "echo 'c vpn-connection' > /var/run/xl2tpd/l2tp-control"
sleep 1
sudo ipsec auto --down my-vpn
sleep 1
sudo systemctl stop xl2tpd
sleep 1
sudo systemctl stop ipsec
sleep 1
sudo ip route replace default via 10.0.2.2 dev eth0
sudo ip route del <vpn server ip addr>
my-remote-computer
#!/bin/sh
ADDR=<private network address of your remote computer>
vagrant up
vagrant ssh -- -t 'cd /home/vagrant; ./vpn-start'
vagrant ssh -- -L 3389:$ADDR:3389 -N &
SSH_PID=$!
sleep 1;
xfreerdp /u:<username> /p:<password> /v:127.0.0.1 /f +decorations +fonts -grab-keyboard
vagrant ssh -- -t 'cd /home/vagrant; ./vpn-stop'
kill $SSH_PID
See the wiki article for ipsec.conf, ipsec.secrets, options.l2tpd.client, and xl2tpd.conf details.
Last edited by cherub (2016-05-31 01:28:42)
Offline