You are not logged in.

#1 2016-03-28 08:01:19

Zancarius
Member
From: NM, USA
Registered: 2012-05-06
Posts: 207

Unusual NFS + Kerberos + Automount issue (with workaround)

Hey everyone:

First, I should state that this is not a help-seeking thread; I've already resolved the problem after a maddening amount of time this weekend. It's my hope that future visitors (hi!) who encounter a similar problem with a similar configuration might be able to save themselves a bit of a headache. Second, I'm also probing for advice. The workaround I've used isn't optimal, and I'm not even sure why the "correct" solution isn't working. If anyone has some insight, I'd be greatly appreciative, but I'm afraid this problem scope is so narrow and so specific that outside egregious mistakes on my part, it might as well be an exercise in futility.

I should also apologize for the essay below. There's some pertinent information here, and I'm reluctant to gloss over it since I feel it may be tied to the issue. I'm just not sure I completely understand how all of this fits together.

Background:

My setup is composed of a file server that acts as an NFS host. The NFS host authenticates clients against a KDC for most shares. All of the clients generate ticket-granting tickets either at login (pam-krb5) or via other means (the HTPC uses anonymous tickets, I think, but it's been a while since I set it up). Most/all of the systems are Arch and mount shares via systemd-automount, specified in fstab. All of this is working fine and isn't the source of the problem. It's necessary to know, because it's applicable to the nature of the error I've encountered.

Problem:

My problem has been my desktop's NFS setup. Recently, it has no longer been mounting kerberized shares, failing with "device not found" and generating "permission denied" errors when mount is run manually with -vvv. With sec=sys it'll work fine, but as soon as a Kerberos mount option is supplied, it fails. Running rpc-gssd with verbose output enabled, the following hint is a suggested source of the failure:

WARNING: Cannot contact any KDC for realm 'EXAMPLE.ORG' while getting initial ticket for principal 'nfs/<redacted>.example.org@EXAMPLE.ORG' using keytab 'FILE:/etc/krb5.keytab'

(Domain names have been redacted for privacy reasons, but the output is consistent with my KDC configuration.)

The appropriate principals have been configured, the keytab has been set up appropriately, and the account I login with is capable of generating ticket-granting tickets. It's very obviously not a problem with Kerberos (or NFS), because restarting rpc-gssd fixes the problem, and it proceeds with the automount as expected.

My first inclination was to suspect a configuration error or similar, so I fired up a virtual machine, configuring it to use my KDC with appropriate principals, and so forth--and it works flawlessly. Copying the /etc from my desktop minus files necessary to the VM's continued health (fstab, hostname, netctl configs, etc) won't replicate the issue, and I've been reluctant to try the inverse. Either way, it's unlikely a configurational issue, and the various GSSAPI-related files have matching checksums between the VM and my desktop, suggesting that they're unchanged from the packages that installed them (namely nfs-utils and gssproxy).

However, going back to the output from the systemd journal, I became suspicious that it might have been tied to the network. I don't know much (if anything) about rpc.gssd's internals, but I started to suspect that it was somehow attempting to contact the KDC early on in the boot process, receiving no response (no network), and then caching that result indefinitely or until the process was restarted. Thus, I tried enabling systemd-networkd-wait-online (using netctl, not Network Manager, it appears this is the only correct wait-online service?), but it had no obvious effect and rpc-gssd still reported failures. Strange.

My final attempt late this evening, and one made in desperation as I didn't expect it might work, was to copy the rpc-gssd unit file, drop it into /etc/systemd/system as a local override, and modify its After= configuration to wait for network.target. This worked, and rpc-gssd now correctly refuses to start up until the network connection has been established, and the automount shares are accessible. However, this feels like a hack, and I'm certain there's a "more" correct solution out there that I'm missing because I'm too stupid to see it, it's too obvious to notice, or I've been mentally bogged down with this problem such that either of the former issues are precluding me from resolving this more appropriately.

Before coming to this resolution (After=network.target ...), I tried changing my netctl configuration away from DHCP (which I also use to assign addresses) and using a static assignment. That didn't change anything either. For what it's worth, I'm running IPv6, but the virtual machine I configured was a) also receiving address assignment via DHCP and b) was also using a mix of static and autoconfig IPv6 addresses. Hence this is probably better left as a footnote of interest and unlikely to be a contributing factor.

I can't help but feel that systemd-networkd-wait-online would be the correct solution, but looking at the unit file, the Before/After declarations seem suggestive--and might explain why it doesn't work. Further, enabling this service on the VM (by accident, when I was attempting to replicate the issue) actually caused a ~2 minute delay before the mount would succeed. I'm at a loss why I'd have this problem on one system out of 4 or 5 other machines similarly configured (excluding non-Arch installations), but I can't exclude the possibility of PEBKAC-induced faults. I certainly don't remember changing anything recently and this is a particularly recent problem. Maybe it's a good time to stick my /etc in some kind of VCS so I can have a paper trail in case I do anything outrageous.

Any advice? Or should I chalk this up to a one-off error and just stick with the unit file workaround?


He who has no .plan has small finger.
~Confucius on UNIX.

Offline

#2 2016-03-28 16:44:48

Zancarius
Member
From: NM, USA
Registered: 2012-05-06
Posts: 207

Re: Unusual NFS + Kerberos + Automount issue (with workaround)

It occurred to me this morning (shower thoughts) that systemd-networkd-wait-online is unlikely to do anything unless the unit files are well behaved and actually require one of the network targets (I think?). I haven't examined all of the NFS-related units, but this may explain why it had no effect. rpc-gssd relies on "var-lib-nfs-rpc_pipefs.mount" which in turn relies on "systemd-tmpfiles-setup.service." Guess that explains it!


He who has no .plan has small finger.
~Confucius on UNIX.

Offline

#3 2016-03-29 07:44:40

sultanoswing
Member
Registered: 2008-07-23
Posts: 316

Re: Unusual NFS + Kerberos + Automount issue (with workaround)

Holy self-resurrected zombie thread, Batman!


Arch on: ASUS Pro-PRIME x470, AMD 5800X3D, AMD 6800XT, 32GB, | ThinkPad X1 | ASUS ux303ua | Surface Laptop 2 | Minisforum UM780

Offline

#4 2016-03-29 14:15:49

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 20,487

Re: Unusual NFS + Kerberos + Automount issue (with workaround)

sultanoswing wrote:

Holy self-resurrected zombie thread, Batman!

9 hours to zombie status?  Or did you mix up the registration date(2012) with the post date?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
The shortest way to ruin a country is to give power to demagogues.— Dionysius of Halicarnassus
---
How to Ask Questions the Smart Way

Offline

Board footer

Powered by FluxBB