You are not logged in.
Hi everyone:
I'm posting this in the hopes it might be useful to someone else. If later versions of gssproxy solve the problem, I'll mark this thread as solved. Until then, I'm having to use a workaround. Also, I should note that this problem is not related to the systemd automount issues I encountered back in March as I was able to duplicate the problem in a clean virtual machine running as both client and server (minus the changed unit file, obviously).
The problem started after I updated my server which is running as both an NFS host and Kerberos KDC. Kerberos works fine, and I have no trouble with GSSAPI clients (such as SSH). NFS, however, began failing mysteriously with "permission denied:"
[client:/mnt]$ sudo mount -v -t nfs -o sec=krb5 192.168.5.1:/home home
mount.nfs: timeout set for Wed Jun 1 13:10:31 2016
mount.nfs: trying text-based options 'sec=krb5,vers=4.2,addr=192.168.5.1,clientaddr=192.168.5.100'
mount.nfs: mount(2): Permission denied
mount.nfs: access denied by server while mounting 192.168.5.1:/home
rpc-gssd also failed with clues that suggested the source of the problem may have been Kerberos:
Jun 01 13:08:31 client rpc.gssd[676]: creating context with server nfs@server.[redacted]
Jun 01 13:08:31 client rpc.gssd[676]: WARNING: Failed to create krb5 context for user with uid 0 for server nfs@server.[redacted]
Jun 01 13:08:31 client rpc.gssd[676]: WARNING: Failed to create machine krb5context with cred cache FILE:/tmp/krb5ccmachine_[redacted] for server server.[redacted]
Jun 01 13:08:31 client rpc.gssd[676]: WARNING: Failed to create machinekrb5 context with any credentialscache for server server.[redacted]
(Machine names changed to reflect their purpose and do not represent the names of actual machines in service.)
When I downgrade to gssproxy 0.4.1 (from v0.5), NFS resumes working with Kerberos.
I found a bug report for gssproxy 0.4.1 that surprisingly matches my problem fairly closely despite being for an earlier version, and while it appears to include a suggested workaround for problems related to v0.5, I've tested it and it doesn't work since it would seem gssproxy's configs have changed somewhat between the two versions (the [gssproxy] section header is no longer needed?). Between 0.4.1 and 0.5, the nfs-client and nfs-server settings appear to be the same with the only exception being that 0.5 splits them into separate files. I suspect configurations aren't the source of my woes, and it's worth noting that debugging output from the broken version of gssproxy isn't especially helpful because it's identical between working/broken versions and has no hints of trouble (changing the debug level doesn't appear to do anything):
[server:gssproxy]$ sudo gssproxy -i -d
[2016/06/01 19:18:17]: Debug Enabled (level: 1)
[2016/06/01 19:18:17]: Failed to get peer's SELinux context (95:Operation not supported)
[2016/06/01 19:18:17]: Client connected (fd = 9)[2016/06/01 19:18:17]: (pid = 1594) (uid = 0) (gid = 0)[2016/06/01 19:18:17]:
[2016/06/01 19:18:26]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "nfs-server", euid: 0,socket: /run/gssproxy.sock
I don't believe the problem is related to FS#49242 or FS#42635 as I'm not actually having problems with gssproxy starting.
I'm about out of ideas beyond sticking with an earlier version of gssproxy until this is resolved, but I'd be willing to test other suggestions. I'm reluctant to chime in on upstream's Bugzilla, because I'm not sure if I'm going crazy and overlooking something obvious. To the best of my knowledge, I'm using stock out-of-the-box configurations for gssproxy and company, and the only obvious deviations are for my Kerberos configuration and NFS exports.
Edit:
See lompik's post toward the end of this thread for the more appropriate solution once upstream's fixes trickle into the repos.
Last edited by Zancarius (2016-06-06 17:34:18)
He who has no .plan has small finger.
~Confucius on UNIX.
Offline
THANK YOU for posting this!!!
I thought I was the only one! Especially, since this is 'old' technology and thousands have gotten this to work in the past before me, so when I ran into trouble setting up my first NFS+Kerberos setup, I thought it _must_ be my fault...
The error message and total lack of errors in the logs has been quite *irritating*. :-)
Anyways, downgrading gssproxy as suggested worked like a charm.
Offline
Hi, digitus:
I'm glad it worked for you! I just wish we had a more permanent fix.
I suspect this will eventually work its way out upstream given that gssproxy is relatively new (it landed in Fedora circa 2013), but this certainly isn't the first time I've encountered breakage with it. In the old days with rpc-svcgssd, things generally Just Worked™, but recent changes have been moving us in a different direction. I assume part of it is that gssproxy allegedly offers a higher level API with easier integration. I have read, for instance, that it reduces reliance on reading keytabs directly since the work is now performed by the daemon, but being as I've never used it directly in code I've written, I cannot comment further on its implementation. I think these changes are fundamentally a good thing, but the breakage is infuriating when it occurs because the error messages are so opaque, generally leading you to believe configuration faults in unrelated subsystems.
Curiously, on all my installs, the rpc-svcgssd.service systemd unit still exists, but the binary itself is missing. I was surprised by this until I learned (if I'm remembering this correctly) that the kernel will select between rpc.svcgssd and gssproxy for GSSAPI-based authentication whenever the first share access is authenticated, and I think the unit file is more of a migration tool than anything else. Anyway, I can't find the reference now about the interaction between rpc.svcgssd and gssproxy, but I think I encountered it while reading or attempting to find the gssproxy documentation or one of a dozen other rabbit holes I fell into.
I really wish I had better news for you than to resort to a downgrade. I might suggest adding gssproxy to your /etc/pacman.conf IgnorePkg list to prevent it from upgrading, but it will eventually break should any of the dependencies change. For that, there's always ABS (and some manual work or SVN) and rebuilding it ourselves, I suppose.
Should you find anything, I'd be really happy if you could post here (unless this thread is old; if so, create a new one and link to this) and perhaps ping me via email (my username at gmail works fine) to let me know. I'm really interested in more permanent solutions.
He who has no .plan has small finger.
~Confucius on UNIX.
Offline
Hi
Thanks for reporting this. I just found myself in a similar situation.
I 've had an interesting case where I could not mount a krb5 NFS share from one server(centos 7) but it worked on another server(fedora 23). So I decided to look into the code but was not able to isolate the issue. The lack of debug messages is normal. The verbosity level is not passed to the underlying library (in this case libtirpc). Arch makes it easy to rebuild this library .. just need to force debug logging. I filed a bug upstream so hopefully it will be fixed. I was getting the exact same messages as Zancarius. I also downgraded to gssproxy v4.1 but was still having the same messages. Since there is no debugging output in libturpc, everyone cases could be different. In my case me the issue occurs in `gssauth_create_default` in the libtirpc library and not getting any answer from the server for gss auth (with RPC_CANTRECV errors). However I restarted the computer this morning and everything is working with gssproxy 4.1 with gssproxy (gssproxy 5.0 works as client not as server). So i can't investigate further. My guess is that the failure to create a context with server is related to the kerberos parameters: double check time sync, principals keys (kvno, host, nfs, etc..), set domain in idmapd.conf, reverse dns lookups. A restart of the server/client could help too.
Another easy way to get more debug messages si to use wireshark : select yourt interface and use filter "nfs" and check for any errors.
Regards,
lompik
Last edited by lompik (2016-06-05 15:14:54)
Offline
The verbosity level is not passed to the underlying library (in this case libtirpc).
That's good to know. I know rpc.gssd usually has to be built with debug support in order for the -r flag to have any effect, but I did not know gssproxy doesn't currently do anything with its own flags with regards to libtirpc. Thanks!
However I restarted the computer this morning and everything is working with gssproxy 4.1 with gssproxy (gssproxy 5.0 works as client not as server).
You have to restart both gssproxy and rpc-gssd after downgrading. I suspect that might be what you encountered and why it worked following a restart.
I've noticed the same thing, though: gssproxy 0.5 is fine on my clients. It does not work on my NFS server.
My guess is that the failure to create a context with server is related to the kerberos parameters: double check time sync, principals keys (kvno, host, nfs, etc..), set domain in idmapd.conf, reverse dns lookups. A restart of the server/client could help too.
I don't believe this is correct. In my case, Kerberos works. I'm confident the problem lies with gssproxy.
Time synchronization issues will create expired tickets that can't be used for authentication (I just tried it out of curiosity). Other GSSAPI clients (e.g. ssh) authenticate just fine, I can obtain ticket-granting tickets (and can forward them), and incorrect service principals would prevent NFS from mounting Kerberos-authenticated shares, among other problems, with the working gssproxy version. I believe the checklist you've supplied would indicate Kerberos breakage well before NFS entered the picture, but it is useful for people who don't have a working Kerberos installation (that's not the issue we're having).
Being as context-related errors only show up with gssproxy 0.5 on my server and are absent with gssproxy 0.4 (rpc-gssd creates a working context in this case), I have a feeling they're misleading and may not indicate the true source of the problem. (Is it possible rpc-gssd is unable to create the context because of a failure in gssproxy?)
He who has no .plan has small finger.
~Confucius on UNIX.
Offline
You are right, my initial guess was wrong. Although rpc.gssd is unlikely to point out kerberos issues with libtirpc silenced.
I looked into gssproxy and it turn out they did some refactoring between v0.4.1 and v0.5.0 and made a typo in the code. This is already fixed upstream : https://git.fedorahosted.org/cgit/gss-p … bfb1aefa79. I doubt gssproxy is called in clients so thats why it was working.
Also, gssprox v0.5.0 gain a --debug-level option (this is undocumented yet) which enables tracing function calls and contexts.
Regards,
lompik
Last edited by lompik (2016-06-06 12:53:52)
Offline
I looked into gssproxy and it turn out they did some refactoring between v0.4.1 and v0.5.0 and made a typo in the code. This is already fixed upstream : https://git.fedorahosted.org/cgit/gss-p … bfb1aefa79. I doubt gssproxy is called in clients so thats why it was working.
Excellent investigative work!
Being as gssproxy is a relatively new replacement for rpc.svcgssd, I expect some breakage periodically. No doubt upstream is happy you were able to source the option causing this issue!
I'll go ahead and mark the topic as solved.
He who has no .plan has small finger.
~Confucius on UNIX.
Offline
Sorry for re-openin, but is it possible that this is an issue (still/again) in gssproxy 0.6.2?
I get the following output when running gssproxy -f -vvv
handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt1d)
krb5_use_machine_creds: uid 0 tgtname (null)
Full hostname for 'server.my-ad-domain' is 'server.my-ad-domain'
Full hostname for 'client.my-ad-domain' is 'client.my-ad-domain'
No key table entry found for client$@MY-AD-DOMAIN while getting keytab entry for 'client$@MY-AD-DOMAIN'
Success getting keytab entry for 'CLIENT$@MY-AD-DOMAIN'
INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_MY-AD-DOMAIN' are good until 1488605241
INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_MY-AD-DOMAIN' are good until 1488605241
creating tcp client for server server.my-ad-domain
DEBUG: port already set to 2049
creating context with server nfs@server.my-ad-domain
WARNING: Failed to create krb5 context for user with uid 0 for server nfs@server.my-ad-domain
WARNING: Failed to create machine krb5 context with cred cache FILE:/tmp/krb5ccmachine_MY-AD-DOMAIN for server server.my-ad-domain
WARNING: Machine cache prematurely expired or corrupted trying to recreate cache for server server.my-ad-domain
Full hostname for 'server.my-ad-domain' is 'server.my-ad-domain'
Full hostname for 'client.my-ad-domain' is 'client.my-ad-domain'
No key table entry found for client$@MY-AD-DOMAIN while getting keytab entry for 'client$@MY-AD-DOMAIN'
Success getting keytab entry for 'CLIENT$@MY-AD-DOMAIN'
INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_MY-AD-DOMAIN' are good until 1488605241
INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_MY-AD-DOMAIN' are good until 1488605241
creating tcp client for server server.my-ad-domain
DEBUG: port already set to 2049
creating context with server nfs@server.my-ad-domain
WARNING: Failed to create krb5 context for user with uid 0 for server nfs@server.my-ad-domain
WARNING: Failed to create machine krb5 context with cred cache FILE:/tmp/krb5ccmachine_MY-AD-DOMAIN for server server.my-ad-domain
ERROR: Failed to create machine krb5 context with any credentials cache for server server.my-ad-domain
doing error downcall
Offline