DESCRIPTION OF THE PROBLEM:
Every now and then on the nfs clients, one of the mount gets disconnected or somewhat not accessible
I usually find out when one of the web server’s configuration has not been updated because either the /maintenance mount or the /config mount gets disconnected.
All I have to do is type df to see that one of the mount is not working.
I used to have to kill all processs that were locked up because nfs wouldn’t let them terminate, so I change the nfs mount option to (soft) instead of (hard,intr)
Now all I need to do when it happens is type service netfs restart and things are fine until it locks up again.
I also tried adding nolock option to see if it was related to the lockd, but it still happens.
This has been going on for months and every day or so I have to fix one server or another.
This also happens ( but less often) on my mail servers
This seems to happen only on the mount /maintenance or /config .. there are crons running on the nfs clients that generates the configuration in /config from a script running in /maintenance every minutes.
On top of that I get problems with DEFUNCT lockd processes on vega when it starts happening. The more it happens the more the load gets bigger on vega because of the defunct lockd that cannot be killed.
On top of that, I cannot restart the NFS server as easily as I was because for some reason the bios stays stuck sometimes therefore if I want to restart it, I have to drive to the data center to make sure it boots back properly. (it`s something to do with the bios, not a problem related to the OS or anything)