Server :: Deadlock In NFS4 / Shared Servers Suddenly Stop Responding And Cant List It From Debian 5 Server?
Jan 4, 2010
I am connecting servers using NFS4 the shared directories are on servers running Debian 4 while the one who read from them is Debian 5.0.3. The problem is one of these shared servers suddenly stop responding and you cannot list it from Debian 5 server, also df hang, and the web application that is using it does not respond to requests that use this shared directory since it is blocked. Then the load on the server start to increase until the server cannot respond (over 90). I have found many entries in the syslog that refer to this like:
ma25555 kernel: [1200285.732919] nfs: server 10.xxx.xxx.xxx not responding, still trying
Dec 31 08:16:33 ma25555 kernel: [1200289.815378] INFO: task java:9702 blocked for more than 120 seconds.
Dec 31 08:16:33 ma25555 kernel: [1200289.835249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
code....
I have tried the connection between the 2 servers using ping for one day and all are OK (zero lost)
There are 3 other servers that are running Debian 4 and are working fine.
i'm trying to setup a nfs4 server and client. i followed the instructions in
[URL]
The SERVER is on 192.168.89.1 running Xubuntu 10.04, and the CLIENT is on 192.168.89.128 running Ubuntu 10.10. Firewall is disabled on both the server and the client for testing purposes. /etc/default/nfs-kernel-server on the SERVER:
Code:
# Number of servers to start up RPCNFSDCOUNT=8 # Runtime priority of server (see nice(1))
[code]....
because we want UID/GUID to be mapped from names. This way, server and client do not need the users to share same UID/GUID. In that case,
1. Should i set those 2 fields to "no" and "yes" respectively instead?
2. Or else, how do i make sure that the uid on the server is mapped to something useful on the client instead of nobody and nogroup?
I installed F14 and I have been having some issues with the usage of the ram memory. Here is the situation: while working with firefox, amsn, and VLC, suddenly the system stop responding, I just can manage to check system monitor and the ram is at 89% (of 1GB) and the swap at 50% (of 1.4GB). I can do nothing then I just switch off the machine. After that using the same applications everything is fine. I was using F12.
I have a strange problem where my machine will suddenly stop responding to connections while still remaining partially connectible. Here's an example sequence of events I have taken (numerous times). If anyone can spot something else I should check or has seen these symptoms before please let me know.1) System is using an Edimax 7318-usg, Ubuntu 10.042) Wifi connection is working, with power management turned off (iwconfig)3) Signal strength does not cause the connection to drop (confirmed with router logs)4) When in a ssh session on the problem machine suddenly the ssh session is dropped "host is down"5) When I attempt to re-initiate ssh it fails, no route to host...6) Pinging it fails, accessing a webpage hosted on it fails, all access fails7) The machine I'm pinging/sshing from still has a good network connection8 ) I check my routers DHCP leases, problem machine still has a valid lease.
9) Router still shows problem machine in routing & arp table & shows active traffic going to and from (I checked the routers states and bandwidth monitor for this)10) I try sshing and pinging the problem machine from the laptop on the local network, it still fails, no route to host, wtf?11) I go to the problem machine physically, wifi connection is up with decent signal12) ifconfig indicates it still holds an IP address13) I open a browser on the problem machine, any webpage loads fine14) I ping the laptop from the problem machine, ok15) I ssh from the problem machine to my laptop, it works16) At this point I ssh from the laptop to the problem machine, sometimes it works sometimes it fails(it seems that pinging/sshing the laptop FROM the problem machine sometimes makes it connectable again).
17) Usually if I wait a while the problem machine will randomly become connectable again at random intervals18 ) The only reliable way to make the problem machine connectable again is manually reconnecting it to the wireless network19) I tailed auth.log ufw.log, etc. nothing suspicious disallowing my ssh connection20) dmesg shows nothing unusual, as far as I can tell, just some activity negotiating WPA keys and suchBy looking at the timestamp for the WPA negotiation and cross referencing the router bandwidth monitor I can tell the WPAnegotiation doesn't make it drop the conectionSo that's it, lol.
I'm trying to setup a NFS4 server (no security, local home network behind FW). It seems that I'm missing something because 'rpcinfo -p' does not list v4 for NFS: petit-pois:/home/eric# rpcinfo -p
I recently decided to install Ubuntu and I went through all of the steps to set up a subversion server using svnserve. It all works except for one thing: it stops responding to remote computers after a while.I have the server on a 4 port wireless router/switch in my dorm room. I set up port forwarding so that the server receives all traic over the correct ports. The subversion server AND the default http server on Ubuntu work fine always as long as I access them from a computer connected to the switch in my room.The problem is that if I try to access it from anywhere else, it won't respond at all unless it has been accessed by a computer on the switch recently (something like 30-60 minutes). So to fix it I have to go back to my room and access it from a computer on the switch. Then it will work from anywhere for another 30-60 minutes.
I run a mediaserver on Archlinux, working perfectly (or almost). I have set up NFS v3 and that worked for me on these clients:
- Debian Lenny - Archlinux 64bit
Now I've upgraded my Lenny-box to squeeze and I see that 2 of my 3 shared folders (tdone and twatch) are mounted like they should and the third one (media) doesn't come up. A 'mount -a' as root gives this error: mount.nfs4: access denied by server while mounting (null) My relevant fstab-lines:
I just installed ubuntu as a learning tool. It was going fine until the next day after the install. This happened on two reinstall of ubuntu server 10.4. All was installed was lamp, squid, privoxy, tor, shorewall, and phpmyadmin. I have added 3 websites to the server.
Today, I noticed my internet stopped responding on my other pc which is linked to my server for internet access( privoxy). When I turned on the server, I noticed the screen was purging out random lines. I don't know what they mean, but it was a never ending series of lines filling the whole screen. To stop it, I ctrl+c. Some of these lines read usb, keyboard, mouse, etc. I am guessing the system went hay-wired.
When I reboot the machine, it did it again. This time I don't have the cmd prompt. I didn't know another other way to properly shutdown so I just cold reboot the machine. The next time it booted up, all seems normal like it always have been. My sites, internet is working.
I probably didn't provide enough info to diagnose this behavior. As mentioned before, it happened the first time I installed the server and it is happening again with the same applications installed.
Edited: It happened again. Some of the lines I could make out are these.
Code: Bug: unable to handle kernal NULL pointer deference Kernal panic-not syncing: Attempted to kill the idle task! Error panic occured: Switching back to text console. The server froze at this point.
It's been awhile since I posted anything which is a good sign my install has been working well and I have been able to handle most everything. However, I'm not able to handle this issue. I recently installed F11 and everything went well. But, when trying to see my other computers on the local network, I cannot. I receive this error message: Unable to mount location Failed to receive shared list from server. I understand the message as it is obvious, but do not know how to fix it.
I have a Mac G3 and Squeeze 2.6.32-5-686 on an ethernet wired. Works fine with Lenny kde but only in one direction with squeeze kde4. Mac pings squeeze box ok but response is "server may be down or offline". Squeeze connects to Mac normally. Squeeze installed with server option. How do I enable eth0? I need some direction
I have been having off and on issues with my samba file shares. I am sharing a NTFS formated hard drive where the mount point is in my home directory, as well as a printer connected via USB. I am to the point where printing works (using it as an ipp print share, samba is configured for it, but I don't know if it works or not), and I can access the shared folder from Windows, but I can't access the shared folder from any Ubuntu machine. I get the error:
I am going to install Oracle RAC on two Servers, With shared SAN storage (Servers and Storage is IBM) OS = RHEL 5u5 x64 bit
And we used multipathing mechanism and created multipathing devices. i.e. /dev/mapper/mpath1. Then I created raw device /dev/raw/raw1 of this /dev/mapper/mpath1 Block device as per pre-reqs for Oracle Cluster. Every thing looks good, But we faced the performance issue as under.
when we run command : #dd if=/dev/zero of=/dev/mapper/mpath1 bs=1024 count=1000 the writing rate is approx. 34 MB/s But If we run command #dd if=/dev/zero of=/dev/raw/raw1 bs=1024 count=1000 the writing rate is very slow like 253 KB/s
I recently switched from Ubuntu to Fedora to try Fedora out, but I cannot connect to my windows servers on my network. When I click on windows network it says "failed to retrive shared list from server".
I have tried 504 i386 and 505 i386 both downloaded on an Athlon on an Abit board 20Bb hd old Hitachi monitor both load, but do not complete the boot up they suddenly stop and the monitor goes off only the setup is ever displayed the hardware is working, i just had another system on it.
I've got a machine running Ubuntu Server that is on several VLANs. Each VLAN has its own subnet and the server has an address on each subnet. The switches are set to allow tagged traffic to the server for each VLAN that it is on. Switch ports ending with workstations are given untagged ports on whatever VLAN is appropriate. Workstations are given addresses on a subnet for each VLAN via DHCP. All this works great and hosts on any subnet/VLAN can access the server as normal via its address on that subnet/VLAN.
Accessing the machine by its address on a non-local subnet is where I run into a problem. Inter-subnet traffic has to go through a router, which has been set up appropriately. Running tcpdump on the server and pinging it from a workstation on a subnet, using its address on a different subnet, shows the server receives the ping, but sends no response:
Code: sudo tcpdump -i vlan4 -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
So I have a few Ubuntu (Hardy till I can find a replacement for Xen) boxes that I am trying move from nfs3 to nfs4.I set it up according to this guide: URL...However I ran into trouble when the client see's all users/groups as nobody/nogroup.The current set up is that all the boxes have synced uids/gids and all users with root access can be trusted. I read some reports that said the only way this could be fixed was by using Kerberos. However I would really prefer not having to move to Kerberos as I have heard that it is very intensive to set up. So what I am looking for here is a solution other than sticking with nfs3 or putting everything on Kerberos. However if you think that Kerberos is easier to set up than I am giving it credit for then that could be useful to hear as well.
My plex media server suddenly stopped working after years of no hassle use. This is the output of cat /proc/version = Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.65-1+deb7u2
My problem is similar to this forum thread viewtopic.php?f=5&t=121945 where I can get it to start by running Code: Select all/usr/sbin/start_pms & but it's not my normal server configuration. I added a couple files to it and it seems to work except it doesn't start when I reboot. I have to run that command again.This is the init script I've been running for years...
Code: Select all---- #!/bin/sh ### BEGIN INIT INFO # Provides: plexmediaserver # Required-Start: $remote_fs $syslog $networking # Required-Stop:
[code]....
My Debian skills are lacking to know how to proceed with troubleshooting what could have happened. Everything else seems fine, I just can't seem to get this application to run at startup or get back my old configuration...
Can list options available for server redundancy? (i.e. fault tolerance methods so that if a server goes down another server can take over). Also is it possible to implement RAID1/disk mirroring across servers?
5.5 (64bit). My server goes into non responding state where I can't telnet, ftp or ssh server whereas PING is the only thing which works at that time. I need to hard reboot server to get it back online. This is happening very frequently in fact yesterday and today server went inaccessible. which log file should I look into to find out the reasons of this inaccessibility?
i'm trying to setup a nfs4 server and client. i followed the instructions in [URL](nfsv4 quick start section) and [URL] The SERVER is on 192.168.89.1 running Xubuntu 10.04, and the CLIENT is on 192.168.89.128 running Ubuntu 10.10. Firewall is disabled on both the server and the client for testing purposes. /etc/default/nfs-kernel-server on the SERVER:
Code:
# Number of servers to start up RPCNFSDCOUNT=8 # Runtime priority of server (see nice(1))
[code]....
On the [URL], i see some steps related to portmap on the "NFS Server" and "NFS Client" sections. Would i need those steps as well? There's also a list of steps on [URL] (linked from [URL]. Are those necessary?
EDIT: Running showmount on the client seemed to show that NOTHING is shared on the server:
I'm having a very strange problem with my ubuntu apache2 server running wordpress. i want do download media files (from within a flash-mp3-player onsite or by link [url]) but the file transfer just stops after a while. (at least sometimes) at random positions. after that i have to clear the browsers cache and try again.
It is really annoying, though it is my band's website and we want to share our songs with our friends. i checked from several clients, seems to happen everywhere (linux, mac or windows clients)
I've had a server running for over a year and it's been very stable. Suddenly (no config or software changes) it's behaving oddly. If I perform a network related activity, it appears to effect other separate network functions. Example: I get onto the machine using realvnc server and suddenly port 80 goes down, or I can access port 443 inside the network but not externally, or I can get to 443 but not 80. The server has two network cards. Switching to the second card appears to have had no effect. I also tried rebooting, switching in a new network cable, bouncing the network card (which does re-establish services, but not consistently). I have several servers and this is the only one misbehaving. It's running Debian Etch (as are a few others).
I am using CentOS 5.5 (64 bit) on a Quad core server having 8 GB RAM. This server has MySQL server 5.1.47 installed. THis server goes non responding state after every 20 to 60 days. SSH, telnet doesn't work at that time however PING works fine. I have to hard reboot the server to get it back on track. Can anybody let me know what logs/files should I look into to find out that what happens to the server when it went to non responding state?
I have a WD Caviar Green 2TB drive installed into a NAS server. Without any explanation, randomly, usually after several days of uptime, the drive stops responding completely to the SATA bus.The syslog shows:
i just one to emulate the windows 2003 - windows XP easy VPN deployment, with my ubuntu server.I got my server side (ubuntu) and client side ( openVPN gui) and everything looks okbut now, i cant make a //server/SHARED and get from my house to the office's docs, despite the conection its ok... whats wrong?
I am running Debian (Lenny) as my file server on a Windows network. I have multiple users accessing shared excel worksheets. I am the owner of these files and have no problem but if I save a shared workbook the file properties on the Linux server change ''Groups - Read Only, Others - Read Only''. I can reset them logged in as root but they change back every time I save the file from the XP machine (XP Home SP3). All other non-shared worksheets are fine.