General :: NFS Failover - Setting Up Two Server With Shared Storage With Automatic Failover If One Fails
Feb 14, 2010
What is the current state of NFS failover (i.e. setting up two server with shared storage, with automatic failover if one fails?) I've seen a cookbook, but no details that would let me assess how well it works. There are lots of complex issues with data consistency, but the detailed information on that is years old. Our needs are fairly simple: 2 servers, a shared array, and I'm reasonably sure that we don't use locking. However we'd like failover to work reliably without loss of data.
My question is about setting up SMB and AFP failover between two servers. The plan is to have two servers both running CentOS with one acting as a primary node and one as secondary failover node. I have never set anything like this up before. In the past I have always worked with SAN's primary XSAN/StorNext. Both of which handle failover pretty much automatically. Unfortunately there isn't the budget on this job to install a SAN. Also this is only for temporary use for a week in a production office.
My thoughts where to run the two servers and use rsync on a cron tab to keep the data synchronised between the two. In an ideal world clients would log on to the primary and if that fails, seamlessly moved over to the secondary. I'm guessing however this is not possible outside of a SAN environment. So keeping the two servers synced and the clients manually moving over to the secondary manually is, I'm guessing, my only real option.
I'm in need of some advise from you guys. I'm currently running a live production serverA, and last week it went down for a couple of hours which was really bad to say the least.
I've been thinking about building a mirror serverB that will rsync my data nightly. Now I don't want to load balance here, I just need to be able to switch to serverB when serverA goes down for any reason.
Would the best solution for this is to change my main nameserver entry when I want to switch ? I'm just curious if it will be a few hours or an instant change.
I am using Centos. I have read places that you can use Drbd + heartbeat + nfs to make a simple failover NFS server.I can't find any document that works though. I've tried 20 or so, including some Debian ones.So, does anyone have any other ideas of how to do this?Point me in the right direction please.I want 2 nodes. One to be actively serving an NFS share The other to be ready for failover. If the first one goes out, the second takes over.Meaning, the filesystem is in sync, the IP must change, and NFS must come up
We are having a production setup where we are having one SAN storage and two RHEL machines. Now we have created a SAN LUN, say for example trylun. Now we have mounted the same SAN partition on both the machines of RHEL on the same mountpoint path say for example /trylun. After that we have installed RHEL Cluster suite to create a failover cluster.
Now we will be having one Ingres Database service for which data will be stored in SAN storage LUN mounted on both the machine say for example /trylun. When service on one machine will be down then RHEL Cluster Suite failover cluster will takeover and then it will start the same service on another node and handle the failover. Wether Ingres will run from node 1 or node 2 will not make any difference as both are using shared SAN storage i.e /trylun in our example. So same data storage will be used by both the ingres service on both the servers.
Now I have to simulate the same in my office test environment. But the problem is, in office test environment I will not have SAN server as it is additional cost. And I will have fedora operating system.
So I wanted to know is how can we create a shared file system like SAN in fedora (Is NFS a solution). And after creating the shared file system how can we create a failover cluster in fedora if we do not have Red Hat Cluster Suite.
I am familiar with windows 2008 cluster servers, and I just started testing with centos cluster. I am creating a simple 2-node cluster, for a simple ping test.
So far, I can ping a virtual ip, and manually relocate it between the nodes, but I didn't figure out, how to do this automatically. So this is my question: How can I setup the cluster, to it automatically failover the a service to another node case one node fails?
We have setup a High Available Cluster on two RHEL 5.4 machines with Redhat Cluster Suite (RHCS) having following configuration.
1. Both machines have Mysql server, Apache web server and Zabbix server. 2. Mysql database and web pages reside in SAN. 3. Active machine holds virtual IP and mounted shared disk. 4. We have also included a script in RHCS which takes care of starting Mysql, Apache and zabbix server on the machine which turns active when cluster switches over.
The above configuration holds good if Active machine goes down as a result of hardware failure or Reboot. What if, If any one service say Apache/Mysql/zabbix running on active hangs or become unresponsive.How can we handle this scenario ? Please advice.
I currently have 9 physical servers that I wanted to condense to 1 physical server with 7 VMs using Xen. The only issue that I have with doing this is that if the server happens to fail due to hardware problems, I am going to have a major issue. I wanted to set up a two node cluster so in the event of something happening to one of the servers, it will automatic failover to the next.
With so many ways to cluster servers, does anyone have any suggestions on the way to perform my needed task. The OS for Xen will be Red Hat and the VMs will have many versions of linux. Some of the VMs will have mysql, apache, dns, and postfix running.
I don't have much experience in clustering. And I'm deploying a cluster system on CentOS.But I don't know how long a node failover and another node take over those resouces to continue running service is good, fast or slow? 1s, 10s or
I have a dual-homed Debian server running squid, but not acting as a router. Simplied network diagram is below - there are other local hops between the gateways and the Internet.
Code: (eth0 @ 192.168.44.2) <--> (Gateway1 @ 192.168.44.1) <--> Internet (eth1 @ 192.168.55.2) <--> (Gateway2 @ 192.168.55.1) <--> Internet
Using Gateway1 gives a very fast, but not always reliable route to the Internet. Using Gateway2 gives a slower, but more reliable route to the Internet. The server uses Gateway1 as the default gateway.
I have written a script that pings three hosts on the Internet, and if all three are down, switches the default gateway to Gateway2. This part seems to be easy, but I'd like know if there is a way of routing a ICMP/ping out eth0 to a host, with all other traffic to the host going out eth1, so I can determine if the Internet is reachable via Gateway1 again.
Have been tasked with a couple of Sunfire X2100 that I am slapping Fedora 11 onto for some high profile tasks around the office. Have two drives of the same size in each server and would like to have the two disks mirrored for redundancy. Admittedly I am new at Linux administration and am feeling over my head.
1. Can this be managed during the installation process of Fedora 11? 2. If yes, let me know the step by step please. 3. If no, I take it a cron job of rsync is going to be my best option. 4. Alternatives insights etc.
i have set up apt-cacher-ng and it is working fine.I have configured the clients using the synaptic gui, (configuration-preferences-network-set proxy...)Is there a way to configure the clients so that they use the proxy when it is available (at the office) and they fall back to using no proxy when the proxy cannot be reached?We have a lot of laptops at the office which are on the road a lot, in the office i want them to use apt-cacher-ng for updates and installation, when the users are not in the office they should be able to install software without having to change the synaptic conf...
My friend has a server with 2 ips, 1 primary and 1 secondary/failover. He has given me a shell account and I want to use ssh to route my home http traffic through it like a socks proxy. I connect to his server using the secondary ip like this:
ssh me@secondary_ip -p port -D forwarding_port
It builds a proxy, however it uses the primary ip of the server, not the secondary ip that I logged in with. When using irssi I've bound it to the secondary ip with no problem. If I try to use the -b flag I get the error: cannot bind: Cannot assign requested address.
how I can bind the ssh tunnel to the secondary ip?
We run redundant switches that two nic's on each server connect to. We also run bonding on our servers. Because we have two switches, we can't run lacp or anything. If a switch goes into a crashed state where it doesn't pass traffic but still provides link, bonding thinks the interface is still up and thus will still send traffic through it. Does anybody know a better way to configure the fail over of the interface? This would be a similar situation to somebody using a media converter.
I have a linux machine, attached to a fibre channel SAN.
We're in the testing phases and we're attempting to get all of the bugs worked out before this goes live.
If i have my host streaming data to the storage device on the san (or from the device on the san) and simulate a path failure (by shutting down one of the host's ports on the FC switch), multipath does not pick up on another path until about 45 seconds have passed.
I can verify this by watching the statistic graph (which updates once per second) on the storage system.
I see iops running along rather nicely, and then they drop to 0 for 45 seconds, then pick right up to normal again.
This is a RehHat EL 5.5 system, with qlogic HBA's.
Am i being too picky? I'd expect multipath to recover in under 30 seconds, so as to not alarm host applications running on the linux host... 45 seconds seems like a long time to wait for a disk operation to complete.
Any tips on tuning Multipath, or the qlogic card? As it is, i've got the following options in my modprobe.conf.
I've been reading up on clustering a bit and want to create simple 2 machine cluster of which the 2nd one will turn on and take over traffic at the previous machine's IP (I'm aware there will be a couple minutes of downtime while the failover server boots). I've looked into doing a simple DNS round robin, but I think both machines would need to be online all the time for this to function correctly. I've also looked into linux-HA and GlusterFS but both seem a little intense for the simple failover I'm trying to achieve. One catch if approaching with DNS round robin... the server that I want to setup with a failover also operates as a DNS server. Anyone know the simplest way to complete this? One idea I had (that seemed like a bit of manual work): Create script that runs every 60 secs through CRON on firewall (running pfsense) and have it check if master-server is up. If it's not, it will send a WOL signal to the slave, which has a mirrored image of the first. the problem I have with that is I will also need to create a script to keep both synced properly which seems almost impossible if one is shut down until failover (my thought was periodic rsyncs.) Perhaps creating a FS to direct all files to for the master/slave would be the only way around that.
I am trying to setup multipath with failover policy on openSuSE 11. I have two qla2xxx HBA's installed and they appear to be working. Here is the output of "multipath -l" command
[Code]....
While testing, I pulled one of the two connection to SAN, and the connection failed over to second HBA connection to SAN. When I plug the cable back in, it does not fall back to original connection... It stays in failed state. Also, I noticed that failed disk (sdd disk) comes back as (sdg disk), which is probably why connection does not fall back to original HBA. But, when I run "/sbin/service multipathd restart" sdg disk shows as as enabled in multipath -l...
I am going to install Oracle RAC on two Servers, With shared SAN storage (Servers and Storage is IBM) OS = RHEL 5u5 x64 bit
And we used multipathing mechanism and created multipathing devices. i.e. /dev/mapper/mpath1. Then I created raw device /dev/raw/raw1 of this /dev/mapper/mpath1 Block device as per pre-reqs for Oracle Cluster. Every thing looks good, But we faced the performance issue as under.
when we run command : #dd if=/dev/zero of=/dev/mapper/mpath1 bs=1024 count=1000 the writing rate is approx. 34 MB/s But If we run command #dd if=/dev/zero of=/dev/raw/raw1 bs=1024 count=1000 the writing rate is very slow like 253 KB/s
I built a CentOS 5 Linux cluster with GFS storage using local RAID volume and share it with gnbd_export/import on two web-servers. Now I need to expand that storage to another servers local volume. I saw the picture in the manual, but I don`t know, how can I create that scheme.
I can use gnbd_export on the second server and gnbd_import on the first. In that case I will have two volumes on the first storage and I can expand volume group, logical volume, etc on it.
I used to have three partitions: Win7, Fedora 12 and a NTFS-Storage-Partition.Now I had to reinstall Win7, it doesn't see any files on the storage partition;Windows shows the whole rest of the disk including the linux partition as one drive with 7gb of 80gb free, I can also open it but then there is nothing in it.Linux is still working and everything on the storage partition is still there and accessible in linux.any suggestions? do i have to tell windows the partitions or does it have to do a scan or something?
Ive been asked to investigate presenting the same SAN LUN to two or more RHEL5 hosts. The hosts are providing independent applications so theyre not clusters from an application perspective. The shared storage location would be used as a common area for imports/exports. Were hoping to reduce file transfer times between the hosts by eliminating the need to copy the files between two storage locations. Some of our hosts run Advanced Server and some are standard. Is there a file system that I can use that will allow multi-host access without running advanced server with clustering services on all hosts?
I was lurking about my hidden files and noticed the ".thumbnails" directory. I had been browsing the web for about an hour and had opened some image files from my drive.HOLY CRAP! I had over 2000 thumbnails in that directory! Every time an image is displayed, a thumbnail of it is created? Really? Seriously, I don't care what the reason behind this "feature" is, how do I stop it? It is simply unnecessary and a waste of disk space.
I have a server running CentOS 5.5 with KVM capabilities. I need to migrate all the VMs to another server with the exact same hardware specs. The problem is it is running on individual harddisks, not shared storage. What is the best way to migrate to minimise downtime?
I'm planning on setting up a home file server. I was wondering what platform would be recommended for something like this. The server would be used mainly for media storage which would be shared between an HTPC and a couple desktops and laptops. I was thinking of just getting whatever motherboard had the most SATA headers on it (which currently seems to be something P55-based) and setting up a RAID5 fakeraid with some 1.5 or 2TB drives and the OS in RAID1 with whatever drives I have laying around. It there anything flawed with this approach? P55 boards with 10 SATA headers are currently upwards of $200, which is kind of pricey. Is there a more economical route that I should consider? Also, are there any known problems with setting up a fakeraid like this using certain motherboard's SATA controllers?
I need a cluster-safe filesystem for a SAN shared storage in Slackware. Red Hat's GFS is the best/only solution?Given GFS support in a custom kernel, what about tools (mkfs,mount...)?
I am working on the beginning of implementing a two-node cluster with shared storage (GFS) and IP address. Both machines are virtual on VMware ESX 3.5, that should not make a difference, but that is the background.current status is that I have a single node cluster built with only the IP address configured within the cluster. The issue that I am having is that I have configured a service to contain only the IP address resource, however, when I go into cluster management that "service" does not register. As such, I cannot bring it online, ping it, etc. below is my cluster.conf configuration:
I have a device that is only accessible using wireless. By default it starts an ad-hoc wireless I can connect to. The problem is there is no internet access on the ad-hoc. So I'm connecting it to my router using the commands (its a gnu/linux):
I want to add it to start up script but I don't want to get locked out of the device forever if something happens to the router (its really old). Is there any way to check if the connection failed and create an ad-hoc instead if it failed?
The Update Manager has tried to upgrade me to version 11 three times now, and fails each time.
First I get a warning, that probably isn't important: "Third party sources disabled
Some third party entries in your sources.list were disabled. You can re-enable them after the upgrade with the 'software-properties' tool or your package manager."
After the upgrade I get this:
"Could not install the upgrades The upgrade has aborted. Your system could be in an unusable state. A recovery will run now (dpkg --configure -a).
Upgrade complete The upgrade has completed but there were errors during the upgrade process."
The console has about six messages similar to this, all complaining about the vnc-e package:
"warning, in file '/var/lib/dpkg/available' near line 10992 package 'vnc-e': error in Version string '4.4.3_r16583': invalid character in version number"