CentOS 5 Server :: 2-node Non-shared-FS Cluster On 5.2/3?
Apr 15, 2009
I'm having some trouble configuring clustering in a 2-node cluster, with no shared FS. Application is video streaming, so outbound traffic only...The cluster is generally ok - if I kill -9 one of the resource-applications, the failover works as expected. But it does not failover when I disconnect the power from the service owning node (simulating a hardware failure). clustat on the remaining node shows that the powered-down node has status "Offline", so it knows the node is not responding, but the remaining node does not become the owner, nor start up the cluster services/resource-applications. eth0 on each node is connected via a crossover cable for heartbeat, etc. Each eth1 connects to a switch.
I am working on the beginning of implementing a two-node cluster with shared storage (GFS) and IP address. Both machines are virtual on VMware ESX 3.5, that should not make a difference, but that is the background.current status is that I have a single node cluster built with only the IP address configured within the cluster. The issue that I am having is that I have configured a service to contain only the IP address resource, however, when I go into cluster management that "service" does not register. As such, I cannot bring it online, ping it, etc. below is my cluster.conf configuration:
I am familiar with windows 2008 cluster servers, and I just started testing with centos cluster. I am creating a simple 2-node cluster, for a simple ping test.
So far, I can ping a virtual ip, and manually relocate it between the nodes, but I didn't figure out, how to do this automatically. So this is my question: How can I setup the cluster, to it automatically failover the a service to another node case one node fails?
I have a two node cluster, and a third system which has luci installed.
node1 is nfs0 node2 is nfs1
both nodes have identically the same configuration. They have a fresh installation of Centos 5.5 + yum update. I am unable to join nfs1 to the cluster, as it is giving me the following issue:
Sep 29 23:28:00 nfs0 ccsd[6009]: Starting ccsd 2.0.115: Sep 29 23:28:00 nfs0 ccsd[6009]: Built: Aug 11 2010 08:25:53 Sep 29 23:28:00 nfs0 ccsd[6009]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
I don't have much experience in clustering. And I'm deploying a cluster system on CentOS.But I don't know how long a node failover and another node take over those resouces to continue running service is good, fast or slow? 1s, 10s or
I have lack of understanding of CentOS in general. I have looked for a remedy on other forums and google, but haven't been able to find the answer. I have a 3 node cluster that was functioning great until I decided to go offline for awhile. My config is as follows: node 2: vh1 node 3: vh2 node 4: vh6 All nodes connect to a common shared area on an iscsi device (vguests_root)
Currently vh2 and vh6 connect great, however since putting the machines back online I can no longer connect with vh1. A dmesg command on vh1 reveals the following: GFS2: fsid=: Trying to join cluster "lock_dlm", "Cluster1:vguest_roots" GFS2: fsid=Cluster1:vguest_roots.2: Joined cluster. Now mounting FS... GFS2: fsid=Cluster1:vguest_roots.2: can't mount journal #2 GFS2: fsid=Cluster1:vguest_roots.2: there are only 2 journals (0 - 1) .....
I have created a simple menu driven script for our Operations to take care of the basic monitoring and managing of our production application from the back-end. Now, the script when tested in UAT environment was fine, but when deployed to production it kind of behaved oddly.hen the Operator chooses an option from the menu he is given the output and at the end is prompted to return to the main menu by using ctrl+c. In production, this return does not occur for some strange reason and the program just sits there.The session becomes unresponsive after that and I'm forced to terminated it by closing the PuTTY.I tried enabling the debug mode too (set -x) and still was not able to find any useful hints/trails as to why.
Using google with search option: cman not started: Can't find local node name in cluster.conf /usr/sbin/ cman_tool: aisexec daemon didn't start.I found this URL...I have found the config_version in cluster.conf. Unfortunately, as everyone may have noticed, english is not my native tongue so I am having trouble understanding the part "Make sure you bumped the cluster config version number". Can anyone enlightened me on what should I be doing so that I could "bump" the cluster config version?
I am having an issue with LVM on a 2 node cluster. We are using powerpath for external drives. We had a request to increase /apps filesystem which is EXT3.
On the first node we did: pvcreate /dev/emcpowercx1 and /dev/emcpowercw2 Then.... vgextend apps_vg /dev/emcpowercw2 /dev/emcpowercx1 lvresize -L +60G /dev/apps_vg/apps_lv resize2fs /dev/apps_vg/apps_lv Everything went on well , we got the /apps increased. But on the second node when I do pvs.
I am getting following error: WARNING: Duplicate VG name apps_vg: RnD1W1-peb1-JWay-MyMa-WJfb-41TE-cLwvzL (created here) takes precedence over ttOYXY-dY4h-l91r-bokz-1q5c-kn3k-MCvzUX How can I proceed from here?
I am using Centos. I have read places that you can use Drbd + heartbeat + nfs to make a simple failover NFS server.I can't find any document that works though. I've tried 20 or so, including some Debian ones.So, does anyone have any other ideas of how to do this?Point me in the right direction please.I want 2 nodes. One to be actively serving an NFS share The other to be ready for failover. If the first one goes out, the second takes over.Meaning, the filesystem is in sync, the IP must change, and NFS must come up
I have two node redhat cluster for mysql database.The problem is that after updating the packages on both of the nodes after and previously the sevices was not able to relocated on second one , even rebooting the server the problem occurs.While starting the service on second node it started on the first one.Other services are running fine on both nodes.I have checked the /etc/hosts, bonding and many more files and seems to good.find the log for reference.
<notice> Starting stopped service service:hell Oct 22 14:35:51 indls0040 kernel: kjournald starting. Commit interval 5 seconds Oct 22 14:35:51 indls0040 kernel: EXT3-fs warning: maximal mount count reached, running
Newly trying cluster configuration setup on RHEL5.3_64 bit machine.Basic Requirement :Need to Configuration GFS file systemHerewith I have shared details:
System : > I have 2 HP proliant Dl385 server. > Both system are connecting on Public network. (eth0) > I have connected eth1 - directly each other system like a Private Network (May be I am
I am trying to rock cluster for the large computing. my all slave node connected with rock cluster master node. but I want to run the graphical application on the cluster node. I am not getting this point .
I need to build a 3 node web server cluster to run a php application. Since the app requires users to login (which means a session state is to be maintained), I will be sharing sessions save path, I also need to share the application directory across 3 nodes. I having trouble deciding which cluster file system to select.
I've build a home-made small cluster built up of a master and 1 disk-less slave node.Lately it happens that the node 1 fails to start, reporting the following message:
I run a compute cluster with only a few users. Occasionally a user will accidently run a job on the master node that runs out RAM/swaps then hanges up for a while.In /etc/security/limits.conf I have set memlock to 7.5GB (master has 8GB RAM) and maybe that is what lets the machine come back rather than hanging completely? Is this the right setting to physocally limit a single user from asking for more RAM than the system has and bringing down the system? Should I set this to 2GB or so or is there something else I can do??
We have a set of two production machines running Oracle databases. There are a couple of SAN attached filesystems of which one of them on the one machine (node1) is created as ext3 and nfs exported to the second machine (node2). However,during certain conditions related to rac, the interconnect between the two nodes lose connection and due to the loss of communication the servers will reboot. The problem however is that node2 usually reboots first and by the time it starts up, node1 is not up and running yet, thus causing the nfsmount to not be available on node2. I have put my head around some options on how to get the servers to automatically resolve this, however I am posting my question here as someone might have a reliable way of managing this. My one idea is to create a script on node2 to mount the nfs filesystem, create ssh key authenticated user between the nodes and then put another script in place on node1 as part of the startup to ssh to node2 and mount the filesystem.
after few days of hard work about redhat cluster and piranah, i have done 90% hopefully,but i am stuck with iptables rulesi am attaching full piranah server screen shot of my network .please have a look and please tel me, what else to do in piranah server ...a) from firewall (Lynksys) what ip shall i forward port 80 ?? ( 192.168.1.66 or 192.168.1.50 ??)b) Currently its looks like http request is not forwarding from Virtual server to real server , what iptables rules shall i write ?(Please have a look to iptables rules)also, this link for my piranah server setupi guess i am stuck somewhere where i need some experts eye to catch it upso please look at the all the pictures , ifconfig and iptables rulesifconfig :
ifconfig eth0 Link encap:Ethernet HWaddr 00:0F:3D:CB:0A:8C inet addr:192.168.1.66 Bcast:192.168.1.255 Mask:255.255.255.0
I'm building a 3-node cluster. I have created ocfs filesystems and mounted them on the first two nodes. But while mounting them on the third node , i'm getting this error for 8 of the total luns. all these 8 luns are of 1GB size.
I've unmounted these 8 luns from the other node and tried to mount in the third node ... and then it was working and again the error occurs in the second node. My observation was for some reason these particular luns are not allowing the third node.
mount.ocfs2: Invalid argument while mounting /dev/mapper/voting1 on /voting1. Check 'dmesg' for more information on this error.
I want to configure two node cluster for qmail-toaster?? My idea is.. if one server hardware gets failed it should transfer/migrate service to other qmail-toaster server with all settings like ... domains/users/password etc etc.
I was given task to install redhat linux os on one of the compute node on server which doesn't have cd/dvd drive or usb port.I have installation media as well ISO image. This server is on network, so I can access it via my PC which is running window 7.I think, I have 2 choice to install:1. Copy iso image to head node on server and then install linux os on compute node via nfs.r2. Use my PC dvd drive to install linux on compute node via network.But I don't know how to do it.
I need to setup an linux cluster ..so i prefer ubuntu because of support and i personally i use ubuntu.. and can any one explain in breif ..what all the things needed to setup an ubuntu based cluster my configuration for each node will be (totally 6 nodes) core2 duo with 4 gb ram i need 4 nodes and 2 for load balancing..
I am using the Redhat Cluster Suite (luci and ricci) on my centos 5.4. i have 2 nodes in a cluster.I had clustered an apache server.The service is up end running and i can stop,start and switch on all two node.The problem is when i try to simulate a fault for one node.For example:The apache resource stay on the first cluster node.If i power off the first cluster node (not halt or init 0 but take off the eletric power off), the second cluster node not take the resource.With the clustat command, the service still running on the first node.But the service is down. The first node is dead.Only one the first node is join again the cluster the resource goes up on the second node.
I am (and still) trying to create a 2 Node cluster on Centos5.2 with a Dell MD3000 as a storage. However I am getting this when I try to probe for storages in luci: An error has occured while probing storage:
What I did not realize was, that DLM uses the external Ethernet Interface even when talking to the local machine/node. So iptables was blocking my DLM daemon. With iptables down or the TCP port for DLM opened, cman starts, mount works.What I have here is a fibrechannel SAN which will be directly attached to several servers in the near future. Thise servers should be enabled access to a single filesystem on the SAN (shared).I heard that the right filesystem choice for this kind of setup would be GFS, because it has a Distributed Lock Manager and one FS journal for each node.
But I am having trouble setting up GFS. I have managed to create a GFS on a small testvolume (local HDD so far), but am unable to mount it. It seems that GFS/DLM needs a lot of cluster services to run, which I do not all understand / know how to correctly setup. Also: Will the lock_dlm stuff need Ethernet communications to handle file locks? And if so, will it fetch the node list from /etc/cluster/cluster.conf to determine who to talk to?
I created a cluster with two nodes and a machine for managers with luci, if a machine reboot the cluster function by transferring the resource (IP address), if forced to stop the machine (pull the plug) the cluster does not work.
I have just installed a two server cluster with ricci luci and conga on centos 5.6 32bit , both servers are vmware guests and have a shared storage disk connected to them both
with a GFS2 file system on them + fencing agents configured to work with VMware Vcenter.
(this is supported by vmware and works great on 4 other centos clusters i have been runing for 4 monthes with no CLVMD).
In this setup i used for the first time CLVMD as recommnded by RedHat so i could have the flexablitly of LVM under the GFS2 file system but , i have been getting some Strange problem with it , some times after a developer has done some IO heavy task like unziping a file or a simple TAR the load goes to 10 - 15 and no task can be killed , trying to reboot the server hangs.
After hard shutting the server every thing works ok until the next time some one does the same IO work as before.