I run a compute cluster with only a few users. Occasionally a user will accidently run a job on the master node that runs out RAM/swaps then hanges up for a while.In /etc/security/limits.conf I have set memlock to 7.5GB (master has 8GB RAM) and maybe that is what lets the machine come back rather than hanging completely? Is this the right setting to physocally limit a single user from asking for more RAM than the system has and bringing down the system? Should I set this to 2GB or so or is there something else I can do??
I don't have much experience in clustering. And I'm deploying a cluster system on CentOS.But I don't know how long a node failover and another node take over those resouces to continue running service is good, fast or slow? 1s, 10s or
I have created a simple menu driven script for our Operations to take care of the basic monitoring and managing of our production application from the back-end. Now, the script when tested in UAT environment was fine, but when deployed to production it kind of behaved oddly.hen the Operator chooses an option from the menu he is given the output and at the end is prompted to return to the main menu by using ctrl+c. In production, this return does not occur for some strange reason and the program just sits there.The session becomes unresponsive after that and I'm forced to terminated it by closing the PuTTY.I tried enabling the debug mode too (set -x) and still was not able to find any useful hints/trails as to why.
So far, I can ping a virtual ip, and manually relocate it between the nodes, but I didn't figure out, how to do this automatically. So this is my question: How can I setup the cluster, to it automatically failover the a service to another node case one node fails?
I am having an issue with LVM on a 2 node cluster. We are using powerpath for external drives. We had a request to increase /apps filesystem which is EXT3.
On the first node we did: pvcreate /dev/emcpowercx1 and /dev/emcpowercw2 Then.... vgextend apps_vg /dev/emcpowercw2 /dev/emcpowercx1 lvresize -L +60G /dev/apps_vg/apps_lv resize2fs /dev/apps_vg/apps_lv Everything went on well , we got the /apps increased. But on the second node when I do pvs.
I am getting following error: WARNING: Duplicate VG name apps_vg: RnD1W1-peb1-JWay-MyMa-WJfb-41TE-cLwvzL (created here) takes precedence over ttOYXY-dY4h-l91r-bokz-1q5c-kn3k-MCvzUX How can I proceed from here?
I am using Centos. I have read places that you can use Drbd + heartbeat + nfs to make a simple failover NFS server.I can't find any document that works though. I've tried 20 or so, including some Debian ones.So, does anyone have any other ideas of how to do this?Point me in the right direction please.I want 2 nodes. One to be actively serving an NFS share The other to be ready for failover. If the first one goes out, the second takes over.Meaning, the filesystem is in sync, the IP must change, and NFS must come up
I have two node redhat cluster for mysql database.The problem is that after updating the packages on both of the nodes after and previously the sevices was not able to relocated on second one , even rebooting the server the problem occurs.While starting the service on second node it started on the first one.Other services are running fine on both nodes.I have checked the /etc/hosts, bonding and many more files and seems to good.find the log for reference.
<notice> Starting stopped service service:hell Oct 22 14:35:51 indls0040 kernel: kjournald starting. Commit interval 5 seconds Oct 22 14:35:51 indls0040 kernel: EXT3-fs warning: maximal mount count reached, running
I'm having some trouble configuring clustering in a 2-node cluster, with no shared FS. Application is video streaming, so outbound traffic only...The cluster is generally ok - if I kill -9 one of the resource-applications, the failover works as expected. But it does not failover when I disconnect the power from the service owning node (simulating a hardware failure). clustat on the remaining node shows that the powered-down node has status "Offline", so it knows the node is not responding, but the remaining node does not become the owner, nor start up the cluster services/resource-applications. eth0 on each node is connected via a crossover cable for heartbeat, etc. Each eth1 connects to a switch.
I am trying to rock cluster for the large computing. my all slave node connected with rock cluster master node. but I want to run the graphical application on the cluster node. I am not getting this point .
I need to build a 3 node web server cluster to run a php application. Since the app requires users to login (which means a session state is to be maintained), I will be sharing sessions save path, I also need to share the application directory across 3 nodes. I having trouble deciding which cluster file system to select.
We have a set of two production machines running Oracle databases. There are a couple of SAN attached filesystems of which one of them on the one machine (node1) is created as ext3 and nfs exported to the second machine (node2). However,during certain conditions related to rac, the interconnect between the two nodes lose connection and due to the loss of communication the servers will reboot. The problem however is that node2 usually reboots first and by the time it starts up, node1 is not up and running yet, thus causing the nfsmount to not be available on node2. I have put my head around some options on how to get the servers to automatically resolve this, however I am posting my question here as someone might have a reliable way of managing this. My one idea is to create a script on node2 to mount the nfs filesystem, create ssh key authenticated user between the nodes and then put another script in place on node1 as part of the startup to ssh to node2 and mount the filesystem.
after few days of hard work about redhat cluster and piranah, i have done 90% hopefully,but i am stuck with iptables rulesi am attaching full piranah server screen shot of my network .please have a look and please tel me, what else to do in piranah server ...a) from firewall (Lynksys) what ip shall i forward port 80 ?? ( 192.168.1.66 or 192.168.1.50 ??)b) Currently its looks like http request is not forwarding from Virtual server to real server , what iptables rules shall i write ?(Please have a look to iptables rules)also, this link for my piranah server setupi guess i am stuck somewhere where i need some experts eye to catch it upso please look at the all the pictures , ifconfig and iptables rulesifconfig :
ifconfig eth0 Link encap:Ethernet HWaddr 00:0F:3D:CB:0A:8C inet addr:192.168.1.66 Bcast:192.168.1.255 Mask:255.255.255.0
I have lack of understanding of CentOS in general. I have looked for a remedy on other forums and google, but haven't been able to find the answer. I have a 3 node cluster that was functioning great until I decided to go offline for awhile. My config is as follows: node 2: vh1 node 3: vh2 node 4: vh6 All nodes connect to a common shared area on an iscsi device (vguests_root)
Currently vh2 and vh6 connect great, however since putting the machines back online I can no longer connect with vh1. A dmesg command on vh1 reveals the following: GFS2: fsid=: Trying to join cluster "lock_dlm", "Cluster1:vguest_roots" GFS2: fsid=Cluster1:vguest_roots.2: Joined cluster. Now mounting FS... GFS2: fsid=Cluster1:vguest_roots.2: can't mount journal #2 GFS2: fsid=Cluster1:vguest_roots.2: there are only 2 journals (0 - 1) .....
I'm building a 3-node cluster. I have created ocfs filesystems and mounted them on the first two nodes. But while mounting them on the third node , i'm getting this error for 8 of the total luns. all these 8 luns are of 1GB size.
I've unmounted these 8 luns from the other node and tried to mount in the third node ... and then it was working and again the error occurs in the second node. My observation was for some reason these particular luns are not allowing the third node.
mount.ocfs2: Invalid argument while mounting /dev/mapper/voting1 on /voting1. Check 'dmesg' for more information on this error.
I want to configure two node cluster for qmail-toaster?? My idea is.. if one server hardware gets failed it should transfer/migrate service to other qmail-toaster server with all settings like ... domains/users/password etc etc.
I was given task to install redhat linux os on one of the compute node on server which doesn't have cd/dvd drive or usb port.I have installation media as well ISO image. This server is on network, so I can access it via my PC which is running window 7.I think, I have 2 choice to install:1. Copy iso image to head node on server and then install linux os on compute node via nfs.r2. Use my PC dvd drive to install linux on compute node via network.But I don't know how to do it.
Using google with search option: cman not started: Can't find local node name in cluster.conf /usr/sbin/ cman_tool: aisexec daemon didn't start.I found this URL...I have found the config_version in cluster.conf. Unfortunately, as everyone may have noticed, english is not my native tongue so I am having trouble understanding the part "Make sure you bumped the cluster config version number". Can anyone enlightened me on what should I be doing so that I could "bump" the cluster config version?
I need to setup an linux cluster ..so i prefer ubuntu because of support and i personally i use ubuntu.. and can any one explain in breif ..what all the things needed to setup an ubuntu based cluster my configuration for each node will be (totally 6 nodes) core2 duo with 4 gb ram i need 4 nodes and 2 for load balancing..
I am using the Redhat Cluster Suite (luci and ricci) on my centos 5.4. i have 2 nodes in a cluster.I had clustered an apache server.The service is up end running and i can stop,start and switch on all two node.The problem is when i try to simulate a fault for one node.For example:The apache resource stay on the first cluster node.If i power off the first cluster node (not halt or init 0 but take off the eletric power off), the second cluster node not take the resource.With the clustat command, the service still running on the first node.But the service is down. The first node is dead.Only one the first node is join again the cluster the resource goes up on the second node.
I am working on the beginning of implementing a two-node cluster with shared storage (GFS) and IP address. Both machines are virtual on VMware ESX 3.5, that should not make a difference, but that is the background.current status is that I have a single node cluster built with only the IP address configured within the cluster. The issue that I am having is that I have configured a service to contain only the IP address resource, however, when I go into cluster management that "service" does not register. As such, I cannot bring it online, ping it, etc. below is my cluster.conf configuration:
I am (and still) trying to create a 2 Node cluster on Centos5.2 with a Dell MD3000 as a storage. However I am getting this when I try to probe for storages in luci: An error has occured while probing storage:
I just ponder can anybody shed some light to me how to manually disable service such as FTP,SSH,etc in which Bastille is doing.If all the services can be manually disable,which mean Bastille is just a tool to help newbies like me to use it.
I have a VPS (Ubuntu 8.04 server eition) and as such am stuck with using a software firewall.
i currently have UFW installed.
I would ideally like to have my firewall be a little rude, or rather just not polite. I know what i am asking will break the RFC, but i consider this ok due to the security benefits.
I would like to have my firewall 1) ignore (eg drop without responding)all packets that dont start with a syn flag 2)for all other traffic that is currently blocked, have it dropped (again drop it without responding)
If there are any other rules you can think of i would like to know them. I already have only the services i want open and the rest blocked.
I use my desktop (VM) for online transactions only (it has no other purpose) & have removed most software. In adidtion want to remove Archive Manager, Calculator, Multimedia systems selector, Printing, sound recorder, Terminal (gnome-terminal), terminal server client, Ubuntu One. When I attempt to remove the listed software, I receive a waring message "no future desktop updates will include if you remove this". I want to know if this will impact the updates for Firefox as this is the only app I need. Can you please advise on any other consequences if I proceed with the above?