Networking :: How To Remove /dev/drbd0 From A Node
Jan 21, 2010
I've configured drbd with Heartbeat. The nodes are connected at first, but when I issue "/usr/lib/heartbeat/hb_standby" on node 1, node 2 won't take over, and after a while, both nodes become "WFConnection".
I figure this happens because I created "dev/drbd0" on both nodes! Do you know how to remove it? I googled it but got nothing.
I am familiar with windows 2008 cluster servers, and I just started testing with centos cluster. I am creating a simple 2-node cluster, for a simple ping test.
So far, I can ping a virtual ip, and manually relocate it between the nodes, but I didn't figure out, how to do this automatically. So this is my question: How can I setup the cluster, to it automatically failover the a service to another node case one node fails?
I've set up DRBD on 2 machines, 1 of them is the master, another is the slave.
After each bootup, I need to run the following on the master machine:
Code: drbdadm -- --overwrite-data-of-peer primary all
Do we need to specify which machine should be the primary node every time? Is there any method to make the machine "know" it's itself the primary node?
I have created a simple menu driven script for our Operations to take care of the basic monitoring and managing of our production application from the back-end. Now, the script when tested in UAT environment was fine, but when deployed to production it kind of behaved oddly.hen the Operator chooses an option from the menu he is given the output and at the end is prompted to return to the main menu by using ctrl+c. In production, this return does not occur for some strange reason and the program just sits there.The session becomes unresponsive after that and I'm forced to terminated it by closing the PuTTY.I tried enabling the debug mode too (set -x) and still was not able to find any useful hints/trails as to why.
I don't have much experience in clustering. And I'm deploying a cluster system on CentOS.But I don't know how long a node failover and another node take over those resouces to continue running service is good, fast or slow? 1s, 10s or
i have a ubuntu server and no of nodes.give me any suggesions for how to handle the node users login with my permission?.i mean than they can switching on and boot the system but the system want to stay in unusable stage.After getting my permission they can operate the system.
My requirement is to route ssh sessions from a single head node to multiple slave nodes. So what i want is, for a client there is just one point of entry (master/head node) to ssh into, it evaluates the load on the slave nodes connected on to internal network and routes the ssh session, kind of a ssh load balancer. Do you have any idea what open source solution i can apply for my problem?
I have tried using LVS piranha, it works well for http and https load balancing but not for ssh load balancing.
I have a rhel cluster with two nodes. cluster is working fine.But suddunly for today morning im not able to ssh to once of the nodes with follwing error.ebug2: we sent a hostbased packet, wait for replyConnection closed by 10.125.104.162After some not able with in the node and after some time second node also started behaving like this. Now ssh with in the nodes and between nodes is not happening. But i am able to putty session.Error from /var/log/messages.vsftpd(pam_unix)[14260]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=
after few days of hard work about redhat cluster and piranah, i have done 90% hopefully,but i am stuck with iptables rulesi am attaching full piranah server screen shot of my network .please have a look and please tel me, what else to do in piranah server ...a) from firewall (Lynksys) what ip shall i forward port 80 ?? ( 192.168.1.66 or 192.168.1.50 ??)b) Currently its looks like http request is not forwarding from Virtual server to real server , what iptables rules shall i write ?(Please have a look to iptables rules)also, this link for my piranah server setupi guess i am stuck somewhere where i need some experts eye to catch it upso please look at the all the pictures , ifconfig and iptables rulesifconfig :
ifconfig eth0 Link encap:Ethernet HWaddr 00:0F:3D:CB:0A:8C inet addr:192.168.1.66 Bcast:192.168.1.255 Mask:255.255.255.0
I am having an issue with LVM on a 2 node cluster. We are using powerpath for external drives. We had a request to increase /apps filesystem which is EXT3.
On the first node we did: pvcreate /dev/emcpowercx1 and /dev/emcpowercw2 Then.... vgextend apps_vg /dev/emcpowercw2 /dev/emcpowercx1 lvresize -L +60G /dev/apps_vg/apps_lv resize2fs /dev/apps_vg/apps_lv Everything went on well , we got the /apps increased. But on the second node when I do pvs.
I am getting following error: WARNING: Duplicate VG name apps_vg: RnD1W1-peb1-JWay-MyMa-WJfb-41TE-cLwvzL (created here) takes precedence over ttOYXY-dY4h-l91r-bokz-1q5c-kn3k-MCvzUX How can I proceed from here?
I am running a 5 node 11g RAC cluster with cluster ready services. The hardware is HP BL460g6 with virtual connect modules on each chassis. The OS is RHEL5.5, 2 Qlogic 4G fibre cards, 2 Broadcom NICs bonded, and the backend disk is all Hitachi 15k. The issue that I'm seeing is periodically each day there are spikes in I/O waits for disk writes, and sometimes that will cause a node or 2 to reboot when CRS can't communicate with the other nodes. Both nodes that have been evicted, are on the same chassis. What I've checked is, the backend storage is not seeing ANY high utilization, in fact it's running at less than 10% all the time. The network is 10G, and is not showing errors on the switch, or on the blades themselves. The redo logs have a 10M buffer cache, and are running on ASM disk. What other information would be beneficial to check? I am seeing nothing in the logs as to any errors, or waits for disk writes. I believe it's a software issue, but am lost as to how to prove it. These aren't the only nodes experiencing periodic high I/O waits, and it happens at the exact same time on all systems, whether they run ASM or are part of a cluster.
I want to change my servers node name which is the output of "#uname -n"Server is CentOS 5I searched but couldn't find. There was some search results about /etc/nodename but I don't have a file at that path. Also some said uname -S which doesn't work.
On a HA Cluster, the 2nd node noticed that the 1st node was down, but the first node wasn't down. This resulted in that the 2nd node tried to take over the resources but failed, because the resource was stil in use by the first node. This caused that the first node was left behind in a fuzzy state. I had no other choice to kill the heartbeat service, and reboot the server to solve the issue. There where no network issues, all hardware is ok. Are there any bugs know? Is there a way to avoid it from happening again?
there are 3 nodes..A,B and C. Node A wants to send information to Node B but it does so by sending it to Node C first which then sends to B. And similarly Node B sends to A. In this simultaneously C doesn't send to both A and B. This is the in built algorithm..but i want to change it to: A and B send their packets to C and C sends both these packets to A and B..by ORing... In the receiver side...node A can receive the wanted packet and also B. Where do i change the algorithm?
I made a diskless image against Fedora15, during the boot I found it displayed the following error message and went into emergency mode.
The error message:
Starting Relabel all filesystems, if necessary ^[[1;31maborted^[[0m because a dependency failed.^M [ 107.607155] systemd[1]: Job fedora-autorelabel.service/start failed with result 'dependency'.^M Starting Mark the need to relabel after reboot ^[[1;31maborted^[[0m because a dependency failed.^M
I have lack of understanding of CentOS in general. I have looked for a remedy on other forums and google, but haven't been able to find the answer. I have a 3 node cluster that was functioning great until I decided to go offline for awhile. My config is as follows: node 2: vh1 node 3: vh2 node 4: vh6 All nodes connect to a common shared area on an iscsi device (vguests_root)
Currently vh2 and vh6 connect great, however since putting the machines back online I can no longer connect with vh1. A dmesg command on vh1 reveals the following: GFS2: fsid=: Trying to join cluster "lock_dlm", "Cluster1:vguest_roots" GFS2: fsid=Cluster1:vguest_roots.2: Joined cluster. Now mounting FS... GFS2: fsid=Cluster1:vguest_roots.2: can't mount journal #2 GFS2: fsid=Cluster1:vguest_roots.2: there are only 2 journals (0 - 1) .....
I have a Node mounted from my Appserver (Solaris) to DBserver (Solaris), the reason why I Mount is that My Oracle writes file using UTIL_File in Dbserver only, so now I done the Mount and I can create file using VI in the Mounted point. But My UTIL_file is not able to create a file, the reason might be that Oracle writes only as ORA user and my Appserver has no such user, for that I have given the permission 777 for that particular folder, but no use, so I wonder do I need additional permission for this.
I'm building a 3-node cluster. I have created ocfs filesystems and mounted them on the first two nodes. But while mounting them on the third node , i'm getting this error for 8 of the total luns. all these 8 luns are of 1GB size.
I've unmounted these 8 luns from the other node and tried to mount in the third node ... and then it was working and again the error occurs in the second node. My observation was for some reason these particular luns are not allowing the third node.
mount.ocfs2: Invalid argument while mounting /dev/mapper/voting1 on /voting1. Check 'dmesg' for more information on this error.
I am using Centos. I have read places that you can use Drbd + heartbeat + nfs to make a simple failover NFS server.I can't find any document that works though. I've tried 20 or so, including some Debian ones.So, does anyone have any other ideas of how to do this?Point me in the right direction please.I want 2 nodes. One to be actively serving an NFS share The other to be ready for failover. If the first one goes out, the second takes over.Meaning, the filesystem is in sync, the IP must change, and NFS must come up
I have two node redhat cluster for mysql database.The problem is that after updating the packages on both of the nodes after and previously the sevices was not able to relocated on second one , even rebooting the server the problem occurs.While starting the service on second node it started on the first one.Other services are running fine on both nodes.I have checked the /etc/hosts, bonding and many more files and seems to good.find the log for reference.
<notice> Starting stopped service service:hell Oct 22 14:35:51 indls0040 kernel: kjournald starting. Commit interval 5 seconds Oct 22 14:35:51 indls0040 kernel: EXT3-fs warning: maximal mount count reached, running
I wanted to implement a graph.By graph I mean to say that I am writing programs for BFS,DFS and other such stuff and want to see them in action. For this I started writing a program.The first entry point is to define a graph. I took a pen and paper and made a graph. Now to be able to code this in C.I defined a structure but I am finding problem in defining this structure. I looked into Google and came across incidence list and adjacency list representation [URL] after understanding that I am not able to understand how do I define my structure that would be a node in graph. Here is what my graph looks like
I am trying to find a way to pull Node Pointers out of the model, for use in other areas of the program. The solution is probably simple but I haven't had any luck finding it.
Model for a QTreeView:
Code:
class testModel : public QAbstractItemModel { public: testModel(); ~testModel();
I'm having some trouble configuring clustering in a 2-node cluster, with no shared FS. Application is video streaming, so outbound traffic only...The cluster is generally ok - if I kill -9 one of the resource-applications, the failover works as expected. But it does not failover when I disconnect the power from the service owning node (simulating a hardware failure). clustat on the remaining node shows that the powered-down node has status "Offline", so it knows the node is not responding, but the remaining node does not become the owner, nor start up the cluster services/resource-applications. eth0 on each node is connected via a crossover cable for heartbeat, etc. Each eth1 connects to a switch.
I have a two node cluster, and a third system which has luci installed.
node1 is nfs0 node2 is nfs1
both nodes have identically the same configuration. They have a fresh installation of Centos 5.5 + yum update. I am unable to join nfs1 to the cluster, as it is giving me the following issue:
Sep 29 23:28:00 nfs0 ccsd[6009]: Starting ccsd 2.0.115: Sep 29 23:28:00 nfs0 ccsd[6009]: Built: Aug 11 2010 08:25:53 Sep 29 23:28:00 nfs0 ccsd[6009]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.