I am running a 5 node 11g RAC cluster with cluster ready services. The hardware is HP BL460g6 with virtual connect modules on each chassis. The OS is RHEL5.5, 2 Qlogic 4G fibre cards, 2 Broadcom NICs bonded, and the backend disk is all Hitachi 15k. The issue that I'm seeing is periodically each day there are spikes in I/O waits for disk writes, and sometimes that will cause a node or 2 to reboot when CRS can't communicate with the other nodes. Both nodes that have been evicted, are on the same chassis. What I've checked is, the backend storage is not seeing ANY high utilization, in fact it's running at less than 10% all the time. The network is 10G, and is not showing errors on the switch, or on the blades themselves. The redo logs have a 10M buffer cache, and are running on ASM disk. What other information would be beneficial to check? I am seeing nothing in the logs as to any errors, or waits for disk writes. I believe it's a software issue, but am lost as to how to prove it. These aren't the only nodes experiencing periodic high I/O waits, and it happens at the exact same time on all systems, whether they run ASM or are part of a cluster.
I don't have much experience in clustering. And I'm deploying a cluster system on CentOS.But I don't know how long a node failover and another node take over those resouces to continue running service is good, fast or slow? 1s, 10s or
I've set up DRBD on 2 machines, 1 of them is the master, another is the slave.
After each bootup, I need to run the following on the master machine:
Code: drbdadm -- --overwrite-data-of-peer primary all
Do we need to specify which machine should be the primary node every time? Is there any method to make the machine "know" it's itself the primary node?
I have created a simple menu driven script for our Operations to take care of the basic monitoring and managing of our production application from the back-end. Now, the script when tested in UAT environment was fine, but when deployed to production it kind of behaved oddly.hen the Operator chooses an option from the menu he is given the output and at the end is prompted to return to the main menu by using ctrl+c. In production, this return does not occur for some strange reason and the program just sits there.The session becomes unresponsive after that and I'm forced to terminated it by closing the PuTTY.I tried enabling the debug mode too (set -x) and still was not able to find any useful hints/trails as to why.
I am familiar with windows 2008 cluster servers, and I just started testing with centos cluster. I am creating a simple 2-node cluster, for a simple ping test.
So far, I can ping a virtual ip, and manually relocate it between the nodes, but I didn't figure out, how to do this automatically. So this is my question: How can I setup the cluster, to it automatically failover the a service to another node case one node fails?
I am having an issue with LVM on a 2 node cluster. We are using powerpath for external drives. We had a request to increase /apps filesystem which is EXT3.
On the first node we did: pvcreate /dev/emcpowercx1 and /dev/emcpowercw2 Then.... vgextend apps_vg /dev/emcpowercw2 /dev/emcpowercx1 lvresize -L +60G /dev/apps_vg/apps_lv resize2fs /dev/apps_vg/apps_lv Everything went on well , we got the /apps increased. But on the second node when I do pvs.
I am getting following error: WARNING: Duplicate VG name apps_vg: RnD1W1-peb1-JWay-MyMa-WJfb-41TE-cLwvzL (created here) takes precedence over ttOYXY-dY4h-l91r-bokz-1q5c-kn3k-MCvzUX How can I proceed from here?
I want to change my servers node name which is the output of "#uname -n"Server is CentOS 5I searched but couldn't find. There was some search results about /etc/nodename but I don't have a file at that path. Also some said uname -S which doesn't work.
I am using Centos. I have read places that you can use Drbd + heartbeat + nfs to make a simple failover NFS server.I can't find any document that works though. I've tried 20 or so, including some Debian ones.So, does anyone have any other ideas of how to do this?Point me in the right direction please.I want 2 nodes. One to be actively serving an NFS share The other to be ready for failover. If the first one goes out, the second takes over.Meaning, the filesystem is in sync, the IP must change, and NFS must come up
I have two node redhat cluster for mysql database.The problem is that after updating the packages on both of the nodes after and previously the sevices was not able to relocated on second one , even rebooting the server the problem occurs.While starting the service on second node it started on the first one.Other services are running fine on both nodes.I have checked the /etc/hosts, bonding and many more files and seems to good.find the log for reference.
<notice> Starting stopped service service:hell Oct 22 14:35:51 indls0040 kernel: kjournald starting. Commit interval 5 seconds Oct 22 14:35:51 indls0040 kernel: EXT3-fs warning: maximal mount count reached, running
I'm having some trouble configuring clustering in a 2-node cluster, with no shared FS. Application is video streaming, so outbound traffic only...The cluster is generally ok - if I kill -9 one of the resource-applications, the failover works as expected. But it does not failover when I disconnect the power from the service owning node (simulating a hardware failure). clustat on the remaining node shows that the powered-down node has status "Offline", so it knows the node is not responding, but the remaining node does not become the owner, nor start up the cluster services/resource-applications. eth0 on each node is connected via a crossover cable for heartbeat, etc. Each eth1 connects to a switch.
I have a two node cluster, and a third system which has luci installed.
node1 is nfs0 node2 is nfs1
both nodes have identically the same configuration. They have a fresh installation of Centos 5.5 + yum update. I am unable to join nfs1 to the cluster, as it is giving me the following issue:
Sep 29 23:28:00 nfs0 ccsd[6009]: Starting ccsd 2.0.115: Sep 29 23:28:00 nfs0 ccsd[6009]: Built: Aug 11 2010 08:25:53 Sep 29 23:28:00 nfs0 ccsd[6009]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Newly trying cluster configuration setup on RHEL5.3_64 bit machine.Basic Requirement :Need to Configuration GFS file systemHerewith I have shared details:
System : > I have 2 HP proliant Dl385 server. > Both system are connecting on Public network. (eth0) > I have connected eth1 - directly each other system like a Private Network (May be I am
I am trying to rock cluster for the large computing. my all slave node connected with rock cluster master node. but I want to run the graphical application on the cluster node. I am not getting this point .
I'm a new user for oracle,tried to install oracle 10g on redhat linux 5 but gettinh the same error message.response/ runInstaller[oracle2@localhost database_10201]$ sh runInstaller _runInstaller: line 54:/tmp/database_10201/install/.oui: Permission denied_Doany one plz help me how to give full set of permisions to an user in linux to access a folder??
I've AMD64 system with Ubuntu 11.04 installed. It's been rough ride for me to install oracle-xe-universal. I've already spent more than 2 days on this. Still unsuccessful.
1) First I downloaded the packages libaio_0.3.104-1_i386.deb and oracle-xe-universal_10.2.0.1-1.1_i386.deb
(gave me dependency error for libc6 (>= 2.3.2), I modified the control file to remove dependency and rebuilt the package) (now worked fine) Oracle xe is now installed. Then I tried to start the DB it started but it's HTTP client never started. So I decided to uninstall the oracle-xe=universal. None of the sudo apt-get remove oracle-xe-universal command's didn't worked for me. So i went for manual uninstallation directions as per oracle link.
rocky@ubuntu:~/git/mygit/edas2/libaio$ sudo dpkg -i --force-architecture oracle-xe-universal_10.2.0.1-1.1_i386.deb dpkg: warning: overriding problem because --force enabled: package architecture (i386) does not match system (amd64)
[code]....
Even in applications menu I don't see the if oracle has been installed. So i conclude first time installation was ok but somehow http client didn't worked. After manual uninstallation, second installation didn't even loaded/installed the oracle-xe in init.d directory.
I need to build a 3 node web server cluster to run a php application. Since the app requires users to login (which means a session state is to be maintained), I will be sharing sessions save path, I also need to share the application directory across 3 nodes. I having trouble deciding which cluster file system to select.
I've build a home-made small cluster built up of a master and 1 disk-less slave node.Lately it happens that the node 1 fails to start, reporting the following message:
I have a rented vserver running at Strato [URL]. It came preinstalled with Debian 7. I upgraded it to Debian 8, what seemed to run fine, all services running. The problems came up when I tried to reboot the server to test the init system. It just does not come up, I cannot ping it, nothing. I can boot into the rescue system, mount the system partitions and chroot in to the filesystem. In this state I also can run my services, including apache2 and mysql. In the syslog I find nothing about the reboot. Now I need to reboot ino the normal system. I already tried to resume to sysvinit without success.
there is any other way to make oracledb on CentOS 5.4 and HP Proliant DL360 G5 Server fast access from a vb application. modified, reconfigured all to make faster but no use
We have a server running CentOS 5 Linux 2.6.18-128.1.16.el5xen #1 SMP Tue Jun 30 06:39:23 EDT 2009 x86_64 x86_64 x86_64 GNU Linux. We've seen at random times that the server will just reboot and nothing is logged in messages. I tried to enable kdump but was only able to get a 5.4 gig dump since our /var directory is set to 10GB. Here is the messages I see before and after the server restart. I had thought that when a kernel panics, it is supposed to halt the system and not reboot it. My /proc/sys/kernel/panic is set to 0. I can run an update but want to have some sort of idea what is causing the issue and if the update will fix anything.
May 13 20:05:22 hlotmt01 xinetd[3609]: EXIT: bpcd status=0 pid=1071 duration=1(sec) May 13 20:05:22 hlotmt01 xinetd[3609]: START: bpcd pid=1072 from=10.203.1.1 May 13 20:05:23 hlotmt01 xinetd[3609]: EXIT: bpcd status=0 pid=1072 duration=1(sec)
My server is rebooting frequently(4 to 5times a day) without any logs, can any one help me out to fined the cause for the unexpected reboots of the server. reboot system boot 2.6.18-194.3.1.e Fri Feb 4 15:16 (00:-24)
[root@elastix log]# cat /etc/redhat-release CentOS release 5.5 (Final) [root@elastix log]# uname -a Linux elastix 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:09:10 EDT 2010 i686 i686 i386 GNU/Linux
I have implemented a web application on Linux that I want to deploy and sell to customers. I want to sell ready systems including the hardware. The application is written in PHP/MySQL. What I am searching to achieve is :
1) Find a way so that filesystem and partitions to be encrypted but without the need to insert some code when rebooting. So that if someone gets out the hard disks and attach to another system, cannot have any access to my files or settings. And of course when rebooting (e.g. after a power failure) encryption to be applied automatically.
2) I know that there are ways to bypass root password on a Linux system. Can all these ways be unassigned ? I want the only way to have access to system, to be by using the root password and nothing else.
I have thought of using a virtual server instead of a physical one (like deploying a virtualbox server) but still would like this to be the most secure possible including not only remote but also local access to system.
I'm trying to configure DNS on Oracle Enterprise Linux 5.4 - Kernel 2.6.18-128.el5. When I restart the named service, I'm not getting any errors. But service is showing Failed. What could be the reason?
I have an issue with few servers. All have OEL 5.5 and various versions of oracle (mainly 10g something).
I've noticed on one machine that amount of free memory is slowly decreasing. When it comes to less then 30M, application has problems. Ive tried adding mem on this machine (virtual), and it behaves the same.
The same is on physical machines I have simmilar, if not the same config. Variations are with physical memory installed, and it does not matter if servers have 2G, or 8G, free -m gives around 30 free
It does not swap, and does not make some big issues on other machines, but on this one, application that uses this DB timeouts the user. Is there some chance to check what is creating this issue?
P.S. I've tried with DB admin to check something related to oracle itself, bit when we shutdown instance, Oracle frees 1.5G allocated to him, but nothing else. Will try testimg a bit more, but any thought on this is usefull. This is production server, so no big issues can be done. Will try to copy it to other hardware and do some testing, but then no App server available to attack it.
I just found that PMON and SMON are running (i.e., showing in the output of ps -ef | grep -i [o]ra)but when I check MMAN (used for internal database tasks), nothing comes up:
Code: ps -ef | grep -i [m]man Is that okay or could there be some problem?
So, what are the essential processes of an Oracle Databse that we must check for their status to make sure that the Oracle Databse is up and running fine? What steps do you usually take to validate an Oracle Databse on a Unix / Linux (Production) Server (specifically immediately after a CR has been worked upon)?