Software :: MD Raid5 Kicked Out Two Members And Won't Start After Reboot
Apr 14, 2010
Suddenly I noticed that all my file system had gone into read-only mode. My first thought was that the Sata data cable had got loose for one of the drives, but that wasn't it. All cables were connected correctly. So I booted up again, but I only came to a rescue mode terminal.
I have four software MD raid volumes:
Running mdadm -D on the volumes told me that the sdc drive had been kicked out from both md0 and md1. However, md3 had kicked out two drives, so I couldn't get any information from mdadm -D on that. For md0 and md1 I could just add the kicked-out partitions back into the volume, but for md3 I don't even know which partitions got kicked out...
Here are some outputs:
Before I rebooted the first time I saved the 200 last rows of dmesg to a memory stick. Here they are:
Trying to restart the md3 volume in the rescue mode terminal:
The "Array State" row seems interesting. I guess that AAAA means all four drives are OK. But then why does the array state differ between the members?
Does anyone know how to figure out which two members that got kicked out? And how do I get them back in (assuming that they're OK)?
Running OpenSuse 11.4 and have setup 2 x Raid5 configs - raid created, disks format, everything working fine. I've just rebooted and the raid5 fails to initialize.
Getting these errors:
Code: May 29 16:58:30 suse kernel: [ 1788.170692] md: md0 stopped. May 29 16:58:30 suse kernel: [ 1788.197864] md: invalid superblock checksum on sdb1 May 29 16:58:30 suse kernel: [ 1788.197876] md: sdb1 does not have a valid v0.90 superblock, not importing!
I've got a RAID5 array that doesn't want to automount after rebooting. I'm pretty familiar with linux, RAID, and mdadm, and up until now, I've had the RAID5 array working just fine. However, whenever I reboot, the array drops off and won't remount until I manually assemble and then mount the thing. I find this odd because I had everything automounting just fine back in 10.3, and even in 11.0 (I think - not sure on that). Currently, things are working, but I'd really like to not not have to type
Code: mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 followed by Code: mount /dev/md0 /mnt/data every time I reboot. Even including this in some sort of start-up script seems kludgey... Surely there must be a more elegant way of automatically bringing up a RAID5 array after booting? I'm not sure what information you'll need, so I'm going to go ahead and include as much as I can anticipate...
I'm a light linux user over the last couple of years and I decided to built a HTPC/NAS device.
Setup: 40gb ide -> usb boot drive 3x2tb sata (4k Sector) drives
I've got another 2tb identical drive but it's holding data that is going to be copied to the raid after it's up and running and then be 'grown' into the raid array to yield a final 5+tb array. I tried doing a disk util raid array and it ended up failing after reboot due to it using the /dev/sd* designations and they swapped. I have no idea how to do the UUID version, my googlefu and practical guide to ubuntu. So I decided to do it manually in order to also fix the sector issue as disk util wasn't formatting them correctly and once formatted wouldn't let me create a raid array from the discs.
I am getting really frustrated with trying to get my RAID5 working again. I had a RAID5 array built with 4 of the Western Digital 1.5tb "Advanced Format" drives, WD15EARS. However, when copying 1.5gb dvd encoded files to the drive, I was getting speeds of ~2mb/s. When researching how to make this faster, I came across all the posts about the Advanced Format drives and how that was causing a lot of issues for a lot of people. It looked like the solution was simple enough: partition starting at sector 64 or 2048 or whatever and then recreate the RAID. However, this is not working for me.
Here are my computer specs: Motherboard: Gigabyte GA-EP43-DS3L LGA 775 Intel P43 ATX CPU: Intel Core 2 Duo E8400 Wolfdale 3.0GHz 6MB L2 Cache LGA 775 65W RAM: 4gb DDR2 1066 (PC2 8500) Video card: ASUS GeForce 9600GT 512MB 256-bit Linux: 10.04
after a failed upgrade from 9.10 to 10.04 I had to format my computer and do a clean install of 10.04, and now my mdadm raid5 array wont start.my array is called "The Library", and i believe the space between "The" and "Library" is causing the command disk utility uses to start the array to fail.The exact error isAn error occurred while performing an operation on "The Library" (RAID-5 Array): The operation failed
Error assembling array: mdadm exited with exit code 1: mdadm: unrecognised word on ARRAY line: Library mdadm: unrecognised word on ARRAY line: Library
I am setting up a new server and am in the midst of testing RAID. This is an Ubuntu 9.10 server. RAID1 (/dev/md1) is spread across 12 one-terabyte SCSI disks (/dev/sdi through /dev/sdt). It has four spares configured, each of which are also one-terabyte SCSI drives (/dev/sdu through /dev/sdx). I have been following the instructions on the Linux RAID Wiki ([URL]....
I have already tested the RAID successfully by using mdadm to set a drive faulty. Automatic failover to spare and reconstruction worked like a champ. I am now testing "Force fail by hardware". Specifically, I am following the advice, "Take the system down, unplug the disk, and boot it up again." Well, I did that, and the RAID fails to start. It outright refuses to start. It doesn't seem to notice that a drive is missing. Notably, all the drive letters shift up to fill in the space left by removing a drive. The test I did was to:
Is removing a disk from the bus a reasonable test in the first place? Meaning, is this likely to happen in a production environment by other means than a human coming by and yanking out the drive? Meaning, is there a hardware failure that would replicate this event? Because, if so, then I don't know how to recover from it.
I am having a RAID5 with a spare(total 4 disk).then the steps which lead me to a problem:
1. i was doing I/O on the array. 2. i pulled out a drive manually. So the spare drive took care of the failed one and started rebuilding. then 3. in the, mean time i pulled out the power plug of my NAS box. 4. After power up i saw my array was not active(by -D command option of mdadm). then 5. i executed: mdadm --assemble --scan /dev/md0 it gave me
I checked into the linux source and found that bd_claim is a function inside fs/block_dev.c and it failing due to which lock_rdev function (calling bd_claim in md.c) is failing and we are not able to start the array.I don't know why my RAID is not live after power on. Plese help atleast can i save my data?
So I installed denyhosts on my system and I ssh to it fine. Then all of a sudden I got an email saying my ip was added to the /etc/hosts.deny file.I have no clue why. I did not fail the login. So I had an open session and put it in the /etc/hosts.allow file and tried to ssh back in no problem.Then I logged out and all of a sudden I got the email saying my ip was added to the hosts.deny again. Now I am kicked out of the system..
I am guessing I cannot get back in until I get to the console and remove it. I can power on and off the system remotely but I enabled the chkconfig denyhosts on option so it starts on reboot. No remote console is setup.So it looks like I am hozed until I can get to the console, bummer as I was trying to set up a spacewalk server on it. I cannot get to the console for a few days so if anyone has ideas how I can get back in let me know. But denyhosts seems to be working as designed.
This was a default install I did not configure anything funky. Just changed the email to root and started it.I thought about changing my client IP but that wont work as I only have ssh passed on my router to that IP so if I change the client IP I wont get into my routing machine.I think i answered my own question but just thought I would askI guess my real question is why would denyhosts block my IP when the login did not fail and how do i configure it so this does not happen again.
I have no drive failures but just need to recreate a raid5 set as the next free MD disk number. Originally I built a temp OS of debian on a single drive and had 4x2TB drives in a raid5 software array (MD0) this worked fine and allowed me to move all data to it, and remove our old fileserver. I have now pulled out the 4 x 2TB Raid 5 drives and created a new OS on two new 80GB drives, partioned as follows,
MD0 is now 250mb Raid1 as /boot MD1 is 4GB Raid1 Swap MD2 is 76GB Raid1 as /
If I turn off and push back in the 4x2TB drives I cannot see a MD3. I presume I would need to create a MD3 from these 4 drives but I dont want to mess things up as its live data. So im here asking for help, or a bit of hand holding to get it done right.
PS - Its a Debian Lenny 5.0.3 Raid1 fresh install replacing a Debian Lenny 5.0.3 on a single disk.
Running close to a go live date of 11/10/10 and need to have this fixed. DBA is pointing to the OS which is RedHat Enterprise Release 5.3. with Oracle RAC 10G
Oracle works fine it just won�t start up automatically after a reboot. After a reboot a person with root access has to increase the number of semaphores and then oracle will start and function. The next time the machine is bounced the same issue will arise and the semaphores need to be increased again by hand. Every time the machine is bounces the semaphores need to be increased by hand. This is not normal for a Unix machine. Semaphores are not part of the oracle install, it is a kernel parameter on Unix machines. Eventually after several reboots the semaphore parameter will be too high for the box to function. Once the semaphores have hit their peak, we reset them back to the original value and the whole process starts over again. This is a Unix build issue that has only surfaced on the production machines. The DAS development machines have the exact same oracle build and do not have this issue.
I have installed FreeRadius to a Debian Linux server.
I have configured an account called Support to run the Radius as I didnt want Root to be the user to run this.
I want Radius to start up automatically after the system is rebooted but I don�t know how to do this. I am new to Linux so please bare with me. If the system is rebooted, is it possible for the Support account to be logged in automatically? Is there a script I can create to automatically login the Support account? This may not be secure but it has been requested. Also the main question is after a reboot can the Radius be configured to automatically start without the need for someone to login? So if the system is rebooted and then goes back to login prompt, can the Radius then be running?
I have had a good search about scripts but with my limited knowledge it isnt too easy.
So far Ive read it says to create a script in /etc/init.d which Ive done and named start-my-radius.sh I think Ive made It execuatable by chmod 777, if thats right?
The script looks like this:
But I dont know if thats even right? The radiusd is located in the /usr/local/sbin/ and the radacct and radius.log is located in / usr/local/var/log/radius
Some stuff I have read says it needs to link into /etc/rc.d but there isnt a rc.d directory, I have other rc directories which are rc1.d rc6.d.
After reading it also said something about using rc.radiusd which will automatically start Radius after a reboot, but again I cannot understand exactly what I need to do.
Let me know if I am on the right track? Will the start-my-radius.sh script work after the system is rebooted without someone actually login and how do I get it to work?
I just installed Fedora 12 on my HP laptop. And everthing has gone smooth, and I decided to reboot my machine. And it rebooted and the Fedora Loading logo camep and after it finished, it went to a black screen with a "|" just flashing and it's stuck on that
We have three production websites running on RHEL 3 AS running tomcat 5. After a reboot last night Tomcat will not start and has the following in the catalina.out log:
Mar 20, 2011 4:09:31 PM org.apache.commons.digester.Digester startElement SEVERE: Begin event threw error java.lang.NoClassDefFoundError: org/apache/naming/resources/ProxyDirContext at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389) at java.lang.Class.getConstructor0(Class.java:2699) at java.lang.Class.newInstance0(Class.java:326) at java.lang.Class.newInstance(Class.java:308) .....
I have spoken with our Java Developers and they seem to think it is something up between httpd and tomcat at an OS level. Since Tomcat is not starting and giving an error that it is all ready running, I think you have a stale lock file. Look for the file /var/run/tomcat5.pid and cat or less it to see what the PID number is. If it matches the error, then delete the file and try to start Tomcat again. It should start this time.
Most common problem with Tomcat5. Try following : 1) Search PID in /tmp, if found delete it or else stop your tomcat. 2) Undeploy your application 3) Check out this link [URL] to know what causing tomcat to generate this error.
If none of them work then check your application configuration settings. [URL]. When I do start tomcat it appears to start:
[root@RPSI-2 san00]# service tomcat5 start Starting tomcat5: Using CATALINA_BASE: /usr/share/tomcat5 Using CATALINA_HOME: /usr/share/tomcat5 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp Using JAVA_HOME: /usr/java/jdk1.6.0 [root@RPSI-2 san00]#
But then I try to find if it is running via ps -ef | grep tomcat5, I only receive my query. I perform the same query on Java and it also only returns my query.
Few days ago we had a server maintenance. The system was shutdown, we fix the CPU fan, and start the system again. But somehow when the system starts, our IMAP server - dovecot is not running. It just sit like a rock. Because the machine had a CPanel/WHM, I tried to restart dovecot using cpanel and got a message:
That was not really useful....
When I tried to restart dovecot using command line, I got nothing. really nothing.
How to find out what happens to my IMAP/dovecot? And anyway to make it run again?
I have installed FreeRadius to a Debian Linux server.I have configured an account called Support to run the Radius as I didn't want Root to be the user to run this.I want Radius to start up automatically after the system is rebooted but I don't know how to do this. I am new to Linux so please bare with me. If the system is rebooted, is it possible for the Support account to be logged in automatically? Is there a script I can create to automatically login the Support account? This may not be secure but it has been requested. Also the main question is after a reboot can the Radius be configured to automatically start without the need for someone to login? So if the system is rebooted and then goes back to login prompt, can the Radius then be running?
I have had a good search about scripts but with my limited knowledge it isn't too easy.
After reading it also said something about using rc.radiusd which will automatically start Radius after a reboot, but again I cannot understand exactly what I need to do.Please can someone help out with this and let me know if I am on the right track? Will the start-my-radius.sh script work after the system is rebooted without someone actually login and how do I get it to work?Please explain clearly as this is all a bit technical for my liking and not understanding it well!
I recently installed open SUSE 11.3 on my computer which I built myself and the problem I have is after rebooting the computer, I get a black screen. It will not boot the operating system. Here is my component specification:
Video Card: BFG GeForce 8800GTX 768MB 384-bit GDDR3 PCI Express Motherboard: DFI LP DK 790FX-M2RS AM2+/AM2 CPU: AMD Phenom II X4 955 Black Edition 3.2GHz Memory: CORSAIR XMS2 4GB (4 x 1GB) DDR2 800 (PC2 6400) HD: Western Digital Caviar SE16 320GB 7200 RPM 16MB Cache SATA 3.0Gb/s
Another question is, where can I purchase SUSE 11.3 and if I do purchase, will I get help on the phone installing it ? Pleas keep in mind that I have little experience with SUSE Linux.
I urgently need expert assistance. During a cold-start reboot, something corrupted my / directory making it impossible to access much of anything. I was able to log into a recovery console and do "ls /" which shows most of the critical directories although "media" is missing. However attempting to "ls /home" gets the "no such file or directory" error message, as do attempts to ls most of the others. Doing "ls -l /" gets one item listed, followed by a single record out of /var/log/syslog, then nothing at all.
Attempting a normal login (I've disabled the quiet and splash options, so I see all the startup messages) gets to doing fsck on the swap partition, and that stalls out at 14% complete. The fsck on the root partition completed okay, apparently. This machine is my router, firewall, FTP server, and also hosts VirtualBox VMs that contain all of my customer support tools. The VMs themselves are partially backed up on another box, fortunately, but it took me a couple of months to get the failed box configured properly so if possible I need to recover it. For lack of space, I've never been able to do a full backup of it -- and permissions have foiled portions of all my other backup attempts on it.
Complicating matters is the fact that I have a moderately large data recovery job scheduled to arrive this evening (July 3) to be done while the customer's business is shut down for the holiday. Without the FTP server I'm unable to receive the files, which will total from 100 to 200 MB in all. Is it likely that testdisk may be able to rebuild the borked main root directory, if run from a live CD? The system is Hardy LTS 8.04.4, with 3 GB of RAM and a 250 GB SATA drive. I'm posting this from my WinXP laptop, which doesn't get along with my LAN so connects from a different router than the one that's down.
Ok so I followed the instructions here [url] and this works great for the install however if the machine is rebooted the VMWare server refuses to start back up stating that it knows that its installed but it was not installed with the right installer.
This is on a Dell Server I can't remember the model right now but its got Dual PIII in it. I'm running Ubuntu Server 32bit 10.10 on the box as well. Thank you in advance for your assistance with this. Once I get this first server figured out I'll get my other one fired back up.
I need help adding a nohup command in this command line: su - rhx12 -c "/rhythx/rhythx/bin start /rhythx/rhyth" When I execute the script using root on the command line it works fine. But, when I reboot the server the process doesn't start. This script will go into the etc/init.d and rc2.d directory.
#!/bin/bash case $i in start) su - rhx12 -c "/rhythx/rhythx/bin start /rhythx/rhythx" ;;
I wish to create several different users for different members of the family. They should all have different menus, and they should not be able to access some programs, above and beyond just the normal root blocked ones. Any personal configuration files (eg, bookmarks, saved passwords, logins for other programs (IRC, MSN, etc) should also not carry over. One of them needs to have KDE enabled as default (as it looks the most like windows). I however, would prefer to have enlightenment enabled as default for me. Is this in any way possible?