Software :: RAID 5 Array Not Assembling All 3 Devices On Boot Using MDADM - One Is Degraded
Aug 31, 2010
I have been having this problem for the past couple days and have done my best to solve it, but to no avail. I am using mdadm, which I'm not the most experienced in, to make a raid5 array using three separate disks (dev/sda, dev/sdc, dev/sdd). For some reason not all three drives are being assembled at boot, but I can add the missing array without any problems later, its just that this takes hours to sync. Here is some information:
This is message I get when I try and start itmdadm: /dev/md0 assembled from 2 drives - not enough to start the arrayBelow is the information I've collected about any help on how I can get the raid back up and going to I can get the data off of it would be awesome
I have a previously defined RAID5 (4 disks). This worked in Ubuntu 8.04.. I recently moved to CentOS5.. and I cannot seen to get the drive back online.cat /proc/mdstat shows, no raid levels (personalities).. and no drives listed.mdadm --detail -scan returns nothing.mdadm --QE returned a UUID string.. and the ARRAY output.I can mdadm --examine all the members of the original array.I am not versed in mdadm enough to really understand what I can run and should not run that would erase the data on the drives. Please assist.. I will try to post exact output of commands.. but the system is kind of unreachable and being rebuilt... i just want to ensure my data on the array is not lost
I have a used but good harddrive which I'd like to use as a replacement for a removed harddrive in existing raid1 array. mdadm --detail /dev/md00 0 0 -1 removed1 8 17 1 active sync /dev/sdb1I thought I needed to mark the removed drive as failed but I cannot get mdadm set it to "failed". I issue mdadm --manage /dev/md0 --fail /dev/sda1But mdadm response is:mdadm: hot remove failed for /dev/sda1: no such device or addressI thought I must mark the failed drive as "failed" to prevent raid1 from trying to mirror in wrong direction when I install my used-but-good disk. I want to reformat the good used drive first right? I believe I must prevent raid array from automatically try to mirror in the wrong direction.
so my servers 7 hds in raid 5 all was working well until one of them died. The HD that died sort of works it can read like half a file also freezes on the benchmark test in disk utility. Unfortunate when i take it out on boot it says. The drive for /media_kbt is not ready or present press s to skip or m for manual recovery. I hit s and then go to disk utility. But i can't start or add disks to the array.
Ubuntu Server 11.04 i386. I've used linux on and off for years but only in small doses, so I'm really just at newbie level. I was running an Openfiler NAS, but decided to give Ubuntu+Webmin a try. And up 'til now I've been happy with progress. I have set up a RAID-6 array using 5 x 1TB SATA drives. I've ensured that the array is in a "clean" state, and now I want to do some failure testing. The problem occurs when I remove one of the drives in the array. I shutdown, remove a drive, then boot up. The array wont start at all, and comes up with this error during boot:
the disk drive for /mnt/raidvol1 is not ready yet or not present Continue to wait; or Press S to skip mounting or M for manual recovery
If I wait, nothing happens. Obviously the RAID array should start in degraded mode, but it fails to mount at all. When I press "M" to go into manual recovery and type "mount -a" I get the response:
mount: special device /dev/RAIDVG1/RAIDLV1 does not exist
I have set BOOT_DEGRADED=true in /etc/initramfs-tools/conf.d/mdadm without success. If I reconnect the disconnected drive, the array works fine, and is in a clean state.
I have a problem with my mdadm RAID. I wanted to know if anyone had any experience with shrinking RAID5 arrays. I was growing the array from 5 to 6 devices however the grow got interrupted and it has recovered to 5 drives. The 6th drive is toast and I am unable to re add it to the system. I would like to drive the device listed as "removed". I have tried mdadm /dev/md0 --remove detached and failed with no success. I am running Ubuntu kernel 2.6.28-11 and mdadm is v3.1.1.
Here is output of a "mdadm -D dev/md0" /dev/md0: Version : 0.90 Creation Time : Wed Jan 12 00:46:41 2009 Raid Level : raid5 Array Size : 4883812480 (4657.57 GiB 5001.02 GB) Used Dev Size : 976762496 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Feb 15 20:25:07 2010 State : active, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K
UUID : 74fa5199:84b88e81:4ae0fbae:92643084 Events : 0.1331010 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 2 8 48 2 active sync /dev/sdd 3 8 0 3 active sync /dev/sda 4 8 64 4 active sync /dev/sde 5 0 0 5 removed
I'm running a Debian homeserver, with a 3-disk (1GB each) raid 5 array using mdadm (the OS is on a separate disk).Now, smartmontools noticed some bad sectors on one of the disks, and I'm not sure what to do next (except for backup of valuable data).I found some articles on how to fix these sectors, but I'm unaware what the result on the whole array will be.
a server that was running a hardware isw raid on the system (root) disk. This was working just fine until I started getting sector errors on one of the disks. So, I shutdown the system and removed the failing drive and installed a new drive (same size). On reboot I went in to the intel raid setup and it did show the new drive and I was able to set it to rebuild the raid. So, continuing the reboot everything came up just fine except the raid 1 on the system disk. I have tried many times to get the system to rebuild the raid using dmraid, but to no avail it would not start a rebuild. In order to get the system back up and make sure that the disk was duplicated I was able to 'dd' the working disk to the new disk that was installed.At present when I look at the system it does not show up with a raid setup on the system disk ( this comprises the entire 1TB disk with w partitions sda1 as / and sda2 as swap).Problem:I have decided to forego the intel raid and just use mdadm. I have a test system setup to duplicate (not the software, but the disk partitions) the server setup.
Code: [root@kilchis etc]# fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
I have a 7-drive RAID array on my computer. Recently, my SATA PCI card died, and after going through multiple cards to find another one that worked with linux, I now can't assemble the array. The drives are no longer in the order they were in previously, and mdadm can't seem to reassemble the array. It says there are 2 drives and one spare, even though there were 7 drives and no spares. I know for a fact that none of the drives are corrupted, because one of the non-working RAID cards was still able to mount the array for a short period, but would loose the drives during resyncing (I later found out that the chipset on the card was had extremely limited linux support). I have tried running "mdadm --assemble --scan" and after the drive is partially assembled, I add the other drives with "mdadm --add /dev/md0 /dev/sdc1". These both return errors and will not complete on the new raid card.
Code: aaron-desktop:~ aaron$ sudo mdadm --assemble /dev/md0 mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
I recently upgraded from lenny to squeeze, and my raid array is degraded immediately after boot. Some info on my machine: I've got a built-in SATA II chipset with 4 drives /dev/sda-d that I use for my RAID5 system, and an IDE drive /dev/sde. Before I upgraded to squeeze, the IDE drive was at /dev/hda. I did the usual 2-step upgrade (kernel/udev first, reboot, then everything else). After the first reboot, the IDE drive became /dev/sda and the SATA drives were /dev/sdb-e. I updated mdadm.conf to reflect the new drive naming and added /dev/sde to the array; it rebuilt successfully everything was back online. After the 2nd reboot, the IDE drive became /dev/sde and my SATA drives went back to /dev/sda-b. No biggie; updated mdadm.conf again, rebuilt, and everything works.
Now that everything has been upgraded, the RAID array still becomes degraded upon boot. I can always add /dev/sda back to the array, and it's always rebuilt successfully. Here are some interesting lines from dmesg:It finds all my drives:
I have been having some odd issues over the last day or so while trying to get a raid 5 array running in software under Kubuntu. I installed 3 1TB drives and started up, my sd* order got all messed up( sda was now sdc and so on). This wasn't entirely unexpected, so I fixed up fstab and booted again. I found all three of the drives I installed, set them to raid auto-detect and used mdadm to create /dev/md0. I then created mdadm.conf by piping the output of mdadm --detail --scan --verbose into /etc/mdadm.conf.At this point, everything was still going swimmingly. I copied over a few hundred GB of data from another failing drive and everything seemed ok. I went to reboot once the copy was done and everything just went weird. All of the sd* drives went back to the original. Of course, this meant that the mdadm.conf was wrong. I tried to just change the device list, but that didn't work. I then deleted mdadm.conf and rebooted. The drive list stayed in the original order this time, so I just tried manually starting the array.
By erasing the partition table of the 3rd drive, I've been able to get it to the status of spare, but it says it is busy when I try to add it to the array. A grep through dmesg makes me think that md has a lock on it. I'm not sure where to go with it now. If anyone has any pointers, I would like to hear them.
mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdb1and I getmd1: raid array is not clean -- starting background reconstructionWhy is it not clean?Should I be worried?The HD is not new it has been used in before in a raid array but has beenrepartitionated.
I have 4 WD10EARS drives running in a RAID 5 array using MDADM.Yesterday my OS Drive failed. I have replaced this and installed a fresh copy of Ubuntu 11.04 on it. then installed MDADM, and rebooted the machine, hoping that it would automatically rebuild the array.It hasnt, when i look at the array using Disk Utility, it says that the array is not running. If i try to start the array it says :Error assembling array: mdadm exited with exit code 1: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: Not enough devices to start the array.I have tried MDADM --assemble --scan and it gives this output:mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.I know that there are 4 drives present as they are all showing, but it is only using 2 of them.I also ran MDADM -- detail /dev.md0 which gave:
root@warren-P5K-E:~# mdadm --detail /dev/md0 /dev/md0: Version : 0.90
When we assemble a raid array, from where does it load configuration information for that array? I thought it refers to /etc/mdadm.conf file, but in my system, mdadm.conf file doesn't even contain all information. Still it is able to successfully assemble previously created device.
I've been having troubles with software raid. In particular, the raid array becomes un "assembleable" after reboots. The config is CentOS 5, 4 sata discs (one by 160 containing OS, no raid and 3 2TB disks configured as a RAID 5 array - no spare drive). These drives were configured in anaconda and all seemed to go well (the drive and its lvm partitions worked and it finished rebuilding overnight). A couple of reboots later the drives cannot be assembled anymore and the machine won't boot. The error message says:
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.
Of course there are 3 drives and no spares in the array as configured. Manually starting the array with mdadm --assemble --scan gives the same message as does assembling the drive by specifying the individual parts. /proc/mdstat does recognize the 3 drives and when I look at the partition tables in fdisk, they show as being software raid. What could be wrong or steps to diagnose? I tried configuring the raid drives manually before going the anaconda route. Also, does anyone know I can edit the /etc/fstab file to disable them so the machine will at least boot. The (Repair filesystem) shell has the / drive mounted r/o.
I have a 4 drive RAID 5 array set up using mdadm. The system is stored on a seperate physical disk outside of the array. When reading from the array its fast but when writing to the array its extremely slow, down to 20MB/Sec compared to 125MB/Sec reading. It does a bit then pauses, then writes a bit more and then pauses again and so on.The test i did was to copy a 5GB file from the RAID to another spare non-raid disk on the system average speed 126MB/s. Copying it back on to the RAID (in another folder) the speed was 20MB/s.The other thing is very slow several KB/s write speed copying from eSATA drive to the RAID.
I also get sent to a Busybox (initramfs) shell with no text editor and don't know how to copy all the error messages and post them here. If there is a way, let me know. I've typed it out in the meantime:
Code: md0 : inactive sdxxxx Attempting to start the RAID in degraded mode... mdadm: CREATE user root not found mdadm: CREATE group disk not found
This is with a 3 disk RAID5 array. I turned off the system, pulled out a drive, and started it back up. Fresh install, all I've done so far is apt-get update and upgrade.
I've recently started having an issue with an mdadm RAID 6 array that been operational for about 2500 hours.
Intermittently during write operations the array stalls, dropping to almost 0 write speed for 10-30 seconds. When this occur one or both of the 2 drives attached to a 2 port Silicon Image si3132 SATA-II controller "locks up" with its activity light locked on. This just started occurring within the last week and didn't seem to coincide with any update that i noticed. The array has just recently passed 12.5% full. The size of the write does not seem to make any difference and it seems completely random. Some times copying a 5 GB dataset results in no slow down other times a torrent downloading to the array at 50kb/sec does cause a slow down and vise versa.
The array consists of 8 WD 1.5TB drives, 6 attached to the ICH9R south bridge, and 2 attached to a si3132 based PCI express card. The array is formatted as a single ext4 partition.
Checking SMART data for all drives shows no errors. Testing read speed with hdparm reports what i would expect (100mb/sec for each drive, ~425mb/sec for the array).
The only thing i did notice is that udma6 is enabled for all the ICH9R drives while only udma5 is enabled for the si3132 drives. Write cache is enabled for all the disks. Attempting to set the si3132 drive to udma6 results in an IO error from hdparm.
The si3132 drive is using the sata_sil24 driver. Nothing of interest appears in the kern or syslog. During this time top shows very high wait time.
The s13132 controller appears to have the original firmware from 2006 loaded, there are some firmware updates available on the Silicon Image website for this controller that now appear to offer separate firmwares for RAID operation (some sort of hybrid controller/software thing the controller supports) and a separate firmware for standard IDE use.
Has anyone had similar issues with this controller? Is a firmware update a reasonable course of action? If so which firmware is best supported by the linux driver?
I know i'm not using its raid features but i've dealt with controllers that needed to be in raid mode for ahci to be active and for linux to work well with them. I'm bit ify at the idea of just trying it and finding out as it could knock 2 disks of my array out of action.
We have some servers that run in very harsh environments (research vessel) that need to have high-availability.We have software RAID 1 for some measure of resiliency, along with proper data backups (tapes etc), however we would like to be able to break out a new server and re-image it (including RAID setup) from a known good copy if the hardware completely fails on the production box. Simplicity of the process is a big plus.I am interested in any advice on the best way to approach this. My current approach (relatively new to Linux administration, totally new to MDADM) is to use DD to take a complete gzipped copy of one of the RAID'ed devices (from a live CD): ode: dd if=/dev/sda bs=4096 | gzip -c > /mnt/external/image/test.img then reverse the process on the new PC, finally using Code:mdadm --assemble to re-create and re-build the array.
I am learning software raid 1 with centos 5.5. I created the raid with out any problems and removed the first drive to check there was no problems and it booted. I have installed the old drive back in the system as hdc and need to resync the drives (used old drive as partitions correct) I thought I could use raidhotadd but id does not seem to exist anymore. how I resync the drives in the array hda primary and hdc secondary using mdadm
I've been having some problems w/ a my RAID 5 array, and after extensive investigation, I'm fairly sure that my last resort is rebuilding the array. I'd tried --assemble, b/c it's a previously created array, but it didn't seem to like that. So, I checked into --create, and it will re-create the array w/out destroying the data, if the superblocks are persistent, which they seem to be. However, here's what I get:
My question is: why do /dev/sdb1 and /dev/sdi1 show as both ext2fs and also as part of a RAID array?
I recently upgraded from 10.04 to 11.04 and I now often get boot messages about a degraded raid.
I'm fairly experienced, but I'm confused which raid it is talking about. I have a raid5 array, but I don't boot of that, and it seems fine when I finally get it to boot. Previously, I didn't have any other raid arrays, but now I seem to have two others called md126 and md127, they both seem to be degraded. Where did they come from?
 I *do* have two 80GB drives that I was booting from in RAID1, but that was a looong time ago, and I have since only booted from one of them. The partition table indeed shows partitions 1 and 5 are raid autodetect and /proc/mdstat shows they are degraded ([U_]). Could it be that this is causing the problem? If so, why has this only started to happen since the upgrade from 10.04 to 11.04?Anyway, perhaps it is a good idea to add in that second disk to the raid1 array. If so, how to do that? Note that, I've also noticed that when I boot and get to the screen when I select from the different kernel versions, I now get a couple of really old ones too - my thought is that these are from the raid1 disk that I stopped using. If I add it to the array, how can I be sure it will mirror in the correct direction?
It could be that I have fairly recently plugged in that second RAID1 disk, after a long time of not having enough spare sata sockets (I switched my RAID5 array from 8 disks to only 3 disks, so suddenly had a lot more spare sockets).
Could any RAID gurus kindly assist me on the following RAID-5 issue?I have an mdadm-created RAID5 array consisting of 4 discs. One of the discs was dropping out, so I decided to replace it. Somehow, this went terribly wrong and I succeeded in marking two of the drives as faulty, and the re-adding them as spare.
Now the array is (logically) no longer able to start:
mdadm: Not enough devices to start the array.Degraded and can't create RAID ,auto stop RAID [md1]
Code: mdadm --create --assume-clean --level=5 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 As I don't want to ruin the maybe small chance I have left to rescue my data, I would like to hear the input of this wise community.
I wonder how to attach new sata hard disk to software array where are two disk and one is crashed (this is a mirroring mode=Raid 1).Situation like this:I unpluged crashed disk and I buy the similar one and plug in What Next should I do?
I'm working on a server and noticed that the to RAID5 setup is showing 4 Raid devices but only 3 Total devices. It's on a fully updated CentOS 5 system that only has three SATA drives, as it can not hold anymore. I've done some researching but am unable to remove the fourth device, which is listed as removed. The full output of `mdadm -D /dev/md2` can be see below. I've never run into this situation before.Anyone have any pointers on how I can reduced the Raid Devices from 4 to 3? I have tried
My home-backup server, with 8*2TB disks won't boot anymore. Two disks failed at the same time and i rebuilt the raid 6 array without any problem, but now i can't boot the os. I'm using ubuntu server, 10.10. I've made screens of the displays to don't copy everything here. The problem at the boot:
And the Grub config: It's not a production server, but i would like to have it online. I've tried for the lasts 2 days (just a couple hours a day) but without success. I was suggested to do "mount -o remount,rw /" and than edit /etc/fstab, but it get the file don't exist error.
120GB Sata HDD - Primary OS drive 3 x 1.0TB Sata HDD - Raid 5 array
This is on a C2D MSI P35 Platinum board. Anyway, did a fresh install of F12 on the 120GB, which I had problems with - Anaconda refused to see the drive. Fedora Live could see it fine, and it was listed as an 'nvidia_raid_member' - no idea why, but I completely erased the disc under the Live CD and proceeded to install F12.
Once F12 was installed, I loaded up mdadm to re-activate my Raid 5 array, using 'sudo mdadm --assemble --uuidthe uuid) - and it started with only 2 of the 3 drives. My /dev/sdb drive did not activate into the array, due to what mdadm said was a mismatched UUID. Ok, so I erased /dev/sdb, intending to rebuild the array. Erased /dev/sdb, and then attempted 'sudo mdadm --add /dev/md0 /dev/sdb' and I get this error: "mdadm: Cannot add disks to a 'member' array, perform this operation on the parent container" - I can find NO information on this error message.
I don't believe the hard drives are connected in the exact same order they were in before - I disconnected everything in the system and blew it out (it was pretty dusty)
I am running a 14 disk RAID 6 on mdadm behind 2 LSI SAS2008's in JBOD mode (no HW raid) on Debian 7 in BIOS legacy mode.
Grub2 is dropping to a rescue shell complaining that "no such device" exists for "mduuid/b1c40379914e5d18dddb893b4dc5a28f".
Output from mdadm: Code: Select all # mdadm -D /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed Nov 7 17:06:02 2012 Raid Level : raid6 Array Size : 35160446976 (33531.62 GiB 36004.30 GB) Used Dev Size : 2930037248 (2794.30 GiB 3000.36 GB) Raid Devices : 14
Output from blkid: Code: Select all # blkid /dev/md0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs" /dev/md/0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs" /dev/sdd2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="09a00673-c9c1-dc15-b792-f0226016a8a6" LABEL="media:0" TYPE="linux_raid_member"
The UUID for md0 is `2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb` so I do not understand why grub insists on looking for `b1c40379914e5d18dddb893b4dc5a28f`.
**Here is the output from `bootinfoscript` 0.61. This contains alot of detailed information, and I couldn't find anything wrong with any of it: [URL] .....
During the grub rescue an `ls` shows the member disks and also shows `(md/0)` but if I try an `ls (md/0)` I get an unknown disk error. Trying an `ls` on any member device results in unknown filesystem. The filesystem on the md0 is XFS, and I assume the unknown filesystem is normal if its trying to read an individual disk instead of md0.
I have come close to losing my mind over this, I've tried uninstalling and reinstalling grub numerous times, `update-initramfs -u -k all` numerous times, `update-grub` numerous times, `grub-install` numerous times to all member disks without error, etc.
I even tried manually editing `grub.cfg` to replace all instances of `mduuid/b1c40379914e5d18dddb893b4dc5a28f` with `(md/0)` and then re-install grub, but the exact same error of no such device mduuid/b1c40379914e5d18dddb893b4dc5a28f still happened.
One thing I noticed is it is only showing half the disks. I am not sure if this matters or is important or not, but one theory would be because there are two LSI cards physically in the machine.
This last screenshot was shown after I specifically altered grub.cfg to replace all instances of `mduuid/b1c40379914e5d18dddb893b4dc5a28f` with `mduuid/2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb` and then re-ran grub-install on all member drives. Where it is getting this old b1c* address I have no clue.
I even tried installing a SATA drive on /dev/sda, outside of the array, and installing grub on it and booting from it. Still, same identical error.