Ubuntu :: MDADM RAID 5 Disk Failure And Recovery?
Jun 18, 2010
I have a fileserver which is running Ubuntu Server 6.10. I had a RAID5 array consisting of the following disks:
Code:
/dev/sda1
/dev/sdb1
/dev/sdd1
/dev/md0 -
the raid drive for the above three disks. The sda1 disk has failed and the array is running on 2 of 3 disks
/dev/sdc (OS disk)
/dev/sde (new 2tb disk - unused)
/dev/sdf (new 2tb disk - unused)
My plan was to rebuild the array using the two new disks as RAID1. Would the best way to do this be to create a new RAID1 disk on /dev/md1 then copy all data over from /dev/md0? Also - this may sound stupid but since all 3 drives in md0 are identical i'm not sure physically which disk is bad. I tried disconnecting each disk one-by-one then rebooting but the system doesn't appear to want to boot without the bad drive connected. I've already failed the disk in the array with mdadm but i'm unsure of how to remove it properly.
View 3 Replies
ADVERTISEMENT
Feb 2, 2010
Recently, one the SMART utility said that one of the drives had failed and another drive was about to fail. I downed the box and hooked them up to my windows machine to run sea tools on them (They are all seagate drives). Sea Tools said that the drives were fine, while ubuntu said they were failing/dead. Yesterday I decided to try to fix one of the drives in the raid. I turned the server off, took the failed drive out, and restarted. Of course the raid didn't work because only 2 of the 3 drives were there, however it had been working w/ only 2 of the 3 drives for a couple months now (I'm a lazy college student). I turned it back off and back on with the drive there just to see if I could get the raid up again, but I havn't been able to get it to go. So far I've tried:
Code:
mdadm --assemble /dev/md0 /dev/sd[b,c,d]
mdadm: no recogniseable superblock on /dev/sdb
mdadm: /dev/sdb has no superblock - assembly aborted
[code]....
I'm looking for a way to trick the raid into working with just 2 drives until I can warranty the seagate and buy an external 1.5 TB drive to use as another backup. how to remove the bad drive from the array and replace it with a fresh drive, without data loss.
View 3 Replies
View Related
Jun 5, 2011
I have 4 WD10EARS drives running in a RAID 5 array using MDADM.Yesterday my OS Drive failed. I have replaced this and installed a fresh copy of Ubuntu 11.04 on it.I then installed MDADM, and rebooted the machine, hoping that it would automatically rebuild the array.It hasnt, when i look at the array using Disk Utility, it says that the array is not running. If i try to start the array it says : Error assembling array: mdadm exited with exit code 1: mdadm: failed to RUN_ARRAY /dev/md0: Input/output errormdadm: Notenough devices to start the array.I have tried MDADM --assemble --scan and it gives this output:mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.I know that there are 4 drives present as they are all showing, but it is only using 2 of them.I also ran MDADM -- detail /dev.md0 which gave:
root@warren-P5K-E:~# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
[code]....
View 2 Replies
View Related
Jun 7, 2011
I have 4 WD10EARS drives running in a RAID 5 array using MDADM.Yesterday my OS Drive failed. I have replaced this and installed a fresh copy of Ubuntu 11.04 on it. then installed MDADM, and rebooted the machine, hoping that it would automatically rebuild the array.It hasnt, when i look at the array using Disk Utility, it says that the array is not running. If i try to start the array it says :Error assembling array: mdadm exited with exit code 1: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: Not enough devices to start the array.I have tried MDADM --assemble --scan and it gives this output:mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.I know that there are 4 drives present as they are all showing, but it is only using 2 of them.I also ran MDADM -- detail /dev.md0 which gave:
root@warren-P5K-E:~# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
[code]...
View 11 Replies
View Related
Apr 7, 2011
I am trying to create a Raid 1 ram disk. Below are the commands I used:
[root@abidbodal dev]# mke2fs -m 0 /dev/ram8
[root@abidbodal dev]# mount /dev/ram8 /mnt/rd8
[root@abidbodal dev]# mke2fs -m 0 /dev/ram9
[code]....
View 3 Replies
View Related
Oct 16, 2009
my Fedora 11 system is not starting anylonger. It stops with the message:
Code:
VFS: Can't find ext4 filesystem on dev dm-0
The system told me since a while, that a lot of the sectors of one disk of the (software) RAID compound are failed already. So tried to disconnect each of the disks and start them separately. Unfortunaltly this is not working (for one its is not working at all, the other wents the same far as with both), when I tried to recover the system with the Fedora DVD, it said no distribution found. I am quite new and do not know so much about linux system, so i do not know what further information you could need. Maybe it can be important, that both disks are encryped (the system wents so far, that I can type in the password).
View 2 Replies
View Related
Sep 19, 2014
I am running a 14 disk RAID 6 on mdadm behind 2 LSI SAS2008's in JBOD mode (no HW raid) on Debian 7 in BIOS legacy mode.
Grub2 is dropping to a rescue shell complaining that "no such device" exists for "mduuid/b1c40379914e5d18dddb893b4dc5a28f".
Output from mdadm:
Code: Select all # mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Nov 7 17:06:02 2012
Raid Level : raid6
Array Size : 35160446976 (33531.62 GiB 36004.30 GB)
Used Dev Size : 2930037248 (2794.30 GiB 3000.36 GB)
Raid Devices : 14
[Code] ....
Output from blkid:
Code: Select all # blkid
/dev/md0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs"
/dev/md/0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs"
/dev/sdd2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="09a00673-c9c1-dc15-b792-f0226016a8a6" LABEL="media:0" TYPE="linux_raid_member"
[Code] ....
The UUID for md0 is `2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb` so I do not understand why grub insists on looking for `b1c40379914e5d18dddb893b4dc5a28f`.
**Here is the output from `bootinfoscript` 0.61. This contains alot of detailed information, and I couldn't find anything wrong with any of it: [URL] .....
During the grub rescue an `ls` shows the member disks and also shows `(md/0)` but if I try an `ls (md/0)` I get an unknown disk error. Trying an `ls` on any member device results in unknown filesystem. The filesystem on the md0 is XFS, and I assume the unknown filesystem is normal if its trying to read an individual disk instead of md0.
I have come close to losing my mind over this, I've tried uninstalling and reinstalling grub numerous times, `update-initramfs -u -k all` numerous times, `update-grub` numerous times, `grub-install` numerous times to all member disks without error, etc.
I even tried manually editing `grub.cfg` to replace all instances of `mduuid/b1c40379914e5d18dddb893b4dc5a28f` with `(md/0)` and then re-install grub, but the exact same error of no such device mduuid/b1c40379914e5d18dddb893b4dc5a28f still happened.
[URL] ....
One thing I noticed is it is only showing half the disks. I am not sure if this matters or is important or not, but one theory would be because there are two LSI cards physically in the machine.
This last screenshot was shown after I specifically altered grub.cfg to replace all instances of `mduuid/b1c40379914e5d18dddb893b4dc5a28f` with `mduuid/2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb` and then re-ran grub-install on all member drives. Where it is getting this old b1c* address I have no clue.
I even tried installing a SATA drive on /dev/sda, outside of the array, and installing grub on it and booting from it. Still, same identical error.
View 14 Replies
View Related
Oct 27, 2010
We have some servers that run in very harsh environments (research vessel) that need to have high-availability.We have software RAID 1 for some measure of resiliency, along with proper data backups (tapes etc), however we would like to be able to break out a new server and re-image it (including RAID setup) from a known good copy if the hardware completely fails on the production box. Simplicity of the process is a big plus.I am interested in any advice on the best way to approach this. My current approach (relatively new to Linux administration, totally new to MDADM) is to use DD to take a complete gzipped copy of one of the RAID'ed devices (from a live CD): ode:
dd if=/dev/sda bs=4096 | gzip -c > /mnt/external/image/test.img then reverse the process on the new PC, finally using Code:mdadm --assemble to re-create and re-build the array.
View 1 Replies
View Related
Aug 12, 2010
I've got an 8-disk raid-5 setup, and one of the disks failed. I shut the system down, replaced it, and powered the box back on again. Then, I made a catastrophic mistake; I 'failed' and removed the wrong disk (should have been sdj1, and I typed sdk1 by accident). I tried to re-add sdk1 back to the raid array, but it got listed as 'spare'. My raid array is off-line, since I now have 2 disks unavailable.
I know that the data still exists on sdk1, is there any way I can get the raid array to recognise the fact that it's a valid part of the array, and not a spare disk? At least if I can do that, I'll have a degraded but accessible array, and then I can rebuild the array on the properly replaced disk.
View 7 Replies
View Related
Dec 19, 2010
I went to setup my linux box and found that the OS drive had finally died. It was an extremely old WD raptor drive in a hot box full of drives so it was really only a matter of time before it just quit on me. Normally this wouldn't be such a big deal however I had just recently constructed an md RAID5 array of 3 1TB disks to act as an NFS mount for basically all of my important files. Maybe 2-3 weeks before the failure I had finished moving all of my most important stuff onto that array. Now I know that the array is intact. All the required data is sitting on those disks. Since only the OS level disk failed on me I should be able to get a new disk in there, reinstall ubuntu and then rebuild that array. how exactly do I go about doing that with mdadm? Do I create the array from the /dev character devices like when I initially built the array?
View 2 Replies
View Related
Jun 9, 2011
Following scenario: My server in some data center on a different continent with two disks and software raid 1.
One day I see that a disk failed (for example with /proc/mdstat). Of course I should replace the failed disk asap. Now that I think about it, I am not sure how. What should my email to the data center support guy mention to make sure that guy doesn't replace the wrong disk?
With hardware RAID it is very easy, because the controller usually has some kind of red LED indicator. But what about software raid?
View 8 Replies
View Related
Apr 4, 2010
I have installed a Fedora Core 12 Linux system onto a RAID 1 file system. I now need a way of getting an notification if the disk fails. Is there an SNMP MIB that covers Intel RAID? I have done the searching but still the answer alludes me.
View 1 Replies
View Related
Feb 1, 2011
Could any RAID gurus kindly assist me on the following RAID-5 issue?I have an mdadm-created RAID5 array consisting of 4 discs. One of the discs was dropping out, so I decided to replace it. Somehow, this went terribly wrong and I succeeded in marking two of the drives as faulty, and the re-adding them as spare.
Now the array is (logically) no longer able to start:
mdadm: Not enough devices to start the array.Degraded and can't create RAID ,auto stop RAID [md1]
I was able to examine the disks though:
Code:
root@127.0.0.1:/etc# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 00.90.00
code....
Code:
mdadm --create --assume-clean --level=5 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
As I don't want to ruin the maybe small chance I have left to rescue my data, I would like to hear the input of this wise community.
View 4 Replies
View Related
Jul 18, 2011
I have a raid5 on 10 disk, 750gb and it have worked fine with grub for a long time with ubuntu 10.04 lts. A couple of days ago I added a disk to the raid, growd it and then resized it.. BUT, I started the resize-process on a terminal on another computer, and after some time my girlfriend powered down that computer!
So the resize process cancelled in the middle and i couldn't acess any of the HDDs so I rebooted the server.
Now the problem, the system is not booting up, simple black with a blinking line. Used a rescue CD to boot it up, finised the resize-process and the raid seems to be working fine so I tried to boot normal again. Same problem. Rescue cd, updated grub, got several errors: error: unsupported RAID version: 0.91. I have tried to purge grub, grub-pc, grub commmon, removed /boot/grub and installed grub again. Same problem.
I have tried to erased mbr (# dd if=/dev/null of=/dev/sdX bs=446 count=1) on sda (ide disk, system), sdb (sata, new raid disk). Same problem. Removed and reinstalled ubuntu 11.04 and is now getting error: no such device: (hdd id). Again tried to reinstall grub on both sda and sdb, no luck. update-grub is still generating error about raid id 0.91 and is back on a blinking line on normal boot. When you'r resizeing a raid MDADM changed the ID from 0.90 to 0.91 to prevent something that happend happened. But since I have completed the resize-process MDADM have indeed changed the ID back to 0.90 on all disks.
I have also tried to follow a howto on a similar problem with a patch on [URL] But I cant compile, various error about dpkg. So my problem is, I cant get grub to work. It just gives me a blinking line and unsupported RAID version: 0.91.
View 2 Replies
View Related
Jul 11, 2010
I'm writing a monitoring plugin for a home server RAID, mdadm on Ubuntu 10.4. code...
I'm looking for the possible values of "state" but can't seem to find it anywhere, neither man nor the online documentation I have found seem to have a list.
Does anyone know where to find a list of possible states?
View 1 Replies
View Related
Mar 10, 2010
I have done lots of searching and I haven't been able to find anyone else with the same problem. Whenever I create a RAID with 'mdadm', regardless of level (I've done linear, 0, and 5) the command I use is:
Code:
mdadm --create --run --verbose /dev/md0 --raid-devices=11 --spare-devices=1 --chunk=256 --level=5 /dev/sd[abcdefghijkl]1
The RAID is build RAID 5 as it should be. However, when I check /proc/partitions it shows under "md3".
[Code]...
View 1 Replies
View Related
Oct 21, 2010
I have a previously defined RAID5 (4 disks). This worked in Ubuntu 8.04.. I recently moved to CentOS5.. and I cannot seen to get the drive back online.cat /proc/mdstat shows, no raid levels (personalities).. and no drives listed.mdadm --detail -scan returns nothing.mdadm --QE returned a UUID string.. and the ARRAY output.I can mdadm --examine all the members of the original array.I am not versed in mdadm enough to really understand what I can run and should not run that would erase the data on the drives. Please assist.. I will try to post exact output of commands.. but the system is kind of unreachable and being rebuilt... i just want to ensure my data on the array is not lost
View 7 Replies
View Related
Jun 7, 2010
I just had a whole 2TB Software RAID 5 blow up on me. I rebooted my server, which i hardly ever do and low and behold i loose one of my raid 5 sets. It seems like two of the disks are not showing up properly.. What i mean by that is the OS picks up the disks, but it doesnt see the partitions.
I ran smartct -l on all the drives in question and they're all in good working order.
Is there some sort of repair tool i can use to scan the busted drives (since they're available) to fix any possible errors that might be present.
Here is what the "good" drive looks like when i use sfdisk:
Quote:
sudo sfdisk -l /dev/sda
Disk /dev/sda: 121601 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sda1 0+ 121600 121601- 976760001 83 Linux
/dev/sda2 0 - 0 0 0 Empty
[Code]....
View 2 Replies
View Related
Oct 6, 2010
Can I use UUIDs to setup a raid with mdadm?
View 3 Replies
View Related
Feb 1, 2011
My home-backup server, with 8*2TB disks won't boot anymore. Two disks failed at the same time and i rebuilt the raid 6 array without any problem, but now i can't boot the os. I'm using ubuntu server, 10.10. I've made screens of the displays to don't copy everything here. The problem at the boot:
And the Grub config: It's not a production server, but i would like to have it online. I've tried for the lasts 2 days (just a couple hours a day) but without success. I was suggested to do "mount -o remount,rw /" and than edit /etc/fstab, but it get the file don't exist error.
View 2 Replies
View Related
Mar 25, 2011
I'm having trouble with Ubuntu 10.10 and stable device names. When I installed Ubuntu, the root drive was the only one in the machine; it obviously got /dev/sda.
After the base installation, I installed three additional 2TB drives to make RAID-5 array. Ubuntu renamed the root drive to /dev/sdd. While annoying I lived with it.
After creating a single partition set to "Linux raid autodetect" on each drive, I created the RAID-5 array:
Code:
All was going well until a reboot. When rebooting Ubuntu decided to make the root drive /dev/sda this time and now mdadm --detail /dev/md0 reports:
Code:
How to fix the array and make the device names stable?
View 1 Replies
View Related
Aug 14, 2010
I'm running a Debian homeserver, with a 3-disk (1GB each) raid 5 array using mdadm (the OS is on a separate disk).Now, smartmontools noticed some bad sectors on one of the disks, and I'm not sure what to do next (except for backup of valuable data).I found some articles on how to fix these sectors, but I'm unaware what the result on the whole array will be.
View 4 Replies
View Related
Feb 20, 2011
My only goal is to have a raid-5 that auto-assembles and auto-mounts. Hardware: 4*2TB sata (raid disks), 1*500GB IDE (OS disk), 1*DVD IDE all plugged direct into the motherboard (nForce 750i SLI).
Starting partitions on the raid disks: gpt ext4 The problem occurs when I restart my comp after building it for the first time. I am able to see it assemble, I am able to partition it, I even mounted it Once.This is the second time I've built it so I have watched everything that happened. I don't know if this has anything to do with my problem, but when I created the raid my drive designations were: sda - 500GB(OS), sd[bcde] - 2TB(raid). When I restarted: sd[abcd] - 2TB(raid), sde - 500GB(OS).
[Code]...
View 3 Replies
View Related
Aug 21, 2010
I HAD a fedora 11 server with md RAID 1 across two 1TB SATA drives. The md0 space was set up to be an LVM PV and the single LVM VG was carved up into 5 or 6 LVs. The MB on this system died and I wound up buying a new one.
Now I want to recover the data from the RAID1 setup on the new server. However, when I attach the two 1TB drives to a new fedora 13 setup, mdadm is only able to find one of the two drives. The partition on the second drive shows "busy" during an mdadm -A -s -v to scan for md volumes.
Well, one drive should be enough since this is RAID1, right? Well, when I do a pvscan -v, the other drive shows up as a "NEW" pv not allocated to a VG. In addition, vgscan does print "Invalid metadata header checksum" when it runs but it doesn't point at any particular PV. I'm afraid to go any further with LVM since I can't afford to lose the data on this system. It is backed up offsite, but the restore will take several days and I can't afford to be down that long.
Are there any tools or techniques where I can dig deeper into what each drive, in the RAID1 pair, has right and wrong with it and pick one that I can force into a usable VG so that I can recover the data?
View 2 Replies
View Related
Mar 2, 2011
a server that was running a hardware isw raid on the system (root) disk. This was working just fine until I started getting sector errors on one of the disks. So, I shutdown the system and removed the failing drive and installed a new drive (same size). On reboot I went in to the intel raid setup and it did show the new drive and I was able to set it to rebuild the raid. So, continuing the reboot everything came up just fine except the raid 1 on the system disk. I have tried many times to get the system to rebuild the raid using dmraid, but to no avail it would not start a rebuild. In order to get the system back up and make sure that the disk was duplicated I was able to 'dd' the working disk to the new disk that was installed.At present when I look at the system it does not show up with a raid setup on the system disk ( this comprises the entire 1TB disk with w partitions sda1 as / and sda2 as swap).Problem:I have decided to forego the intel raid and just use mdadm. I have a test system setup to duplicate (not the software, but the disk partitions) the server setup.
Code:
[root@kilchis etc]# fdisk -l
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
View 12 Replies
View Related
Sep 10, 2010
I have a 7-drive RAID array on my computer. Recently, my SATA PCI card died, and after going through multiple cards to find another one that worked with linux, I now can't assemble the array. The drives are no longer in the order they were in previously, and mdadm can't seem to reassemble the array. It says there are 2 drives and one spare, even though there were 7 drives and no spares. I know for a fact that none of the drives are corrupted, because one of the non-working RAID cards was still able to mount the array for a short period, but would loose the drives during resyncing (I later found out that the chipset on the card was had extremely limited linux support). I have tried running "mdadm --assemble --scan" and after the drive is partially assembled, I add the other drives with "mdadm --add /dev/md0 /dev/sdc1". These both return errors and will not complete on the new raid card.
Code:
aaron-desktop:~ aaron$ sudo mdadm --assemble /dev/md0
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
[code]....
View 4 Replies
View Related
Mar 3, 2010
I have a 4 drive RAID 5 array set up using mdadm. The system is stored on a seperate physical disk outside of the array. When reading from the array its fast but when writing to the array its extremely slow, down to 20MB/Sec compared to 125MB/Sec reading. It does a bit then pauses, then writes a bit more and then pauses again and so on.The test i did was to copy a 5GB file from the RAID to another spare non-raid disk on the system average speed 126MB/s. Copying it back on to the RAID (in another folder) the speed was 20MB/s.The other thing is very slow several KB/s write speed copying from eSATA drive to the RAID.
View 9 Replies
View Related
Dec 3, 2010
I have 4 SATA's in a RAID 5 array using mdadm. Yesterday when I started the computer the RAID did not build/mount. When trying to load the array manually I get the message "mdadm: cannot open device /dev/sd(a,b,c,d)1: Device or resource busy" The drives should not be mounted or in use. The output of the drives in mdadm (mdadm --examine /dev/sd_1) looks normal.
The weirdest part is that rebooting often changes which drive is marked as busy, it can be any of the 4 SATA drives. how to figure out why/what is being used and how to disable it? I have tried searching for similar threads here and in google and haven't found anything similar or that worked.
View 3 Replies
View Related
Feb 5, 2011
I am trying to create a new mdadm RAID 5 device /dev/md0 across three disks where such an array previously existed, but whenever I do it never recovers properly and tells me that I have a faulty spare in my array. More-specific details below. I recently installed Ubuntu Server 10.10 on a new box with the intent of using it as a NAS sorta-thing. I have 3 HDDs (2 TB each) and was hoping to use most of the available disk space as a RAID5 mdadm device (which gives me a bit less than 4TB.)
I configured /dev/md0 during OS installation across three partitions on the three disks - /dev/sda5, /dev/sdb5 and /dev/sdc5, which are all identical sizes. The OS, swap partition etc. are all on /dev/sda. Everything worked fine, and I was able to format the device as ext4 and mount it. Good so far.
Then I thought I should simulate a failure before I started keeping important stuff on the RAID array - no point having RAID 5 if it doesn't provide some redundancy that I actually know how to use, right? So I unplugged one of my drives, booted up, and was able to mount the device in a degraded state; test data I had put on there was still fine. Great. My trouble began when I plugged the third drive back in and re-booted. I re-added the removed drive to /dev/md0 and recovery began; things would look something like this:
Code:
user@guybrush:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdc5[3] sdb5[1] sda5[0]
3779096448 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
[Code]...
View 9 Replies
View Related
Apr 5, 2011
I bought a disk to a friend that used it in a raid array, using the entire disk for the raid usage. To put that disk on service, i used dd-rescue to copy my old disk entirely, and managed to grow and setup a the partition table without losing any data. My last step was to create a RAID between my entire old disk, with a single partition and a partition of the same size on my new disk. I ran into some problems, but i manage to somehow fix it imperfectly, but now this setup is working properly. The problems (and imperfection) came from an issue it did not suspected : at some point, the original RAID superblock of the new disk, living in /dev/sda, resisted to dd-rescue, and so it is scanned by mdadm that tries, obviously unsuccessfully, to use it.
Partition layout :
Code:
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
[code]....
this setup is working properly besides this raid5 declared on sda, so that is shows up here and there. Since it is using the same device name that my other, proper raid setup, i don't know how to deactivate it since mdadm uses the /dev/mdx name to identify arrays.
View 4 Replies
View Related