CentOS 5 Hardware :: Getting Errors When Running Mdadm In Grow Mode Failed
Jun 24, 2009
Is growing raid 6 in 5.3 centos possible? I'm getting errors when i run mdadm in --grow mode failed : 'mdadm --grow /dev/md0 -n 5 2>&1' -> mdadm: Cannot set device size/shape for /dev/md0: Invalid argument Do i have to create a custom kernel for centos?
I've 2 servers (xen1 and xen2 - their hostnames) with perversion configuration below: Each server have 4 SATA disks, 1 Tb each.
16 Gb ddr3 debian squeeze x64 installed: root@xen2:~# uname -a Linux xen2 2.6.32-5-xen-amd64 #1 SMP Wed Jan 12 05:46:49 UTC 2011 x86_64 GNU/Linux
Storage configuration: Former 256 Mb + 32 Gb of 2 of 4 disks are used as raid1 devices for /boot and swap respectively. The rest of space, 970 Gb on all 4 sata disks are used as raid10. There is LVM2 installed over that raid10. Volume group is named xenlvm (that servers are expected to use as xen 4.0.1 hosts, but the story is not about Xen troubles). / , /var, /home are located on logical volumes of small size (just found out I got mixed up with lv names and partitions, but that's not the problem, I think):
I have been running 10.4 with no problems for some time now. Today when I booted up it started checking the drive for errors, and I just left it to do its thing. I came back to this warning screen: Ubuntu is running in low-graphics mode.Your screen, graphics card, and input device settings could not be detected correctly. I've tried all of these with no luck. When I select to run to low graphics mode, it says "Stand by one minute while the display restarts...OK".I select OK and then it gets stuck checking for battery state.I try to reconfigure the graphics, and nothing happens when I select any of the options on the next screen.
I'm looking for insite on how it might be possible to grow an existingvolume/partition/filesystem while it's in active use, and without having to add additional luns/partitions to do it.For example the best way I can find to do itcurrently, and am using this in production, is you have a system using LVM managing a connected LUN (iSCSI/FC/etc), with a single partition/filesystem residing on it.To grow this filesystem (while it's active) you have to add a new LUN to the existing volume group, and then expand the filesystem. To date I have not found a way to expand a filesystem that is hosted by a single LUN.
For system context, I'm running a 150 TB SAN that has over 300 spindles, to which about 50 servers are connected. It is an equal mix of Linux, Windows, and VMware hosts connected via both FC & iSCSI... With both Windows & VMware, the aforementioned task of expanding a single LUN and having the filesystem expanded is barely a 1 minute operation that "Just Works".If you can find me a sweet way to seamlessly expand a LUN and have a Linuxfilesystem expanded (without reboot/unmount/etc)I have cycles to test out any suggested methods/techniques, and am more than happy to report the results for anyone else interested. I think this is a subject that many people would like to find that magic method to make all our lives much easier
I installed CentOS 5.5. After install, I decided to put 3 identical disk for raid 5. All the disks are IDE disk. Then I put a sata disk and partitioned it to add another partition to the raid 5 array. Everything works fine until I rebooted my system. After reboot, the sata partition I added into raid 5 is showing removed. I had to readd it using "mdadm --add" to make raid 5 array works.
I have configured a "Syslog" server on /var directory as a separate ext3 partition - to receive the logs and events from the clients & the firewall as well. The directory needs to grow dynamically as the logs are populated. Is there a way i can make the filesystem grow dynamically as and when the directory is full.
I received some errors while running a benchmarking script to determine my ideal raid chunk size. There are several errors in the kernel log regarding the sata link and eventually the two drives i have connected to a pci express x1 sata card were no longer present in /dev/
the script i was using is available here [URL]..system specs 1 500gb western digital drive (system drive)3 2tb samsung f4 drives (2 connected to pci x1 card (sata II) and 1 onto onboard sata port (sata I)) single core amd 64 on SiS chipset debian 64bit testing
[Code]...
I rebooted the machine and everything appears to be happy. What do these errors mean? What steps should i take to prevent them in the future so it doesn't end up corrupting the array?
Its been a while (maybe too long) since I last used yum to check for updates / install anything on my server, but when I tried the other day I got the following response:
# yum list updates Loading "installonlyn" plugin Setting up repositories Error: Cannot find a valid baseurl for repo: updates[code]......
Trying other mirror.Error: Cannot open/read repomd.xml file for repository: updates
A few weeks ago there were some network changes, but nothing else should have changed OS side.I have checked /etc/resolv.conf and that looks ok - ping and wget have no problem resolving hostnames.I tried setting enabled=0 in yhe .repos files - this allowed me to do a 'yum list', but 'yum list updates' returned an empty list - I am definitely not 100% up to date!
P.S. Attachment not there at the moment - getting the following error when I try to upload the zip file...Errors Returned While Uploading
Failed to open directory with write permission: /var/wwwthtml/oldwebcopy.centos.org/uploads/newbb
I tryed to install ubuntu 10.04 using the beta alternative install cd.
Everything went fine until the partitioning section.
I choose manual partitioning and all my existing partitions were detected correctly included my 2 mdadm raid0 arrays.
I choose md0 as my / partition and choose to format the partition
I choose md1 as my /home partition as choose to keep the data
When I choose to continue and write the changes to disk the install started to create an ext4 partition on md0, the installer then stopped with an error that the kernel could not reread the partition table.
I aborted the installation at this point.
Now I can not access either of my arrays.
I have booted a livecd and installed mdadm. When I checked /etc/mdadm/mdadm.conf my existing arrays were already listed.
Code: # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. #
I know this is probably a very generic linux question, but since I am using ubuntu - I thought it safer to ask here. I have jumped into the deep end of linux - and I am afraid that I will be forced to swim sooner rather than later.
Let me start at the beginning - I am and probably will be a windows fan for a long time - let me not list the reasons - or else you guys will probably hang met out to dry - the one thing I have discovered - is that windows sucks in generating a software RAID - especially the RAID 5 that i was looking for any case - after loosing plenty data via windose - I decided to attempt linux/ ubuntu. I must say - so far so good.
I used this excellent guide: [URL] and must say that the raid is performing admirably - I am currently busy adding/growing the 12th 1Tb drive onto the RAID, and no issues so far(some other major WOW advantages i have noticed... like speed writing too and reading from the RAID.. )
My questions : If one drive fails on the array(for example SDK1) - how the heck do i determine which physical hardware device it is that has failed? (without compromising the other data - yes unfortunately I cant afford to backup 11TB of data - personal server). I don't have space in the box for a mouse - not even talking about a hot spare drive - thus adding the backup drive before removing the faulty drive is rather difficult - but if that's the only option I will have to keep with that as everybody know RAID5 is only 1 drive backup - so partly I would like to solve the issue as quick as possible -without having to resort to disconnecting one drive at a time to determine which is which. If the drive assignments ( SDA/SDB/SDC) is constant
What is the most intuitive/fast way to determine that a faulty drive exist in the array? - i.e. is there some sort of GUI solution for MDADM that will tell me the moment that a drive has turned faulty? - The box is currently not on the internet -meaning notification via email is not possible. Is there a non-destructible way to convert the RAID-5 to RAID-6? (I would rather sacrifice 1TB of free space - for peace of mind) - and RAID6 will make troubleshooting a bit easier since 3 drives will have to fail before data-loss.
i am setting up a software raid6 for the first time. To test the raid i removed a drive from the array by popping it out of the enclosure. mdadm marked the drive as F and everything seemed well. From what i gather the next step is to remove the drive from the array (mdadm /dev/md0 -r sdf), when i try this i receive the error:
mdadm: cannot find /dev/sdf: No such file or directory
That is true, when i plugged the drive back in the machine now recognizes it as /dev/sdk. My question is how do i remove this non-existent failed drive from my array as i was able to re-add it just fine as /dev/sdk with mdadm /dev/md0 -a /dev/sdk
Also, is there any way to define a drive based on id or something similar to the same drive name to avoid this?
I just had a whole 2TB Software RAID 5 blow up on me. I rebooted my server, which i hardly ever do and low and behold i loose one of my raid 5 sets. It seems like two of the disks are not showing up properly.. What i mean by that is the OS picks up the disks, but it doesnt see the partitions.
I ran smartct -l on all the drives in question and they're all in good working order.
Is there some sort of repair tool i can use to scan the busted drives (since they're available) to fix any possible errors that might be present.
Here is what the "good" drive looks like when i use sfdisk:
Quote:
sudo sfdisk -l /dev/sda Disk /dev/sda: 121601 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sda1 0+ 121600 121601- 976760001 83 Linux /dev/sda2 0 - 0 0 0 Empty
Something weird happened last night and my raid5 failed. I am trying to re activate it and see if my data is dead or what. When I run mdadm -Asv /dev/md0 I get
Code: mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: /dev/dm-1 has wrong uuid. mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: /dev/dm-0 has wrong uuid. mdadm: cannot open device /dev/sde2: Device or resource busy mdadm: /dev/sde2 has wrong uuid. mdadm: cannot open device /dev/sde1: Device or resource busy mdadm: /dev/sde1 has wrong uuid. mdadm: cannot open device /dev/sde: Device or resource busy mdadm: /dev/sde has wrong uuid. mdadm: cannot open device /dev/sdd: Device or resource busy mdadm: /dev/sdd has wrong uuid. mdadm: cannot open device /dev/sdc: Device or resource busy mdadm: /dev/sdc has wrong uuid. mdadm: cannot open device /dev/sdb: Device or resource busy mdadm: /dev/sdb has wrong uuid. mdadm: cannot open device /dev/sda: Device or resource busy mdadm: /dev/sda has wrong uuid.
I have SLES10-SP3 running on an Intel SR1600URHS board with 3 hot-swap SATA disks configured using mdadm as Raid1 with hot spare. If I pull one of the active disks, all file i/o will stop for about 2.5 minutes after which it will start again and the raid array will be rebuilt using the spare disk. Is there any way I can reduce this 2.5 minutes of inactivity? I've tried setting /sys/block/sdX/device/timeout and /sys/block/sdX/device/retries to 1 for all disks, but this hasn't made any difference. The output from messages is:
12:11:56: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen 12:11:56: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x1e data 0 12:11:56: res 40/00:03:00:00:20/00:00:00:00:00/b0 Emask 0x4 (timeout)
I am using mdadm 2.6.4 for managing RAIDs on Linux kernel 2.6.18. I've a query like whenever I tried to add a new disk to a running linear array(JBOD)i get a message "cannot add new disk to this array".
The exact steps are as follows: create a new array as: mdadm -C /dev/md0 -llinear -n2 /dev/sata/ /dev/sata2 It is getting added and i am able to see with -D command.
Now add a new disk sata3 as follows: mdadm --grow /dev/md0 --add /dev/sata3 I get the output as: md: sdb has invalid sb, not importing! md: md_import_device returned -22 mdadm: cannot add new disk to this array.
So my first doubt is whether mdadm 2.6.4 supports this features or not if it supports then do I need to change the driver?
I have a strange issue with my RAID5 array - it worked fine for a month, a couple days ago it didn't start on boot with mdadm reporting "Input/Output error" - I didn't panic restarted my computer, same error. Then opened a Disk utility and it reported State: Not running, partially assembled - don't know why, I've pressed Stop RAID Array and started it again, voila - it reported State: Running - I've checked components list and there was nothing wrong with it. So I run Check Array utility, waited almost 3 hours to finish it and it worked since than, till today's morning - I've started my computer, and here we go, same error.
See screenshots:
This is an initial state just after computer startup:
This is after I stop and start RAID5:
This is a components list:
I can see nothing wrong there yet not sure why mdadm fails on boot. I do not really like the windows solution I guess, when I check my array again, it will work fine again, but it then can fail in the same way without known reason.
When I try to run the Online Updates to the system through Yast I get the following errors for each package:
[QUOTE] Subprocess failed. Error: RPM failed: warning: /var/cache/zypp/packages/Packman Repository/Multimedia/i586/libaudcore1-2.4.4-1.pm.1.1.i586.rpm: Header V3 DSA/SHA1 Signature, key ID 9a795806: NOKEY error: db3 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Requireversion index error: db3 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found [Code]...
I have a previously defined RAID5 (4 disks). This worked in Ubuntu 8.04.. I recently moved to CentOS5.. and I cannot seen to get the drive back online.cat /proc/mdstat shows, no raid levels (personalities).. and no drives listed.mdadm --detail -scan returns nothing.mdadm --QE returned a UUID string.. and the ARRAY output.I can mdadm --examine all the members of the original array.I am not versed in mdadm enough to really understand what I can run and should not run that would erase the data on the drives. Please assist.. I will try to post exact output of commands.. but the system is kind of unreachable and being rebuilt... i just want to ensure my data on the array is not lost
The system always boot up in Graphic Mode. After installation of Web Server, I want to disable Graphic Mode and change it to boot to Text Mode to save memory. Is there a way to disable graphic mode?
I've been having troubles with software raid. In particular, the raid array becomes un "assembleable" after reboots. The config is CentOS 5, 4 sata discs (one by 160 containing OS, no raid and 3 2TB disks configured as a RAID 5 array - no spare drive). These drives were configured in anaconda and all seemed to go well (the drive and its lvm partitions worked and it finished rebuilding overnight). A couple of reboots later the drives cannot be assembled anymore and the machine won't boot. The error message says:
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.
Of course there are 3 drives and no spares in the array as configured. Manually starting the array with mdadm --assemble --scan gives the same message as does assembling the drive by specifying the individual parts. /proc/mdstat does recognize the 3 drives and when I look at the partition tables in fdisk, they show as being software raid. What could be wrong or steps to diagnose? I tried configuring the raid drives manually before going the anaconda route. Also, does anyone know I can edit the /etc/fstab file to disable them so the machine will at least boot. The (Repair filesystem) shell has the / drive mounted r/o.
Trying to complete a RAID 1 mirror on a running system and have run into a wall at the last part. I can't add the active physical disk to the mirror. This is on a Centos 5.6 x86_64 system. Anybody know where to go from here? I've tried adding the nodmraid line to the kernel boot line with no luck. Tried removing the logical volumes from LVM, but it won't let me. Not a Linux newbie, but haven't set up a RAID in a long time.
[root@blackbox-0-2-e3-23-72-c5 ~]# mdadm /dev/md1 --add /dev/sda2 mdadm: Cannot open /dev/sda2: Device or resource busy
i installed freeradius 2.1.3 on fedora 10 and want to use it with ieee802.1x using peap. when i run command to start radius service in debug mode the following output come
I've faced the problem with server freeze on heavy write.
System
CentOS 5.5 x64_86 with latest updates and kernel (2.6.18-194.32.1). Also tried 2.6.18-194.26.1 and 2.6.37-2 from ELRepo with the same results. CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Memory: 3 x 2Gb DDR3. HDDs: 2 x Western Digital WDC WD1002FBYS-02A6B0
120GB Sata HDD - Primary OS drive 3 x 1.0TB Sata HDD - Raid 5 array
This is on a C2D MSI P35 Platinum board. Anyway, did a fresh install of F12 on the 120GB, which I had problems with - Anaconda refused to see the drive. Fedora Live could see it fine, and it was listed as an 'nvidia_raid_member' - no idea why, but I completely erased the disc under the Live CD and proceeded to install F12.
Once F12 was installed, I loaded up mdadm to re-activate my Raid 5 array, using 'sudo mdadm --assemble --uuidthe uuid) - and it started with only 2 of the 3 drives. My /dev/sdb drive did not activate into the array, due to what mdadm said was a mismatched UUID. Ok, so I erased /dev/sdb, intending to rebuild the array. Erased /dev/sdb, and then attempted 'sudo mdadm --add /dev/md0 /dev/sdb' and I get this error: "mdadm: Cannot add disks to a 'member' array, perform this operation on the parent container" - I can find NO information on this error message.
[Code].....
I don't believe the hard drives are connected in the exact same order they were in before - I disconnected everything in the system and blew it out (it was pretty dusty)
I have had a few X crashes and started to suspect compiz as they usually happened when I was resizing a window or the window was wobbling.
Here is the Xsession-errors log (it's a hidden file in your home folder). It mentions:
Code: WARNING: Application calling GLX 1.3 function "glXCreatePixmap" when GLX 1.3 is not supported! This is an application bug! Starting gtk-window-decorator Unable to find a synaptics device.
I just reformatted my PC and installed Natty 11.04; my work files are all backed up in a separate /home directory that resides on another partition separately. That partition is not formated nor erased. I tried installing git through apt-get:
[code]...
I even followed the troubleshooting procedures for PackageManager here but to the same man-db postprocessing error. Any other logs I should look at?
I ran an update recently. It was about a 300Mb download and with it was the current kernel and others. I restarted to find an error. Compiz is no longer starting and there is this red minus icon on my bar saying there is a package error. It said to run "sudo dpkg --configure -a" but when I ran that I got: dpkg: failed to open package info file `/var/lib/dpkg/status' for reading: No such file or directory"
I try to set up a 5 nodes cluster and a shared Coraid Storage with conga but it fails with "Shared Storage Support" checked.The message is:'A problem occurred when installing packages: Packages of set "Clustered Storage" are not present in available repository' and it is shown under every node on the next sceen after I submit.The pc where conga runs is on the same subnet (192.168.xxx.xxx) and it has the same /etc/hosts of the other nodes.In that pc runs a proxy too and the nodes go out through it (that pc has 2 NICs)Every node (2.6.18-128.1.14.el5-xen-x86_64) is patched whith the last yum update (this morning) the same is for the pc (2.6.18-128.1.14.el5).Every node has 4 NICs , 2 NICc towards the storage the others in bonding towards the WAN.Every node is exactly alike, they have been installed with the ks.cfg generated from the first node and they all have the support for Clustering, Virtualization, and Clustered Storage.