Server :: RAID5 Refuses To Start After Yanking Drive From SCSI Bus
Mar 23, 2010
I am setting up a new server and am in the midst of testing RAID. This is an Ubuntu 9.10 server. RAID1 (/dev/md1) is spread across 12 one-terabyte SCSI disks (/dev/sdi through /dev/sdt). It has four spares configured, each of which are also one-terabyte SCSI drives (/dev/sdu through /dev/sdx). I have been following the instructions on the Linux RAID Wiki ([URL]....
I have already tested the RAID successfully by using mdadm to set a drive faulty. Automatic failover to spare and reconstruction worked like a champ. I am now testing "Force fail by hardware". Specifically, I am following the advice, "Take the system down, unplug the disk, and boot it up again." Well, I did that, and the RAID fails to start. It outright refuses to start. It doesn't seem to notice that a drive is missing. Notably, all the drive letters shift up to fill in the space left by removing a drive. The test I did was to:
[code]....
Is removing a disk from the bus a reasonable test in the first place? Meaning, is this likely to happen in a production environment by other means than a human coming by and yanking out the drive? Meaning, is there a hardware failure that would replicate this event? Because, if so, then I don't know how to recover from it.
View 1 Replies
ADVERTISEMENT
May 24, 2010
I have one machine (out of a couple dozen) that continues to refuse to allow "vncserver :1" to start. It is perfectly happy with :2 - :9 but tells me :1 is already running - yet ps tells me that there is no running
Xvnc instance.
What have I done here and how do I get that session back? I've cleared the /tmp/.X11-unix entries but that did no good. Could I have checked something in Yast that is launching an invisible session?
View 4 Replies
View Related
Nov 12, 2009
I'm trying to install two instances of OCS Inventory on the same machine using virtual hosts in apache. The apache configuration for it is in it's own file and it works fine without any virtual host. If I add it apache refuses to start without giving any error message. I narrowed it down to some lines loading perl modules, and if I comment them away apache will start again. Some of the modules work, but some of the ones specific to OCS Inventory will not work. I can't understand what the difference is between loading it in a virtual host or not, it doesn't make any sense to me!
View 2 Replies
View Related
Feb 15, 2010
i m facing same error in most of the HCL servers. the problem is that it throws error while booting and sometimes not throws error. the error is :-
Feb 13 13:17:25 fe13s kernel: Adapter 0: Bus A: The SCSI controller was reset due to SCSI BUS noise or an invalid signal. Check cables, termination, termpower, LVDS operation, etc.
Feb 13 13:17:30 fe13s kernel: Adapter 0: Bus B: The SCSI controller successfully recovered from a SCSI BUS issue. The issue may still be present on the BUS. Check cables, termination, termpower, LVDS operation, etc
Feb 13 13:29:15 fe13s kernel: Adapter 0: Bus B: The SCSI controller successfully recovered from a SCSI BUS issue. The issue may still be present on the BUS. Check cables, termination, termpower, LVDS operation, etc
code....
View 2 Replies
View Related
Apr 30, 2010
I am running fedora 8 on a Dell 2950 with a qlogic fibre channel card.is attached to a Dell/clarion AX100.I had a drive assigned to this server which had been working for approximately a year now.Recently during a power failure things turned ugly. Rebooting didn't fix things.I unassigned/reassigned the drive, rebooted, still no good.Funny thing when I unassigned the drive and reboot the system.still see /dev/sdb but no partitions and fdisk -l does not display /dev/sdb.I even did "echo 1 > /sys/block/sdb/device/delete"
and the device no longer showed in /dev, but after a reboot it is back.I pulled and reseated the fibre card, same
View 1 Replies
View Related
Apr 14, 2009
I just moved my / from sda1 to an ide drive, hde1. i dont see how this could have caused any of these issues, but it did.
First my network card failed to start. i added a line in my rc.local file (where i put all of my additional programs, etc i want to start):/etc/rc.d/rc.inet1
The above now starts my network card with my static ip configured. dhcpcd also worked but i wanted this static.
Now samba will not start. i have the following line in my rc.local: /etc/rc.d/init.d/samba start
This used to work just fine. at first i thought that samba may be trying to start before my network card gets an ip, but the line is *after* the network startup line. just to make sure, i made an additional script called startsamba which contained a sleep 60 followed by samba start, to delay the startup of samba even further.
The message samba reports is very vague, something like failed - core dumped. most of the core dump log is garbage characters, but here is the beginning which seems like it might contain some info:
Code:
ERROR: Can't log to stdout (-S) unless daemon is in foreground (-F) or interactive (-i) after the system starts, i can drop to a console and type "/etc/rc.d/init.d/samba start" and the service starts just fine. i've also tried starting samba manually with "smbd -d" which also produces the core dump when started from rc.local, but not when started from a console after startup.
View 8 Replies
View Related
Jan 9, 2010
I have no drive failures but just need to recreate a raid5 set as the next free MD disk number. Originally I built a temp OS of debian on a single drive and had 4x2TB drives in a raid5 software array (MD0) this worked fine and allowed me to move all data to it, and remove our old fileserver. I have now pulled out the 4 x 2TB Raid 5 drives and created a new OS on two new 80GB drives, partioned as follows,
MD0 is now 250mb Raid1 as /boot
MD1 is 4GB Raid1 Swap
MD2 is 76GB Raid1 as /
If I turn off and push back in the 4x2TB drives I cannot see a MD3. I presume I would need to create a MD3 from these 4 drives but I dont want to mess things up as its live data. So im here asking for help, or a bit of hand holding to get it done right.
PS - Its a Debian Lenny 5.0.3 Raid1 fresh install replacing a Debian Lenny 5.0.3 on a single disk.
View 2 Replies
View Related
Nov 3, 2009
Proftpd has been running OK until recently, when I tried to connect using my laptop. All I get is the proftpd-socket file under /var/run/proftpd/ I cant' find any log files with error messages in them. I have checked /var/log/messages - nothing in there either. I have rebooted the machine after re-installing proftpd. Still it wont start Is there any way to find out why proftpd has decided not to start any more?
View 4 Replies
View Related
Oct 17, 2010
If I try to start gparted, sometimes it works fine, other times, it throws this error:
Code:
Could not initialise connection to hald.
Normally this means the HAL daemon (hald) is not running or not ready
I have tried manually starting hal daemon,but nothing happens.
View 2 Replies
View Related
Sep 8, 2010
All of a sudden the Knetworkmanager icon disappeared from my panel and when I try to start it from a console I get the following output.
ion@linux-4cfs:~> knetworkmanager
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
[code]...
View 8 Replies
View Related
Jul 5, 2010
After upgrading to v10.60, Opera will not run from the KDE menu. A entry appears in the taskbar and then disappears without opening any window. The entry for Opera in KDE menu has this command:
Code:
/usr/bin/opera %u
If I open a terminal and type "opera" or navigate to /usr/bin and click on the opera script there, it starts up just fine. Seems like it should be simple to fix, but I have tried various entries in the KDE menu ("opera", "opera %u", "/usr/bin/opera") and this same behavior occurs with any of them.
View 4 Replies
View Related
Jul 27, 2010
after a failed upgrade from 9.10 to 10.04 I had to format my computer and do a clean install of 10.04, and now my mdadm raid5 array wont start.my array is called "The Library", and i believe the space between "The" and "Library" is causing the command disk utility uses to start the array to fail.The exact error isAn error occurred while performing an operation on "The Library" (RAID-5 Array): The operation failed
Error assembling array: mdadm exited with exit code 1: mdadm: unrecognised word on ARRAY line: Library
mdadm: unrecognised word on ARRAY line: Library
[code]....
View 1 Replies
View Related
Apr 14, 2010
Suddenly I noticed that all my file system had gone into read-only mode. My first thought was that the Sata data cable had got loose for one of the drives, but that wasn't it. All cables were connected correctly. So I booted up again, but I only came to a rescue mode terminal.
I have four software MD raid volumes:
Code:
Running mdadm -D on the volumes told me that the sdc drive had been kicked out from both md0 and md1. However, md3 had kicked out two drives, so I couldn't get any information from mdadm -D on that. For md0 and md1 I could just add the kicked-out partitions back into the volume, but for md3 I don't even know which partitions got kicked out...
Here are some outputs:
Before I rebooted the first time I saved the 200 last rows of dmesg to a memory stick. Here they are:
Code:
Trying to restart the md3 volume in the rescue mode terminal:
Commands:
Code:
Output:
Code:
The "Array State" row seems interesting. I guess that AAAA means all four drives are OK. But then why does the array state differ between the members?
Does anyone know how to figure out which two members that got kicked out? And how do I get them back in (assuming that they're OK)?
View 3 Replies
View Related
Apr 11, 2011
In my understanding, the way /proc/scsi/scsi gets populated, /proc/paritions also gets populated in the same fashion. i.e. the description for first entry of /proc/scsi/scsi can be seen in the first entry of /proc/partitions and same for rest.
So, With this assumption, in my project, I used to relate first entry of /proc/scsi/scsi with first entry of /proc/partitions to get its total size and same for all entries.
But, I observed some differences in following scenario, where
1) The first 4 entries in /proc/scsi/scsi are SAN luns attached to my system and for which the actual device names in /dev/ are sda,sdb,sdc and sdd.
2) The last 4 entries are the internal HDDs on same system. In /dev/, their respective device names are sde,sdf,sdg & sdh.
(Output attached at end of the thread)
But in /proc/partitions, the device order is different.
You can see their respective sizes in /proc/partition output as well.
So, my question is, in this particular scenario, I can't relate the first entry of /proc/scsi/scsi with first entry of /proc/partition. i.e. scsi0:00:00:00 is not /dev/sde, because it is actually /dev/sda.
It seems that my assumption is wrong in this scenario.
Is there any way or mechanism to figure out actual device name for an entry in /proc/scsi/scsi in /dev/ directory?
How can my application should relate /proc/scsi/scsi entries with their respective device names and sizes?
View 2 Replies
View Related
May 4, 2010
When I enter "cat /proc/scsi/scsi" I'm returned with "cat: /proc/scsi/scsi: No such file or directory". I've tried this on two different installs on two different machines.
View 6 Replies
View Related
May 29, 2010
I try to use a usb pen drive. The usb pen drive show up under computer but not like a drive in geparted. When I look in the log file I find the drive
Code:
May 29 10:11:46 CQ60 kernel: [ 112.942602] scsi 6:0:0:0: Direct-Access
USBest USB2FlashStorage 0.00 PQ: 0 ANSI: 2
I do not know what this mean but it looks like the drive show up like a scsi and not usb or? I need a clue to get it work like a normal usb device.
View 4 Replies
View Related
Feb 21, 2009
I just installed centos 5 on a hp dl380 server and it has 2 72.8 scsi gig hard drives. The problem I am having is that only one hard drive is being recognized and it is not being recognized as a scsi. This is what I get from fdisk -l
Disk /dev/cciss/c0d0: 72.8 GB, 72833679360 bytes
255 heads, 63 sectors/track, 8854 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/cciss/c0d0p1 * 1 13 104391 83 Linux
/dev/cciss/c0d0p2 14 8854 71015332+ 8e Linux LVM
as you can see, the system doesn't even see the second hard drive. How do I get both hard drives to be seen and how do I get them to be recognized as scsi?
View 2 Replies
View Related
Jul 7, 2010
OO write crashes today and now refuses to start. strace gives:
Code:
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb772e000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb772e8d0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xdef000, 8192, PROT_READ) = 0
[Code]...
View 5 Replies
View Related
Apr 23, 2010
I am having a RAID5 with a spare(total 4 disk).then the steps which lead me to a problem:
1. i was doing I/O on the array.
2. i pulled out a drive manually. So the spare drive took care of the failed one and started rebuilding. then
3. in the, mean time i pulled out the power plug of my NAS box.
4. After power up i saw my array was not active(by -D command option of mdadm). then
5. i executed: mdadm --assemble --scan /dev/md0 it gave me
I checked into the linux source and found that bd_claim is a function inside fs/block_dev.c and it failing due to which lock_rdev function (calling bd_claim in md.c) is failing and we are not able to start the array.I don't know why my RAID is not live after power on.
Plese help atleast can i save my data?
View 9 Replies
View Related
Jun 23, 2011
Some of our workstations have LTO's attached and they seem to drop off every now and again, the only thing which picks them up again (besides a reboot) is the famous rescan-scsi-bus script from here
The thing is that I'd like non-root users to be able to run this script, which in turn needs root to /proc/scsi/scsi
View 2 Replies
View Related
May 17, 2011
I need to read some data from a hdd that belongs to an old SCO Unix system. The SCSI card is PCI so I unplugged it from the old SCO unix box and stuck it in a new computer and booted using a Fedora 14 USB pen drive. The SCSI card was recognized and so was the hdd but it was not mounted. I then went into Applications, Disk Utility, and found the hdd. Under 'Edit Partition' the type was blank. I was tempted to set it to 'Extended' but was not sure whether that could damage the data on the disk. Does anyone know whether I would be able to read this Hdd by setting the type to 'Extended'?
View 5 Replies
View Related
Oct 18, 2010
I have just purchased (AT ENORMOUS EXPENSE) a used 306637-004 36 GB hard drive. It has come, by default, with no jumpers and my scsi controller recognizes it as SCSI device 0.Very inconvenient as it clashes with my existing IBM 9GB drive I am loth to play about with.Does any kind person out there know how to jumper this drive to be SCSI ID 1 or 2 (say).
View 1 Replies
View Related
Jun 15, 2011
I'm a bit at a loss on this one. I couldn't get a drive from a former RAID5 array to format. I did a dd to write zero's to the drive and attempted to fsck only to be stopped every time with the error: Couldn't find ext2 superblock, trying backup blocks.. fsck.ext3: Bad magic number in super-block while trying to open /dev/sda1
Smartctl shows no problems with the drive (a Seagate 750GB), but I haven't removed it and thrown it in a windows machine to do seagates proprietary drive diagnostics yet. Running Centos5.6 .I've never had this problem before. The drive is not mounted and the old md device has been removed as far as I can tell. It could still be attempting to assemble the RAID5 with the 1 drive, but I didn't see it attempt to do so.
[Code]...
View 3 Replies
View Related
Jan 22, 2010
I did an installation of SUSE 11.2 on a new SCSI hard drive. Keeping the old hard drive separate. I remembered there was some info on the old hard drive I wanted.
I added this to the system and mounted a partition. I then copied the data over. Then I umounted the partition rebooted the machine and removed the hard drive. However the machine will now not boot without this hard drive even though its not mounted. Not sure what the error message im given means I think it could be trying to fchk it.Do I need to do something more like remove /dev/sdd ?
View 3 Replies
View Related
Apr 16, 2011
How to get compiz to auto-start. What seems to be the generally accepted method (from what a few google searches and the debian wiki tell me) of using gconf-editor and changing the window manager from 'gnome-wm' to 'compiz' in desktop > gnome > session > required_components doesn't change anything. The only method I found that did not involve using a terminal and running 'compiz --replace' every time I boot the computer was to add compiz and fusion-icon to the gnome startup apps, but this causes unwanted flickering (it starts metacity and then replaces it with compiz, ie it's simply automating what I would do with the terminal). Autostarting the fusion-icon alone does not start compiz, although it allows me to start it from it's menu if I right click the icon. Note that I sometimes use fluxbox as well, so starting it on boot isn't really an option either.
Perhaps this can be useful :
compiz:
Installed: 0.8.4-4
Candidate: 0.8.4-4
gnome-session:
Installed: 2.30.2-3
Candidate: 2.30.2-3
View 5 Replies
View Related
Jan 8, 2011
Ubuntu Linux 10.04 LTS -Partition dev/sda2 is formatted as FAT32 and mounted as /d. Boot found a problem with it; the original and backup copies of the boot sector don't match.dosfsck refuses to fix it. I use 'dos fsck /dev/sda2" and it offers to copy one to the other, but whatever I pick it refuses to write to the drive. I use "dosfsck -a /dev/sda2" and again it refuses to write to the drive. How can I fix this Must I reformat the partition?
View 1 Replies
View Related
Dec 7, 2010
As the title says, I have a failed RAID5 hard drive. What's the easiest way I can go by replacing it? I've seen many ways to do this, but I would like to know what other people are saying about this, and see how you would do it.
P.S. This is the one I found. [URL]
View 3 Replies
View Related
Dec 4, 2010
just bought an adaptec 39160 PCI scsi controller and 34Gb drive and installed it on my computer.When i Booted it up a utility thing ran which said 'detecting array' and it found the drive and its size etc. it didn't go any further to allow my system to boot, so I rebooted and pressed CTRL + A to enter the adaptec scsi config. I changed something in there to make the disk non bootable (i Think) and rebooted the system.
It now hangs at :
'Detecting Array .....'
and does not show the drive details or the CTRL + A option to enter the utilities. So the only option I have had is to remove the adaptec controller ( and the system boots fine) the mother board is an Abit an8 ultra - probably about 4 years old and cpu is amd 64 I guess i have broken it and my remedy would be to reset the default options on the adaptec controller, but you can probably guess I have v. limited knowledge and have not used scsi before.
View 1 Replies
View Related
Aug 12, 2010
I have successfully setup a FOG server to image my Windows clients, so I have tftp, pxe and anything else related to booting to a pxe server setup and rocking. What I'm trying to do now, is use the CentOS net install files to setup CentOS on an old server with no USB boot option, and a broken scsi cdrom drive (it's a Dell PowerEdge 2400, with a single PIII 733 and 1.25GB ram).
Using the FOG Projects gparted wiki entry (adding gparted to the pxe boot menu) I was actually successfully able to pull the net install files over to the PE, and install CentOS 5.5 via local ftp server. At first it kept erroring out (I kept picking and choosing individual packages from the package groups), so I thought it may be an issue with the GUI install (the python script kept spitting back errors forcing a reboot). In any case, I finally got through the GUI install, but now I need / want to know how to force a text mode install.
[Code]....
the bolded "append" line is where I thought I could force the text mode install script, but that didn't work. The vmlinuz and initrd.img files were both pulled form the net install iso, NOT the livecd. Would that have made a difference? If not, what / where / how should I force the text mode install script?
View 4 Replies
View Related
Jan 13, 2011
I have a little nice Ubuntu server with 6x 1TB drives assebmbled into a RAID5 array. Recently SATA cable of one of the drives failed. So I ordered a new cable and ran the server in degraded mode for a few days. Like this:
Code:
/dev/md0:
Version : 00.90
Creation Time : Sat Sep 19 10:39:11 2009
Raid Level : raid5
Array Size : 4883812480 (4657.57 GiB 5001.02 GB)
code....
I'd like the 6th drive to be active, not spare, like before. Should I just wait for rebuild to be finished (it can easily take over 1 day)? Or should I add it somehow differently to be active immediately?
I'm not sure, but I think as I simulated failures unplugging one of the disk, after plugging it in again, the "failed" drive was active again and rebuilding was started as well of course. But it was 2 years ago, so...
The array works just fine for now - I can access files, etc. But I suspect, that in this state if another cable or drive fails, it won't survive anymore. Even after rebuilding is finished, but the 6th drive stays is still marked as "spare". Right?
View 4 Replies
View Related