CentOS 5 Hardware :: Infiniband (mlx4) Works On 16-core System But (now) Fails On 32-core System
Jan 27, 2010
We have a small cluster of 20 HP systems, all running CentOS 5.3 in an NFS-root environment. Half are quad-socket, quad-core Xeon E7340 @ 2.40GHz (total 16 cores), the other half are 8-socket, quad-core Opteron 8354 (total 32 cores). All systems have a Mellanox Infiniband adapter ("Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0)")
With kernel 2.6.18-128.1.6.el5, infiniband works fine on all systems.
With the update to kernel 2.6.18-164.11.1.el5 (and both types of node running the same NFS-root image), the 16-core Xeons still work fine. Infiniband no longer works on the 32-core Opterons. Specifically, either the ib0 interface fails to appear, or it does appear but when configured with an IP address, doesn't actually work. In either case, loading the IB kernel modules takes a long time, but I haven't instrumented the load script yet to see which module, if any, is at fault. More errors listed below.
However, if I tweak the BIOS of the 32-core systems to reduce the per-socket core count to 2 (so effectively 8-socket, dual-core, down to a total of 16 available cores), Infiniband starts working again. Putting it back to 32-cores makes it fail. Booting the older kernel makes it work again. In summary: old kernel, IB works on all systems. Newer kernel, IB only works on 16-core systems.
Updating the IB firmware from 2.5.0 to 2.7.0 (latest available) doesn't help. I also did a full 'yum update' to make sure that libmlx4, openibd all other associated packages were up-to-date. Doesn't help either.
Some errors that appear on 32-core nodes:
ib_query_port failed (-16) for mlx4_0
ib_query_port failed (-16) for mlx4_0
mlx4_core 0000:04:00.0: SW2HW_MPT failed (-16)
mlx4_core 0000:04:00.0: SW2HW_MPT failed (-16)
I am trying to install a program on my Linux system (Fedora core 5) but it fails because there is no fort77 compiler. I know that I have working ifort on my system, but I need fort77. It looks like that the program that I am trying to install can also be compiled with g77, but again this one is also missing. how I can get these compliers and get them work on my system.
I have a desktop system (P55-USB3 + Core i7 + Ubuntu 10.10) that fails to suspend/resume from memory. So I'm trying to diagnose the problem. The first obstacle was easy enough --- when I put the system to sleep to memory, the computer comes back alive right away. A look at /var/log/kern.log revealed that one USB device (usb10) failed to suspend, and from there I was able to pin it down to the USB3 controller in the BIOS. Disabled that and this problem disappeared.
Now, I'm stuck with the second obstacle. The computer successfully goes into the suspend mode, but it hangs during resume. The monitor doesn't get any video signal, and it fails to respond to ping (netconsole doesn't work either.) After a forced reboot (that involves unplugging the power cable), /var/log/kern.log doesn't contain any interesting entries. All the pm_test modes from freezer to core succeed (I followed [URL] I've also tried pm_trace (https://wiki.ubuntu.com/DebuggingKernelSuspend) but again kern.log nor dmesg contains anything after the suspend. Either the write didn't survive the forced power off, or the resume is failing even before that. The motherboard doesn't have a serial port nor firewire, so getting kernel logs through them is not a possibility, either.
I am running CPU tests on a radio controller to determine max simultanious calls. A tool using top was developed so that we could get a good look at what exactly was happening on the process level, however we are mainly interested in one object running on the box.The box has a single core Celeron processsor running the Wind River Linux platform. The CPU usage from my object is frequently spiking over 100%. Doing some research online so far has led me to the fact that a multicore processor can do this however I have found no mention of a single core processor displaying this behavior.
I've tried searching the forums / google and haven't been able to come up with anything... in Debian-based distros there's an option that can be set to allow boot concurrency so that multiple processor cores can be used for the boot process. Windows also has an option similar to this to specify how many processor cores to use for boot.
Is there an option for multi-core booting in Fedora?
I recently assembled a new desktop computer, the following is the hardware details
CPU: intel core i7 2600 Motherboard chipset: intel Z68, GIGABYTE GA-Z68A-D3H-B3 Video card: Geforce GTX 560 Ti
I tried to install ubuntu 11.04 i386 and X86-64 version. Both of them are quite unstable.The system is easy to crash because of network problem and the internet connection is extremely slow. I also install window 7 in the same computer while the wired network is doing well.
In the past, I've deployed new 64 bit systems and I've worked on and developed on 64 bit systems. But until a week ago, my workstation was a 32 bit system. Now, it is a 64 bit quad core Phenom II system, and I suppose I need to start the migration to 64 bit Linux. I do not want to blow off my system and rebuild it. This particular system dates back a decade and through many many updates. There is some digital debris in it, but there is also a fair amount of customization that I have implemented either for my own purposes or for customers, and to lose that customization would represent a headache for me.
What I want to do is install a 64 bit system over top of the 32 bit system. It is my hope that doing this would install the necessary 64 bit libraries, while not impacting the existing 32 bit libraries (except with some possible symlink problems). I then, hopefully, could boot into a 64 bit kernel while still running 32 bit programs. Is this feasible? My backup system is comprehensive; I COULD just try it and back up if my system became hosed. But I'd rather not; I have a lot of work to do and I'd rather not learn by doing in this case.
I have a command line OCR program called OCR Shop XTR (Vividata corp) that I am using on a system with a 6-core AMD chip. I changed the bios so that the 6-cores were activated, but htop shows me that while the program is running, I am only getting activity on one core (the program maxes out the one core with consistent usage between 97% and 100%).
I have read that many programs are not written to take advantage of multiple core cpu's. However, I am just hoping that there is some way to get this program to take advantage of the extra cores. Does anyone know of a way to invoke programs from the command line which would spread the workload out among additional cores?
Here is the output of uname -a:Linux linux 126.96.36.199-1.2-desktop #1 SMP PREEMPT 2011-02-21 10:34:10 +0100 i686 athlon i386 GNU/LinuxAnd here is the output for one of the cores from cat /proc/cpuinfo:processor : 5
vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1100T Processor stepping : 0
I have now installed Wheezy on two different hard drives and in each case it seems only one CPU of my dual core CPU computer is recognized. System Monitor, Gkrellm and lscpu show just one when prior to the new install the old Wheezy showed both CPU's. I have put the hard drive into two other computers with dual core CPU's and all show just one CPU.
Interestingly System Profiler and Benchmark (hardinfo?) > Devices > Processors now show a large amount of processor infomation when with the old Wheezy I would only see both CPU's listed and nothing else.
I recently read in a forum that by default the Linux kernel only activates one of two cores in a dual core processor. Searching online gave one option to find out and that was the mpstat command. I therefore ran the command and got the following output.As the result says, it shows only 1 cpu. I was wondering what I could do to activate both cores in my machine, and whether doing so was going to cause me any problems.
Assume someone bind a particular process to a particular CPU core(In multi core machine) by using sched_setaffinity() like functions. Then how we can get that process running core id and CPU core utilisation of that process on that running CPU core(Pragmatically or by a Linux command)?.
I have created a virtual machine of a system running Fedora Core 4 and I need to upgrade it to Fedora Core 10. Based on what I have read, it iis possible so I started theupgrade process. I get an error message saying that /dev/hda6 (my root paritition does not exist) even though it does.
Does the installer need to read a label from /etc/fstab? I executed tune2fs -L / /dev/hda6 amd ,and added LABEL=/ for the corresponding entry for fstab. but the FEDORA CORE 10 is still giving the same problems for the installation process. Should I upgrade to an intermediate verson like Fedora Core 7 first?
I've a program that launches new processes, and wait for them to die before it exits. So, for example, my program is a process, and it launches 3 more processes, and when the 3 child processes end, it will exit.
As you see, at end of the example, the program used a total number of 4 processes.
1 - Now, I'm running this program in a CPU with 4 cores. This means that the program used each core for each process?
I wanted to try Fedora 12 Live/KDE on a newly-bought Fujitsu Esprimo P1500. Booting with no kernel options would just freeze the machine. After some random experimentation, I added the option "nolapic" and got a seemingly working machine. However, only seemingly, as it turned out that only one of the processor cores was working (the processor is a "Pentium(R) Dual-Core CPU E5300").
I want to generate core dump files from my program when it crashes. Its a pretty big process and has about 10-11 threads in it.I have followed the documentation to enable core dump by setting ulimit to unlimited etc. I quickly tried "A demo program creating a core dump" from the following webpage, which succeeds in Segfault and dumping a core file in the directory that I configured.However, I tried running my original program and caused it to crash. I did this by making calls to kill(), raise() or the same null pointer access as shown in the webpage above. In each case, my program crashed but did not generate a core dump file. Am I missing something?My program is in C++ and my environment is Redhat 9.0 (kernel 2.4.20)
Going through the "Why do I NOT get a core dump?" section on the same webpage as above, I can see two potential problems. One - there are issues with the suid/sgid (bullet # 6). I am not able to change any settings with suid because my system does not contain either /proc/sys/fs/suid_dumpable or /proc/sys/kernel/suid_dumpableTwo, my program has threads in it and the bullet # 8 is the problem.
I have tried installing Vmware-Server 1.0.10 on Fedora Core 12. After installing all packages it fails on building the vmmon module. In the internet i found many patches and vmware-any-any-updates...but nothing worked. My Kernel Vresion is 188.8.131.52-174.2.3.fc12.i686.PAE...
I am trying to update Flash-player and yast2-core as prompted by today's update icon.
I am now faced with a loop that goes like this: 1) click on icon 2) window pops up, "2 packages selected" 3) click on apply, subwindow pops up 4) click on continue, "waiting for authentication" 5) after minutes of waiting, click on cancel, nothing happens 6) click on icon killing subwindow 7) now back to 2)
Aborting the process, starting at 1) again, same mess
I will be relocating to a permanent residence sometime in the next year or two. I've recently begun thinking about the best way to implement a home-based network. It occurred to me that the most elegant solution might be the use of VM technology to eliminate as much hardware and wiring as possible.My thinking is this: Install a multi-core system and configure it to run several VMs, one each for a firewall, a caching proxy server, a mail server, a web server. Additionally, I would like to run 2-4 VMs as remote (RDP)workstations, using diskless workstations to boot the VMs over powerline ethernet.The latest powerline technology (available later this year) will allow multiple devices on a residential circuit operating at near gigabit speed, just like legacy wired networks.
In theory, the above would allow me to consolidate everything but the disklessworkstations on a single server and eliminate all wired (and wireless) connections except the broadband connection to the Internet and the cabling to the nearest power outlets. It appears technically possible, but I'm not sure about the various virtual connections among VMs. In theory, each VM should be able to communicate with the other as if it was on the same network via the server data bus, but what about setting up firewall zones? Any internal I/O bandwidth bottlenecks? Any other potential "gotchas", caveats, issues? (Other than the obvious requirement of having enough CPU and RAM).Any thoughts or observations welcome, especially if they are from real world experience in a VM environment. BTW--in case you're wondering why I'm posting here, it's because I run Debian on all my workstations/servers (running VirtualBox as a VM for Windows XP on one workstation).
I have a multi-threaded application using pthreads. On application crash or when signalling with 'kill -s 6' the core file created by the 2.6.18-128 kernel on CentOS 5.3 shows only one single thread. Core file saved with gcore in gdb shows all running threads properly so the problem is clearly in the kernel. I tested CentOS 5.2 (kernel 2.6.28-92) and it works correctly.