General :: Nas - Most Effective Backup Software -> When Dealing With Large Numbers Of Files?
Jul 18, 2010
I have two NASes. I work off of one, and the other is used as a backup. As I have it set up now, it's slow. Running a backup takes a week. Even for 7 TB, with 1,979,407 files, this seems a bit outlandish,particularly as both systems are RAID-5 and the network is all gigabit. I've been digging about in the rsync man pages, and I really don't understand what differentiates the various topologies.Right now, all the processing is being done on the backup NAS, which has the main volume from the main NAS mounted locally over SMB. I suspect that the SMB overhead is killing me, particularly when dealing with lots of files.
I think what I need is to set up rsync on the main nas as a daemon, and then run a local rsync client to connect to it, which would hopefully allow me to completely avoid the whole SMB-in-the-middle affair, but aside from mentioning that it's there, I can find very little information on why one would want to use the daemon mode for rsync.
Here's my current rsync command line: rsync -r -progress --delete /cifs/Thecus/ /mnt/Storage/input? Is there a better way/tool to do this? Edit:Ok, to address the additional questions: The "Main" NAS is a Thecus N7700. I have additional modules installed that give me SSH, and it has rsync, but it's not in the $PATH, and I havn't figured out how to edit the local $PATH in a way that persists between reboots. The "Backup" NAS is a DIY affair, built around a 1.6Ghz Via Mobo with a Adaptec Hardware RAID card. It's running CentOS 5 with a full desktop environment. It's the hardware I'm running rsync from. (Gigabit is through a additional PCI card).
Further Edit: Ok, got rsync over SSH working (thanks, lajuette!).I had to do a bit of tweaking on my command line, I'm running rsync with the args:rsync -rum --inplace --progress --delete --rsync-path=/opt/bin/rsync sys@10.1.1.10:/raid/data/Storage /mnt/Storage (Note: I'm specifically not using -a, because I want to change the ownership to the local account, to not freak-out SELinux)
I have a large collection of pictures (12GB and growing) - way too big to fit on one CD or DVD.I want to back them up to CDs or DVD's in standard (I think it's iso 9660) format that Windows can read.I know how to do this the hard way - by manually selecting a pile of pictures that will fit on one disc, burning it and then going on to the next pile.There must be a way to tell k3b or a similar program to do this for me - to automatically make a backup of the whole thing using as many discs as necessary.Can anyone tell me how to do this?
I don't want to use tar or another archive/compression scheme because I want the pictures accessible to someone with minimal technical expertise who doesn't even know how to spell "Linux".
am writing a small search program for my class. I have decided to use indexing for my program. Ive researched online about indexing and how search engines do it. If im gonno do that I need to create inverted files to associate files to numbers ( numbers being the index of my paths ) . Now I was wondering what would be the best way to create an inverted file ? I was going to create sql tables using mysql api in C but then again there is no array data type or vectors to store few numbers in a single column in mysql and it is not advised to use Enum or SET
I work for a school consulting company.We helped a school deploy about 1500 computers.The computers have windows XP but we have been using G4L for the restore partition on the drives.So far the software works great. We did however run into a problem in that many of the computers we deployed are missing the restore partition. The reason they are missing is long and convoluted and not really that important. What I have been charged to do is try and fix the restore partition problem. One solution that I had, which im not even sure if it will work, was to backup the recovery file, that g4l created, to DVD and write a basic script to recreate the partition and then copy the file over. This process would need to be as automated as possible since this disc will be inserted by the end user(the students). The backup file that g4l created is 5.9GB so it wont fit on just one disc and Dual layer discs are too expensive to use for this project, so the file will either need to be compressed again (not sure if that's a good idea or not) or split across two DVD's.
I have searched the forums here and I was not able to find anything to fix this problem. I was able to find some info on splitting files across two discs but im not sure how to use that to fix my problem.
I'm just setting up a partition on a seperate HDD in my system. I plan to use the partition to backup the important files on my main HDD (to guard against HD crash).
The question I have is about where would be the typical location to auto mount this partition? Which would it be normal to go for:
I am trying find files in a directory that contain numbers. I have tried ls /etc *[0-9]* but that doesn't work. If I cd to /etc and run ls *[0-9]* it almost works but it also includes results from within files. My last thought was to try: find /etc [0-9] -type f but this does not work either. My second problem is that I am trying to get list of files in a directory that were changed less than 10 hours ago, using grep, while leaving out directories. I am completely stuck with the second problem.
I need to download some very large files (circa 75 GB) from a remote server via SFTP. I've been using SFTP via the command line on my Linux netbook. Around halfway through, the transfer stops and says "stalled." Can anybody recommend a reliable way to download these files?
how I can randomly write / create a 1 GB file in bash to test disk / network i/o? I was told I could use the 'dd' command but I don't know if there are some better ways and or what the 'dd' command looks like.
What can you do when your linux system "can't find" dynamically linked libraries that are indeed installed in their correct locations? Case in point, I'm trying to run a program called 'ucanvcam':
oliver@human ~/installed/ucanvcam-0.1.6/bin $ ./ucanvcam ./ucanvcam: error while loading shared libraries: libgd.so.2: cannot open shared object file: No such file or directory oliver@human ~/installed/ucanvcam-0.1.6/bin $ locate libgd.so.2 /usr/lib64/libgd.so.2.0.0 /usr/lib64/libgd.so.2
oliver@human ~/installed/ucanvcam-0.1.6/bin $ ldd ./ucanvcam linux-gate.so.1 => (0xf7706000) [...] libgd.so.2 => not found [...] librt.so.1 => /lib32/librt.so.1 (0xf6b1e000)
How can I tell it to look for libgd.so.2 in /usr/lib64? And more importantly, why isn't it looking there, and where is it looking?
I've a directory containing around 2.8 lacs of files. I want to move them to another directory.If I use cp or mv then I get an error 'argument list too long'. If I write a script like
for file in ls *; do cp {source} to {destination} done
then because of ls command , its performance degrades.How can I do this?
I want to transfer an arbitrarily large file (say >20GB) between 2 servers. I have several considerations:
Must use port 22 (ssh) because of firewall restrictions Cannot tax the CPU (production server) Memory efficiency Would prefer a checksum check but that could be done manually Time is not of the essence
Server A and Server B are on the same private network (sharing a switch) and data security is not a concern, Server A and Server B are not on the same network and transfer will be via the public internet so data security is a concern, My first thought was using nice on an scp command with a non-CPU-intensive cypher (blowfish?). But I thought I'll refer to the SU community for recommendations.
I am facing problem in copying a large number of file 18 lakh (18,000,000) files from my personal hardisk to another hardisk each file is very small and size of folder is around 3.95 GB copying files using copy given by Windows is frustrating and I am not even able to compress file its giving me error that its not readable.And problem is I am not able to open this drive in Linux it showing me error there saying do diskchk in Windows and Windows disk check is also not able to repair this drive and goes into some mode unsolvable.Is there any way to open disk with error to open in Windows and if not any way I can copy data faster?ERROR: Disk labled EDU is corrupt go to windows and chkdsk /f there and reboot into window 2 times.
I understand that chroot is usually used to provide security, however, for my issue, security is a big don't care. I am very new to using chroot and don't fully understand how the chroot'd env works.
problem: Trying to use a vendor supplied cross compile environment. The environment runs as a chroot'd env and works just fine. I have a large number of additional modules that I wish to compile in the chroot'd environment. FYI, these modules are also (succesfully) compiled for other targets not using chroot'd env's. Copying the source files into the the chroot environment is not an option (don't have hours to wait for copies to finish and it would break the make system). Having them live in the environment is also not an option (the chroot build is a tiny part of the build process and we cannot revamp our entire source tree to accommodate it).
I am looking for a way to have the compiler in the chroot'd env have access to a path that is outside of the env and typically higher up in the same path that holds the chroot'd env. I have tried soft links (they don't work as expected). Hard links only work for single files and there are 10's of thousands of files that would need to be linked. I am not sure how I would go about exporting the additional files and then mounting the exported files in the chroot'd env (or if that would even work).
I am looking for a file sharing program to install on my dedicated server that will allow me to upload large MP3 files and allow my clients to download them. these files are recordings of counseling sessions for families who are seeking help for their children.
What I am looking for is similar to the system this company uses [URL].
we've been trying to become a bit more serious about backup. It seems the better way to do MySQL backup is to use the binlog. However, that binlog is huge! We seem to produce something like 10Gb per month. I'd like to copy the backup to somewhere off the server as I don't feel like there is much to be gained by just copying it to somewhere else on the server. I recently made a full backup which after compression amounted to 2.5Gb and took me 6.5 hours to copy to my own computer ... So that solution doesn't seem practical for the binlog backup.Should we rent another server somewhere? Is it possible to find a server like that really cheap? Or is there some other solution? What are other people's MySQL backup practices?
For my research I have some very large files that are basically millions of lines of ten columns of numbers. These files can be up to 5 GB in size. Recently I noticed that when I made a copy of one of my files, some exclamation points appeared in it where there should not be any: in front of random numbers throughout the file. Making another copy of the file would result in exclamation points in front of different numbers in different parts of the file. Doing this many times has given me up to four exclamation points in different parts of the file. Sometimes the file copies just fine without producing any extraneous exclamation points.Additionally, I have occasionally seen a "^K" where there should be a newline (the data that should have been on the next line was instead on the previous line with a ^K in front of it) in copies that I have made of my files. I don't know if this is related or not.
I currently work within an RTOS environment without an MMU and thus have access to the entire memory map of whatever application I'm working on. As is common in the embedded world, different parts of the memory map relate to different peripherals or different types of memory. For our next generation hardware, my company is looking at moving to an MMU-enabled processor and using Linux in some shape or form. Most of us in the dept are familiar with Linux, but we are not Linux gurus by any means. So how to explicitly indicate to Linux that we need certain portions of an application to be stored in NVRam and other portions of the application to NOT be based in NVRam has us confused. None of us have a clear understanding of how user memory is delved out by Linux and how we can influence Linux to use specific portions of the memory map at specific times.
For example in this new application, we expect to have 2 memory chips, both that are DDR3 interfaces. One is a standard DDR3 chip. The other is a non-volatile MRAM with a DDR3 interface so it can be accessed by a DDR3 controller and coexist with conventional DDR3 memory. But because the portion of the memory map that the MRAM will represent will be the only portion of non-volatile memory, we are unclear how we explicitly access MRAM addresses in an MMU-controlled environment. My hail-mary guess was that we would want to somehow tell Linux that we want the MRAM's memory space to be mounted as a RAM Drive and then we access that memory as though it is a file on a HD, except it is much higher speed since it will be at DDR3/MRAM speeds. Is there a better, more straight forward way to do this? Coming from an RTOS world, Linux is going to pose some serious challenges for us, but I think it will be the right move once we are all up to speed and are thinking Linux-centric.
I am studying for the LPIC-1 exam, and reading a book that they recommend: "Introduction to Linux: A Hands-on Guide", by Machtelt Garrels. There's one question on the 4th chapter (Processes), that I found confusing: Question: Based on process entries in /proc, owned by your UID, how would you work to find out which processes these actually represent?
What does he mean? If I run the command (considering that my username is sl33p): Code: $ps -u sl33p ...gives me the right answer?
The ps man page says: -u userlist Select by effective user ID (EUID) or name.
This selects the processes whose effective user name or ID is in userlist. The effective user ID describes the user whose file access permissions are used by the process (see geteuid(2)). Identical to U and --user.
I am using the diff command with the -r option, to compare a large number of files and files in subdirectories. My main interest is to find out which files have been changed, and not what the actual changes are, and since a lot of files has been changed, it would be a lot easier to view the file names only. Is there and option for diff that might do this, or does there exist a similar tool/command that could do the job?
I've discovered that Dolphin seems to lose random files when copying many large folders.
I first noticed this a few months ago when I tried to copy my music library from one folder to another on the same HDD. It consisted of around 600 folders and 6500 files. During the copy there were no errors but after the copy I found that some of the newly copied folders were missing files. I put it down to human error or a glitch.
Yesterday I tried to copy 13 folders containing rips of some of my DVDs. Each folder basically had one film of either 700MB or 1.4GB. Again no errors showed up during the copy but I found 3 of the newly copied folders were empty.
It's not so critical with music or films but I can't afford to lose work data like this.
Has anyone experienced or seen a similar problem with Dolphin? I'm going to have to do some more extensive testing but this is not good.
The first time I noticed the problem I was running KDE4.3.4 (I think) and now the latest was with KDE4.4.0.
Just a few words in the form of introduction.I have just purchased an older server to use on my home LAN.I understand that the use of Linux or Unix as an operating system will be effective in detering the gathering of viruses from the internet.I am totally unfamiliar with either system, but if this is the case, that's about to change.
What do the numbers in brackets mean? (I tried looking, but I don't know how to start to search for the answer to that without being too vague). I've noticed they're nearly always progressive (increasing). Do they just refer to the event number? And in my log file viewer, why is there a dmesg and a dmesg.0?
I have a situation where a directory has about 1.5 million files in it. On an hourly basis, I want to be able to find any files that have changed in the last hour, compress them, encrypt them and then copy them to both a local backup machine and an off site backup.
Is there any kind of utility or kernel module that creates some type of log of modified files? I know I can use find, but the search for -mtime in this directory takes quite a while and will not suffice for an hourly backup.
2. I run the application & it creates a list of all files (size & time-stamp) without actually storing them. Let's call this the "snapshot list".
3. I update some of the files on the laptop.
4. Now I run the application & it only copies the files which have changed on the laptop, that have different size/time-stamp from the snapshot list, onto some external media, such as a memory card. Of course, the files should be copied onto their proper location in the directory tree & not just pile up in one place.
Why is this useful? although the laptop has a 200GB HD I typically only update a small number of files, whose total size is maybe 10MB or so. If I could only backup those which have changed, I could do this with a tiny SD card instead of lugging around an external usb HD.