General :: Recursively Download An Entire Web Directory?
Feb 3, 2010i have a web directory that has many folders and many sub folders containing files.
i need to download everything using wget or bash.
i have a web directory that has many folders and many sub folders containing files.
i need to download everything using wget or bash.
I moved to Mac OS X recently and bumped into the "feature" of Mac where copying files from an external drive resets the file modification/update date/timestamp to the current date (which Windows does not), causing a disaster for my 10+ years of backup work files where date is important. So, before I learned how to avoid that (e.g. using the -p "preserve" flag in the "cp" copy command) I have in the meantime added to my new Mac hard drive many more files as well as updating existing old files.
I have a backup external hard drive with all my old data and proper modification dates. I have a Mac hard drive with reset modification file dates (a single or two particular days). The Mac hard drive has all the "true" and "current" file contents with files modified and added. I need to Copy all the original files from the external harddrive, preserving file metadata (really only modified date), but ONLY overriding the new internal Mac hard drive IF
The file contents (md5 or whatever) is the same or The file was updated after the day (which of course I can see on all files) on which the original disasterous cope was performed (implying the file is new or modified) Ensure the copy leaves all the new and modified files completely intact on the Mac internal hard drive. "No prompting/stopping of the copy of any kind (i.e., not verbose) is required but is o.k". "Recursive copy - obviously I would like to copy all* files folders and subfolders found in export".
How do you instruct wget to recursively crawl a website and only download certain types of images? I tried using this to crawl a site and only download Jpeg images:
wget --no-parent --wait=10 --limit-rate=100K --recursive --accept=jpg,jpeg --no-directories http://somedomain/images/page1.html
However, even though page1.html contains hundreds of links to subpages, which themselves have direct links to images, wget reports things like "Removing subpage13.html since it should be rejected", and never downloads any images, since none are directly linked to from the starting page.I'm assuming this is because my --accept is being used to both direct the crawl and filter content to download, whereas I want it used only to direct the download of content. How can I make wget crawl all links, but only download files with certain extensions like *.jpeg?
EDIT: Also, some pages are dynamic, and are generated via a CGI script (e.g. img.cgi?fo9s0f989wefw90e). Even if I add cgi to my accept list (e.g. --accept=jpg,jpeg,html,cgi) these still always get rejected. Is there a way around this?
I have a really deep directory tree on my Linux box. I would like to count all of the files in that path, including all of the subdirectories.
For instance, given this directory tree:
If I pass in /home, I would like for it to return 4 files. Or, bonus points if it returns 4 files, 2 directories. Basically, I want the equivalent of right-clicking a folder on Windows and selecting properties and seeing how many files/folders are contained in that folder.
How can I most easily do this? I have a solution involving a Python script I wrote, but why isn't this as easy as running ls | wc or similar?
I have a system where the permissions of many files are messed up. I have another system that has the same files, if I put that hard drive in, without simply overwriting the files, is there a way where I can recursively set the permissions of each file to that of this other directory?
View 1 Replies View RelatedHow can I get the last time any of the files in a directory or its subdirectories has changed?
Dir - changed 1/1/1
Sub Dir 1 - changed 2/1/1
Sub Dir 2 - changed 3/1/1
File 1 - changed 10/1/1
File 2 - change 5/1/1
The output for this for Dir should be 10/1/1 (File 1 was the last modified one). Getting the last file name to be modified is a bonus but isn't necessary.
How would the command for recursive search in LDAP look like when I'm searching for "cn" or "ou"?
View 3 Replies View RelatedI'm under linux . by default, other user can't read anything under my home directory. let's see my home directory is /home/superman , and I tried to use
chmod +r /home/superman
to let others can acess files under my home directory , but it does not work .
I am attempting to use the zip command with the '-x' option to exclude a folder e.g. 'zip public_html -x public_html/jquery/*'. However, parts of this folder are still being added to the archive. I made a shell script (saved as '' and ran as '.') to do the archiving so I could test adding nested wildards for multiple subfolder levels.
rm -f
zip -r public_html
-x public_html/jquery
Each new line I added here that has the nested wildcards made the archive file size a bit smaller. Adding more /*'s than this didn't affect the file size. Even after all this though, there were still a couple megabytes of files and folders from the 'jquery' directory that were added to the archive.
Here's some examples of files and folders that were created after I unzipped the archive:
public_html/jquery/js/tablesorter/addons/pager/icons [folder]
public_html/jquery/js/tablesorter/addons/pager/.svn/entries [file]
public_html/jquery/js/tablesorter/build/.svn/text-base/js.jar.svn-base [file]
Why is it that despite all the -x lines, the files and folders like these were still being added to the archive? How can I simply recursively exclude the entire public_html/jquery folder from the archive?
I'm able to use the following to remove the target directory and recursively all of its subdirectories and contents. find '/target/directory/' -type d -name '*' -print0 | xargs -0 rm -rf
However, I do not want the target directory to be removed. How can I remove just the files in the target, the subdirectories, and their contents?
I would like to overwrite files in a directory tree, recursively. The ones I would like to overwrite match the filename "x_alpha*.png" and have a size exactly 456 bytes. Is there any way to search for these recursively in a directory tree, and overwrite them with a reference file, for example "e:mydirgood.png"
I am using Windows 7, but I have UnxUtils, so I can use those too. What I am looking for is something like this, generated automatically:
copy /y e:mydirgood.png e:mydiracx_alpha0023.png
copy /y e:mydirgood.png e:mydirefgx_alpha0045.png
copy /y e:mydirgood.png e:mydirhx_alpha0248.png
So I have a bunch of directories:
which themselves all contain subdirectories:
dir1subdir1subdir2etc.and at the lowest level they contain all of these jpegs that I need. The problem is that I only need some of them. They're named like this:
I want to just grab the ones without the size suffix and copy them all to another set of folders, while preserving the directory structure. The numbering all starts at 1 for each low level subdirectory, so I think that the directory structure is the only way to not get them mixed up.
I know that cp has a recursive option -r but how do I just extract the ones without the underscore? And then how do I preserve the directory structure when I move them over?
i would like to find and backup all *.mp4 files from /Pictures and its sub-directories and move them to a single directory on a remote. I can find and move the files but I don't want the directory structure...just the files to be placed in the remote directory.
To find my files I use
rsync -r -a -v -e "ssh -l user" --delete --include '*/' --include '*.mp4' --exclude '*' /home/drew/Pictures/ remoteserver:/Users/drew/mp4
but this creates all the subdirectories
I also tried
find ~/Pictures -name "*.mp4" -exec rsync -r -a -v -e "ssh -l user" --delete {} remote:/Users/drew/mp4 ;
This works but takes forever
I have the following content on the source directory:
I want copy those files to a destination directory which, after copy, shall look like this:
How can I do this? It seems that "cp" lacks such an option
Installed Sidux over LennySidux didn't want to take my usual username, because a folder with that name existed in my home directory.So, I just mounted the home partition and changed the name of my home directory from shay to shay1.Don't know what that did or didn't do permission wise to the files in my old home directory, but I've got a few unowned files floating around my home directory anyway that have been dragged in from old harddrives and such.
View 3 Replies View RelatedI'm facing a little trouble with copying a .txt file(only) from a directory and subdirectory to another directory. -R command don't work I think if I want to do this, since I don't want to copy subdirectory.
View 6 Replies View Relatedis there a way to actually download the entire fedora repo? i've a friend with dialup who cannot install big things and i want to just downlaod the repository for him. i know ubuntu has some sort of way of doing this. i tried the following:
sudo yum --downloadonly --downladdir=~/fedora_repo install *.i?86
but this seems to, aside from all sorts of conflicting packages only get the things i dont have installed.
I need to invert the colors of a lot of images that are in different folders in the same directory, is there a way to use image magic or something to do this in only a few commands?
View 9 Replies View RelatedI am new in Linux and I need to extract alot of zipped files (different format (e.g tar.gz, tar.gz2)) which are in subdirs and I do not want to go to each subdir and extract each file because it will take alot of time. Is there away to extract all files that are existing in dirs and subdir with "for loop" or is there a script that can do the job automatically.
View 1 Replies View RelatedI did a clean install from Ubuntu 09.04 to 10.04 and restored my files from tar.
Everything worked fine until I tried my weekly rsync backup.
The permissions seemed to be causing problems, so I recursively changed all the permissions in my home directory:
~/Documents$ sudo chmod -R 644 /home/wolf/
[sudo] password for wolf:
chmod: cannot access '/home/wolf/.gvfs': Permission denied
So now all the directories and files have read permission for everyone:
~/Documents$ ls -A
ls: cannot open directory .: Permission denied
~/Documents$ sudo ls -lA
[sudo] password for wolf:
total 80
drw-r--r-- 2 wolf wolf 4096 2010-05-22 20:45 career
drw-r--r-- 23 wolf wolf 4096 2010-05-02 17:17 computer_languages
drw-r--r-- 2 wolf wolf 4096 2009-08-09 23:29 .ecryptfs
drw-r--r-- 21 wolf wolf 4096 2010-05-02 17:23 misc
-rw-r--r-- 1 wolf wolf 27298 2010-05-23 13:01 next.odt
drw-r--r-- 3 wolf wolf 4096 2010-05-23 15:46 PC_maintenance
drw-r--r-- 5 wolf wolf 4096 2010-05-08 01:43 software_projects
Now I can't even look at my own directory:
/home$ cd /home/
/home$ ls -lA
total 20
drwx------ 2 root root 16384 2010-05-07 01:01 lost+found
drw-r--r-- 42 wolf wolf 4096 2010-05-23 15:35 wolf
/home$ cd /home/wolf
bash: cd: /home/wolf: Permission denied
/home$ sudo cd /home/wolf
[sudo] password for wolf:
sudo: cd: command not found
how the bash script should look to copy huge directory with multiple sub-folders to a new place place while checking load and stopping for several seconds if load reached lets say 3 or 4 ? I only know the simple command cp -r /dir/allfiles /dir/newplace However would like to copy over 30 000 files which will cause me a high load.
View 1 Replies View RelatedI set up a samba server on my Linux box for the purpose of allowing everyone - and I mean everyone - on my LAN network to be able to put files to one folder... The only issue seems to be not having write permissions to the target folder.
Question, re-stated: How do I set the permissions for an entire directory to not require anyone to have a login? I have tried many things, such as "chown -aR /data/public", but I still cannot seem to find the magic words.
somewhere lurking is a file containing the default print resolution, which is not being overwritten by printer settings or cups management. I've asked on the cup forum and nothing successful.
So here's the question:
How can I configure grep to search recursively through all files in a directory, or if need be starting from root to find the pattern "2880" I've looked in the man page for grep and I can't see how to do it, is grep the right tool to use for this ?
I have a directory: /var/www/html/something/
and it's got tons of files and directories, some containing hidden files.
I want to move all the contents of something including hidden files up to the site root at: /var/www/html/
What is the proper command for this?
i want to download android developer guide from google site but is forbidden from my country i want to use wget to download entire android dev guides with freedom( proxy ) that i set in firefox these for open forbidden sites ( port:8080 ) i use this command to download entire site
`wget -U "Mozilla/5.0 (X11; U; Linux i686; nl; rv:1.7.3) Gecko/20040916" -r -l 2 -A jpg,jpeg -nc --limit-rate=20K -w 4 --random-wait http_proxy -S -o AndroidDevGuide`
This should be a simple thing to accomplish, but I can seem to figure it out. Essentially, I want to have a bash alias or function that will let me recursively grep the current directory. A while back I added this to my .bashrc:
alias rg="grep -r --exclude=*/.svn/* --exclude=*.swp"
This works fine, (and also ignores any svn and vim swp files), and I can call it like:
rg foo *
However, 99.999% of the time, I am only interested in searching in the current directory, so the "*" is a bit redundant. Also, I would say 5-10% of the time, I am typing faster than thinking and forget the "*", so grep just sits there trying to read from stdin. It's a pretty minor thing, but ideally I'd like to be able to just type:
rg foo
I've tried creating a function to handle this:
function rg(){
grep -r --exclude=*/.svn/* --exclude=*.swp $1 *
but it behaves exactly the same as the alias above. escaping the "*" with 's doesn't work, and neither does trying `pwd` (or even a hard-coded path) in its place.
I want to find maximum length file in a given directory. It should search recursivley. I want this to be done using ls and simple looping constructs.
View 6 Replies View RelatedI have a directory listing with many subdirectories having many files. I want to recursively search for the oldest 5 files starting from the base directory and not 5 from each subdirectory. I am writing a shell script which sorts them using ls -lRtur|egrep "txt|jpg" > /tmp/file1 Now from this /tmp/file1 file I want to sort the files same as what the ls -ltr command does that is oldest file time to newest file time first. How do I sort based on Linux time stamp? The files itself also have Linux timestamps embedded in them So I can sort based after extracting them as well if it is easier.
My /tmp/file1 has entries like below.
-rw-rw-r--. 1 usr1 usr1 705 2010-01-22 17:25 sample20100603173659.jpg
I want to get the 5 oldest files and then delete them.
i have crashed my system, i have a lot of simualtion software already configured and working on it, i'm thinking to do a new installation of ubuntu in a different partition and copy there all directory tree, basically replace the new / directory system, whit the old one that i have back up. Can i do this and everything will works ok?
View 1 Replies View RelatedI am new to Linux, running Fedora 13. If installing from source, how do I specify the directory into which I want to download the file? I have a Download directory set up in my home directory but nothing ever goes there and I spend all my time searching for the files I just downloaded. I obviously have no idea what I am doing.
The download pages don't provide me with a choice, that I can see. I usually end up doing yum install but then I don't really learn anything from the process.