Software :: Can't Replicate Site Using Wget
Jan 19, 2010I want to replicate this small howto (http://legos.sourceforge.net/HOWTO) using wget.However I just get a single file and not the other pages and that file too is not html.
View 4 RepliesI want to replicate this small howto (http://legos.sourceforge.net/HOWTO) using wget.However I just get a single file and not the other pages and that file too is not html.
View 4 RepliesI need to mirror a website. However, each of the links on the site's webpage is actually a 'submit' to a cgi script that shows up the resulting page. AFAIK wget should fail on this since it needs static links.
View 1 Replies View Relatedi use this code to download :wget -m -k -H URL... but if some file cant be download , it will retry Again and again ,so how to skip this file and download other files ,
View 1 Replies View RelatedI am trying to wget a site so that I can read stuff offline.I have tried
Code:
wget -m sitename
wget -r -np -l1 sitename
[code]....
This is the command line switch I am using:
Code: Select allwget -p -k -e robots=off -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' -r www.website.com
For some reason it seems to be downloading too much and taking forever for a small website. It seems that it was following alot of the external links that page linked to.
But when I tried:
Code: Select allwget -E -H -k -K -p www.website.com
It downloaded too little. How much depth I should use with -r? I just want to download a bunch of recipes for offline viewing while staying at a Greek mountain village. Also I don't want to be a prick and keep experimenting on people's webpages.
I was trying to download MOPSLinux from their Russian FTP server, using Firefox-->FlashGot-->KDE-Kget and it kept sitting there for about a minute, then popping up a dialog box asking for a Username & Password to access the FTP site.
I tried the usual anonymous type of login information combinations, to no avail; the box kept reappearing.
Finally for the heck of it, I tried Firefox-->FlashGot-->Wget and presto! It began downloading right away, no questions asked.
This is on Slack64 with the stock KDE installation + the KDE3 compat libs.
Here's the transfer currently going on the Wget window:
Code:
How do you instruct wget to recursively crawl a website and only download certain types of images? I tried using this to crawl a site and only download Jpeg images:
wget --no-parent --wait=10 --limit-rate=100K --recursive --accept=jpg,jpeg --no-directories http://somedomain/images/page1.html
However, even though page1.html contains hundreds of links to subpages, which themselves have direct links to images, wget reports things like "Removing subpage13.html since it should be rejected", and never downloads any images, since none are directly linked to from the starting page.I'm assuming this is because my --accept is being used to both direct the crawl and filter content to download, whereas I want it used only to direct the download of content. How can I make wget crawl all links, but only download files with certain extensions like *.jpeg?
EDIT: Also, some pages are dynamic, and are generated via a CGI script (e.g. img.cgi?fo9s0f989wefw90e). Even if I add cgi to my accept list (e.g. --accept=jpg,jpeg,html,cgi) these still always get rejected. Is there a way around this?
i want to download android developer guide from google site but code.google is forbidden from my country i want to use wget to download entire android dev guides with freedom( proxy ) that i set in firefox these for open forbidden sites ( 127.0.0.1 port:8080 ) i use this command to download entire site
Code:
`wget -U "Mozilla/5.0 (X11; U; Linux i686; nl; rv:1.7.3) Gecko/20040916" -r -l 2 -A jpg,jpeg -nc --limit-rate=20K -w 4 --random-wait http://developer.android.com/guide/index.html http_proxy http://127.0.0.1:8080 -S -o AndroidDevGuide`
[Code]....
I want to do something simular to the following:
wget -e robots=off --no-clobber --no-parent --page-requisites -r --convert-links --restrict-file-names=windows somedomain.com/s/8/7b_arbor_day_foundation_program.html
However, the page I'm downloading has remote content from a domain other than somedomain.com. It was asked of me to download that content too. is this possible with wget?
If a wget download is interrupted (like if I have to shutdown prematurely), I get a wget.log with the partial download. How can I later resume the download using the data in wget.log? I have searched high and low (including wget manual) and cannot find how to do this. Is it so obvious that I did not see it? The wget -c option with the wget.log as the argument to the -c option does not work. What I do do is open the wget.log and copy the URL and then paste it into the command line and do another wget. This works but the download is started from the beginning, which means nothing in the wget.log is used.
View 2 Replies View RelatedCurrently my partner is printing on my Epson Stylus Photo R265 from her XP system. I am attempting to convert her over to Lucid but every attempt so far to replicate the printer settings fails miserably.I had a look at Picasa which on first view appeared to give me exactly what I wanted being an A4 print without any border. However when I attempted to produce the final result it failed miserably, quality was wishy washy and displayed print lines across the paper.I raised the matter on the Picasa forum only to be informed that the method used by them is to forward the print to the printer. The responder assumed that it was via Cups and suggested that I look to the print settings in Ubuntu.
When I access the printer via the System settings all it gives me is connectivity details. Where should I be looking to access the settings where I can set the printing details. I am aiming to have the best photo quality on premium glossy paper and printing without borders.Is this possible to achieve ?? I have been round the houses with F-Spot, Photoprint, Shotwell etc etc. Please dont mention Gimp as it is well over my head. Surely there must be some way of setting print quality within Ubuntu. I feel that I am so near to my aim and would like to achieve it as even she who must be obeyed is getting cheesed off with windows.
I have a file containing text. I want to replicate a specific field.For example, I might want to append a copy of the second word of each line to the end of that line.Have: Once upon a midnight dreary, while I pondered weak and weary,Over many a quaint and curious volume of forgotten lore, Want: Once upon a midnight dreary, while I pondered weak and weary, upon Over many a quaint and curious volume of forgotten lore, many Is there a Linux command which will do this? I seek a basic command, not awk, not Perl, because I haven't learned those things yet.
View 14 Replies View RelatedI have 2 servers, 1 running a web server on 1 IP address,the other server running a proxy on a different IP address. I intend to change and update the ncsa file with usernames and passwords from a mysql table.Is there a way to replicate 1 table from the webserver to the proxy server databases and then run a cronjob? The cron will simply delete unused usernames and add new ones.Is there a better way of doing this? I don't want to run the webserver on the proxy server for security and also for performance.
View 2 Replies View RelatedI'm pretty sure the hard disk on my FC5 system (tells you how old it is!) is failing. It's a 500GB drive and I have a second hard disk that's 1.5TB (if I remember right) as a secondary.I would like to (ideally) just migrate everything -- settings, MBR, OS, home dirs, etc., everything -- to a different drive, take out the one that's on the fritz, and reboot the system with everything as it was...just a new hard drive.
Alternatively, I figure I could remove the old drive, move the secondary one into place as the primary, reformat it (assuming there's no data I need), and then install a fresh FC version and from there, migrate data from the old one as needed. I've done this latter method before and assuming the hard drive doesn't totally go, it should work. But I'd really RATHER have somehow a "mirror copy" of the 500GB drive that just seamlessly replaces what's currently there.
I am trying to replicate my GnuCash files between two computers. My GnuCash folder has three types of files:
* One Gzip archive file
* Many xac files (GnuCash Financial Data)
* Many log files
The xac and log files replicate with no problem. However the Gzip file does not. Without this Gzip file, GnuCash cannot makes sense of the other files.Why would this Gzip file not sync? Have others used Ubuntu One to sync their GnuCash files?
i hav ubuntu installed onto my hard disk..
now i want the same installation with all the packages to be replicated on a pen drive..
so that i can use ubuntu on other pcs with a bootable pen drive with all my packages installed.
i dont want to create an installation usb..
jus a usb which contains my full hard drive installation and can run on any system which supports bootable usb..
my filesystem size is 5.5gb and pen drive size is 8gb.
I would like to know if there is a way to replicate the contents of the GNOME Applications menu in FVWM (Without running GNOME), because it was very convenient to have it and I don't know the command-line names of some of the programs I need.Especially now I would really like to have it soon because I got VirtualBox and I don't know of any other way to start it other than that menu.
View 10 Replies View RelatedI have to ubuntu machine (9.10 and 10.4) with a openvpn tunnel between them.This is the situation:
Code:
NetworkA 192.168.0.0/24
|
UbuntuA br0:192.168.0.3 (openvpn bridge between eth0 and tap0)[code].....
UbuntuA has one only interface etho and there are two openvpn instance: one bridge istance with br0 and another instance with tun0.
UbuntuA is not the gateway for networkA. UbuntuB is the gateway for NetworkB.I need to comunicate between pc on networkB e those on networkA.This is the "ping situation" (no pc tested has an active firewall):
ubuntuA vs ubuntuB: OK
ubuntuB vs ubuntuA: OK
pc on NetworkA vs ubuntuA and ubuntuB: OK[code].....
I've been on a quest to enable full routing through my openvpn tunnel between my office and the colo. Masquerading will work, however it will throw off anything key based and makes a lot of things just more difficult and vague in general. Is there an easy way to do this via iptables? I tried using quagga hoping it would magically solve my problems, however it does not seem to do my routing for me . I just did a basic static route within zebra...
View 3 Replies View RelatedI have three locations with a central office connected to two remote locations. At the central office I run on a cisco asa 5505 two site to site vpns. The remote end of the first site is a checkpoint firewall , and the remote end of the second site is racoon on debian. Both sites are up and working. However, where at the first site traffic goes both ways, at the second site it only works from the central office to the remote office.
For example, I can ssh from a host in the central office to a host in the first remote site (through checkpoint firewall,) then ssh back from that host at the remote office to any host in the central office. In contrast, after I ssh from a host in the central office to a host in the second remote office (through racoon), I cannot see the central office hosts (ping the ip address of a central office host, ssh, etc. all fail.) The vpn settings at the central office (the cisco asa 5505) are identical. So it seems to me that some routing magic is missing on the host running racoon at the second remote office. Where would such setting reside? racoon config files? iptables?
Maybe a site-to-site Ouija board connection.
View 5 Replies View RelatedIn the office there is a local network with samba+openldap PDC. The local domain name is company.net. The company desided to create a corporate Website on a remote hosting and desided that the site's domain should be company.net which is same as local network's domain name. So now it is not possible to reach that corporate website from within the company's local network because, as I guess, bind9 which is installed on above menioned PDC looks for company.net on a local webserver. Is there a possibility to let people from this local network browse the remote site?
View 1 Replies View RelatedI'm typing this on my linux laptop, at work. My Firefox works fine, but I cannot apt-get, or wget anything. To get my Firefox to work, I just went into the Firefox preferences, checked "Automatic proxy configuration URL" and entered the url that I have. Now Firefox works fine, but the rest of my system does not.o be a similar setting in System>Preferences>Network Proxy. There is check box for "Automatic proxy configuration" and a field for a "Autoconfiguration URL". I put the same URL that put into Firefox here and told it to apply it system-wide, but my apt still does not work. This is a big deal because I need to install software and I really don't want to start manually downloading packages, plus I need ssh.
I have googled extensively on how to get apt to work from behind a proxy, but nothing seems to be working. I don't have a specific proxy server and port; rather I have some kind of autoconfiguration URL. Plus, my system has no /etc/apt.conf file at all. Any ideas on how I can get my system to be able to access the internet? It's very strange to me that Firefox can, but apt, ping, wget, etc cannot.
I am trying to have this cool effect where gnome scheduler downloads with wget this image every three hours. However, even when I do it manually in the terminal it doesn't seem to download it correctly. When I go to open the .jpg it says in a big red bar on the top "Could not load image '1600.jpg'. Error interpreting JPEG image file (Not a JPEG file: starts with 0x47 0x49)"
However, when I go to the picture in the link above and right click "Save Image As" it downloads it fine.
I'm currently using wget to keep a running mirror of another site but I don't have much space locally. I was wondering if there was a way to turn on -N (timestamping) so that only the "updates" were retrieved (i.e. new/modified files) without hosting a local mirror.
Does -N take a timestamp parameter that will pull any new/modified files after "x"?
It seems like a waste to compare remote file headers against a timestamp without presenting the option of supplying that timestamp. Supplying a timestamp would allow me to not keep a local mirror and still pull updates that occurred after the desired timestamp.
Like the subject says,.. I'm lookin for a wget'able 11.04 Live CD URL This URL works great with a point and click but doesn't tell me what the direct URL is to use wget. [URL]
View 1 Replies View RelatedI did a yum remove openldap and apparently it trashed yum and wget.w can I get them back now?
View 5 Replies View RelatedIf I have an address, say [URL], and I want to run n number of wgets on it. How can I do this? I'm curious for the reason of checking how wgets caches DNS. Turn off caching of DNS lookups. Normally, Wget remembers the IP addresses it looked up from DNS so it doesn't have to repeatedly contact the DNS server for the same (typically small) set of hosts it retrieves from. This cache exists in memory only; a new Wget run will contact DNS again.
The last part confuses me. "a new Wget run will contact DNS again." This means if I run a for-loop to call wget on an address, it will just make a new call to DNS every time. How do I avoid this?
I am writing a bash script where I would need to down load few file from server but the glitch is authentication is being performed by SSO/Siteminder server.
Does anyone aware of a option or trick with wget or curl to authenticate against SSO and then download the file from the server.
Standard http-user and http-password definitely does not suffice the need.
For some reason some command line commands are unable to resolve urls, whereas other commands work as they should. I have checked most setting but am unable to find out what is wrong and am no closer to figuring out what and why.
[root@subzero ~]# yum update
Loaded plugins: refresh-packagekit
[URL]: [Errno 4] IOError: <urlopen error (-2, 'Name or service not known')>
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: atrpms. Please verify its path and try again
[root@subzero ~]# .....