Programming :: Cannot Extract Webpage For Processing
Aug 25, 2010
I am trying to extract a web page via Google for processing. I am able to create a proper query and test it using cut/paste into the address bar of my firefox browser.
When I attempt to extract the page with wget:
wget -O - -q "$query"
I do not see the information that is present when I used the browser.
I want to download a webpage to extract information. This used to work with other pages, but with this particular page [URL].., I get the following error.
I'm attempting to learn 'processing', the language used on the Arduino but I am having a bad time finding resource because of the poorly chosen name. What I am trying to do is open a pipe to another process/program to use as input to the processing plot program I am writing. I found processing.org but that site is not very helpful. There is plenty of examples of reading from an Arduino from the serial port but I want to read from a program running on my laptop. Both the plot and data accumulator program in the same PC.
I am looking for some suggestions if possible, regarding processing the files using perl script. Scenario is I have a location where new files will be added always. I need to process these files for some validation. I wrote a perl script to do this and I thought I can rename the files once they are processed in that way I dont process the same files again. But now I can't rename the files due to some restrictions. Second thought, to process them based on date stamp but as my perlscript is being automated and runs every one hour to process the files I can't go by date stamp.
New to Fedora (from Windows), I am up and running ok with packages from the repository but only half ok with Processing, the Java graphics programming front end from processing.org.Their download gave me a .tgz file which Package Manager extracted for me into a location of my choice and where there is now a "processing" shell script.This works ok and I have managed to create a launcher on the desktop. That starts ok but always with processing's default action of giving you a new and automatically named work file.In Windows an existing Processing file (.pde file) could be "opened-with" Processing. Trying to do similar in Fedora I find that I am expected to nominate an Application to open with but Processing has not installed as an application.I guess the question is how do I promote Processing to be an Application?Or is there a different approach?
I need to stream a webpage to my application and i tried something like this but i get segmentation faults. Is there any example in c and/or gtk that i can peek on.
I use the below loop to process each file (listed in a text file) with a software. During processing the software asks me to enter a value and continues processing of that file after I enter the value. I have those values stored in a text file "myfile". What I want is to get the values directly from myfile when the software asks "please enter the title:". I dont want to enter them all manually. But i could not figure out how to code this in Bash script.
Code: for ((i=1,i<=$NR,i++)); do --command of the software comes here-- done
I'm working on a simple data processing script.My script uses a loop with getline to check for the value on the next line to decide if it's time to terminate the loop.This works dandy, but the problem is that getline eats that line, which then isn't processed by the rules in the remainder of the script (even though I want it to be). To illustrate what I mean, consider this simple gawk script:
I am attempting to "export" the progress bar from wget display using sed. Basically, we have an app that starts wget to download a large file and we want to show a progress bar. Our application has a dbus interface to receive the download progress.
So we were think of a command like: wget [] | sed [] | dbus-send[] The problem at the moment is, how do you get the matched string out of sed and into dbus-send? I can get the progress string by: sed -u 's/[0-9]*%/&/'
This populated '&' with the correct percentage, but I cannot seem to get this out of sed.
I want to do a "gallery" on a webpage, ie. a series of thumbnails, whhich when clicked, will popup a new window of a bigger image of that same things (with more details about it, etc.). How do I do that? (I think it's soemthing to with "window.popup" in Javascript, but how do I use that? Can I do it with <a href="something" target=new>?)
The software Nagios uses .cgi files to show a lot of things.. services, hosts, etc etc. Is there any way to pick up those .cgi files and import them to other web page? how to do it?
Does anyone know a method of being able to process the complete and literal command line passed to a shell script ? I want to have the command line parameters with ALL characters (including meta characters e.g. $ literally).
So as if there was no shell to substitute or expand parameters nor applying it quoting rules.
I would like to read unix file permissions into a bash array for processing but tbh I have no idea how to do this. Then I will check for each individual access right l, d, x etc.
I have a Python script that copies a couple of DLL's and EXE to a directory before running the EXE. It can be a fresh copy or the files can already be in the target directory and are then overwritten. The script uses shutil.copy() to copy the files and that works but as the files are copying processing continues and the script tries to run the files mid copy, causing an error.
I need a way to wait for the files to finish copying before the script continues. Putting the thread to sleep isn't good enough, calling os.system("copy ...") also doesn't work, using os.path.exist() won't work because the file will exist during the copy.
when the "Submit" button is clicked on a form in the webpage, I'd like the background tiled image to be changed to another one (downloaded from the server, and "activated"), and the logo that I have there also replaced by another one, which will also have to be downloaded from the server.
how to make a site like this one, LinuxQuestions itself - it puts a thin line around each post, to demarcate it - for the website I'm building, I need exactly this functionality. Do I have to use the "gd" library?
I want to write Bourne-shell script that will be to do finding and replacement in any web page code (.htm file) name of the tied folder in which have been saved pictures, .css, .js and other files. This folder create a web browser when we save web page completely and has so name as web page and has ending '_files'. I have many web pages where name of their folder are incorrect. Of course, my web browser shows these web pages without pictures. I can count amount of web pages in a folder (/path) needed for me.
1) find /path -type f -name "*.htm*" -print | grep -c .htm or find /path -type f -name *.htm | wc -l I can get list of web pages.
2) ls /path *.htm > out-list But I don't know how to assign the value from out-list (2) or result commands from pipeline (1) to a variable. Then I want to do next:
3)
var="1" # where variable 'list' is an amount of web pages while [ $var -le $list ]
[code]....
4)assign the 1st (then 2nd , etc. ) value from out-list (2) to variable 'webfile' sed -n $var,+0p out-spisok
5)find the 1st string value '_files' in the 'webfile' grep -m1 _files $webfile
6)For example, 'abracadabra_files' is an incorrect folder in the 'webfile' I must to know start and end position 'abracadabra' without ending '_files', "cut" name of the incorrect folder and assign it to the variable 'finder' finder = 'abracadabra' BTW, name of a folder before '_files' is between '="' and '_files' in any web page code.
7)foldernew = $webfile (without '.htm')' foldernew' is equal with name of the tied folder without ending '_files' in the folder '/path'
8)find and replace in the 'webfile' and save result in the 'webfile-out'sed s/$finder/$foldernew/g $webfile > $webfile-out
I have a few questions regarding HTML, UNIX and Javascript. I've been tasked with creating a fairly simple webpage that takes a few inputs. Each input must correspond to an argument in a UNIX command running on a server.On a UNIX server we have a script (.ksh) that takes 3 arguments. The result of the script is a data file which is FTP'ed to an external server. Let's forget about the FTP portion for now. I would like to know where I should begin.What I know so far:
1) I will need HTML to create the webpage. Skill level is high 2) I will need Javascript to make my webpage more interactive. Skill level is high. 3) I will need to understand the UNIX environment. Skill level is high.
I need to extract a price from a string, this may vary in the future so it may be 12.99 or 14.99. i thought a sed command might crack it and i need to write to a file:<td><b class="priceLarge">?6.99</b>I need to extract the price 6.99(with no ?)so extract anyhting between "> and </B> and write it to a file such as tmp1.txt .
I am trying to extract 2 numbers from a same file and my goal is to print them both in another file, on the same line, separated with a space. I have to do that for 20 files and I would like to have therefore 20 lines like this in the output file. It would look like this :
And I did this by running a bash script with the following content :
Code:
#!/bin/bash ls execution$1$2*.* | while read filename do cat $filename | grep -e "Total aborts:" | cut -d " " -f3 >> abort$1$2.dat done
$1 and $2 are just strings to identify the different files I want to consider in this loop. This script works well to extract a number which is the 3rd field of a line starting with "Total aborts:".Now, how could I change this script to do what I mentioned above (i.e. extracting two numbers from two different lines) ? The second number is the 3rd field of a line starting with "Total throughput:"
I have this string ./DAT000728-652523058.job.I want to extract the no between DAT and - sign. I want 728. I dont want 000728.echo ./DAT000725-560162365.job | cut -d'T' -f2 | cut -d'-' -f1 I am getting 000728.string can be ./DAT326822-652523058.job also. then i need 326822
I have many files in a folder from which I need to extract some contents, these are basically text files wich have individual lines with (i.e)
name: john address: whatever phone: 123456
Some caveats
1. Sometimes a line might be missing.
name: johnn phone: 123456
2. Lines are not in the same line-numbers across the files I did try some things with awk based on google searches but I couldn't extract the data of each file into a single line (this is the ultimate goal):
john,whatever,123456
I don't have knowledge other than having put some bash scripts together for backup jobs, so I am open to install anything that could to pull this off.
I'm trying to create an application that monitors, among other things, what site the user is currently viewing. I would like to know if there is any way to get the current URL from the Firefox's address bar on a Linux machine. I know that under Windows I can use the DDE server approach, but under Linux this task is proving very tricky. I've considered an approach involving an extension to Firefox, but this would require the user to install the said extension himself. Which is not something I want. If an extension can be installed by a different program's installer than that could work, but I don't know if that's possible or not.
The idea is to make a website to check the availability of domains and it works but its not pretty yet. Below is what i have till so far:
## this is the API from my domain registrar. <?php $client = new SoapClient('http://api.sync.com/?wsdl'); ## I have a search box that sends the request to this page $var = $_GET ["s"];
## remove the most common subdomains from the request. $var=eregi_replace("www.", "", $var); $var=eregi_replace("mail.", "", $var); $var=eregi_replace("ftp.", "", $var); $var=eregi_replace("pop.", "", $var); $var=eregi_replace("smtp.", "", $var);
## remove any TLD extension from the request. $split = explode(".", $var); $main = $split[0]; $arraysize = sizeof($split); for ($x=1; $x<$arraysize; $x++) { $tld .= "." . $split[$x]; } ## login to the API $paramLogin = array('handle' => 'randall', 'password' => 'password');
Result Login: Array ( [code] => 200 [message] => Login succesful ) array(3) { ["code"]=> string(3) "200" ["message"]=> string(20) "Domain not available" ["result"]=> object(stdClass)#236 (1) { ["status"]=> string(5) "TAKEN" } } bool(true) array(3) { ["code"]=> string(3) "200" ["message"]=> string(16) "Domain available" ["result"]=> object(stdClass)#232 (1) { ["status"]=> string(4) "FREE" } } bool(true) ?> ## till so far it works
What I need to do is to make this ugly looking reply in to something more readable, basically if TAKEN print occupied and if free print its yours to grab. I have been struggling with the in_array function but i'm not getting anywhere close in getting it to work.
I am trying to get the metadata out from an image file in python. I have tried using PIL but it does not give me the data I am looking for (mostly just got a bunch of hex code) and I have no idea how to use ImageMagick, the python module is poorly documented and I can't find any examples on the net.The info I need is stuff like camera model, if flash was used, focal length, exposure time, date, etc.. pretty much the same info I get when I look at the "Image" tab on properties in Nautilus on Ubuntu.
What I am doing is writing a script that will iterate through a lot of pictures and put all this metadata into MySQL. I chose python since it is simple and I am familiar with it. But I can't find a good way to get that metadata from within python.
I have a small bash/awk program that extracts the date/time/size of thousands of email headers. I'm trying to also extract the last "Received from:" string from these email headers which will give me the senders email server. on extracting the last occurrence of this string, and printing the information after it?