Programming :: Methods For Extracting Data Strings From Output Files?
Aug 23, 2010
I am trying to develop a method of reading files generated by other programs. I am trying to find the most versatile approach. I have been trying bash, and have been making good progress with sed, however I was wondering if there was a "standard" approach to this sort of thing. The main features I would like to implement concern reading finding strings based on various forms of context and storing them to variables and/or arrays. Here are the most general tasks:
a) Read the first word(or floating point) that comes after a given string (solved in another thread)
b) Read the nth line after a given string
c) Read all text between two given strings
d) Save the output of task a), task b) or task c) (above) into an array if the "given string(s)" is/are not unique.
e)Read text between two non-unique strings i.e. text between the nth occurrence of string1 and the mth occurrence of string2
As far as I can tell, those five scripts should be able to parse just about any text pattern. I am by no means fluent in these languages. But I could use a starting point. My main concern is speed. I intend to use these scripts in a program that reads and writes hundreds of input and output files--each with a different value of some parameter(s).
The files will most likely be no more than a few dozen lines, but I can think of some applications that could generate a few hundred lines. I have the input file generator down pretty well. Parsing the output is quite a bit trickier. And, of course, the option for parallelization will be very desirable for many practical applications.
I have a PHP script written that is checking a string to see if it contains a link in it (i.e. a URL). I have the following if statement, that uses 3 possible regular expressions to determine if there is a link or not.
Code: // check if we found a link // links are denoted by strings that: // - contain http:// // - contain www.*.*
[Code]....
I'm not convinced yet that writing a shell script to do this is the best course of action. If someone is capable of doing this with a Perl or a Python script that's fine too. If you want to make it super high performance and write it in assembly
I work with python and I use emacs as my IDE tool. I have been running Debian Squeeze (6.0.9) for some time now with emacs 23.2.1 and ecb 2.32. I am able to access my python methods in the ecb-methods window with no problems. However I recently upgraded my desktop to Debian Wheezy (7.5) running emacs 23.4.1 and ecb 2.40 but I have lost access to the methods in the ecb-methods window. The window is just empty while the others (directories, sources and history) are all populated. I have a second laptop which I decided to upgrade to Debian Jessie, however Jessie recommends emacs 23.4.1 which is running with ecb 2.40 also. The result is the same as on Wheezy.
I have used the ecb menus and googled for a solution or even just a mention that such a problem exists but have come up with nothing. Either I have a unique situation here or am doing something really dumb.
I would like to upgrade to Wheezy or Jessie but I need access to methods in the ecb methods window. How to keep my upgrade and see the methods in the methods window of the ecb system ....
I want to search and replace strings in a file with strings in other files/i need to do it with big strings(string1 is big) and i want to use a txt file for this.But this code not working :
I've got lines of data in the following format: space1=number of times error has occured space2=IP address space3=Error
I've set this out nicely with printf and made it email me, the problem is - it's not entirely clear what each column/space is and the IP and occurances can sometimes seem confusing. Is there any (easy) way to output this into an ascii like table? There will always be 5 occurances, and the format will always be the same
I was messing around with Bash scripting just now and was wondering if there was a way to organize the output of a command into an array. Like the Bash equivalent of the PHP explode() function.
I'm not sure wether this is the correct forum for this, but this is the best place I can see at the moment, so I'll give it a try. Please redirect me if I'm mistaken.Running Suse 11.2, I have a RAID-5 device mounted, and a straigt disk. I want to copy data from the straigt disk to the array, using several methods: with Dolphin, with cpio. Copying runs for some time, sometimes one or some files are copied indeed, but after a short time (sometimes half a minute, sometimes 10 minutes or more) I get a
Message from syslogd@linux-wrth at Jan 9 22:44:03 ... kernel:[ 381.602651] Oops: 0000 [#1] PREEMPT SMP Message from syslogd@linux-wrth at Jan 9 22:44:03 ...
I have come across a webpage where there are several (thousands of hyperlinks). Of those hyperlinks, many are named "CDS" and in each of these CDS hyperlink- there are certain lines which i need to chop off!! I kind of have this feeling that AWK would be the one to use.
I am trying to extract data from the xml file using c coding. here is the sample code. this code gets compiled with errorfree. but the call back functions start and end are not invoked. coding:
I use Lenny, and was trying to mount a .iso image, supposedly a cd imagem.
[code]....
This is what I get from dmesg | tail:
debian:/home/zac/cscd# dmesg | tail [ 1811.505199] floppy0: disk absent or changed during operation [ 1811.505207] end_request: I/O error, dev fd0, sector 0
[code]....
I did a little research on the web and it seems that this file is not really a cd image, but simply data in a .img file. What do you think of that?
debian:/home/zac/cscd# file cscd3.iso cscd3.iso: data
Some people recommend to extract the data via the dd command, but it didn't seem very safe for me to do that!
[URL]
is it possible to extract the data into a directory (instead of a device) using dd? This file is supposed to be a software. I wanted to run it on wine by keeping it mounted on a mount point in my file system. Does it make any sense to try to do this if the file simply isn't a cd image?
I have to execute certain commands (like shutdown Tomcat) on several servers so I'm using a loop and ssh. I put the server's IP on a CSV file which I parse, execute the commands for each line and send the output to a file. The problem is that after processing one line the program stops execution. I wonder if someone could lend me a hand with this, I'm new in bash scripting and I ran out of ideas.
The CSV (servers.csv) file looks like this:
Code:
192.168.254.5:Server 1 192.168.254.6:Server 2 ...
And the script looks like this:
Code:
#!/bin/bash while IFS=: read ip name do sshpass -p "pass" ssh -o "StrictHostKeyChecking no" root@"$ip" 'sh <CATALINA_BASE>/bin/shutdown.sh' >> output.log done < servers.csv
I want to have the output of a program go to 2 different files but not going to standard out. Is there a way to do this in bash? I know that in Z shell its really easy. omething like: Code: echo "test" >> file1 >> file2 Would work. But in Bash it doesn't seem that easy. I know that tee will send the output to 2 files but it also sends it to STDOUT.Something like:Code: echo "test" | tee -a file1 file2 Would put the word "test" in file1, file2, and STDOUT. Is there a way to just send the output to file1 and file2?
I wanted formated output of all the files under a particular directory. I am trying to use find.Something like find -P ./ -type f -name '*.cpp' -printf "%p "I want all the files with specific extension like .c .cpp .h to be printed out separated by space. One more thing i want is absolute path names instead of relative.
I have wrote a 1 line command that parses a file, locates the IP Address in the file and then trims the output the way I want it, and then sorts numerically and by uniqueness and then >> appends to output.txt
I can get all the IP's into 1 file "output.txt", but what I am really looking for is some type of way to create a text file, for each IP it finds labeled xxx.xxx.xxx.xxx.txt and also put that ip address into that file..
I have encountered a problem:I have "while" loop; at each run a set of outputs is produced but then I need to shift them into a corresponding folder ; otherwise next run the new outputs will be over-written. Furthermore, I need to pipe what I have on the screen inside a file. I have put my code in the following:
I have a folder which includes bunch of folders each having data files in them. [ Folder A has F1, F2 F3 ..... F1000 folders in it, and F1, F2, F3 ... each has about 10 different files named FILE 1, FILE2, FILE3 .... in them.
I am interested in File 1 of each Folder, because that contains the data I need in it. More specifically, that File1 s have a line "ANSWER=..." in them, and i need to get that value of the ANSWER from each file. So doing it by hand is so hard, so I need to write a script that will scan all folders and give me a list of values of eache ANSWERs.
I've a program which manages my pdf and references. I wish to put some of the information on my website but that program (Mendeley) does export only in XML (or bibtex). I'd like to simply convert the XML output files to SQL in order to create or update an SQL database.I'm not an expert in either XML or SQL (use only PHPMyadmin). Does someone get help me to figure out?
I have a problem - I have files with rows of data and I need to check if the next row (of the same type) has the NEXT date in it so I need to extract a date in YYYYMMDD format from a row (easy enough) then add one day to it and compare it to the the next date I encounter on a subsequent row.
I am having a lot of problems trying to change one string by another using sed: the sentence is like this:
sed -i 's/KERNEL=="tty[A-Z]*", NAME="%k", GROUP="uucp", MODE="0660"/KERNEL=="tty[A-Z]*", NAME="%k", GROUP="uucp", MODE="0666"/g' 50-udev.rules it is just to fing the line with: KERNEL=="tty[A-Z]*", NAME="%k", GROUP="uucp", MODE="0660"
I wrote this small program that will truncate a string that's entered in by the user.An example of its usage:if the user enters in a string say "abcdefghijklmnopqrstuvwxyz" the program will only take the first 9 characters and truncate the rest so that the user can be prompted for a second string and not be worried about remaining characters left in the stream.Now this program works O.K. but I would like to find something in C that has this functionality build into it...Does anyone know of any function that will accomplish this.
I am trying to replace a section of a file between the first instances of the strings {}, with the contents of another file. Example of the format of the file I'm trying to modify
I use udhcp with some of my minimal installs. I've messed around with the code a bit when it wasn't working correctly - a few years ago. I will find time - I hope soonish - to figure out how to do a few other things with it.
For now though, I'm using this string to grab my ip after startup
I realize I could substitute ifup -a but I'm more interested in figuring out how to make ifup wait for the ip to become available if it is not available yet.
Never mind that one, just typing out the question answered it for me, when I find it in the scripting man ' ; : " & =
Or if there are any other suggestions for better construction of the string.
what is the best command to use to parse strings?I have a variable $str and need to parse this string.Can you provide an example of the command used to get a substring of $str based on the index values of start and end
i have an sql table with 2 columns i run a script that randomly selects a word from the table in column 1. the word is displayed on the screen and I guess what it means i concatenate the randomly selected word and the answer the script looks for a match in mysql if it finds a match it says "Good job!" if there is no match it will say "not correct". However when i get it right it says not correct even though when i echo the variables they look exactly the same. the script below:
#!/bin/bash var=$(mysql translator -u root --password=*-N<<EOF SELECT word FROM tagalog ORDER BY RAND() LIMIT 1 EOF )