Programming :: Using Key To Match Against Source.txt File To Add Xml Tags To Names In Perl
Jun 21, 2010
Using a list of names (over 4000 of them) painstakingly gleaned from the source file years ago for a database file, I want to match the names against the source file so that they can be updated with the tags <forename></forename> in the original source file.
I placed the list of names in @forenames (only posted a few of them here).
Perl script is:
I am able to get the name bracketed by the tags to appear on the console screen but don't know how to apply the output to the source file. Perhaps I need to do a match on the words then some kind of edit to surround the matching words with the xml tags? I'm a rank novice doing this as a labour of love for a friend.
I want to match watever that is within the <item></item> tags and save it in the $content variable. however, the <item> tags can spread over multiple lines:
I am writing a script that involves reading the content of a file present in a directory and/or its sub directory. I know readdir returns all the files & DIR names in a directory but how to check weather readdir is returning a file or a directory
Was wondering if any perl guru's could help me with a quick log file adjustment. I have a text file that looks like so (tabs and newlines are revealed so you can see what separates the data):
There are maybe 100 lines of text in this file at any given time. I need to delete all duplicate lines only looking at the first bit of text prior to the first tab. It doesn't matter which one gets deleted as long as there are no two lines that begin with that same text at the beginning before the first tab. So in this example, either the fist line "1234" or the last line "1234" would need to be deleted. I already have code in my script that opens the files - I just need the code to read the text into an array and the part that would find matches based on the above criteria, and make the deletions.
If it would be easier, I can even do a system call and use SED (v4.1.5) and/or AWK (3.1.5) instead.
I'm working on a project at work to automate sending e-mails to customers.Everything is in place except my ability to extract the useful data from HTML tags to use in the formation of the POST.
Totem by default shows some sort of tags (if found) and not file names in the playlist. I don't like this, because I use Totem for video only and videos I get from the web usualy contains URLs and other such nonsense in the tags so I never know which file is which.
Can I somehow force Totem to always display file names?
One of my application generates a text file with an XML output in it. I need to read that log files and if the output does not match to a string in couple of tags it should create a log file with the file name and the the tag name.
The two tags where the string should match is:
Identity format tag should always be JPEG , well- formed and valid status tags should be true.
I want to strip the process name from the hosts - i did it with the code below.
I have two questions - is there a more compact way to strip off the process names? usalso i want to get rid of the errors after extracting the hostname. It is complaing about $arry[1]. using my $arry[1] is not allowed. Assigning the slice to a value, as is 'my $sliced_arry = $arry[1]; print $sliced_arry , does not work either.
Code:
Use of uninitialized value in pattern match (m//) at newcomm_stats.pl3 line 7, <NEWCOMM> line 16 here i get what i want - just the host name, but still get those nasty errors. assigning a value to $1 does not work, and localizing $1 with 'my' is not allowed.
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings? I'm new to sed scripting and I have been working on this for some time now. I have a large file (sample below) that I need to edit.
What I need looks something like this.
I'm working with a very large file so simply merging all the lines then adding a new line character before ">contig" and after "translated" won't work, at least not with sed.
I'm trying to split a text file into various parts. Everything in between "123" and "break" (including linebreaks) goes into the splitted file.
e.g. using this text file:
This should split into 4 files. However I'm only getting 2 files: one for the line "123break" and one for "123 blah break". The two occurrences that contain linebreaks are being ignored. The .* part of my match should capture linebreaks seeing that I'm using the /s modifier shouldn't it? Even when I use the match /(123 break)/gs it still doesn't capture the first occurrence. I'm using Perl v5.12.3 (from ActiveState) on Windows XP. The text file is also in Windows format.
Code listed below.
The above code generates two files Output_1.txt and Output_2.txt which contain "123break" and "123 blah break" respectively. I want it to generate four files.
I am trying to make a random sentence generator in perl. So far I can loop it x times to make a fixed quantity of words, but I don't want to do that. Or I can let it go on until I hit ctrl-C :/
I want to have it so that when it reaches a word with a sentence terminating punctuation mark it stops.
My attempt to do that was with:
This is doing the woooosh text until ctrl-C thing...
Now however I am not sure how I can cut off everything after the punctuation mark (when it exists).
Code: $sql="SELECT table1.datetime, table1.user_id, table2.ip, FROM table1,table2 WHERE id='$id' AND (table1.id = table2.id AND table1.datetime = table2.datetime)";
In table2 the datetime fields are about 1 to 2 seconds off due to the source of the data, which I cannot change.
Is it possible via a query match table1.datetime & table2.datetime by HH:MM (ie. to the minute instead of to the second)?
What I am attempting to do is rename some television shows into the format that my PVR will understand for the naming convention. I have a script that cleans them up about 95%, now I just cant figure out the last little detail..
For example: NCIS_01.mkv I think it can be done in sed, but I just figure out how. I need it to be renamed to: NCIS_s01e01.mkv
How can I make sed (or something else) match the last "_" and any numbers after it until the period and then insert text between them reliably?
Depending on the show, it can be something like: This_show_name_243.avi so I need it to be more flexible than I can figure out how to do..
I have script that I'm working on that updates a username in all the files that are called blah.inc for my framework. since i host a bunch of these web apps i need to do it to all of them. so I need to figure out how to update these files automagically with out me watching it to call vim every time. heres what I have so far
Code:
This finds the files but now i need to figure out how to do s/bob/fred/g on those files.
i want to grep lines which do not start with # or a blank space. like
bla bla bla bla
how do i do this? i tried grep --invert-match '^#' which gives lines not starting with # but gives me blank lines too i tried grep --invert-match '^#|^ ' which will give lines not starting with # OR not starting with blank ( which means any line including ones starting with #
I have a laptop that I am in through SSH. The laptop does not have an Xwindow system so I am using the program fbi to open an image on my laptop screen from my SSH connection:
fbi -T 8 picture.jpg #this opens the image on the laptops tty8 terminal
I've found that making a for loop does not work with files that contain a space in the name. Something to due with a bug that they call a "feature" that stops the first variable at the first whitespace.
Using a "while" loop is not exactly what i require either seeing as I want to be able to view each image in the directory on screen and tag it accordingly, before it jumps off to the next image, and I'm not sure how to add a pause to a while loop.
How do I make a Bash script and loop Variables handle files like "files that contain spaces.jpg"
I often get files with many spaces as part of their names. I would like to automatically replace these spaces with underscores, but otherwise not change the file name. Is there a way to do this task with just the bash shell?
Im writing my first bash script. Its function is to move files to the trash can and write a log file in the same format that the system does to allow for file restoration. The problem is that in bash, everything works fine, but in the OpenBox window session, the files are named after the source directory, not the original name. Heres the script:
Code: #!/bin/bash # trash - Script to move file or folder to the trash can and create a log file ##### Functions ##### err_output () # Writes error message { echo "$0: cannot stat `$1': No such file or directory" echo "USAGE: $0 SOURCE DEST" exit 1 } >&2
if [ -e "${DEST}/${FILE}" ]; then max=0 DIR="$(pwd)" cd "$DEST" shopt -s nullglob for backup in "${FILE}."; do nr=${backup#${FILE}.} if [[ "$nr" =~ ^[0-9]+$ ]]; then if (( nr>max )); then max="$nr" fi fi done cd "$DIR" max=$(( max + 1 )) write_log_numbered mv -- "$SOURCE" "${DEST}/${FILE}.$max" else write_log_unique mv -- "$SOURCE" "$DEST/${FILE}" fi
So I run the script with the test file "Junk". In bash, it moves over and its named correctly. Code: ~/.local/share/Trash/files$ ls file file.1 Files Files.1 Junk The log file is also named correctly
Code: ~/.local/share/Trash/info$ ls file.1.trashinfo Files.1.trashinfo Files.trashinfo file.trashinfo Junk.trashinfo
But, when I go to view the trash can in the file manager in Openbox, the file is called "Testing" which is the name of the source directory. However, if I go to the trashcan via its full path (going to .local/, then share/) all the files are named correctly. Whats going on here? Is there some way to get the trash can to read the correct file name?
I have a log file (test.log) starting & ending within dash (--) as below. I am looking to write a parser for test.log. This test.log file currently has single value for one Job ID but I wish to parse for repeated N values of different Job ID - Job, User, Queue, Dispatched Date, Dispatched Time, Completed Date, Completed Time, Hosts/Processor, CPU_T and TURNAROUND. I can either output this 10 values in another .log file or dump into cgi.
The selected parameters from test.log for parsing with above 10 attributes are -
Some more HTML code... I would like to cut the above text so i get this: Sometext on multiple lines like this.Sometext on multiple lines like this.Sometext on multiple lines like this. Sometext on multiple lines like this.Sometext on multiple lines like this.
There are other HTML files with similar cuts I need to do, but once I have the method for doing one, I am sure I can do the others.
I think the two logical strings to cut between would be:
I am not sure if these strings are always the start and end of the line respectively, is this makes a lot of difference! Then the HTML tags would need to be stripped to get the text on its own.
I know the commands for removing tags, but searching for a string like class="IOSSectionTitle", and cutting everything before it etc is something I am finding challenging.
Just thought I would add that the HTML does not nec. appear on logical new lines throughout the file and there may be unexpected new lines, but as far as i know the class="IOSSectionTitle" and <img always appears as a string without any new lines between those characters.
I have a few problem. I have txt file like this:Quote:00 21 55 84 9a ff 00 1f 9e 1a 5b 00 08 00 45 00 00 4b 00 00 40 00 3f 11 9a 0e a1 8b fa 02 04 02Then, based on my txt file, I would like to generate text like this:Quote:00215584 2155849a 55849aff 849aff00 9aff001f ff001f9e 001f9e1a 1f9e1a5b 9e1a5b00 1a5b0008 5b000800 00080045 08004500 00450000 00004b00 004b0000 4b000040 00004000 0040003f 40003f11 003f119a 3f119a0e 119a0ea1 9a0ea18b 0ea18bfa a18bfa02ased in my reading, I found about ngram solution in perlbut I not really understand to edit from source code given. I m begineer user in programming language. I hope to get the solution. [URL]
I am using File::Find to go through a very large tree. I am looking for all xml files and open only those that contain a tag <Updated>. I then want to capture the contents of two tags <Old> and <New>.
My problem is, after I open the file and do the first grep for <Updated> (which does work), I am unable to grep again unless I close the file and open it.
I did something like this:
Quote:
find(&check, $dir); sub check { if ($_ =~ /.xml/){ open(FILE,"$_"); if (grep{/Updated/} <FILE>){ # <-- works