Programming :: Cutting An Html File Apart - Perl - Sed - Awk?
Apr 13, 2010
I have an html file like this
HTML Code:
Some more HTML code... I would like to cut the above text so i get this: Sometext on multiple lines like this.Sometext on multiple lines like this.Sometext on multiple lines like this. Sometext on multiple lines like this.Sometext on multiple lines like this.
There are other HTML files with similar cuts I need to do, but once I have the method for doing one, I am sure I can do the others.
I think the two logical strings to cut between would be:
I am not sure if these strings are always the start and end of the line respectively, is this makes a lot of difference! Then the HTML tags would need to be stripped to get the text on its own.
I know the commands for removing tags, but searching for a string like class="IOSSectionTitle", and cutting everything before it etc is something I am finding challenging.
Just thought I would add that the HTML does not nec. appear on logical new lines throughout the file and there may be unexpected new lines, but as far as i know the class="IOSSectionTitle" and <img always appears as a string without any new lines between those characters.
I'm trying to find out how to extract the string between the 2 <title> tags: <title>this is what i want</title>.I found lots of results but nothing I've tried works.. EG:$page =~ m/<title>($.)</title>/gism;
I have managed to create a HTML file inside python code,convert this to a PNG file through a Python script ?
EDIT: Details added: I have a python script which generates map-legends in the form of an html file. The legend generated have to be pasted on a map which is in a png format. A png format file can be pasted on another png format file easily. But because the legends generated are in a html format I cannot paste it on the map file !!
EDIT: Details added: I did Googling first but it resulted in various soft wares for above purpose which I don't want !!
The below snippet works fine until I use strict. Then it dies with the following error: uote:Can't use string ("html") as an ARRAY ref while "strict refs" in use at ./filetest3 line 18.I want to create @lists based on the $scalars in @type. However, "my @$ext = ()"; and push (@$ext, @files); do not play nice with strict. How do I get around this?Quote:
I have script that I'm working on that updates a username in all the files that are called blah.inc for my framework. since i host a bunch of these web apps i need to do it to all of them. so I need to figure out how to update these files automagically with out me watching it to call vim every time. heres what I have so far
Code:
This finds the files but now i need to figure out how to do s/bob/fred/g on those files.
I need to extract som text from a text file. The text is a test log with system info at the top and results further down. What I need is to add different tags with formatting before and after each line. I have prepared a template with html formatting, but the number of lines in the test log may be different from case to case, so I need to be able to add formatting tags by need. Can this be done using bash script, sed, awk, head, tail... ?
website.com/john/doe/index.htmlI need to cut that to say:website.com/john/doe/I am getting really close using sed, but I just can't get the syntax quite right.
I need a script that can do this: A script that searches all directories and subdirectories for .html files When a .html file is found it creates a index.html file in that folder. It then edits the index.html file and inserts links to all of the .html files that are in that folder into the body. If no .html files are found, it searches for folders. It then creates a index.html file with links to all of the folders.
I have a problem with arrays in awk. What i want is to take some data from a file (ssh log) and print it to a html table. I have managed to print some stuff (user logged in and how many times they have logged in) What i want more is to take all the ip that each user logged in from and print it in a row next to the username and times (in the code i typed blabbla where i want the ip to be shown. How do you think i should approch that, multidimensional arrys maybe?
I have a log file (test.log) starting & ending within dash (--) as below. I am looking to write a parser for test.log. This test.log file currently has single value for one Job ID but I wish to parse for repeated N values of different Job ID - Job, User, Queue, Dispatched Date, Dispatched Time, Completed Date, Completed Time, Hosts/Processor, CPU_T and TURNAROUND. I can either output this 10 values in another .log file or dump into cgi.
The selected parameters from test.log for parsing with above 10 attributes are -
what is the best way (i.e standard way that is supported on all browsers and probably as well followed by web crawlers).... to include an html file either locally or externally in another ? Of course , i've done the research and i also know that there are server side includes (php , asp ...you name it) at the moment , i'm using this:
Quote:
<script type="text/javascript" src="path to file/include-file.js"> </script>
however, i've been warned that this method may not show up in some browsers as some tend to ignore this tag and that crawlers like your favorite search engine wouldn't bother reading this. so , what is the best and safest way to do the job? and btw , the reason why i've ousted SSI's from the start is because of among other things:
1) the fact that the included file is static html and because the text is included pretty much everywhere
2) hoping to reduce load time as the code (if successfully recognized) would hopefully be treated like any other embedded external file (e.x like an image) , therefore it would be cached without the need to downloaded it over and over again for each new page on the site.
I have a few problem. I have txt file like this:Quote:00 21 55 84 9a ff 00 1f 9e 1a 5b 00 08 00 45 00 00 4b 00 00 40 00 3f 11 9a 0e a1 8b fa 02 04 02Then, based on my txt file, I would like to generate text like this:Quote:00215584 2155849a 55849aff 849aff00 9aff001f ff001f9e 001f9e1a 1f9e1a5b 9e1a5b00 1a5b0008 5b000800 00080045 08004500 00450000 00004b00 004b0000 4b000040 00004000 0040003f 40003f11 003f119a 3f119a0e 119a0ea1 9a0ea18b 0ea18bfa a18bfa02ased in my reading, I found about ngram solution in perlbut I not really understand to edit from source code given. I m begineer user in programming language. I hope to get the solution. [URL]
I am using File::Find to go through a very large tree. I am looking for all xml files and open only those that contain a tag <Updated>. I then want to capture the contents of two tags <Old> and <New>.
My problem is, after I open the file and do the first grep for <Updated> (which does work), I am unable to grep again unless I close the file and open it.
I did something like this:
Quote:
find(&check, $dir); sub check { if ($_ =~ /.xml/){ open(FILE,"$_"); if (grep{/Updated/} <FILE>){ # <-- works
I am writing a script that involves reading the content of a file present in a directory and/or its sub directory. I know readdir returns all the files & DIR names in a directory but how to check weather readdir is returning a file or a directory
I am trying to read certain lines within a file and give the output of the certain lines that dont equal my value, I think showing you would be easier. There is multiples of these inside one file...
Code:
LV Name /dev/vg00/lvol1 LV Status available/syncd LV Size (Mbytes) 300lable/syncd
[code]....
I want to read everything in the file, if the status is not available then it should display the name (directly above status). If they are all availbale then do nothing. I think I know how to do it which includes putting the info in string form and placing in hash but it is proving to be out of my skill range.
I have a CSV file, which I created using an HTML export from a Check Point firewall policy. Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or services.
I need the output to have each rule described in only one line. It's easy to distinguish when each rule begins. In the first column, there's the rule ID, which is a number.
Here's an example. In green are marked the strings that should be moved:
See example. The strings that should be moved are in bold:
Read the first column of the next line. If there's a number:
Evaluate the first column of the next line. If there's no number there, concatenate (separating with a comma) the strings in the columns of this line with the last one and eliminate the text in the current one
The output should be something like this. The strings in bold are the ones that were moved:
Order of these lines are random... So I cannot delete line #19, for example... And you can see that top four lines I want to delete are pairs. So there might be some clever way to detect the lines, if a line has both "1.9" and "1.11", then delete the line... I am new to perl language. The following is the code I have now... I think I just need to write some code inside the while loop checking if I want to delete the line $dotline before I write to a NEW file.