Programming :: Sed - Awk - Perl - Merge Lines Unless They Match A Certain String
Apr 15, 2011
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings? I'm new to sed scripting and I have been working on this for some time now. I have a large file (sample below) that I need to edit.
What I need looks something like this.
I'm working with a very large file so simply merging all the lines then adding a new line character before ">contig" and after "translated" won't work, at least not with sed.
One of my application generates a text file with an XML output in it. I need to read that log files and if the output does not match to a string in couple of tags it should create a log file with the file name and the the tag name.
The two tags where the string should match is:
Identity format tag should always be JPEG , well- formed and valid status tags should be true.
I'm trying to split a text file into various parts. Everything in between "123" and "break" (including linebreaks) goes into the splitted file.
e.g. using this text file:
This should split into 4 files. However I'm only getting 2 files: one for the line "123break" and one for "123 blah break". The two occurrences that contain linebreaks are being ignored. The .* part of my match should capture linebreaks seeing that I'm using the /s modifier shouldn't it? Even when I use the match /(123 break)/gs it still doesn't capture the first occurrence. I'm using Perl v5.12.3 (from ActiveState) on Windows XP. The text file is also in Windows format.
Code listed below.
The above code generates two files Output_1.txt and Output_2.txt which contain "123break" and "123 blah break" respectively. I want it to generate four files.
I am new to perl and am having trouble adding some strings together.
My full code is below:
The problem is $NewCommandB is always split into two lines, where the second line contains the "/atlas2/<blah>/<etc>/..." string. Since I am generating a .sh file to execute a lot of similar commands I need the string to all be on one line. Any idea why I get this behaviour and any suggestion on how to tell perl to make $NewCommandB a one line string?
Btw for completeness finalFileList.txt contains just file names one line after another:
My problem is like this I have to delete all lines between two pattern match example- suppose below is the content of the file then i have to delete all lines between text1 and text2
I want to match watever that is within the <item></item> tags and save it in the $content variable. however, the <item> tags can spread over multiple lines:
how I can match a literal string in awk i.e. making awk to *not* interpret the characters coressponding to its builtin operators in a given string. Take this code:
I have a file, and I have to display all the lines from the beginning of the file till a matching string is found.
I know grep with "-a", "-b" as option does exist, but it needs the number of lines to be printed in advance. eg grep -b 10 "search_string" file so it will print 10 lines before a match is found.
I want to strip the process name from the hosts - i did it with the code below.
I have two questions - is there a more compact way to strip off the process names? usalso i want to get rid of the errors after extracting the hostname. It is complaing about $arry[1]. using my $arry[1] is not allowed. Assigning the slice to a value, as is 'my $sliced_arry = $arry[1]; print $sliced_arry , does not work either.
Code:
Use of uninitialized value in pattern match (m//) at newcomm_stats.pl3 line 7, <NEWCOMM> line 16 here i get what i want - just the host name, but still get those nasty errors. assigning a value to $1 does not work, and localizing $1 with 'my' is not allowed.
I need a substitution of a particular string (StringA) with another string (StringB). However, there may be more than one occurrence of StringA within the file, but only one instance needs to be changed, which is why I'm trying to be sure of it's positioning against something I know will be unique in the file, and will always have the same distance from the string to be replaced. So, I intend to match on a string (StringC) above the string to be substituted and then have sed go to StringA below and replace with StringB.
So far, I have had some success with the following:
Code:
... but I can't help thinking that there *has* to be a cleaner way of doing it.
I have a sed match that matches for certain string of a regex expression:
Code: tname=$(echo "$contents" | sed -n 'some pattern')
How do I match for multiple strings in the $contents and return them as an array? for example
Code: contents="this is a text, just to match patterns, here is another text to be matched" the sed func would be able to recognize both "text"s, but only one is outputted?
Possible to put it in an array? so ${bar[0]} gives one and ${bar[1]} gives another
I am trying to make a random sentence generator in perl. So far I can loop it x times to make a fixed quantity of words, but I don't want to do that. Or I can let it go on until I hit ctrl-C :/
I want to have it so that when it reaches a word with a sentence terminating punctuation mark it stops.
My attempt to do that was with:
This is doing the woooosh text until ctrl-C thing...
Now however I am not sure how I can cut off everything after the punctuation mark (when it exists).
Using a list of names (over 4000 of them) painstakingly gleaned from the source file years ago for a database file, I want to match the names against the source file so that they can be updated with the tags <forename></forename> in the original source file.
I placed the list of names in @forenames (only posted a few of them here).
Perl script is:
I am able to get the name bracketed by the tags to appear on the console screen but don't know how to apply the output to the source file. Perhaps I need to do a match on the words then some kind of edit to surround the matching words with the xml tags? I'm a rank novice doing this as a labour of love for a friend.
i want to grep lines which do not start with # or a blank space. like
bla bla bla bla
how do i do this? i tried grep --invert-match '^#' which gives lines not starting with # but gives me blank lines too i tried grep --invert-match '^#|^ ' which will give lines not starting with # OR not starting with blank ( which means any line including ones starting with #
there is a way to add line spaces when asking for user interaction in a script. For example:
Code:
SPACE Hello what is your name? SPACE SPACE
So this is asking a question but has a space/empty line at the top of the screen and 2 spaces/empty lines below. I've seen it done in a bash script using for each line/space needed
I am trying to read certain lines within a file and give the output of the certain lines that dont equal my value, I think showing you would be easier. There is multiples of these inside one file...
Code:
LV Name /dev/vg00/lvol1 LV Status available/syncd LV Size (Mbytes) 300lable/syncd
[code]....
I want to read everything in the file, if the status is not available then it should display the name (directly above status). If they are all availbale then do nothing. I think I know how to do it which includes putting the info in string form and placing in hash but it is proving to be out of my skill range.
I want to read a input from user and output something like 'inputcd', which has to escape all backslashes if using double-quote. For instance, the following code would work.
Just curious if any other way I could do it without specify all backslashes? Since that takes much efforts when the sequence is long.
In my perl script I'd like to test if a string is written in uppercase letters or not. How can I do that? This type of test don't seem to work, so there must be other ways of doing this:
Code:
...return true.
I can create a subroutine that compares each character aginst a list of uppercase letters, but I'm hoping there's allready a build in routine in perl that does this...
Order of these lines are random... So I cannot delete line #19, for example... And you can see that top four lines I want to delete are pairs. So there might be some clever way to detect the lines, if a line has both "1.9" and "1.11", then delete the line... I am new to perl language. The following is the code I have now... I think I just need to write some code inside the while loop checking if I want to delete the line $dotline before I write to a NEW file.
I have a dataset of around 1000 lines. Out of these 1000 lines I need to pick randomly 160 lines of data and write it to a file. The program is needed to eliminate data bias when I run the program through a reanalysis program. I am thinking I need to use the rand or srand term, but I am having difficulty writing this in perl. I have to write it in perl, because the rest of my scripts for this project are in perl, so consistency is important. The data only consists of one column of the data (YYYYMMDDHHHH).
I have written a regular expression (tested in regexpal and regextester alpha something) with which I want to replace something like code...
but it only matches functions which occupy one line only, despite my tests showing multiple line matching in javascript testers online and using the m and s flags (which should make it multi line no?)