Programming :: Regular Expression To Match Lines In A File That Do Not Start With # Or A Blank Space?
Dec 25, 2010
i have a file like this
# comments
#comments
#comments
bla bla
[code]....
i want to grep lines which do not start with # or a blank space. like
bla bla
bla bla
how do i do this? i tried grep --invert-match '^#' which gives lines not starting with # but gives me blank lines too i tried grep --invert-match '^#|^ ' which will give lines not starting with # OR not starting with blank ( which means any line including ones starting with #
I need to use sed to edit a file that contains just one line. This should be pretty simple, but I've googled and can't seem to figure it out. I need to match everything from a certain string up until the first comma in the line. There are multiple commas in the line and my matching pattern is matching up until the last comma, not the first.
Here is what I'm trying:
As you can see it is matching up until the last comma. Seems like the .* is matching any character including the other commas. The output from this that I am hoping to achieve:
How can I get the regular expression to match from asdf: up until the first comma?
I'm attempting to search through a rather large assortment of html files created in Word using 'save as html'. what I'm trying to do is find and delete these tags (they're causing browsers to display black diamonds with white question marks):
<span style='mso-spacerun:yes'> </span> Tags contain from 1 to 4 spaces between opening and closing. I get positive results from this:
grep <span style='mso-spacerun:yes'> filename.html but once I attempt to tell it to match any number of characters up until the next '>' symbol, it tells me I'm using an invalid regex: grep <span style='mso-spacerun:yes'>[^>]+> filename.html
I've been nose-deep in regex tutorials for the past day or so, and I'm still not understanding why this doesn't work. If I put the pattern (without backslashes) into a separate file and use `grep -f patternfile filename.html`, I get no error but no matches either. So far as I can figure, the above regex boils down to: Match the string "<span style='mso-spacerun:yes'>", followed by any number of characters that are not a ">", followed by a ">". If someone could tell me where I'm going wrong with this,
I only want to match the directories ape/ and apes/ but I think it is matching any directory that ends in "ape" or "apes" or maybe does it match any string containing those characters in any order? I am not great at regex, and have read alot, but still not sure if I understand this correctly.
I'm trying to math all class references in a C++ file using grep with regular expression. I'm trying to know if a specific include is usuless or not, so I have to know if there is a refence in cpp. I wrote this RE that searches for a reference from class ABCZ, but unfortunately it isn't working as I espected:
grep -E '^[^(/*)(//)].*[^a-zA-Z]ABCZ[]*[*(<:;,{& ]' ^[^(/*)(//)] don't math comments in the begging of the line ( // or /* ) .* followed by any character
[code]....
Well, I can get patterns like this:
class Test: public ABCZ{ class Test: public ABCZ { class Test : public ABCZ<T>
I have a very, very large log file (360MB) that I'm trying to thin out. As it turns out the majority of this file has entries that aren't necessary so I'm attempting to build a command that will strip these out. The following command works to display only the data that I do not want:
This displays exactly the data I want to delete from the file by displaying the expression and six lines above it and five lines below it. However I'm at a loss as to how to remove this data from the output and display everything else. I looked into the -v option with grep redirecting the output to a new file:
However it doesn't work, the new file is the same size as the old one. What am I doing wrong? Is there a better method of doing this? I'm a bit out of my element since the method I'd normally use can't handle files of this size.
There is always one occurrence of € in each line. I want the numeric value that precedes this € occurrence. The random text (before and after) may contain numbers too, so the € may be important to parse, in order to correctly identify the number to return. The last character that precedes the number to extract is always a ">" (coming from an HTML tag).
I'm writing a loganalysis application and wanted to grab apache log records between two certain dates. Assume that a date is formated as such: 22/Dec/2009:00:19 (day/month/year:hour:minute) Currently, I'm using a regular expression to replace the month name with its numeric value, remove the separators, so the above date is converted to: 221220090019 making a date comparison trivial.. but.. Running a regex on each record for large files, say, one containing a quarter million records, is extremely costly.. is there any other method not involving regex substitution? here's the function doing the convertion/comparison
I remember reading that using sed, you can do this with parentheses: s/abc(something)def/(something)else/g I can't find an explanation of how to do something like this with Awk. Say you have this in an HTML file, where (number) stands for a one or two-digit number:
Part of a perl script I am writing need to change the character at an index to upper case. Now i am new to perl and i am having some trouble getting it to do it. In c++ i would do something like
Code:
Now from what i understand the same thing is possible in perl using regular expressions. But i cant get it to work.
I have a file with three consecutive blank lines. I want to delete two and keep one.Also, if anyone could direct me towards a guide on regular expressions particularly as they apply to sed, I would be grateful. I am having a hell of a time figuring out the syntax.
how do I get this regular expression to work in an if/else statement? This is just a little script for learning BASH. don't be too harsh.
This script will test if a certain number of files with 1-4 in their filename exist and print their filename. An error message will be printed if not.
# for i in `ls file[1-9]` do if [[ "$i" == *1-4 ]] ; then echo "This file, $i, ends in a number between 1-4" else echo "Error, this file, $i, does not end with a number between 1-4" fi done
I get this error. ./file_test.sh: 13: [[: not found
I'm writing a Perl script to find an old key in a file and replace it with a new codefirst the program should find the old key in the input file. here is the way I used in my script. but it doesn't work.May you please let me know what is wrong and how I can correct it?the key is stored in the file in the following format:
PHP Code: Key=("1234567" , someOtherVrable) I want 1234567
I was doing an exercise on Learning Perl, 3rd edition. (exercise chapter 10 btw) The problem asks to create a program that generates a random number and asks the user to guess. It should tell the user if the guess is lower or higher and exit if the user types either exit or quit. My code is the following:
I have something like the following in my expect script:
Code:
interact { -nobuffer -re {^s } {
[code]....
I have put the "^" anchor to match only those pattern that does not have anything before "s" e.g.
1.when I type "s" followed by "enter" key it should match.
2.if I type something like "chess" followed by "enter" key it shouldn't match. the second case is also being matched by the regular expression I have in my code.
All I want is a command that reads one data file with several columns and prints it in another one.However, whenever the value in one specific column alters, it prints one empty line in the new file. For example, consider the file
I wan to get this text only from the whole html code. </p></td> From the above code I want to get that english written sentence only using php preg_match or anything other which makes it possible. I've tried following so far but doesn't work
I am trying to write a script to edit text files formatted like this:
Code: (MCAL@Contig766:0.30207,CGIG@CVIR_Contig1014:0.13977,(HASI@HDIS_Contig573:0.16828,(CAPI@LCIN_5594371:0.36581,CFOR@FQH745302RIQ7Y:1.91244)0.160:0.00019)0.939:0.15648); There are never line breaks or spaces in the actual files.
I want to delete all instances of the character "@" and everything between it and the next "," (including that comma) or the next ")" (including that close parentheses) whichever comes first. My desired output file would be like this:
Code: (MCAL,CGIG,(HASI,(CAPI,CFOR)0.160:0.00019)0.939:0.15648); I figured out how to do this using sed for either "," or ")" but both looking for whichever comes first.
I am trying to scan a website for http references (links) with this script:
Code:
from urllib import urlopen import re current_site = urlopen("http://en.wikipedia.org/wiki/").read() search = re.search('href="[a-zA-Z0-9]"', current_site)
[code]....
I get the following error message:
Code: Traceback (most recent call last): File "C:UsersadminDesktopcrawler.py", line 8, in <mo print search.group(0) AttributeError: 'NoneType' object has no attribute 'group' I have googled the error
I'm having a bit of trouble with a regular expression I'm trying to write and I'm not sure if it's something Tcl specific or my lack of regexp understanding.
[Code]...
I get a number of strings passed to a proc in the format 3|x where x is a number, either 0 or within the range 5-12. My understanding is that that regexp will match the literal '3' followed by a '|', the escapes the special meaning of |, and then 0 or, because of |, a number within the range 5-12. However I'm getting the error 'couldn't compile regular expression pattern: invalid character range'.