Programming :: Remove All Lines Containing Extended Characters
Nov 7, 2010
I am using 'sed -e /foo/d' to match lines which I want to delete from a file. I discovered I have some lines which contain random (extended?) characters like 'ủ' which I would also like to delete. The lines in the file should only contain alpha numeric characters.
View 8 Replies
ADVERTISEMENT
Nov 24, 2009
How do you remove parts of strings using python? Such as, if I have something like:
Code:
erme1 sdifskenklsd
erme2 sdfjksliel
[code]....
View 3 Replies
View Related
Jan 29, 2011
For example, I have a file called "file" like this one:
type=strongsubj len=1 word=absolve pos=verb stemmed=y priorpolarity=positive
type=strongsubj len=1 word=unique pos=adj stemmed=n priorpolarity=neutral
type=strongsubj len=1 word=absolutely pos=adj stemmed=n priorpolarity=neutral
type=weaksubj len=1 word=taking pos=verb stemmed=y priorpolarity=positive
type=weaksubj len=1 word=friend pos=noun stemmed=n priorpolarity=positive
type=weaksubj len=1 word=usually pos=adverb stemmed=n priorpolarity=positive
type=strongsubj len=1 word=purecolor pos=anypos stemmed=n priorpolarity=negative
type=strongsubj len=1 word=accusingly pos=anypos stemmed=n priorpolarity=negative
I want to add the plural for the noun, for example if find this line:
type=weaksubj len=1 word=friend pos=noun stemmed=n priorpolarity=positive
will add one more line :
type=weaksubj len=1 word=friends pos=noun stemmed=n priorpolarity=positive
where we add "s" for the word friend
I did try to do like that:
<code>
cat file | while read LINE ; do
set -- ${line}
if [[ "${4#pos1=}" == "noun" ]];then
#I tried this line but it doesn't work properly:
v3==$(echo $line |sed 's/$3/$s') #I want to find the third word "word=friend" in that line and add "s" after that word
# I don't know what command to add this new line "$v3" to the file ???
done
</code>
View 12 Replies
View Related
Feb 20, 2011
I am doing a mysql query with a bash shell script like:
mysql translator -u root --password=******** -e "SELECT word FROM tagalog ORDER BY RAND() LIMIT 1" | while read line; do
echo $line
So when I echo the value of $line I get:
word
magandang umaga
"word" is the name of the row in the table and maganda umaga is a randomly selected choice from the row. Is there a way i can remove the name of the row from the variable $line. With a result that will allow me to echo $line and output only the randomly selected entry in from the row e.g. magandang umaga
View 13 Replies
View Related
Jun 3, 2010
I'm having a bit of a headbanger trying to work this one out. I'm trying to remove all of the characters on a line apart from the last 17. For example, I need to change this:
Code:
00000000000000000089;0bbfaeb8
01000000000000000089;0bcb5948
00000000000000000089;0bcc4c40
[code]....
View 5 Replies
View Related
Nov 3, 2010
How can I remove all lines which contain A,,,,,, I tried the following sed statements but no luck.
Code:
sed "/A,,,,,,/d file"
sed "/A,,,,,,/d file"
View 6 Replies
View Related
Apr 14, 2010
How can I remove characters from grep output using sed? code...
View 9 Replies
View Related
Jan 30, 2011
I am reading strings from a file using readline() function,the file contains some strings which has only special characters, I need to avoid the strings which has only special characters, the special characters are not similar. How to do it in python.??
View 2 Replies
View Related
Apr 27, 2010
I hope you can help. I have a collection of spreadsheets with data that needs to be imported in to SQL. The data has been manually entered although there are portions where data has been copied and pasted from the web.
When converting these sheets to a CSV I get strange characters where it looks as though data has been copied and pasted. Is it possible to write a script (AWK?) to pull out these characters?
I guess the script will need to keep alpha characters, spaces, numerics and commas but nothing else. How easy is this to do?
View 7 Replies
View Related
Jul 26, 2010
I am working with a Tcl script and have some strings in the following format (RE):
[a-zA-Z]+[0-9]{6}-[0-9]
There are some leading letters, combinations of capital and lowercase. Then six digits, followed by a hyphen, then one more digit. I would like to remove all of the leading alphabetic characters from the string. The resulting string would then be in this format: [0-9]{6}-[0-9]. In other words, six numeric digits, a hyphen, then one more digit.
I have tried:
Code:
set newstr [string trimleft $origstr alpha]
But that only removes the first alphabetic character, not all of them.
I couldn't get anything with regsub to work correctly, but I am somewhat of a noob with RE's in general and regsub in particular. There are usually 5 leading letters at the beginning of these strings, and I could in most cases get away with using string replace and constant indices to extract the substring. However, my preference is for this to be robust enough to handle all cases with 1 through n leading alphabetic characters.
View 3 Replies
View Related
Jul 15, 2010
I'm trying to search through some pdf files and I'm doing so by converting them to text files using pdftotext which is fine but I'm trying to get the number of occurrences in a paragraph of different words and it's adding a new line character at what it thinks is the right hand margin. I'm trying to remove all these singe new line characters but keep the doubles and I can't seem to work it out. i.e.
This is some text that has been broken.
Another paragraph.
becomes
This is some text that has been broken.
Another paragraph
View 9 Replies
View Related
Jan 21, 2011
I'm trying to come up with ideas for a simple way to strip a specific "entry" from a text file.I know tools like sed and perl can remove specific lines from a file but I haven't been able to come up with an elegant way to do my group of lines.In my file, the first "Location" line and the "SVNPath" line should be unique every time... but are they enough to strip out the whole set of the group plus the trailing one line of white space separating each group? Add to this, my file will grow as new entries are added (always appended to the end) but new entries will have the same formatting.
View 9 Replies
View Related
Apr 8, 2010
I have a file with semi duplicate lines, like:
abc 12 32
agsi 82
sha 26
abc 1
iaij
agsi 3
Now I want to edit my file and make it:
abc 12 32
agsi 82
sha 26
iaij
i.e. remove second occurrence of line when 1st column is abc or agsi.
View 13 Replies
View Related
Jun 21, 2011
I a csv-file (A.csv) with a total of 4.600.000 lines. Thats to many and only a few is necessary. I have a txt-file with 150 lines (X.txt) (all lines is dataset from a mainframe and looks like abc.def.123.456. How do I remove lines from A.csv where none of the dataset from x.txt is present?
View 13 Replies
View Related
Apr 6, 2010
How can I use extended ascii characters, like ALT + 2 + 0 + 0 for instance? I'm using some of those characters for my passwords for online accounts made under MS Windows and it seams I'm unable to use them in Slackware 13. For instance: if I type ALT+2+0+0 in Pidgin there is no character displayed and if I type in the Terminal the same thing, it will replace my shell prompter (sasser@HOSTA:~$) with (arg: 200):
sasser@HOSTA:~$
(arg: 200)
View 2 Replies
View Related
Dec 9, 2009
In previous versions of Fedora I was able to do Ctrl + Shift + U, enter the Unicode number - i.e., 20ac, press Enter and get a euro character. In Fedora 12 I do not have that feature. My language is US English.
View 5 Replies
View Related
Jul 16, 2010
I have a large text file that's formatted sort of like this:
Code:
foo bar
blah
[code]...
View 2 Replies
View Related
Jun 5, 2009
I want to remove duplicate or multiple similar lines from multiple files. I.e. if I have four files file1.txt file2.txt file3.txt and file4.txt and would like to find and remove similar lines from all these files keeping only one line from these similar lines. I only that uniq can be used to remove similar lines from a sorted file.
View 9 Replies
View Related
Nov 13, 2010
Debian "squeeze" AMD64 Some filenames, containing accented or other extended ASCII characters are not shown both in Nautilus and Terminal, nor in Virtual Console.
I also noticed than when asking octave interpreter (ran from terminal) to display character range from 97 to 140 the output was:
On the other hand, when executing the same query in qtoctave the characters are displayed properly.
I've tried to change the font that the gnome terminal uses, to no benefit.
My default locale is en_us.utf8 and I am about to install every package that contains the prefix ttf
thank you for your time reading this
View 3 Replies
View Related
Nov 2, 2010
I have a file with a random word on each line (3k+ lines). How can I get the lines with only five characters? I tried using grep file | more, but it returns all the words (even those less than 5 characters).
Edit: I also tried grep '.{5}' file | more but it doesn't show anything. And grep '.{5}' file | more returns all lines with four or more characters (I'm really confused about why it's doing this).
View 4 Replies
View Related
Apr 17, 2009
I would like to modify the content of a text file in Linux, in the following way:=> the file has several of these lines:./run_pest3 ./g134366.04080_0.062 x 2_d043 1 0.43 results_EC=> I want to modify all lines to be:./run_pest3 ./g134366.04080_0.062 x 2_d043 1 0.43 results_EC0.062i.e., the last number of $2 should be "attached" to the end of $7, for each line.
View 5 Replies
View Related
Oct 4, 2010
I'm looking for a script (bash, python, perl etc) or even a one liner (sed, awk etc) that can take a set of files and remove any line that has more than "x" instances of any character (case sensitive). I have been doing a lot of searching and can only come up with examples of how to remove blank lines, lines that start with a certain character or lines that contain a certain string. This will be used on a system running a Kubuntu derivative.
As a very poor and basic example, I would like to take files that contain lines like:
Code:
And end up with the files only containing the lines:
Code:
If I tell the script that 2 is the maximun number of times any character can appear in any line.
I know this must be possible, but for the life of me I cannot find even an example that will lead me in the right direction or better yet a piece of code I can use.
View 15 Replies
View Related
Dec 25, 2010
View the entire contents of the file / etc / passwd, showing first 10 lines of file / etc / group, 10 displayed the last line of the file / etc / group. And
- The total number of lines and characters in file pwd and grp
View 2 Replies
View Related
Jun 29, 2011
Slackware 13.37, tested on 2 different PC;
affected: mousepad and tcl/tk applications
I am using mousepad and tcl/tk application to view text files with long lines. Sometimes ago I found that some characters (part of line) in long lines disappear. The problem is shown on a very small video. [URL]
View 14 Replies
View Related
Aug 5, 2011
In Windows's CMD when you execute a command and then start writing the next one (while still executing the former one) the characters remain in the buffer and they all come up nicely to the new line once the previous command has been executed. In Ubuntu when I do this the newly typed characters annoyingly get in the beginning of the previous command's output lines. I don't really understand why isn't the default method as in Windows's CMD. I mean otherwise almost _everything_ sucks with it when compared to Unix/Linux shells/terminals (commands are longer, syntax is annoying, etc.) So I'd like to know how to do this in both Bash and Zsh.
View 1 Replies
View Related
Jan 28, 2009
I have a text file called file1.txt containing many lines eg.
line1
line2
line3
line4
line5
line6
Then i have another text file called file2.txt contains
3
5
6
Is there a command to remove the lines in file1.txt based on the keywords in file2.txt? note: It should remove line3,line5,line6 based on 3,5,6
View 10 Replies
View Related
Dec 13, 2010
file = TT.ParlayX_RequestLog_78653_20101212180044.log.17490
1. Want to remove the characters before the first dot (.) including the dot (.)
2. Want to remove the characters after the last dot (.) including the dot (.)
That is, basically, I want the output as:
ParlayX_RequestLog_78653_20101212180044.log
View 7 Replies
View Related
Mar 14, 2011
I have a large file and need to remove all the lines containing symbol/symbols.
For example: . , ! " # $ % & / ( ) = ? � � ' � + * � { } ] [ - _ : ; , > < (maybe more)
View 13 Replies
View Related
Aug 30, 2010
I have a file that looks like this:
1
2 3 4 5 6 7 8 9
10 11 12 13 14 15
16 17 18 19 20 21
22 23 24
1
2 3 4 5 6 7 8 9
10 11 12 13 14 15
16 17 18 19 20 21
22 23 24
1
2 3 4 5 6 7 8 9
10 11 12 13 14 15
16 17 18 19 20 21
22 23 24
...
I would like to reformat it to look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
...
Is there a nifty awk/sed one-liner to do this operation?
View 6 Replies
View Related
Sep 18, 2011
I have a directory (Linux user) with a number of files which contain an added [!] to the end of each file name so that each file reads out as:
foo something [!].zip
bar something [!].zip
helloworld [!].zip
etc.
What is the quickest way to batch rename these to remove the ending [!] character combination from these file names?
View 2 Replies
View Related