General :: Finding And Removing Duplicate Consecutive Words
May 2, 2011
I want to find and remove duplicate consecutive words from a text file. I've tried working with array but is very difficult..then i've tried using sed...somebody hint me with this sed :
sed ':f;N;$!bf; s/(.*)
1/1
/g; s/(.*)1/1/g'.
It works fine but if i have 3 consecutive identical words it only remove first one and the last two remain intact.
I am basically trying to remove duplicate words in my <title></title> tag after I got hit by Google Panda. I have around 750 .html files and it will be difficult for to me remove one by one. I am looking for a way to remove only from within <title> </title>
Example of a duplicate title I have:
Code:
<title>Pasta, Pasta Recipe and Pasta Guide</title>
I dont want to replace those words anywhere else in the file except for within the <title>
I am looking for this `struct messages_sdd_t` and I need to search through a lot of *.c files to find it.However, I can't seen to find a match as I want to exclude all the words 'struct' and 'messages_sdd_t'. As I want to search on this only 'struct messages_sdd_t' The reason for this is, as struct is used many times and I keep getting pages or search results.The directory I am searching in, has another directories so it will have to search recursively.I have been doing this without success:Code: find . -type f -name '*.c' | xargs grep 'struct messages_sdd_t'and thisCode: find . -type f -name '*.c' | xargs egrep -w 'struct|messages_sdd_t'
I have a couple of MP3s that have duplicate fields in their ID3 tags. Let me show you what I mean: This is causing problems with some media players. Is there a tool that can automatically fix these MP3s in batch? I'd prefer a free Windows or Linux program. I'm not afraid to work on the command line if necessary.
i waas wondering if anyone knew of a script or program that removes duplicate words in a txt file. im making an install script and the install list has gotten a bit long so i want to ensure there are no duplicates in the file
I used awk "'!x[$0]++' test.txt > file.new" ,but it deleted #1 also.I tried using uniq command but i didn't work. Can anyone Please let me know is there any way to do this using shell script.
I copied a back up of my windows 'my documents' fold and all of its' sub folders into my linux (Mint Debian) Documents directory. I found that many of my files can be found in more that one directory so, what I want to do is to find all the dups and deal with them. Is there a good linux application to resolve this 'duplicates' problem. (I don't want to touch the linux system files.)
Anyone know of a good Linux application that will remove duplicate files interactively? I've recently spent a lot of time (read weeks) pruning my music collection, basically by hand. and now I'm moving on to my family photos. Most of the work with the music was done under Windows XP. As for the photos, I have a fantastic Windows application, D'Peg, that I had actually purchased some years ago. This app rocks for Windows. In my opinion it's so good that I would happily pay double the asking price. However, I'd prefer to use Linux if possible, so, what's out there, anything that is worth it's salt? Currently playing around with Picasa.
I have made a custom grub2 menu however, both the default and the custom show together. So my grub looks like the list below, the bolded entries are my custom ones. How do I get rid of the duplicates? I have tried apt-get remove and deleting old kernels.
ubuntu,linux ... ubuntu,linux recovery memtest memtest windows7 windows7 ubuntu linux ubuntu linux recover
Is there a Linux program I can use to display random words from a list? By entering words in a spreadsheet and then sorting the list in alphabetical order, I made a list of new vocabulary words for myself to memorize, and wondered whether I could make random words from the list display on the screen daily. I know I could write a program to do that if I knew programming, but I don't.
This simple task is proving harder then imagined. I have a multi-level directory that I'm trying to clean of duplicates, but I can't get 'find' to print what I need to see. To give an illustrative example, here is a dir:
Code: stuart@stuart:~/testdir$ ls * dir1: level2: dir1
So the output of find as i'd like it to work would show the two locations of dir1, which would be ./dir1 and ./level2/dir1. But no:
Code: stuart@stuart:~/testdir$ ls -d */ | head -1 | find . "`cat`" -type d . ./level2 ./level2/dir1 ./dir1 dir1/
I recently upgraded my x86_64 system from FC8 to FC10 using Pre-Upgrade. (related blog link) It appears that the upgrade process installed a steaming pile of i386 packages that are duplicates of existing x86_64 packages. I now get update errors because of this package clash. I have searched the fora and the most progress I've been able to make so far is: I apparently had 8 unfinished yum transactions so I did yum-complete-transaction 8 times and have no more incomplete transactions.
The output of package-cleanup --dupes is not very helpful:
Yet I still get transaction errors when I run updates via synaptic. It checks dependencies and downloads everything and errs when testing changes. This is the error it gives at the moment:
Code:
Test Transaction Errors: file /etc/gconf/schemas/gweather.schemas from install of libgweather-2.24.2-1.fc10.x86_64 conflicts with file from package gnome-applets-1:2.20.1-1.fc8.i386
I have a Nook. When I plug it into the USB port on my system, a window pops up asking what I want to do with the new device. I can open it and access media/nook and move files into and out of the directory.
There's a button to "Safely dismount Nook" before I unplug it. I use that. Apparently, however, sometimes it doesn't respond. Now I have .hal-mtab-lock in my /media folder, along with Nook Main Memory and Nook Main Memory (1) folders. I can't delete any of them.
How do I a) delete these folders, and b) make sure it actually unmounts the device in the future?
I have a file with three consecutive blank lines. I want to delete two and keep one.Also, if anyone could direct me towards a guide on regular expressions particularly as they apply to sed, I would be grateful. I am having a hell of a time figuring out the syntax.
Basically, I am provided with a file "temp.dat" with 30 high temperatures (integers) in it. The program is supposed to read them in and compute/print the average. Then it is supposed to print the temperature of each day and, in addition, display a + by each day that is over the average, but only if it is above the average high for three or more consecutive days. This is the part I am stuck on. I'd appreciate any tips that would point me in the right directionFull disclosure: This is a school project. Code:
Each line represents a portion of a data matrix. I want to convert the numbers after the "=" to the range of that partition in the matrix such that the output file looks like this:
15 for(i = 0; i < N; i++) I want to replace "i" with "test" in the line above,whose line number is 15. When I tried this command :15s/i/test/ Line 15 turned to be this: for(test = 0; i < N; i++) It only replace the first "i",but I want to change all "i" in line 15.
I'm trying to use grep to find the words in the dictionary that contain the letters "th" and the letter m.
I tried grep 'th m*.' Desktop/Dictionary/words(Thats where the destined dictionary word document is located)
grep 'th' Desktop/Dictionary/words works but only for the words with th. I have no idea of what expression to use to make it a unionized expression with m
i want to remove words "Max" and "constrained" in a file given below:
Max 0.003745 constrained Max 0.004549 constrained Max 0.001689 constrained
[code]....
and further want to replace "Max" by line number so that i can plot the resulting file. i searched in forum, but couldn't do what i wanted to do. e.g. i used
I have a folder with many many files. e.g html, docs, excel sheet, script etc. Now I want to find {using grep command}a certain word in that folder/directory and delete it in all the files and scripts that have it.
For example, I want to delete the word /testing (with the slash) in all files in a directory.