Programming :: Splitting Text File Into Each Duplicate
Dec 10, 2010
I have a text file that is filled with references to duplicate files. I'm trying to create a text file for each duplicate file found that contains the paths to the duplicates. I would also like the text file names to be based on the size and file name.
Some thing like:
231.5 KB - P&S.doc.txt
138.5 KB - LIMITED#C71.doc.txt
Code:
NamePathSizeLast ChangeLast AccessFile TypeOwnerAttributes
P&S.doc(3 Files)
P&S.docZ:Leg\_Pri_LegPurP&SBUYBarry V231.5 KB11/2/2001 4:07 PM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)Lou_AC
P&S.docZ:Leg\_Pri_LegP&SBUYBarry V231.5 KB11/2/2001 4:07 PM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)DMsC
P&S.docZ:Leg\_Pri_LegPropsPurP&SBUYBarry V231.5 KB11/2/2001 4:07 PM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)DMsC
LIMITED#C71.doc(2 Files)
LIMITED#C71.docZ:Leg\_Pri_LegPurCV138.5 KB12/15/2003 1:04 PM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)Lou_AC
LIMITED#C71.docZ:Leg\_Pri_LegPropsPurCV138.5 KB12/15/2003 1:04 PM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)DMsC
ps revised.8.30.05.clean.doc(3 Files)
ps revised.8.30.05.clean.docZ:Leg\_Pri_LegPropsPurP&SSellVPSummit54.5 KB8/31/2005 11:46 AM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)DMsC
ps revised.8.30.05.clean.docZ:Leg\_Pri_LegP&SSellVPSummit54.5 KB8/31/2005 11:46 AM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)DMsC
ps revised.8.30.05.clean.docZ:Leg\_Pri_LegPurP&SSellVPSummit54.5 KB8/31/2005 11:46 AM11/22/2010 2:38 AM.doc (Microsoft Office Word 97 - 2003 Document)Lou_AC
Copy of 08 Lee All July Billing.xls(2 Files)
Copy of 08 Lee All July Billing.xlsZ:IS\_Sh_ISDevDocDocl 26 upgradeAS6 backup codeAPImport131.5 KB7/30/2010 12:11 PM11/22/2010 2:38 AM.xls (Microsoft Office Excel 97-2003 Worksheet)AdministratorsC
Copy of 08 Lee All July Billing.xlsZ:APKellie131.5 KB7/30/2010 10:03 AM11/22/2010 2:38 AM.xls (Microsoft Office Excel 97-2003 Worksheet)KellieC
View 5 Replies
ADVERTISEMENT
Nov 11, 2010
What I plan to do is, create a duplicate file of a text file, and then append some text into the new text file.
View 1 Replies
View Related
Apr 1, 2009
Was wondering if any perl guru's could help me with a quick log file adjustment. I have a text file that looks like so (tabs and newlines are revealed so you can see what separates the data):
There are maybe 100 lines of text in this file at any given time. I need to delete all duplicate lines only looking at the first bit of text prior to the first tab. It doesn't matter which one gets deleted as long as there are no two lines that begin with that same text at the beginning before the first tab. So in this example, either the fist line "1234" or the last line "1234" would need to be deleted. I already have code in my script that opens the files - I just need the code to read the text into an array and the part that would find matches based on the above criteria, and make the deletions.
If it would be easier, I can even do a system call and use SED (v4.1.5) and/or AWK (3.1.5) instead.
View 7 Replies
View Related
Sep 21, 2009
I have a utility that works with files. The utility is crashing at after about 120 files. The input to the utility is a file containing a filelist. I want to cut the file with the file names in it to seperate files containing about one hundred or so. My thought was to determine the number of lines/100 and then use head and delete to create temporary files to run the utility multiple times to prevent the crash. When I tried to create a variable using the wc -l command the output gives me the number of total lines but it also includes the filename of the input file. (873 Filename.txt) I can not figure out how to remove the Filename.txt from the variable.
View 2 Replies
View Related
Jul 27, 2010
I am splitting a file based on the values read from an input file. The below one is the script.
1)How do I add the header which is present in the original file to the new split files created?(For eg. pharmacyf conatins header as table column names. The new files created (ODS.POS.$pharmacyid.$tablename.$CURRENT_DATE.dat) are without the header).
2) Also the script is creating 0 byte files for the pharmacyids which are not available in the intial file? Can this be avoided?
for pharmacyf in *
do
tablename=`echo $pharmacyf |cut -f4 -d'.' `
while read pharmacyid
do
grep -w $pharmacyid $pharmacyf >> $OUT/ODS.POS.$pharmacyid.$tablename.$CURRENT_DATE.dat
done< inputfile
done
View 2 Replies
View Related
Dec 16, 2010
Contained within each of these 67 text files is about 1 million urls. Yes. I have 67 text files that contain 1 million lines of urls each. I am sure I am swimming in duplicates. I tried opening one text file and clicking sort ----->remove duplicates. Now Gedit is not responding my processor is maxed out to 100% and I think I am finally ready to delve into some command line code. Can anyone give me idiot proof instructions on how to sort the duplicates out of each one of these 67 text files? How about no duplicates across all 67?
View 7 Replies
View Related
Jul 22, 2011
I am basically trying to remove duplicate words in my <title></title> tag after I got hit by Google Panda. I have around 750 .html files and it will be difficult for to me remove one by one. I am looking for a way to remove only from within <title> </title>
Example of a duplicate title I have:
Code:
<title>Pasta, Pasta Recipe and Pasta Guide</title>
I dont want to replace those words anywhere else in the file except for within the <title>
View 14 Replies
View Related
Apr 14, 2010
i have a big file of random numbers i generated at some point in time, after working with it with different things(how fun that was)... i want to remove duplicate lines and i'm not sure i'm doing this right
heres the command
Code:
sort random.txt | uniq -u > rand-shorter.txt
the file is pretty big, everything on a new line. i found the command on a web site so i'm sure its correct(bit of a command line in linux newbie)
can anyone confirm if this will remove lines duplicate lines (keeping one copy) and dump what is left in a file named rand-shorter.txt?
EDIT: i think its actually working, just taking a reallllly long time (on an old pen 4 from 2000)
View 8 Replies
View Related
Mar 17, 2011
Trying to remove lines from a syslog text file that have duplicate strings
Mar 10 06:51:11[http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360]
then a few lines down
Mar 10 06:52:03 [http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360
got the same thing in terms of a u: number but the issue is I need to remove duplicates and just leave one and the file has multiple duplicates of different u: numbers and it's 14,000 lines long. can anyone tell me if I can use awk? sed? or sort for something like this to? removing lines that have a certain string in there that's a duplicate.
View 4 Replies
View Related
Jan 28, 2009
I have a text file called file1.txt containing many lines eg.
line1
line2
line3
line4
line5
line6
Then i have another text file called file2.txt contains
3
5
6
Is there a command to remove the lines in file1.txt based on the keywords in file2.txt? note: It should remove line3,line5,line6 based on 3,5,6
View 10 Replies
View Related
Aug 14, 2011
I have a text file with many pairs of number, one pair in each line. Each 25 of these pairs are a solution to a math problem I've been working on, and each solution is separated from another by a line with "**********".The problem is that there are duplicate solutions. In order to know exactly how many solutions I found, I have to delete the duplicate ones. How can I do that?Just to make things clear, here are the first three solutions:
1 1
3 2
5 3
[code]....
View 3 Replies
View Related
Sep 7, 2010
I am facing a problem while splitting a text file, I need to split a file into some parts and each split file should have 2000 lines, when I do it through "split" command the mother file is kept intact but as per my requirement I need to cut mother file into some parts thus it should not be kept intact.
Example:
file size
motherfile.txt 5000 lines.
after split
motherfile.txt 2000 lines.
childfile1.txt 2000 lines.
childfile2.txt 1000 lines.
View 7 Replies
View Related
Jan 19, 2009
I need to insert 3-4 lines of text to the beginning of a text file. The file is a largish MYSQL dump, the result of a backup shell script. This shell script should insert the required text.I've wrestled with sed, but lost.
View 2 Replies
View Related
Jan 13, 2010
I have to delete a certain line of text from the a textfile via ubuntu's shell scripting.I have done research, and it seems that most people advocate the usage of sed /d option. sed makes does not edit the text file. Hence, most options I discovered involved the use of a temporary variable/textfile and then overwriting the old file with the temporary new file. Is there anyway whereby I can bypass the use of temporary storage containers? I hope there is any magical combination of commands to edit the file directly.
View 3 Replies
View Related
Jan 8, 2011
I want to display something in my text view widget in glade using c code. that's all right.
now I need to attach a save button beneath the text view.so that on click the text view content should save as a txt file..
View 8 Replies
View Related
Feb 9, 2011
I want to display the contents of a particular log file (simple text file, I mean in Linux). But there is a problem: The contents need to be organized in a fixed format. Have a look at this log file:
sampleLog.txt
Code:
User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah!
[code]....
So, while displaying the contents of above file on a web page, I want to format the field names found in the log file: User Name:, Reported Problems Description:, and Remarks:. These fields may contain a variable length of text and no specific line number is assumed for them to appear on.
The desired output should look like this:
User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah!
[code]....
Well, what I am trying to do may sound wierd to some of you. The filed "Reported Problems Description:" can possible contain text which embeds colon (.
View 15 Replies
View Related
May 3, 2010
a sed command to add a text before line number in text file? I have text file with 500 lines, and i want to add 3 more lines with text after line 300, OR before line 302, isn't no problem.
View 16 Replies
View Related
Apr 1, 2010
I've a web page where I can search information in my database.Sometimes results of search operation are more of fifty.How can I split results in different web page as it happens for example in any search engine or operation
View 7 Replies
View Related
Oct 14, 2010
I am using gnu bash 3.2I need to split the string into array like
a=this_is_whole_world.file # split [_.]
I need to split this on _ and . some thing like this
a[0]=this
a[1]=is
a[2]=whole
a[3]=world
a[4]=file
preferable using bash regex. if not sed is also ok.
View 2 Replies
View Related
Apr 19, 2011
I have a .txt-file with ~50.000 lines of numbers, generated by a mathematics program. From this file, I need line ~ 1.100 to line ~16.000 (these lines are always the same btw, this may make the solution easier, dunno) to be copy/pasted to another file, where the lines ~500 to ~15.000 (also, every time the same) should be overwritten by the aforementioned lines...I haven't found or come up with anything that works yet, mostly I find solutions to copy everything from one file to another but I can't find something to specifically overwrite a part of a file with part of another.
View 4 Replies
View Related
Sep 15, 2010
What i am trying to do is i want to add numbers from 1 to 100. but that too using multiprocessing. So i made a c programme and using fork() command made two child processes. Now in one child process i am adding from 1 to 50. and in another i am adding 51 to 100. and then in the parent process adding the two results to get the final one. Now the result from the two function i am getting correctly. But after the wait() call the value returned is lost : See the programme below for reference
# include<stdio.h>
# include<unistd.h>
# include<sys/wait.h>
# include<stdlib.h>
[code].....
View 6 Replies
View Related
Apr 8, 2010
I'm trying to transfer a large .tgz file from a CentOS dedicated server to a linux webhost (unknown OS). The problem is the webhost will not allow a 1.1gb file to be uploaded, however it will allow the upload in 149MB chunks. I used the split command to segment my tgz into 7 segments under 150mb. I then uploaded all segments via FTP which worked. Then I tried to join the segments to create the original tgz. The join appears to work with no issues. However, when I try to extract the tgz it appears there is a problem, most, but not all files are extracted and there is this error message:
Code: gzip: stdin: Input/output error
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now It appears the join did not work and the tgz is slightly corrupt. What am I doing wrong? Here's the commands I'm using:
1. Create the original tgz on the dedicated server
Code: tar -czf mysite.tgz ./myfolder
2. Split the tgz into segments
Code: split -b 149m -d mysite.tgz seg
# using the -d switch so the segment files use a numerical suffix
# I now have these files:
seg00
seg01
seg02
seg03
seg04
seg05
seg06
seg07
3. Transfer segments to the other webhost using FTP
Code: # hand typing (not a script)
ftp ftp.mysite.com
myusername
mypassword
binary
cd somefolder
put seg00
put seg01
put seg02
# through to seg07
4. Join up the segments on the new webhost
Code: # this is in a .sh script file
cd /full/path/to/somefolder
cat seg* > mysite.tgz 5. Extract the new tgz
Code: # this is in a .sh script file
cd /full/path/to/somefolder
tar -xzf mysite.tgz
# the above error is now thrown.
That's it. What am I doing wrong that's causing the above error?
View 5 Replies
View Related
Mar 11, 2011
I have been working on this since 3 days but wasn't able to achieve what I want
I have a big text that has the following format:
Current max fieldLen for table1 (a):
Fld# Width MaxWidth ERR NAME
---- ------ -------------- ------ ---------
2: 80 38 *** field-name
3: 4 2 field-version
4: 40 7 field-value
[Code]....
View 7 Replies
View Related
Apr 28, 2011
I have a text field that is just list of servers and I need to add the word hostname in front of them... It must be brain fart but I can't think of how to do this. Basically I need this:
server1
server2
server3
To this:
hostname server1
hostname server2
hostname server3
(And I just mean simply the word "hostname")
View 2 Replies
View Related
Feb 10, 2011
I need a sed and renaming the text in file. we have this one:
Quote:
Originally Posted by nickname
nickname presents: $subject
Size: size
[code]....
View 2 Replies
View Related
Jun 30, 2010
I need to extract som text from a text file. The text is a test log with system info at the top and results further down. What I need is to add different tags with formatting before and after each line. I have prepared a template with html formatting, but the number of lines in the test log may be different from case to case, so I need to be able to add formatting tags by need. Can this be done using bash script, sed, awk, head, tail... ?
View 4 Replies
View Related
Dec 21, 2010
I have a file with 5000 lines. it is a list of books authors, series and titles. all lines start with the author names, than there is a dash (-) than the series name, a dash again and the title of the book.
The problem I encounter is that sometime there is a series, sometime not, and as I try to enter this list in a database, I wanted to create a cvs file to import into mysql.
ex:
The best would be to be able to add in the second line, a "space dash space" just after the author name, but how to make sure it does not do it to the first line as well.
If I could separate all line with 2 dash, (grep ?) then I would be able to do a simple replace, and change the single dash into two.
View 2 Replies
View Related
Jun 3, 2011
Code:
int main ()
{
[code]...
View 9 Replies
View Related
Aug 21, 2010
I have a plain text file with 360 lines of varying length text. How do I add a comma or other symbol to the end of each line so that I can convert the file to csv format that I can open in a spreadsheet (45 rows, 8 columns). That means each 8 lines of text forms 8 columns, with 45 rows.
View 9 Replies
View Related
Jun 23, 2011
Im trying to read a file in c++ and search for particular character for example if this is a list that I have:
Alice
Bob
David
[code]....
if the input is D, it should give David, if its B, gives bob. so in this case, meaning it reads the first character of every line. but if possible I want to make this dynamic so the user can specify which character position he is looking for, so in case he is looking for R as character index 3 in all lines, it should give Charlie. but the problem is, it does now recognize , besides, I do not know how to specify the character position in each line.
here is my code
Code:
#include <iostream>
#include <fstream>
#include <cstring>
[code]....
View 1 Replies
View Related