Software :: Perl - Statistics For Duplication File Contents?
Oct 23, 2009
I got a folder containing 20,0000 files (and will be more later), each of this file contains a single URL. What I am trying to do is to do statistics on number of duplicated URLs (file with same content) accumulated, say, from September to October by each day. For example, if file A was created in Sep 1st has a same content with file B created in Sep 15th, then we add 1 to number of duplication on Sep 15th (not Sep 1st). Currently I got a way to do it (in perl) as below:
(1) Read all files inside the folder, print out file content, and modified time into a big file in the format:
Code:
[URL] [month] [day]
(2) Sort the file by month then by day
(3) Then Create two hash : date and content
And then read url from each line into a hash 'content' from the big file (key is url, value left undef), for each new read url found in hash 'content', a duplication is detected, so mark '$date{$month.$day}++' The algorithm could be working but may take too long... so I am wondering if there is some easier way to do that besides hashes.
View 1 Replies
ADVERTISEMENT
Mar 3, 2010
i've checked the link, and it makes it better. but it doesn't include all the information. i'll continue searching the internet. However i have seen an example of creating a fd:
Code:
exec 5<&1
echo "TEST" >&5
exec 5>&-
as in the page, this was intended to redirect the stdout to the fd 5 and create it, and close it. i have the following questions:- what is exactly the meaning of second command? is it to redirect the command stdout "test" to the fd 5? and how i can see the contents of the fd 5? - in the first command, why the < is used instead if > and what is the difference between the below two commands as in the info bash *Redirection section It will be helpful if anyone could include a graph for file descriptor before and after different command execution.
[Code]...
View 5 Replies
View Related
Oct 14, 2010
I want to be able to check the contents of a text file for a specific string and remove it from the file from the command prompt. I would basically be searching through a number of files and if a specific string is found I would like it removed automatically. pretty much a find and replace, were the replace is nothing. any one got any ideas on how you would do this. I already have the search part sorted just need to be able to remove the string I don't want from the multiple files.
View 4 Replies
View Related
Jul 12, 2010
We have some large files with sampling data in it. Don't want to delete these files. But want to quickly overwrite the file with 0s and/or 1s and preserve the original file size.
View 3 Replies
View Related
Jun 15, 2010
I would like to know how I can replace a string in one file with the complete contents of another life.
View 10 Replies
View Related
Nov 17, 2008
I have two files list1.cfg and list2.cfg both files contains differentrecords details like
List1.cfg
NAME1:25:C:NAME LINE1:
NAME2:25:C:NAME LINE2:
CITY:25:C:City:
[code]....
Now I want to append contents list2.cfg to list1.cfg(It ispposible using cat list2.cfg >>list1.cfg) but I want to check if content of (record) in list2.cfg is present in list1.cfg then dont append it otherwise append it.
View 1 Replies
View Related
Sep 25, 2010
someone once told me that use can pass a file to grep and use that to search the contents of another file. if that is the case I'm not entirely sure why the following isn't working for me.
Code:
[root@LCENT01:~]#grep -i id_rsa.pub .ssh/authorized_keys
[root@LCENT01:~]#cat id_rsa.pub >> .ssh/authorized_keys
[root@LCENT01:~]#grep -i id_rsa.pub .ssh/authorized_keys
View 3 Replies
View Related
Nov 4, 2009
i am trying to write a bash script. i have a text file called comp2.tmp which has a list of items in it
example comp2.tmp
Code:
filename.pdf
filename2.zip
filename3.ttf
and so on
I have another text file called comp1.tmp which should have the same list of files in it, but does not look as pretty
example comp1.tmp
Code:
someothertext here ...... 10/30/2009 ...... filename.pdf
=========================------------------==============
othertextagain .......... 09/28/2008 ...... filename2.zip
========================------------------===============
bunchmoretext ........... 04/12/2005 ....... filename3.ttf
and so on
i would like to check if the filenames listed in comp2.tmp exists in comp1.tmp
View 3 Replies
View Related
Nov 11, 2010
this is my first post here .I have two issues related to grub2 I upgraded recently from ubuntu 8 to 10.04 and upgraded grub 0.98 to 1.98. 1-How to prevent Grub from detecting one of Windows installations? or how does grub detect Windows. I have 2 hdds, they were backup of each other. but no more, I deleted one Windows by deleting most of the files so therefore I don't need it to be present in Grub 2 menu. How do I delete this? any suggestion? 2-For some reason grub2 only work correctly from sdb , even though I chose to install grub2 to sda too, but it seems grub is trying to find grub files on the same hard disk instead of on sdb, therefore it goes into grub rescue mode with error "file not found" . I set my computer to boot from sdb for now. but I would like to learn how to make sda's grub work. before it used to work with grub 0.98.
View 9 Replies
View Related
Jun 9, 2010
I am looking for some source package which will convert plain text file to html file without using perl.
I mainly need to do this on an ARM platform, so if I get sources I can cross compile it.
View 5 Replies
View Related
May 6, 2010
cant I simply take a working ubuntu hard drive and dd copy it onto another disk. Then put the disk in the other pc and it will work?
This way I could configure a machine at home, take the disk with me and use it at a place with no internet access.
View 7 Replies
View Related
Jul 28, 2009
I have script that I'm working on that updates a username in all the files that are called blah.inc for my framework. since i host a bunch of these web apps i need to do it to all of them. so I need to figure out how to update these files automagically with out me watching it to call vim every time. heres what I have so far
Code:
This finds the files but now i need to figure out how to do s/bob/fred/g on those files.
View 5 Replies
View Related
Mar 9, 2011
We're trying to take an existing system running multipathed SAN boot disks, duplicate that boot disk to another system volume, and boot a second system up with all the same parts running like the first system (think: cloning the system).However, multipathing is broke on the second system 'm sure I'm forgetting something stupid here after thecloning).Details:We have a freshly installed RHEL6 on IBM PPCThis is a boot off of a SAN volume with two paths. Installation went great - multipathing was auto-detected and used underneath the usual LVMs for the boot volumes. Looks and works great:
Code:
[root@goldimage dev]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
[code]....
View 2 Replies
View Related
Oct 13, 2010
I'm beginning to write a custom RTP implementation and want to test its resilience to UDP traffic. I've searched on the web and all the links I can find are for analysing actual traffic, not generating it or messing it up.
Does anyone know of any software (preferably free software) that will, for example, take actual UDP traffic and drop packets, duplicate some and make some arrive late/out of sequence?
View 1 Replies
View Related
Feb 27, 2011
I have a single file that contain multi-text something likes this:
Quote:
No. Time Source Destination Protocol Info
185 27712.068199 192.168.18.23 192.168.18.191 SMTP S: 250 2.1.5 Ok
No. Time Source Destination Protocol Info
186 27715.068293 192.168.0.50 192.168.5.2 TCP suncacao-jmxmp > 44693 [ACK] Seq=1 Ack=1 Win=64807 Len=1380
[Code].....
View 14 Replies
View Related
Apr 28, 2010
I have both ubuntu and kde-desktop installed, and after trying KDM, switched back to GDM which worked fine for a while (although KDE changed my other gnome settings like usplash and cursors which took a while to change back). A day later, though, when I start up my system, the screen flickers a few times then shows me a message that an x-server is already running, so hit no to try loading it on '0' again or yes to try another number.
If I hit no, it will briefly show me the KDE login screen refresh a few times and go back to the menu. If I hit yes, it refreshes about 7 times and finally shows the gnome login screen. This process takes a long time and I'm not sure is great for my screen.
View 3 Replies
View Related
Jul 29, 2010
In Ubuntu 10.04, there is a certain file that appears highlighted in terminal. When I try to cat the file, it says there is no such file or directory. How can I see what's in this file? Is this a symbolic link?
View 1 Replies
View Related
Feb 24, 2010
how do I print out the contents of a file?
View 4 Replies
View Related
Mar 23, 2010
/root/.local/share/Trash/files/I have a tar backup file in there and can't get rid of it. I've tried from root with Nautilus, the files disappear for a couple of seconds and then reappear.
View 7 Replies
View Related
Apr 30, 2010
EDIT:SOlved, I'm trying to figure out how to delete this
View 1 Replies
View Related
Sep 1, 2011
I know that I can do something along the lines of
Code:
touch /var/www/index.html | echo "echo some contents" >> /var/www/index.html
but would like to do this without having to specify the directory again with echo, and maybe even use linebreaks / tabs on the echo in. Anyone know a neat one liner?
View 6 Replies
View Related
Oct 20, 2009
how to do the listing of zip file contents using C?
View 1 Replies
View Related
Jun 8, 2011
is there any API to read content of PDF file & store it in buffer?
View 14 Replies
View Related
Feb 1, 2011
I'd like to change contents in a *.tgz file. I can uncompress (extract) using:
Code:
tar xzf archiv.tgz
and I get these 3 directories:
Code:
etc
install
usr
How to compress back to a tgz file?
View 2 Replies
View Related
Sep 30, 2010
I have created an incremental backup of a Windows-client folder on a SLES 11 server using find and tar. The resulting file is about 615 MB, but inside the archive is only one file which has a file size of only 9.061 Bytes. BTW: it's a "The Bat!" config fileHere's the backup script:
Code:
Configuration
BACKUPFILE1="/<some_name>/Thunderbird"
[code]...
View 1 Replies
View Related
Jul 16, 2011
A long time ago I wrote a short essay about the 'federal' 'reserve' board. I don't remember it's name or format. I think it's somewhere on my rather large hard discs (to of them divided into various partitions).I'm trying to write a command line that will find it based on a quote that is in it: "our fathers brought forth"I have tried various configurations of grep, and or combining grep with find, but I'm getting nowhere. I really don't understand the syntax of either command, or how they work together, and the examples that I can find are really no help at all.
View 6 Replies
View Related
Jan 11, 2010
I am trying to read the contents of a file into something else. I have a file.txt that I am working with, I want to read the file and take the data and run some commands with the data that it read. So if it read www.yahoo.com I want to be able to nslookup. Does that make sense? I have been trying to use the read command but that does not seem to work. I even was trying to read filename | > filename to see if I could even read any of the data at all. Nothing is working.
View 8 Replies
View Related
Mar 3, 2010
I have an ISO CD image file and want to extract it's contents to a folder. I know there are ways to mount the image and stuff, but it's complicated. I'm looking for a GUI tool to open up the contets and extract needed files. On windows I would use WinRar to do this. K3B only allows me to burn the stuff, Arch does not work with ISO files :(Is there a similar tool on Linux, preferably from KDE world?
View 6 Replies
View Related
Apr 20, 2010
I have a .bkf backup file, created by the Backup utility that Microsoft provides with Windows XP. Is there a way to read the contents of the file using a non-Microsoft OS, preferably Mac OS X or Linux?
View 1 Replies
View Related
Nov 23, 2010
How do I make a .zip file that contains every file AND every folder in the directory?
View 4 Replies
View Related