Programming :: [Perl] Fail To Sort A File With 300,000 Lines By Multiple Columns?
Nov 10, 2009
Each line of the file I am sorting is in the following format:
<url> <month> <day>
For example:
[URL]
I wrote the following to sort:
Code:
#!/usr/bin/perl
$in = shift;
chomp($in);
[code]....
The script worked fine for my small testing files, but failed in my input file. The input file is 18MB and containing more than 300,000 lines. The output will contains some lines like that:
Order of these lines are random... So I cannot delete line #19, for example... And you can see that top four lines I want to delete are pairs. So there might be some clever way to detect the lines, if a line has both "1.9" and "1.11", then delete the line... I am new to perl language. The following is the code I have now... I think I just need to write some code inside the while loop checking if I want to delete the line $dotline before I write to a NEW file.
I'm trying to split a text file into various parts. Everything in between "123" and "break" (including linebreaks) goes into the splitted file.
e.g. using this text file:
This should split into 4 files. However I'm only getting 2 files: one for the line "123break" and one for "123 blah break". The two occurrences that contain linebreaks are being ignored. The .* part of my match should capture linebreaks seeing that I'm using the /s modifier shouldn't it? Even when I use the match /(123 break)/gs it still doesn't capture the first occurrence. I'm using Perl v5.12.3 (from ActiveState) on Windows XP. The text file is also in Windows format.
Code listed below.
The above code generates two files Output_1.txt and Output_2.txt which contain "123break" and "123 blah break" respectively. I want it to generate four files.
I have written a regular expression (tested in regexpal and regextester alpha something) with which I want to replace something like code...
but it only matches functions which occupy one line only, despite my tests showing multiple line matching in javascript testers online and using the m and s flags (which should make it multi line no?)
I am trying to read certain lines within a file and give the output of the certain lines that dont equal my value, I think showing you would be easier. There is multiples of these inside one file...
Code:
LV Name /dev/vg00/lvol1 LV Status available/syncd LV Size (Mbytes) 300lable/syncd
[code]....
I want to read everything in the file, if the status is not available then it should display the name (directly above status). If they are all availbale then do nothing. I think I know how to do it which includes putting the info in string form and placing in hash but it is proving to be out of my skill range.
I've been trying to sort this out for several hours and I?m totally lost? I?ve been searching around, but haven?t found the solution to my problem. I have a directory with 100 files. I need to copy 10 lines of each files (let?s say from line 45 to 55) into one unique file. So I guess I could use sed ?w, but I didn?t manage to write the right script. I also tried using a loop to create 100 different files, each one with the 10 lines) to concatenate them later on. But I only got 1 file, not 100.
I have a very, very large log file (360MB) that I'm trying to thin out. As it turns out the majority of this file has entries that aren't necessary so I'm attempting to build a command that will strip these out. The following command works to display only the data that I do not want:
This displays exactly the data I want to delete from the file by displaying the expression and six lines above it and five lines below it. However I'm at a loss as to how to remove this data from the output and display everything else. I looked into the -v option with grep redirecting the output to a new file:
However it doesn't work, the new file is the same size as the old one. What am I doing wrong? Is there a better method of doing this? I'm a bit out of my element since the method I'd normally use can't handle files of this size.
For example, I have a text file with data which lists numerical values from two separate individuals
Code: Person A 100 200 300 400 500 600 700 800 900 1000 1100 1200
Person B 1200 1100 1000 900 800 700 600 500 400 300 200 100
How would I go about reading the values for each Person, then being able to perform mathematical equations for each Person (finding the sum for example)?
I have a folder with only 24 files named <number>.dat (i.e. 4.dat, 6.dat and so on) where <number> is between 0 and 256. Each file has just two columns of data and nothing else.
I'm trying to combine all the second columns ($2) together. I've been fiddling around with getline and so far have
which takes file 4.dat and adds $2 from 6.dat, but I want a single command to take each $2 from every file and add them to (for example) 4.dat (having $1 from 4.dat is no problem). A command that takes every file in the folder and grabs $2 and places them in a common file would be ideal. Frankly I can work around if you combine both columns from every file.
I've been hitting my head against a wall for awhile with this one:As the last part of some data analysis I performing I would to construct a matrix from a series of different files. These files have the format:
there is a way to add line spaces when asking for user interaction in a script. For example:
Code:
SPACE Hello what is your name? SPACE SPACE
So this is asking a question but has a space/empty line at the top of the screen and 2 spaces/empty lines below. I've seen it done in a bash script using for each line/space needed
I've searched everywhere and I can't come up with a good solution. For each line I need to find the average, min, and max. I've seen plenty of solutions where the number of columns is fixed, unfortunately for me these lines can get pretty large. My thought was to read each line individually into an array, loop through the array and find the avg, min, and max that way but i haven't had much luck. I can read each line using a while loop but I'm having trouble with the array part, or perhaps that's not the best solution?
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings? I'm new to sed scripting and I have been working on this for some time now. I have a large file (sample below) that I need to edit.
What I need looks something like this.
I'm working with a very large file so simply merging all the lines then adding a new line character before ">contig" and after "translated" won't work, at least not with sed.
I am new to perl and am having trouble adding some strings together.
My full code is below:
The problem is $NewCommandB is always split into two lines, where the second line contains the "/atlas2/<blah>/<etc>/..." string. Since I am generating a .sh file to execute a lot of similar commands I need the string to all be on one line. Any idea why I get this behaviour and any suggestion on how to tell perl to make $NewCommandB a one line string?
Btw for completeness finalFileList.txt contains just file names one line after another:
I have a dataset of around 1000 lines. Out of these 1000 lines I need to pick randomly 160 lines of data and write it to a file. The program is needed to eliminate data bias when I run the program through a reanalysis program. I am thinking I need to use the rand or srand term, but I am having difficulty writing this in perl. I have to write it in perl, because the rest of my scripts for this project are in perl, so consistency is important. The data only consists of one column of the data (YYYYMMDDHHHH).
I'm writing a script and I have doubts on how to assign values to an already established variable. The value for the vatriable would be coming from a file with three columns. I'm using the awk command for this. Am I doing it correctly? which of the following two ways is the better one or if both are wrong which one should I use?
The goal is to have a single folder that has symlinks to all the files in each of the drives. Pretty much a poor man's JBOD. Previously, I had problems with conditions like 2 drives having the same sub folder contents, but I ended up solving that with the current script I'm using now.What I'm looking for now is speed. I'm very new to Perl and the script takes about 12 minutes to complete with the current drives.
Basically, the script makes a list of all directories and files in each drive. First, it makes the directories. I didn't use any validation because if a directory already exists, it simply won't make one. However, with the files, I used a hash to only keep the unique files. Then I use the key/value pairs with ln to create every link to the files only, not directories.
Code:
#!/usr/bin/perl use warnings; my @drives_to_sync = qw ( /mnt/sda/ /mnt/sdb/ /mnt/sdc/ /mnt/sdd/);
I'm looking for a way in Perl to be able to take a list of servers, ssh multiple commands to it and store the results. If I do this process serially, sometimes one server will hang the whole script and if it doesn't, it still takes hours to complete.
I'm thinking what I need to do is make a parent loop that calls out a separate process that passes the server name to the child sub process and then executes all the commands I have defined in its own process. If one server 'hangs', at least that won't stop the script from doing all the other servers in the list.
I'm guessing using the fork() command would serve me best, however, all the online descriptions I have found have been vague at best.
I'm programming some skript to get statistical information about some texts. This includes calculating the mean of word lengths.Unfortunately, Umlauts count as two characters. In the example below the output is 9, it should be 6.
sincercly, Max
Code: #!/usr/bin/perl use POSIX; use locale; my $test = length("ABC���"); print $test;
I have a list of words that I want to grep in many files to see which ones have it and which ones dont. in the text file I have all the words listed line by line, ex: list.txt:
check try this word1 word2 open space list ..
I want to grep each line one by one. like I want it to
grep "check" *.log grep "try this" *.log grep "word1" *.log .. etc how can I do this?