Programming :: Parse A File Containing Billions Of Records?
Nov 17, 2010
I have to parse a file containing billions of records and populate them in the Data structure. I have used a lot of C++ class and creating objects of the class I am storing the information retrieved by parsing the file.
Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.
I want to loop through the records in the below file (homedir.temp) /home/user1 /home/user2 /home/user3
I want to do the following activities with each record1. du -s - to get the total usage for that directory (my variable name is SIZE)2. divide SIZE by du -c for /home to get the percentage of usage. (my variable name is PER)3. write the directory, SIZE, PER to a filePROBLEMI am using the below for loop: for record in homedir.tempthe mentioned activitiesdonehe above is not looping through the records. It does the first record perfectly and exits the loop.
I don't think this is a "perl one-liner" of find and replace. I'm trying to auto-fill some information in a listing of files. The simplest example is that in the files the following exists:
I would want the script to find this and populate it with something like -- Date : 20101004-1758
I have a few more similar fields to autofill, and I'd like to do this from within a larger perl script I'm developing to process these files. So, how I perform in-place file modification from within a perl script?
I am trying to make a perl script which reads data from a file and parse it. The data in the file has the following syntax
Code: Device Physical Name : Not Visible Device Symmetrix Name : 1234 Device Serial ID : N/A Attached BCV Device : N/A Device Capacity
Each unique record starts with "Device Physical Name". So, I have a set of records within "Device Physical Name". I want to read this set of records starting from "Device Physical Name" and ends up till next "Device Physical Name". Offcourse FS is ":", and I just want to print/or later put info in a csv file.
I am trying to think of a logic where my file contains some data I had to read and do some processing. Issue is that file contains data multiple times. For example:
::::::::::: var1=value1 var2=value2
I have to read first paragraph of variables and do some processing and then move on until the end of file. Variable names are same in whole file but for each paragraph the value is different. I can't think of a logic to attain this task. How can I do it? It should be a simple bash script, but I am not able to work out.
I am using SQLite as my database for some portable cross platform applications I am working on with REALBasic as my IDE. I have an old Sybase 8.0 database that I can access via Microsoft Access and thereby extract the data I need from each table.
Now I know I can create .csv files from each table and load them into SQLite using the import tool, but then I can't define the primary key and other field attributes. So the other option is to load each file via SQL.
Now with most SQL editors I can create multiple queries and they will run just fine. But I can't seem to do that with the SQLite interfaces. I can paste multiple queries but I can only run one at a time. And by that I mean I have to click run.
Ummm that's not acceptable since my biggest table contains over 600,000 records. I have the queries all written, that was easy using a simple interface I wrote in Access.
Code: INSERT INTO tblMeters(recordId,meterId,meterName,meterSerNum,registerSerNum,mxuSerNum,meterType,manufacture,meterModel,readType,groupId,multiplier,rollover,vendorId,xfrmerCode,bldgCode,CATEGORY,energyType,unitOfMeasure,location,access,comments,dateInstalled,dateCalibrate,pipeSizeIn,pipeSizeOut,elecMeterSpecs)
So is there another method I can use? I can't seem to find anything relating to my particular question at the SQLite web site
i am trying this query to compare records of two different tables...i m geting this message!! no required out putvalues for these ($jobTitle $industry $stationBase $gender $maritalStatus)are coming from textboxes!here is the code...
I need to parse the file of same name which exist on different servers and calculate the count of string existed in both files.Say a file abc.log exist on 2 servers.I want to search for string "test" on both files and calculate the total count of search string's existence.For example if file abc.log on server 1 has string "test" 2 times and file abc.log on server 2 has string "test" 4 times.then the output will beStringName : Countexampletest : 6 timesNote : I have created the password less connectivity using ssh-keygen.
I'm using MYTHTV with AT&T's U-Verse system. AT&T apparently turns off the set-top box if the box doesn't receive any commands after some number of hours. When this happens the box takes so long to turn on that it doesn't process the lirc IR channel change command. I've rewritten my channel changer to add a lockfile, and added a cron entry to kick off an entry to just "ping" the set-top box once an hour. But, OK, I'd like to play the bandwidth saving game and not do this if there isn't anything in the "record" table.
So, how would I write a script that will check the number of rows in the "record" table in the "mythconverg" database and exit if there are zero rows? I'm afraid I don't know how to even start this. Here is my "keepalive.sh" script that does the pinging:
Code: #!/bin/sh #if there are no rows in the record table, just exit this script # lock the lockfile - MUST be same one as channel lock while [ `lockfile "/tmp/mythchanlock.lck"` ]
I am no expert when it comes to BIND. I seem to be able to resolve NS and A and TXT records for my domain, but I cannot get the MX records to come out. Does anyone have an idea what might be wrong with my BIND zone file? I wonder if it might have something to do with the fact that my IP is currently on a policy Block List?
what is the best command to use to parse strings?I have a variable $str and need to parse this string.Can you provide an example of the command used to get a substring of $str based on the index values of start and end
My php knowledge is very poor (only worked with strings so far), and I am faced which a task that is a real challenge for me: I have a variable, that contains data of different type, in this order: byte, byte, string, string, string, string, short, byte, byte, byte, byte, byte, byte, byte, string.
Strings are of variable length. How could this data be parsed into variables of the right type, and then all converted to strings? What are the functions to use? Strings are unicode ones, and they are delimited by "
I would like a program that records my desktop activity as a video file. Do you know any nice program to do that? Moreover it would be nice afterward to insert this video the openoffice or microsoft's powerpoint. Do you know what is the "best" video format that guarantees the biggest interoperability (the ability the video to play in different platforms).
The important bits are hostname(ex compute-1-1) and number of cpus to use(ex 2). And for this program, it wants them in this form, a shell variable: HOSTLIST=hostname:cpus=X hostname:cpus=X .... I've tried this script, but it doesn't work
I have a basic HTML file set up which allows my to input some data. I have a MySql database set up behind the scenes with a table for this information to go in. When I click the submit button in my HTML file this PHP file opens and it comes up with "Parse error: parse error in C:wampwwwcomp39xinsert_student.php on line 32"Here is my code I am using in the PHP file:-
<html> <head> <title>COMP 39x FInal Year Project - Student Added?</title>
I'm trying to figure out how I can get a request count per CIDR/24 from an apache log in combined format - e.g.: Code: 188.8.131.52 - frank [10/Oct/2011:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[URL] "Mozilla/4.08 [en] (Win98; I ;Nav)"
I'm stuck using BASH for this and I generally write everything in Python, or even Perl if necessary so BASH isn't the most comfortable for me. I've got enough to extract the IPs and get a count I just need a slick way to come up with a sum on a per CIDR basis.
There is a file with that format of each models' information.I don't think that's good format, but I cannot change that format. I needed to modify the model name, 'model = xxx' as 'model = abc'.And I don't know how to parse and modify 'model = iii' and 'model = ddd'.The only clue to parse 'model = ddd' is the second 'model = ' after the second 'system information'. But how to parse the second keyword?Is it possible with 'sed'?I sometimes have to modify the information of the file.using shell script if possible.Python is ok. (Shell script is better for me.)
I would like to extract Room number, Lastname,Firstname,invoice (205880080),arrival date, departure date, and total(229.46). Can you at least give me a hint on how to proceed? I have tried a lot but I am stumped from the beginning.
***History*** Room: 124 B Payment: Bell/TRAVELSCAPE.COM Lastname*FIT*,Firstname 4A, 0K, 0B Guest Bell *205880080 FT Bell *205880080 July 31, 2010