General :: Subset A Large Dataset By Specifing The Starting & End Line?
Aug 27, 2010
This is my first time on this forum. I am a statistician. I am trying to subset a large dataset by specifing the starting & end line. The dataset is pretty large (more than 300 million lines), containing around 1.2 million lines for a person. So I would like to split the dataset into per person consecutively. I tried wrap r codes, but R seems to have to read from top to where I want although I specified that it should skip the lines that other tasks have read. So the memory is increasing with the task ID. Finally I got kicked out by the administer.
I guess that shell may do it much simple and elegently. First I thought of "split" command. But the the file has a header of 10 lines. So I can't split it into even size chuncks.
View 5 Replies
ADVERTISEMENT
Mar 13, 2010
I have a huge file which has 450G. Its format is as below
x1 50020 A 1
x1 50021 B 8
x1 50022 C 9
[code]....
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is from 600000 to 30000000. I wrote the following perl script but it doesn't work:
#!/usr/bin/perl
$file1 = $ARGV[0]; # Input file
$file2 = $ARGV[1]; # Output file
[code]...
I guess the input file and output file are both too big that my script can't handle it.
View 11 Replies
View Related
Jun 18, 2010
I have cygwin on Windows XP running rsync to remote Ubuntu server over ssh using ADSL.My data set is about 20Gb! But, Cygwin will backup incrementally, so after the first backup the process should be relatively quick.With ADSL the first backups will take too long. I was thinking about doing the first backup by copying files to an external hard drive then attaching the hard drive to my remote server and copying the files. The idea being that rsync will pick up the files as if it had created them in the first instance. The incremental backups will then pickup from there.
Does anyone have any experience with this and/or can provide any advice? The external hd is fat-32 which is okay with Windows and should be okay with Ubuntu? From XP right click copy and then paste keeps the file dates intact on the external hd - is this enough to get rsync going incrementally?
View 1 Replies
View Related
Feb 9, 2010
I have two files. One huge one (200.000+ lines) called 'db' and one big one (15.000+ lines) called 'indices'.What is the quickest way of filtering out the lines in 'db' containing any index (anywhere on the line) from 'indices'.Is there a faster approach in bash, linux?
View 1 Replies
View Related
Feb 11, 2010
At the Command Line Prompt I am able to start some Applications (such as openoffice.org or evolution) and the command line prompt re-appears after program is launched and I can continue working in that Terminal. However, other Applications, such as Totem or Blackboard will launch from the Terminal but the Prompt does not re-appear. Where Totem is concerned I get a message stating "sha module is deprecated use hashlib module instead". Where Blackboard is concerned the command line does not reappear. I have to use Ctrl + C to get the command line back but this closes the application as well! Or, I have to open a new Terminal. why some applications will start from the command line and others do not? How do you get the prompt back (other than q or Ctrl + c) thanks to all and kindest regards ( I am using Ubuntu 9.04)
View 3 Replies
View Related
Mar 30, 2010
I'm studying Information Technology and doing Linux as part of it. One of the questions in my text book is: Describe three different ways to start a command line interpreter when using the Gnome desktop of openSUSE Linux. I can't for the life of me make sense out of it.
View 8 Replies
View Related
May 3, 2010
Not sure why, but the last couple of tiems I have started xampp from root terminal, I have got this message after each program start.
Warning a bogus unix line
Since that last time it was not there, other that add sweb spahes etc., I have only tried to unpack the Control Panel which was unsuccesfull anyway saying I needed some other programs.
View 1 Replies
View Related
Feb 17, 2010
I have a 50000 line(ish) set of records in a file. I have another file where I have filtered out all the line numbers for those which have an error of various types. e.g column count, field type etc. I want to get all those lines into a separate file so I can sanitise them. There are abt 3-4000 of them.
How can I access those lines which I want to isolate into a single file? I have all the usual linux stuff available and a bit of understanding of regexps.
View 5 Replies
View Related
Jan 28, 2011
I cannot find the way to run some command for a subset of files in directory - how can I do it
View 3 Replies
View Related
Feb 20, 2011
I am a redhat admin and also use Ubuntu. Installed opensuse on my home machine to give it a whirl. I can't seem to figure out why i can't open gui application from the command line.
I receive a GTK error when trying to open with sudo. What am i doing wrong?
EDIT: NM solved my own question, had to add DISPLAY and XAUTHORITY to the sudoers file.
View 5 Replies
View Related
Sep 25, 2009
I need to start DHCP after booting into run level 1.
So i am going to ....
ifconfig eth1 up
what is the command to start DHCP service?
View 7 Replies
View Related
Feb 17, 2010
Yesterday i finally got around to installing my graphics card (NVIDIA GeForce 8400M CS) on fedora 12 by using the command yum install kmod-nvidia the terminal then told me that it installed correctly so i rebooted my system. Now when i boot up into fedora, it loads and when the sign in window is about to appear instead my screen shows random colors all over the place. I am hoping someone can tell me how to remove this via the command line prior to actually starting fedora.
View 2 Replies
View Related
Nov 19, 2010
I have suse10 64bit and I was setting up SVN server on it. After all required setup while reloading apache2,its giving the error:
Code:
httpd2-prefork: Syntax error on line 113 of /etc/apache2/httpd.conf: Syntax error on line 31 of /etc/apache2/sysconfig.d/loadmodule.conf: Cannot load /usr/lib64/apache2/mod_dav_svn.so into server: /usr/lib64/libsvn_subr-1.so.0: undefined symbol: apr_memcache_add_server
View 6 Replies
View Related
Jun 23, 2011
I am loading the file in Fortran. File looks something like this (shown below) I am interested in Velocity values and not Pressure values. Is there a way to code in Fortran which finds the staring LINE of Velocity values and ending LINE of values or I have to manually find the lines? IN this case it should return Starting line : 9 Ending line: 11
PHP Code:
[code]....
View 2 Replies
View Related
Feb 18, 2011
how big and widespaced the fonts on Clementine playlist are and how good they look on the appmenu (where my mouse pointer is). This is not because Clementine is QT4, I've got the same problem with Chrome, Opera etc. I've been messing with system-settings (KDE settings tool) a day before the fonts become that widespaced in order to make my KDE apps look more native on my GNOME, but I haven't touched the fonts settings there.
View 9 Replies
View Related
Oct 7, 2010
How to create cron tab when DSL line down set automatically restart the network service while DSL line up?
View 3 Replies
View Related
Feb 8, 2010
After running the following command, I get:
[root@yukiko /]# find / -iname .bashrc
/home/clamav/.bashrc
/home/vpopmail/.bashrc
/etc/skel/.bashrc
/root/.bashrc
But I would like to have a command that prints a specific line by supplying the command with the line number, for example:
[root@yukiko /]# find / -iname .bashrc | getline(2)
/home/vpopmail/.bashrc
Is there such a command on CentOS?
View 3 Replies
View Related
Sep 11, 2009
I have several files with many lines something like this:
I'm trying to write a script that will count the number of characters per line that doesn't contain a ">" symbol and give me an average of those values. I have most of the script together but I can't figure out how to connect some of the steps.
Code:
View 3 Replies
View Related
Dec 23, 2009
I am trying to write a script that takes an input file ($FileName) and an intermediate file ($FileName.info) and removes lines from $FileName if the value in $2 of $FileName.info is <75.
I can't figure out how to feed only one line of the .info file to the if statement at a time so that it will perceive it as an integer instead of a list.
The error I am getting now is ./script.sh: line 6: [: : integer expression expected
Sample input $FileName
Code:
Code:
Code:
Script so far:
Code:
View 10 Replies
View Related
Aug 2, 2010
I've written a script to parse a file and print each line that ends with matching pattern, if the next line is blank. The pattern lines are the result of md5sum $i|sed 's/path///g' so that only md5 and filename appear. Here's what I'm using.
Quote:
for fline in `sed -n '/.*.ext$/p' file1`
do
if [ "`sed -n -e '/'"$fline"'/ {n; p;}' file1`" == "" ]
then
echo ""$fline" has no info" >>file2
fi
done
[Code]....
View 4 Replies
View Related
Feb 24, 2011
I'd like show a certain line or lines of a file with context, kind of like a unified diff, on the command line in Linux:
$ (something) -l 154 stuff.py
150: def foo(bar):
151: """
[code]....
View 5 Replies
View Related
Aug 22, 2011
How can I print Linux command line history without including the line numbers? I want to send it all to a text file like this:history >> history.txt
View 1 Replies
View Related
Jan 8, 2010
I want to sort a number of lines based on their size:
data:
-------
12345678
87654321
1234
[code]....
Should output as:
-----------------
1
2
12
21
[code]....
But i'm gettings this with sort
----------------
1
12
123
1234
[code]....
Can we sort the above "data" text, based on "number of characters" instead of "character order"?
View 8 Replies
View Related
Apr 1, 2011
I am combining data from a couple different input files and creating an output file in a specific format. I notice that if I use the >> operator, information gets appended to a new line in my output file. This is useful, but if I'd like to append onto the CURRENT line, is there an easy way to do this? I've been googling around and see lots of complicated answers, nothing that suggests to me an easy way to do this. For example, if my output file looks like this:
b1a:] cat test
hello my name is
b1a:]
and I'd simply like to append "Bob", how can I do it? If I use
b1a:] echo Bob >> test
b1a:] cat test
b1a:] hello my name is
Bob
b1a:]
So what I would prefer is some command that would create the result:
hello my name is Bob
View 14 Replies
View Related
Mar 21, 2011
This solution works but is slow with large files. I am looking for a faster solution.
The 2 files contain filenames, one of them has associated data I want to append to the other file's matching filenames
file1:
file2:
I append file2 by matching the unique_filenames and appending them with the tag data and some formatting
appended file2:
Here is the SLOW code
while read inputline.
View 9 Replies
View Related
Mar 30, 2011
I need to grep for a particular string and if found need to display the line containing that string, the line above that and also the first line of that paragraph.
Can this be done via sed.
Eg, My Paragraphs
OA connectA
Enclosure:
Interconnect Module #6 Status:
Here, if I grep for Critical, it should display the following
Similarly if I grep for Degraded, it should display
View 3 Replies
View Related
Jun 6, 2010
Down loaded the Ubuntu 10.4 ISO for this site when download was complete, got a screen telling me to insert a writable cd which I did. It went through a Format process and then asked me to drag the files to that directory. When I tried to do that I got a message saying that I was 138mb short of space. the Iso was 704mb and the cd had formatted to something over 500mb. the disk is a CD-RW rewritable cd.
View 5 Replies
View Related
May 13, 2010
I have to do several scripts and I have no idea of how to do this one: Make a script that read line by line the passwd file and prints in console.Hope you understand couse my english isso bad as you can see.Our teacher told us something like this:#!/bin/bashwhile read line doecho $lineadone < dispositiveexit
View 9 Replies
View Related
Sep 1, 2009
I have a dataset (see example below) that I would like to go through and copy all lines containing a certain string ("LGIG") plus the line immediately following that line to a new file. I have no problem grepping lines containing the string LGIG but I'm lost how to translate that to line number and shift up one line number for each instance of that string.
Example input file:
[code].....
View 5 Replies
View Related
Feb 26, 2010
I have the file abc.txt
cat abc.txt This is a test file Nothing is new in this world
I want to replace "This is a test file" to "Text is replaced"
Code:
FindString='This is a test file'
ReplaceString='Text is replaced'
Findarray=(`echo $FindString | tr ' ' ' '`)
[Code]....
But this is not effective. how to replace entire line either using sed or awk or any other utility.
View 5 Replies
View Related