General :: Splitting Text Files Into Parts With 2000 Lines
Sep 7, 2010
I am facing a problem while splitting a text file, I need to split a file into some parts and each split file should have 2000 lines, when I do it through "split" command the mother file is kept intact but as per my requirement I need to cut mother file into some parts thus it should not be kept intact.
I have a utility that works with files. The utility is crashing at after about 120 files. The input to the utility is a file containing a filelist. I want to cut the file with the file names in it to seperate files containing about one hundred or so. My thought was to determine the number of lines/100 and then use head and delete to create temporary files to run the utility multiple times to prevent the crash. When I tried to create a variable using the wc -l command the output gives me the number of total lines but it also includes the filename of the input file. (873 Filename.txt) I can not figure out how to remove the Filename.txt from the variable.
Is there a way, besides writing a PERL program, to read each line one by one in file A and tell if this line also exists in file B? Can this be done via a shell script?
Is there anyway to delete certain paragraphs within a text file and then insert the paragraph into another text file.I just cannot figure out how to remove the specific lines from the file and then insert them into another file at a certain line within that new file. Thanks again
Suppose that I have this text : 1 abc 2 def 3 ghi 4 klm
Now I want to copy(yank) only the parts: ab from line one and hi from line three at the same time So how can this be done with vim? I know how to copy one line and one part of a line. What I want to do is to copy two parts from 2 different lines at the same time and paste them as they are some where else in the file.
i tried using diff --GTYPE-group-format= with <%, but not sure that right solution.Here's what im trying to do. I have two c source files, file1 and file2. file1 has a function in it that's been modified in file2. However, the functions begin at differnt line numbers in eachof the files. Is there a way to specify a range of file numbers on file1 and file2 to compare, using diff or any other combination of utilities? I can always output text from a range of lines from each file to two separate and new files and then compare those, but that's tedious. I could also write up a script to automate this type of solution, but I imagine there's an existing way of doing this.
I want to (from the command line) be able to counte lines in a bunch of files of a specific type in a folder and all its sub-folders. How would I do this?
I have a text file that is filled with references to duplicate files. I'm trying to create a text file for each duplicate file found that contains the paths to the duplicates. I would also like the text file names to be based on the size and file name.
I used split -b 32m "file.bz2" "file.bz2.part-" to split a file and it created more than 50 parts. From googling, the way I found to reassemble the parts is to cat file.bz2.part-aa file.bz2.part-ab > file.bz2, while enumerating all the 50+ parts. Is there an easier way to reassemble the parts wherein I no longer need to list all those parts explicitly?
Today encoders are getting smarter. They can compress Blu ray similar quality in 700MB. It seems header of video file contain info about frame rate, audio/video encoder etc. which can't be guessed. In MPEG audio , every part of file is independently playable. If a movie is binary split into 6 parts & I don't have the first part then it is unplayable.
Code: example ls -rwxrwxrwx 1 root root 280M 2010-12-07 20:23 irn2-cd1.mkv -rwxrwxrwx 1 root root 50M 2011-05-26 13:09 last-50M-cd2 -rwxrwxrwx 1 root root 50M 2011-05-26 13:44 first-50M-cd1 file * first-50M-cd1: Matroska data last-50M-cd2: data irn2-cd1.mkv: Matroska data
i am on processing text tasks And i found that if you assign a text to a variable is chomp'ed automatically the newline
Code:
variable=$(cat file.txt)
The problem is i can only access the items/lines using:
Code:
for line in $variable do echo $line # Other commands done
how do i convert this to an indexed array. More importantly, how do i get access to individual $line[0], ..., $line[n] Another thing, if the file.txt, has lines with spaces it is a mess using the for...in..., but echoing prints line by line...o_0
I need to insert 3-4 lines of text to the beginning of a text file. The file is a largish MYSQL dump, the result of a backup shell script. This shell script should insert the required text.I've wrestled with sed, but lost.
I have a list of words that I want to grep in many files to see which ones have it and which ones dont. in the text file I have all the words listed line by line, ex: list.txt:
check try this word1 word2 open space list ..
I want to grep each line one by one. like I want it to
grep "check" *.log grep "try this" *.log grep "word1" *.log .. etc how can I do this?
I want to scroll back 10000+ lines in text mode linux terminal. As there is an unlimit option in gnome-terminal, so I guess if this is also possible in text mode?
As much as I didn't want to ask a sed question, especially considering there's already one on this page I've looked as best I could and cant find the solution. Id like to use sed to replace occurrences of a pattern but exclude two or 3 specific lines that are not consecutive. For example I know with 1,10 i could exclude the first 10 lines, what is the syntax if I just wanted to exclude line 3 and 7. The sed command I'm working with right now is for rearranging Ethernets.
cat /etc/udev/rules.d/70-persistent-net.rules | sed -e '/'"$found1fullmac"'/!s/eth1/'"found1eth"'/' > /etc/udev/rules.d/70-persistent-net.rules
I would like to replace $found1fullmac with two variables representing line numbers to exclude from the replacement.
Trying to remove lines from a syslog text file that have duplicate strings
Mar 10 06:51:11[http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360]
then a few lines down
Mar 10 06:52:03 [http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360
got the same thing in terms of a u: number but the issue is I need to remove duplicates and just leave one and the file has multiple duplicates of different u: numbers and it's 14,000 lines long. can anyone tell me if I can use awk? sed? or sort for something like this to? removing lines that have a certain string in there that's a duplicate.
may be an advanced question but I need to know how to do this. Here at work I am in charge of recruiting and we have about 1,000 resumes in already. All of the resumes are in a .pdf format. I need to rename every .pdf in the following format:{firstnameLastname}.pdfThe only way I know how to do this is to convert all the .pdf files to text, extract the name out of the first few lines of text, import into excel, and then use VBA to rename the files in mass:Here is my logic so far:~Deskop/a = houses all the .pdfresumesOpen terminal: Code: cd ~/Desktop/afor f in *.pdf; do pdftotext -raw $f; done That will convert all of the preceding resumes into text filesNow I would like to append the name of the text file into the last line of the text file. So, for example, for Resume1.txt, I want to append "Resume1.txt" to the last line within Resume1.txt. So after I run the command I open Resume1.txt and on the last line within I want to see "Resume1.txt" on the last line, at the end of the resume.How can I do this? I would like to use a loop and have the terminal append the filename to the body of the text file until all of the have been appended.
Having trouble handling JPEG-2000 files. Message says I need some plugin. I've checked in Synaptic and there are a couple of packages installed by default that mention JPEG 2000 but obviously I need something else. Any ideas?
15 this is a sentence containing various words and spaces 34 this is a another sentence containing various words and spaces
cat file2.txt
2 this is sentence1file2 6 this is sentence2file2 54 this is sentence3file2
I would like to join these 2 files. The result should look as follows :
cat joinedfile.txt
2 this is sentence1file2 6 this is sentence2file2 15 this is a sentence containing various words and spaces 34 this is a another sentence containing various words and spaces 54 this is sentence3file2
==> so the joined file must be sorted on the first number. Any ideas how this can be achieved ?
I am having problems opening and printing word 97/2000 formatted documents that were saved using OpenOffice in Maverick. If I try to open one of these files in openOffice for Lucid, it causes OpenOffice to freeze every time. If I save the files in OpenOffice format and share them between Maverick and Lucid - everything works fine. Anyone know what might be causing OpenOffice in Lucid to freeze when loading word 97/2000 files?