General :: Grep - Manipulating Large Text File Full Of Records
Nov 26, 2010
I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace. I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.
i.e. grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt
When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file. It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter). (Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)
I have a performance report that provides all the information I need to report the following: total transactions per day, average transactions per second, and peak transactions per second. It just doesn't provide any of it in a very accessible manner, so I want to parse on it and just capture the bits I care about. Ideally, I'd like the output to look something like this:
Code: Date Total Avg Peak 07/11/11 12,328,033 24.05 64 07/12/11 9,328,429 21.98 56
The problem is the format of the input file, which is somewhat complicated. The report gives a summary of all transactions within any given second, and then totals at the end of each day, with page breaks in the middle, like so:
So first, the easy part that takes me to the daily summary, which gives me the date, the total transactions, and I can divide the total by 86400 to get the average per second, too. No problem. It's the last part that's got me stumped... the daily peak. I can't just do a while loop on the date, because it's missing from most of the records. And it also means I can't use positional parameters, because depending on the page break, the total will move between $2 or $3. And I need the date as a conditional to find the daily peak, because this output will have many days' worth of data.
Any ideas? Some kind of awk or sed command to insert the date wherever it's missing (I'm not particularly good at either utility)? Is there a method to parse these things based on column location that I'm not aware of?
I have on my windows machine several hundred files that are a format of .nc .ncs for a CNC machine. I need to convert them to txt which is something as easy as opening in notepad and then saving as .txt but there are so many that this kind of action would take way too long.
The reason I am writing the linuxquestions is because I would feel more comfortable in loading a live CD and using some sort of terminal command to do this than I would to download one of the many "freeware" type programs I have found for windows (even more so since I have had a root kit before and had to start all the way over to get rid of it).
I need to know:
1. Is this possible to do with the terminal without super advanced knowledge.
2. Can one please point me in the right direction; something to read or an example
This has happened several times now, with 9.10 and 10.04. I back up my photos periodically to external drives, using Nautilus. At the next attempted login Gnome won't start and sometimes gives power manager incorrect installation error.
First time this happened I was stumped and eventually did a clean install. Second time, I found advice elsewhere in this forum to solve this by emptying root trash, which did the trick. This time, however, root trash has nothing in it and 2 users trash were insignificant (I emptied them all anyway with rm -r). Tried looking for enormous directories but couldn't find a smoking gun. I would rather not end up doing another clean install - a painful and extreme solution. I'm continuing to look for solutions to the immediate problem, but my question really is, what causes this and how do I prevent it in the future? I've run Computer Janitor regularly and ran apt-get clean but no help. Should I do all my large scale copying from terminal? I'm not a total noob, but close.
I have a large text file with three columns. I'm trying to write a PERL script that splits the file up based on the value of the 3rd column. So every time the third column reads 0, a new file is created and all the data up until the next 0 is found is written to that new file. This should happen over and over until the initial file has been entirely split up.
I need a loop that pulls out the user name into a variable and then pulls out the LastUpdate field into another variable so I can then perform a comparison against the last update field. Requirements are AIX tools including AWK, SED and Perl I am writing a script to check AIX users password expiration dates and if they are within the alerting period (ie. 7 days etc) it will email the user. I will release the full script into the public domain once completed. The text file I want to parse is formatted like:
I am on Ubuntu 11.04 and using Libre Office 3.3.2 to compose new documents and am saving them using .doc, .ppt and .xls files. (due to having to share them with others who are on Windows systems)
I have a lot of doc files and I need to search for text INSIDE these files. I am perplexed with the fact that no search tool is able to search for text INSIDE these file types. "cat" can display them of course, but grep is not able to locate text INSIDE these file types. I even tried to save a .doc file as an .odt file, but no luck. The Applications>Accessories>Search for Files does not search INSIDE doc, xls or ppt with the option "Contains the text".
Q: Is there any way to use grep and sed with a string variable rather than with a file?
The problem: Im running through a LARGE (about 10,000 lines) xhtml file and need to replace every instance of lines beginning <p>~
The following code works but takes a long time mainlly because an in/out operation needs to be carried out on each line. If I could read from a string rather than a file it would take a much shorter time!
I have done a bunch of searches on this but the terms seem to get tangled in the more popular search of "colouring the output of grep / awk". I am trying to find a way to grep/awk through the output of a command to find text of a specific colour. The command's output has a range of colours signifying too many different things to specify using text, with colour being the only form of grouping.
I have 24" dual monitors with 1920x1080 resolution on both of them. Consequently the text appears so small. I use the following text-intensive applications frequently:
Web browser (Google Chrome) IDE (Komodo) Terminal (Gnome Terminal) Email (Thunderbird)
I can configure text size on IDE, Terminal and Email. But for Chrome, it is not a good idea to set proportional font size because often one wants to see the entire (not just proportional fonts) site to be zoomed. So I am asking: Is it possible to increase DPI in Ubuntu (much like on Windows) so as to increase the text size across all apps? OR Is it possible to set permanent 'zoom' in Google Chrome, using a third-party extension maybe?
For my research I have some very large files that are basically millions of lines of ten columns of numbers. These files can be up to 5 GB in size. Recently I noticed that when I made a copy of one of my files, some exclamation points appeared in it where there should not be any: in front of random numbers throughout the file. Making another copy of the file would result in exclamation points in front of different numbers in different parts of the file. Doing this many times has given me up to four exclamation points in different parts of the file. Sometimes the file copies just fine without producing any extraneous exclamation points.Additionally, I have occasionally seen a "^K" where there should be a newline (the data that should have been on the next line was instead on the previous line with a ^K in front of it) in copies that I have made of my files. I don't know if this is related or not.
I am using RHEL 5.I have a very large test file which cannot be opened in vi.The content of the file has some 8000 lines.I need to view ten lines between 5680 to 5690.How can i view these particular lines in a large file.what is command and option i need to use.
What are the tmpfs's and how can I reclaim that space, and what is /dev/dm-0 and why is that taking up so much space?
I have 2 LVGs vgdisplay -v
Code: Select allroot@SETV-007-WOWZA:~# vgdisplay -v DEGRADED MODE. Incomplete RAID LVs will be processed. Finding all volume groups Finding volume group "WOWZASERVER"
After deleting the log files, I was able to regain access to my GDM session. But I still cant find out what /dev/dm-0 is, and where all the 75 GB is being taken up.
I just noticed, however, even though I can access the drive A-OK via browser, terminal, and web services (Our wowza) when I enter gParted I get this error for sda, my primary OS drive!
Code: Select all Libparted Bug Found!
Error informing the kernel about modifications to partition /dev/sda2 -- Invalid argument. This means Linux won't know about any changes you made to /dev/sda2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting
Now that I'm in gParted I see 3 partitions: [URL] ....
It reports now, that I have used ALL of my disk space.
Post Log delete, and fresh reboot, this is what Code: Select alldf -h outputs
So that whatever was captured in the () in the first part of the statement would be used in the 1 in the back part of the statement for every n.chatlog that might be in any of the /webserver directories at that time.
In my corporate environment, I'm required to run a Windows machine that acquires a VNC session on a machine in the server farm. My windows machine is dual head with different resolution monitors ( 1600x1080 on left and 1920x1200 on right). If I create a VNC session that spans the monitors, then maximizing a window in the VNC session causes it to stretch across both my monitors.
Instead, I want a "maximize" event to behave like it does on my windows machine -- I only want to maximize to the display that the window is on.
How can I define what, what I'll call, "maximize regions"? Regions in the VNC graphical plane where when I click "maximize", the window only expands to the region it currently ( and mostly) resides in.
Can I do this in gnome, X, xrandr, or some other magical interface?
I'm storing a list of strings in a file and would like to read the file and pipe each line returned to grep which in turn searches a directory for files containing the string.However this is not returning any output.