You are not logged in.
Hi folks, working on a little project at the moment in C that stores data in a text file, one item per line. I've got the functions for listing the contents of the file and for adding new lines to it, but I've reached a point where I'm a bit stuck. One of the functions I want this program to have is the ability to remove a specific line, so I would be able to run "program -r <line-number>" and the program would go through the file, remove that line and all would be done.
What I haven't been able to find is much on how exactly to do this. There doesn't appear to be a function for deleting a line from a file, though people suggest copying the program line-by-line to a new file and simply omitting the line that I want to remove, then removing the original file and renaming the new one. The problem there is that I don't seem to be finding anything that really explains how to do that and it's really starting to frustrate me. I know that, perhaps, C isn't the 'best' language to be doing this but I really needed to do something so this was it.
Would anybody be able to provide an example/link to a method for doing this? I'd be very grateful if someone could.
Thanks,
Joel.
Offline
You could either read the contents of the file in your C program, and when you're done write everything back except what you don't want anymore.
You could also try and seek to the line you want to remove and then try to remove from there to the end of the line, maybe.
A line can of course be recognized by the \r and \n combinations (\r\n for windows, \n for linux, \r for (older) mac installations), so finding out how many \n's there are is finding out how many lines there are.
Could this help?
Offline
That's what I've been told to do, but I'm not really sure how I would go about it, if I'm honest. I've found a couple of things that might help and I'm reading them at the moment, but I am genuinely stuck at this point. :S
Offline
I know that, perhaps, C isn't the 'best' language to be doing this but I really needed to do something so this was it.
.
Funny, this is the exact reason I learned Perl. which is funny because Perl would be perfect for this
There are a couple of routes that you could take with this, I think that the best may be to read each line of the file into a linked list of structs with each struct containing the string, and a pointer to the next struct (C programming 101), and then iterate through to the line that you want to remove. once you get to struct n-1, set that pointer to n+1(which is stored in struct n) thereby skipping over n. then write the resulting linked list back to a file line by line.
edit: oh yeah, don't forget to allocate and free your memory properly
Last edited by Cyrusm (2010-12-22 00:07:50)
Hofstadter's Law:
It always takes longer than you expect, even when you take into account Hofstadter's Law.
Offline
Hehe, my first thought was to use Perl too.
I have no real experience with C, but here is how I would do it:
* Open up the file for reading and writing.
* Find the beginning of the line to be removed (pos1).
* Find the beginning of the next line (pos2).
* Let n = pos2 - pos1, i.e. the length of the line to be removed.
* Copy n bytes from pos2 to pos1.
* pos1 += n
* pos2 += n
* Loop until you reach the end of the file.
* Truncate the file by n bytes.
Again, I'm not a C programmer, but I think that will be the quickest way to do it. It requires no memory allocation and it should just rearrange the bytes in situ on the disk without needing to allocate storage for a second file.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Thank you both.
If my reason for creating this was purely that I needed it, I likely would have used Perl as I have a slightly better grasp of it and I appreciate its abilities for handling text files and the like. I only really chose C so that I could continue the learning that I stopped shortly after college. Xyne's method in particular sounds interesting, though I'd have to figure out how to do it.
Many thanks, hopefully I'll be able to come back to the thread with a positive result.
Offline
I implemented my idea as I thought it might be a good learning exercise. I have uploaded it here in case you want to take a peek.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
You probably meant to test "argc < 3" (or "argc != 3") rather than "argc < 2" on line 51.
EDIT: Here's an mmap()-based version that is literally 1800 times faster (0.010 s vs. 18.737s) than Xyne's on a 1000000-line file. The lesson is that even if you can't use mmap, it's way faster to work in memory than on streams, so read and write blocks at a time instead of single bytes.
Last edited by tavianator (2010-12-22 05:41:21)
Offline
You probably meant to test "argc < 3" (or "argc != 3") rather than "argc < 2" on line 51.
Yes I did.
While I was updating it I happened to have a 278207 line log file in the current directory. I noticed how slow it was with the getc and putc implementation so I added an internal copy buffer. It's much faster now.
EDIT: I just saw your edit. Considering that this is an educational thread, do you think you could add some comments to your code? It would help noobs such as me even more.
Btw, my version is almost as fast now (depending on the buffer size), but not quite. Your code is obviously nicer too.
Last edited by Xyne (2010-12-22 06:24:20)
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
You can write your own function for reading a file line by line or you can look at the following functions: fgets() or getline() (check 'man fgets/getline').
Offline
I implemented my idea as I thought it might be a good learning exercise. I have uploaded it here in case you want to take a peek.
Thank you very much for this, I'm reading it at the moment and I'm finding myself understanding it, so perhaps my C-knowledge wasn't as bad as I thought.
You probably meant to test "argc < 3" (or "argc != 3") rather than "argc < 2" on line 51.
EDIT: Here's an mmap()-based version that is literally 1800 times faster (0.010 s vs. 18.737s) than Xyne's on a 1000000-line file. The lesson is that even if you can't use mmap, it's way faster to work in memory than on streams, so read and write blocks at a time instead of single bytes.
Thanks for that example, too. I find it a bit more complex so, echoing what Xyne asked, would it be possible for you to put in some comments explaining what the various bits do?
Offline
Sure, I'll add some comments in a bit. Right now I'm off to write two finals in a row.
Offline
Here is a simple fgets version. Untested and uncompiled, I typed it up mostly from memory.
#include <stdio.h>
char buffer[128]; /* how wide are your lines? */
FILE * in, * out;
in = fopen( "input.txt", "r" );
out = fopen( "output.txt", "w" );
if ( in == NULL || out == NULL ) { perror( "fopen" ); }
while ( ! feof( in )) {
if ( fgets( buffer, 128, in ) == NULL ) { perror( "fgets" ); }
if ( goodline( buffer )) { fputs( buffer, out ); }
}
fclose( in );
fclose( out );
if ( rename( "output.txt", "input.txt" ) != 0 ) { perror( "rename" ); }
Offline
Thanks for that example, too. I find it a bit more complex so, echoing what Xyne asked, would it be possible for you to put in some comments explaining what the various bits do?
While trying to understand the mmap version, I found "man mmap" and "man memchr" to be helpful. Most of it became clear after reading those.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
A line can of course be recognized by the \r and \n combinations (\r\n for windows, \n for linux, \r for (older) mac installations), so finding out how many \n's there are is finding out how many lines there are.
Correction: In C, whatever the system line separator is should be translated to '\n' for your program, and '\n' translated back to the system line separator when you write output -- that is, unless you open a file in binary mode.
Offline
Xyne: you should start compiling with "-Wall", you've got an extra argument to fprintf on line 52.
Here's a version of my mmap()-based one with added comments and one missing error check added.
Offline
Xyne: you should start compiling with "-Wall", you've got an extra argument to fprintf on line 52.
Thanks, I will.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
JHeaton wrote:Thanks for that example, too. I find it a bit more complex so, echoing what Xyne asked, would it be possible for you to put in some comments explaining what the various bits do?
While trying to understand the mmap version, I found "man mmap" and "man memchr" to be helpful. Most of it became clear after reading those.
Thanks for that, I'm getting there albeit slowly. In the mean time, I ended up writing it in Perl to see how long it would take. Four hours later and I have almost all of the features working, even if the code is pretty poor. The C version will be worked on over time as I get time to do it. Hopefully that will be somewhere near complete soon.
Offline