As a word of warning, mmap is fine if the semantics match the application. mmap ...

iforgotpassword · on Jan 9, 2021

Also error handling. read and write can return errors, but what happens when you write to a mmaped pointer and the underlying file system has some issue? Assigning a value to a variable cannot return an error.

So you get a fine SIGBUS to your application and it crashes. Just the other day I used imagemagick and it always crashed with a SIGBUS and just when I started googling the issue I remembered mmap, noticed that the partition ran out of space, freed up some more and the issue was gone.

So you might want to set up a handler for that signal, but now the control flow suddenly jumps to another function if an error occurs, and you have to somehow figure out where in your program the error occurred and then what? Then you remember that longjmp exists and you end up with a steaming pile of garbage code.

Only use mmap if you absolutely must. Don't just "mmap all teh files" as it's the new cool thing you learned about.

jabl · on Jan 9, 2021

Indeed. The issue with file resizing I mentioned was mostly related to error handling (what if another process/thread/file descriptor/ truncates the file, etc.). But yes, there are of course other errors as well, like the fs running out of space you mention.

chrchang523 · on Jan 9, 2021

Yeah, this is the biggest reason I stay the hell away from mmap now. Signal handlers are a much worse minefield than error handling in any standard file I/O API I've seen.

justin66 · on Jan 9, 2021

There’s nothing wrong with using a read only mmap in conjunction with another method for writes.

iforgotpassword · on Jan 9, 2021

You have exactly the same problem on a read error.

justin66 · on Jan 9, 2021

Not the problem you described in your second paragraph.

iforgotpassword · on Jan 10, 2021

Then what do you think happens when you read from your mapped memory and the file system is corrupted and returns an error, or the drive has a bad sector, or the nfs server acts up...

justin66 · on Jan 10, 2021

Reading from a busted file system is a problem to be dealt with (or not), yes, and I certainly wouldn’t recommend mmaping a file shared over nfs if you can help it. I’m not sure what the use case is where that would seem like a good idea.

C, pointers, and mmap are dangerous, sharp instruments but I have to wonder who some of these dramatic warnings are for.

iforgotpassword · on Jan 10, 2021

In general we can probably agree you should always check the return value of read() and write() and handle errors. At least just perror() and abort(), so the user has a chance at finding the problem. Similarly, using mmaped files without handling errors is user hostile since it just crashes the app and SIGBUS gives absolutely no clue to the user what happened. As said, my point is to use mmap when it really makes sense and is worth the hassle, not just because it seems cool and makes the code look a little simpler, exactly because you omit error handling. Especially if you don't know how people will use your software. As you said, mmap on nfs is bad, so you'd basically have to forbid users from using your software with network shares.

klodolph · on Jan 9, 2021

You don’t have to longjmp, you can remap the memory and set a flag, return from the signal handler, handle the error later, if you like.

nlitened · on Jan 9, 2021

Is it still the case in 64-bit systems?

jabl · on Jan 9, 2021

Except for running out of VM space, all the other issues are still there. And even if you have (for the time being) practically unlimited VM space, you may still not want to mmap a file of unbounded size, since setting up all those mappings takes quite a lot of time if you're using the default 4 kB page size. So you probably want to do some kind of windowing anyway. But then if the access pattern is random and the file is large, you have to continually shift the window (munmap + mmap) and performance goes down the drain. So I don't think going to 64-bit systems tilts the balance in favor of mmap.

pocak · on Jan 9, 2021

Linux allocates page tables lazily, and fills them lazily. The only upfront work is to mark the virtual address range as valid and associated with the file. I'd expect mapping giant files to be fast enough to not need windowing.

jabl · on Jan 9, 2021

Good point, scratch that part of my answer.

There are still some cases where you'd not want unlimited VM mapping, but those are getting a bit esoteric and at least the most obvious ones are in the process of getting fixed.

searealist · on Jan 9, 2021

What year / hardware / kernel version are you talking about?

jabl · on Jan 9, 2021

Oh uh, IIRC 2004/2005 or thereabouts. Personally I was using PC HW running an up to date Linux distro, as I guess was the vast majority of the userbase, but there was a long tail of all kinds of weird and wonderful targets where the software was deployed.