Disch wrote:
koitsu wrote:
3) The speed of the device (disk, flash, etc.) has absolutely nothing to do with the problem.
I figured the problem might be related to write being "complete" when they're not really complete. Disk writes aren't instantaneous, and when you save a file or something, it doesn't necessarily save directly to the disk if the disk is busy, but might instead be put in a cache and queued to be saved when the drive has time.
If drives are faster, this is less of a problem, as the queues will be emptied faster, and writes commited sooner.
...
You shouln't need to know when the user is going to remove the device, is my point.
...
If you're writing something to the drive and it disappears, that's a big problem -- but programs typically make it clear that they're writing to the drive while they're writing. Like when you copy files to the drive, a progress box pops up. Or if you save something to it from another program the program typically
stalls until the save is complete (with an optional progress bar). Hopefully the user will have enough sense not to remove the drive during such a time.
The program does not typically stall -- meaning, the program is not sitting there spinning. It performs a write request and hopes to God the kernel takes care of it so it can continue to submit more. The actual "delays" you're seeing are pending I/O transactions happening at a very low level (between controller and disk). The kernel hides all of this from the program. The status bar you see is faked by the program; it's hardly accurate with regards to what's physically been written to the platters on a disk.
Let's focus on writes, because read errors aren't a big deal (I agree). Let me step you through what actually happens on a PC system when there's a write to a disk/storage device of any kind.
1) An application has a file descriptor open which points to a file on the device in question. The application decides to write some data the fd (goal being to write some data to the file).
2) The write gets handed off to a local library (in *IX, usually libc), which has its own form of cache.
3) A syscall is made to the kernel, who then decides what to do with the data being written, and *when*; it may write 3-4 seconds from now depending on what else is going on with the I/O scheduler. This is often measured as "wait time" (how big the request queue is in the kernel). The kernel submits a write request to the disk attached to the storage controller. All I/O goes through the controller.
4) The storage controller will read some of the data its pointed to in from memory (via PCI) and cache some of it. The response from the controller to the OS is "got X bytes" and tells the OS to go back to doing what its doing. This may happen multiple times for larger writes, obviously (kernel has to keep telling the controller "here's more data", controller has to respond "got X bytes"). Let's say the whole operation takes 1 second (big file).
5) The storage controller then submits the actual write request to the disk (on ATA/SATA, this is called WRITE DMA48). The disk responds with an acknowledgement, and then there's a three-way orgy between the disk, controller, and system RAM (RAM <--> controller <--> disk).
6) Now, the disk *also* has its own cache (large drives these days have up to 64MBytes of cache). This is used for both reading and writing. As the write requests are submit to the disk, the disk stores the data in its cache to be written to actual LBAs (sectors) on the disk at a later date. How much later? Again, could be immediate, could be multiple seconds depending on what the drive is doing at the time. I could talk for hours about things drives do behind-the-scenes which could stall this operation. Disk drives do a *lot* of behind-the-scenes "other stuff"; they haven't been "dumb devices" since the early 90s.
7) Let's say the disk isn't doing anything (no pending requests to handle, no internal operations to handle first or abort). So the disk gets to the write, and it takes a total of 1 second (including seek times). Now you have a write operation that took 2 physical seconds to complete: from the application writing the data to the kernel, to the controller, to the disk, and to the platters.
So what happens if someone yanks power or I/O to the disk/device during any of this, without telling the OS authoritatively? Yeah, you guessed it: data loss or file corruption (same thing really).
To effectively accomplish what you want, you need to disable all layers of write caching between the OS and the disk (and on the disk too). Is this doable? Yes, absolutely. What's the trade off? Speed. You'll take a *serious* performance hit on writes to the device. Some devices/disks will drop from ~40MB/sec to 2-3MByte/sec. with write caching disabled. I'm not making these numbers up.
Now on to the end-user perspective:
Not all programs these days make it clear that they're writing to a disk/device when they're doing so. Even those that do don't really know what's going on past the "I submit a write for X bytes to the kernel" phase; it has no knowledge if data was written to the physical medium or not. And it shouldn't, because all that time waiting slows everything down.
Furthermore, with software in mind: do you really know what your anti-virus software is doing behind the scenes as it sits there in the systray? Do you know what every Windows service (Indexing comes to mind!) and Linux daemon is doing behind the scenes at every moment in time? No you don't. Do this sometime: open up Task Manager and I/O writes on a per-program basis (in 2K/XP this is a column you can add in Task Manager; Vista/W7 should show you this automatically). Sort by that column. You'll be *incredibly* surprised at all the writing going on without any visual indication.
And what about the situation where someone yanks the data cord and AC power cord from the removable device (external USB disk, etc.) at once? Wow, now you've not only pulled the device off the bus in the middle of a potential I/O operation, but the disk itself no longer has power, meaning the disk cache can't be flushed to the actual platters. Hoo boy.
So in summary, the easiest way to solve this problem is to require the user to tell the OS in some way "hey! I want to flush all caches/writes to the device in question, and then unmount it, because I'm going to physically remove it".
Sorry for the long-winded post, but I go through this argument at lease once a year with someone, and I have to explain it verbosely every time before they understand exactly what "write()" does behind the scenes.
And how do I know all this crap? Because storage subsystems happen to be one of my fortés. For example, I just got done with a
disk analysis for
a FreeBSD user who was seeing his disks fall off the SATA bus for no reason a few days ago. Well, there's a reason......