Lucas Nussbaum gave me a tip that one can use the posix_fadvise(2) system call to let the Linux IO scheduler know what data you intend to use, so that it can fetch it before you have to block on the data. So I have written the fadvise RubyGem to make that system call accessible to Ruby.
File#fadvise(offset,len,advice) -> self
Advise the operating system how you intend to use the data in this file,
starting from byte offset and counting len bytes, so that the kernel can
schedule it to be fetched in the background while you do more processing. This
call is intended to avoid blocking later when you actually read the data, but
whether this actually happens is up to the the kernel's IO scheduler. Valid
values for advice are:
:normal:sequential:random:no_reuse:will_need:dont_need
This call does not block.
See the posix_fadvise(2) manpage for more information about these types of advice.