How to Deal with I/O Expense

  • For a lot of problems, processors are fast compared to the cost of communicating with a hardware device.

    • Hardware is faster compared to communicating over a network

  • Other types of IO

    • disk

    • databases

    • file

    • network

    • hardware not close to cpu, ie usb, external drives etc

  • There is a lot of gain from improving IO costs

How

  • Caching

    • Caching is avoiding I/O (generally avoiding the reading of some abstract value) by storing a copy of that value locally so no I/O is performed to get the value.

      • Or putting it somewhere with less IO cost then where it was originally retrieved/accessed from

    • The first key to caching is to make it crystal clear which data is the master and which are copies.

      • There is only one master - period.

      • gossip protocol, master/slave protocol

    • Caching brings with it the danger that the copy sometimes can't reflect changes to the master instantaneously

  • Representation

    • is the approach of making I/O cheaper by representing data more efficiently.

    • This is often in tension with other demands, like human readability and portability.

    • ie

      • binary representation instead of one that is human readable

      • transmitting a dictionary of symbols along with the data so that long symbols don't have to be encoded,

      • and, at the extreme, things like Huffman encoding

  • pushing the computation closer to the data

    • improve the locality of reference

    • if you are reading some data from a database and computing something simple from it, such as a summation, try to get the database server to do it for you

    • If you are searching or sorting some data, instead of getting it all into memory, ask the database to do it for you and retrieve the results

Last updated