What we think is needed (if not inevitable) is a way to store heterogenous (unruly, unpredictable, complexly structured) data so it can be accessed quickly for processing by programs. The quickest way to access data for processing is to store it in the form it's used. By using mmap(), under Unix, you can keep the data on disk in precisely the form it's processed in, transparently, and persistently.
Very few programs run once, produce only what you see now, and then go away. A whole class of applications produce data which is intended to be around for a long time (relative to a given run of the program.)
Currently, conventionally, the data's explicitly written out to disk. Usually in a special storage hardened form. Of course, it has to be read back in again next time you want to process it.
- Programmer effort is expended in structuring data to fit into homogenous (fixed size, fixed structure) containers (such as records, tuples, indices, tables, databases).
- Program execution time is spent serialising (pickling) data for long term storage, and deserialising (parsing) it back into the program's address space so it can be operated on.
- Some data (most data) just doesn't fit well into fixed records. Even something as well undestood as a Customer record conventionally reserves worst-case space for several address lines (this is called internal fragmentation in storage allocation thinking) and still can't cope if there's an even worse case nobody foresaw.
- Increasingly, data (in the old database sense) is losing importance relative to text (emails, web pages, SGML/XML documents.) Trouble is, databases containing text are hard to index and process, because text just doesn't fit well into fixed sized records.
- Textual data is becoming more highly structured: SGML/XML/HTML are examples, RFC822 formatted messages (email), MIME encapsulated files all have rich and hard to represent/store/process formats. Serialising and deserialising textual data is going to consume more processing time.
- Object Request Brokers (CORBA, ILU, etc.) focus on communication and interoperability, which may or may not be important in a given application, but which slows them down. You would never consider using CORBA to store your data for processing ... well, perhaps you would, but I wouldn't, it has to spend too much time finding it and translating it. Coldstore data is not encoded at all.
Why ColdStore? An explanation of what we're up to.
The nitty-gritty: an explanation of ColdStore's design, what it
does, and what it might be used for.
Get it, compile it, run it, tweak it. Lather, rinse,
repeat: all with the fresh smell of GNU.
We think that ColdStore has potential. Here's what's in the
offing: from the nearly feasible to the bright blue sky. You can
probably help us out here.
Though it sometimes feels that way, we're not the only ones thinking
about these things. A few references to articles on object persistence
and all that gubbins.
Praise the visionaries behind this thing; alternatively,
berate the guilty parties.
That Oscar acceptance speech in full.