The Heart of AURA

The AURA Project Data Store was designed to be a distributed, scalable, reliable data store that could be used as a repository for item data and metadata as well as user information and attention data that could be used by recommenders for various item types. Our aim is to be able to store information about millions of items and potentially billions of attention data points.

The AURA Project Data Store is composed of three parts:

You can see how these parts are organized in the diagram below.

A data store is started by starting each of its separate components. Each component of the data store finds the components to which it should be connected using a Jini Service Registrar. For example, a partition cluster will search for data store heads with which it should register itself. There is no global configuration for the data store, only configurations for the individual pieces.

The data store is meant to maintain itself, so each component of the data store continuously monitors the service registrar, noting when components of the data store are added or removed, and acting accordingly. For example, if a replicant sees that the partition cluster responsible for it disappears and then re-appears, it will reconnect itself to the partition cluster. The Data Store relies on the underlying grid infrastructure to help restart services when they terminate unexpectedly. All of this means that failures are self-correcting for the most part. Our aim is a data store that is hard to shut off.

The data store is also meant to grow itself as necessary. When a replicant begins to get too full (i.e., when queries against the replicant start to take too long), the partition cluster undertakes the task of splitting the replicant into two new replicants. Once a replicant has been split in two, a new partition cluster can be started to manage the new replicant. All of this can be done while the Data Store is under load, so it's never necessary to stop the Data Store (and the clients using the Data Store!) to add more capacity.

To give you some idea of the scalability of the current Data Store, The Music Explaura is supported by a 16 replicant Data Store that is capable of handling more than 14,000 concurrent users performing typical music recommendation tasks with sub-500ms response times at the client.