BangDB - Embeddable Flavor

Bangdb - Overview

BangDB is a multi flavored distributed key value nosql data store. The goal of BangDB is to be fast, reliable, robust, scalable and easy to use data store for various data management services required by applications. Few highlights of the db are;

Key Value Store

BangDB is a key value store. It can be run in-memory or backed by disk/ssd to handle large amount of data in economical manner. BangDB is fully transactional database where it guarantees full ACID for all the operations executed within the transactional boundary. It uses OCC (Optimistic concurrency control) and locks to achieve full ACID. The database supports write ahead logging which also ensures that db recovers from any crash in an efficient manner. The Access method or index support for keys are provided through Btree and Ext Hash implementation.

The db has been written for high performance hence there are several components which optimizes the resources usage to ensure high throughput and low latency. The db implements its own buffer pool, buffer management system, quasi adaptive page prefetch and flush mechanism, slab allocator and many other optimizations for ensuring high performance even under stress or load using just the commodity hardware. The db can be run fully in-memory or backed by disk/ssd. This allows user to scale economically.

BangDB supports rich API set. Apart from simple CRUD operations, it also provides flexible range query support (when used with BTree as access menthod). Using the scan api, user can do all sorts of query for the desired data set. BangDB also provides range of configuration parameters which can be used to make the db run in suitable manner and further tune it for a given environment

BangDB comes in many flavors

The BangDB can be used without changing the logic or code in many fashions suitable for different needs. BangDB is easy to use db as part of process, as network client server model or as in memory data grid (p2p, with ssd/disk backing as option). Following are the supported flavors;

Embedded (In Proc)

The db becomes part of the process (hence embedded) and provides db services to the application directly. This flavor is good for apps that want to access data in fastest possible manner, without incurring network overhead. Typical use cases for this db are for ex; caching product catalog at the application level, storing semi static data near the application, app specific data, local computational data

Client Server Model

The db runs as network service and clients access it over the network. Replication with secondaries (slaves) can be set based on need and db ensures that the data is replicated according to the setting. BangDB syncs and replicates the data without halting the server operation, even a new slave can be added without disturbing the server's operation as server continues to run while a new slave is added to the cluster. BangDB solves the C10K issue as it can support thousands of concurrent active connections without incurring much overhead. This model is good for sharing data with multiple apps or instances of an app. Typical use case for this flavor is cache on top of the database, a network data store

Distributed data grid/ Elastic Cache

This model is single machine view of the entire data cluster where each node runs an instance of BangDB. The data is distributed across the cluster and individual node provides the services for fraction of the overall data. The machines are added or removed from the cluster without affecting the overall SLA of the data services. Typical use case would be to provide shared distributed data cache for applications for performance, throughput, scalability, high availability. Also it can be used by application where large amount of data has to be stored and anlyzed in fastest possible manner

Persistent data store (not just caches)

BangDB can be configured to be used just as cache or as persistent store. The data is consistently and frequently being flushed to disk to clear up free space in the buffer pool when run as persistent store. The continuous and sequential log flush ensures data durability even though the data itself is not written to disk most of the time until needed

The db provides the least granularity compared to many other dbs for durability of the data. For ex; user can set the log flush frequency in milli seconds, this is critical as in case of process crash or any other such events, data loss would be minimal as db would try to recover data as much as possible. BangDB implements the variant of write ahead log for data durability and at the same time having least impact on the performance

Transactional

The db supports ACID with highest degree of isolation. It implements optimistic concurrency control (occ) couple with write ahead logging to ensure the full ACID for single or multiple operations across many tables in a transaction boundary. BangDB leverages the cores on machine by using the parallel serializability validation, thus allowing multiple transactions to execute concurrently and yet ensuring that ACID promise in intact. Hence BangDB performs very well in transactional mode as well. The transaction is offerred as config parameter and same db files can be run in transaction or non-transaction mode. Please see more on transaction in BangDB

Scales to available memory, elastic in nature, throw machines to scale linearly

The design of BangDB leverages the available and allocated memory to the fullest. In-fact reserving more memory for the db ensures better performance for high volume of data. However, to handle even bigger data (in multiple terrabytes or more), adding more machines would help. With the elastic data grid, user will just have to throw in machines to scale to new load linearly

The need of cost consciousness in the design is critical and hence purposing, re-purposing, provisioning, growing and shrinking have to be done efficiently and without affecting the overall system. Hence the BangDB elastic cache is designed to tolerate high internal churing

High performance, high concurrency

The BangDB is highly concurrent and runs parallel operations as much as possible. The various design techniques in write ahead logging, buffer pool design and the background workers have allowed the BangDB to achieve very high performance with less amount of code. As of now as per our benchmark analysis, BangDB runs faster than Oracle's berkley db and Google's leveldb in terms of IPOS

Robust, crash proof, available, resilient

BangDB core implements write ahead log and offer it as part of configuration for the db. When data durability is required, user should enable log and set the log flush frequency as per need. The db takes care of frequent write of the log to disk, hence in the event of data not being written to disk and db crash, BangDB recovers the data when restarted

Runs on commodity hardware

BangDB is designed to run on commodity hardware. It can run even with smallest amount of memory committed to it. In real practical world, user may allocate as much memory as needed or available

Consistent vs available

The write to a particular node is always consistent. However across the cluster user has more than one option BangDB can be set as ACID within a node of the cluster and eventually consistent across the cluster. This is avaialbe in in memory data grid version of the db

Easy to use, deploy and manage

All flavors of BangDB have the same API. Basically the flavor of the BangDB is in a way abstracted from client. Client always sees a single BangDB to operate with. Since BangDB provides the same old get,put,delete API hence it's pretty straight forward for developers to start playing with the db

BangDB is easy to manage as it doesn't require dedicated db admin. All flavors are self managed. The elastic cache will have admin portal though to view some stats and health of cluster