2014年12月29日星期一

MongoDB Notes Part III

Sharding Introduction 

Sharding is a method of storing data across multiple machines. MongoDB uses sharding to support large data set and high throughput operations.

Two approaches to address the scales
  • vertical scaling
    • adds more CPU and storage resources to increase capacity.
      • As a result there is a practical maximum capacity for vertical scaling.
  • sharding (horizontal scaling)
    • divides data set and distributes the data over multiple servers, or shards.
      • sharding reduces the number of operations each shard handles.
      • sharding reduces the amount of data that each serve needs to store.

Sharding in MongoDB


Sharded cluster has three components: 
  • Shards
    • store the data, each shard is a replica set or a single mongod instance.
  • Query Routers
    • interface with client and direct operation to appropriate shard or shards.
  • Config servers
    • store the cluster’s metadata, which contains the mapping of clusters’ data to shards. There are exactly three config servers.
    • uses two phase commit to confirm immediate consistency and reliability.
    • clusters become inoperable without the cluster metadata, always ensure config servers are available.

Data Partitioning

Sharding partitions a collection’s data by the shard key.
A shard key is either an indexed field or an indexed compound field that exists in every document in the collection. MongoDB divides the shard key values into chunks and distributes them evenly on shards. Shard keys are immutable and cannot be changed after insertion.

Range Based Sharding
  • divides the data set into ranges determined by the shard key values.
  • good for range queries, but the data might not evenly distributed.

Hash Based Sharding
  • computes the hash of a field’s value and then uses the hashes to create chunks.
  • data is distributed more evenly, but the efficiency of range queries goes down.

Maintaining a Balanced Data Distribution

Splitting: A background process that keeps chunk from growing too large. When a chunk grows beyond a specified chunk size, MongoDB splits the chunk in half.

Balancing: A background process that migrate chunks.
  • First, the destination shard is sent all the documents in the chunk of origin shard.
  • Second, the destination shard captures and applies all changes to the data during the migration.
  • Finally, the metadata regarding the location of the chunk on config server are updated.
  • MongoDB removes all chunks on origin shard after the migration is successful.

Broadcast Operations and Target Operations



  • broadcast queries to all shards unless the mongos can determine the single or subset shards to process.
  • The remove( ) is always broadcast.
  • All insert( ) operation targets to one shard.

没有评论:

发表评论