Fight_PY: 二月 2015

2015年2月4日星期三

MongoDB week4 Notes

In order to use index in MongoDB, we must give a leftmost set of the indexes. The order of the indexes matter.

db.students.ensureIndex( { student_id : 1 } ) -> create the index on student_id in increasing order

db.students.ensureIndex( { student_id : 1, class : -1 } ) -> create a compound index

db.system.indexes.find( ) -> find all the indexes in the current database, index default on _id field

db.students.getIndexes( )

db.students.dropIndex( { student_id : 1 } ) -> drop the created index

MongoDB allows to create a key on a field which is an array, the index is called multi-key index.

MongoDB allows to create a compound index with an array and a scale, but does not allow to array.

db.stuff.ensureIndex( { thing : 1 }, { unique : true } ) -> create unique index, each key can only appear once

db.stuff.ensureIndex( { thing : 1}, { unique : true, dropDups : true } ) -> drop the duplicates expect for one

sparse index, only create index on the document that has the specific field

In order to find which index to use for a query, MongoDB will experiment different indexes on real data in parallel to test which is optimal and memorize it

db.students.stats( )

Index Cardinality

Regular 1 : 1
Sparse <= documents
Multikey > document ( index on each array elements )

Use hint( ) to manually tell MongoDB what index to use

ensureIndex( {“location” : “2d” } ) -> 2D geospatial index

find( { location : { $near : [x, y] } } )

db.places.find( { location : { $near : {

$geometry : {

type : ‘Point’,

coordinates : [x, y] },

$maxDistance : 2000

}

} )

db.sentences.ensureIndex( { ‘words’ : ‘text’ } ) -> support full text search

db.sentences.find( { $text : { $search : ‘dog moss’ } } )

use mongotop to find where does most time have been spent on

mongostat

idx miss -> how many times indexes are not in the memory when they are needed, an import factor

Shard: split up the large data into several mongod client as shards, use a mongos as a sever and let the application talk to mongos. It will use shard_key to issue which shards receive the query. The insert operation must contain the entire shard_key. For update and remove query, if shard_key is not given, mongos will broadcast the query to hall shards.

MongoDB week3 Notes

Always try to use embed data and pre-join the data, since there is no join function provided in mongoDB.

There is no guarantee in mongoDB for the consistence of the data, for example, the foreign key constraints. So pre-join the data to make it intact and consistence.

One-to-one relationships

use true linking(“id”)
embed the document

Things need to considerate

frequency of access
size of items, growing
atomicity of data

One-to-many relationships

the best way is to use true linking ( the people living in a city ), when the “many” is large
if the data is few, then we can use embed documents ( the blog schema for comets ), when the “many” is few

Many-to-many relationships

the actual relationships are few-to-few, then we could embed an array of ids to link the two documents
another way is to use embedded documents, but this may not applicable in some situation, for example, in student-teacher relationship, we may insert a teacher into the system before he has any student

Multikey Indexes: index on a array, which makes embedding an array of links more efficiency to query many-to-many relations in MongoDB

Benefits of embedding

Improved read performance, reduce the seek latency since the document is stored sequentially on disk
One roundtrip to the DB

Tree representations

embed a list of children in the document
embed a list of ancestors in the document

Store large documents in MongoDB, larger than 16 MB: GridFS, break the large blobs into pieces to store in MongoDB. GridFS break the documents into two collections, one is called chunk collection and each document in it is 16MB, the other is called files collection, which describe the file put in the chunk collection. The documents in the chunk collection have a files_id associate with the files collection.

ODM: lays between application and driver, tell ODM how to handle the class and hand off objects to ODM, then it will interact with the driver