2015年2月4日星期三

MongoDB week4 Notes

In order to use index in MongoDB, we must give a leftmost set of the indexes. The order of the indexes matter.
db.students.ensureIndex( { student_id : 1 } )     -> create the index on student_id in increasing order
db.students.ensureIndex( { student_id : 1, class : -1 } )  -> create a compound index
db.system.indexes.find( )    -> find all the indexes in the current database, index default on _id field
db.students.getIndexes( )
db.students.dropIndex( { student_id : 1 } )  -> drop the created index
MongoDB allows to create a key on a field which is an array, the index is called multi-key index.
MongoDB allows to create a compound index with an array and a scale, but does not allow to array.
db.stuff.ensureIndex( { thing : 1 }, { unique : true } )  -> create unique index, each key can only appear once
db.stuff.ensureIndex( { thing : 1}, { unique : true, dropDups : true } )  -> drop the duplicates expect for one
sparse index, only create index on the document that has the specific field
In order to find which index to use for a query, MongoDB will experiment different indexes on real data in parallel to test which is optimal and memorize it
db.students.stats( )
Index Cardinality     
  • Regular     1 : 1
  • Sparse      <= documents
  • Multikey    > document ( index on each array elements )
Use hint( ) to manually tell MongoDB what index to use
ensureIndex( {“location” : “2d” } )  -> 2D geospatial index
find( { location : { $near : [x, y] } } )
db.places.find( { location : { $near : {
                                                          $geometry : {
                                                                 type : ‘Point’,
                                                                 coordinates : [x, y] },      
                                                          $maxDistance : 2000
                                              }
                           }
} )
db.sentences.ensureIndex( { ‘words’ : ‘text’ } ) -> support full text search
db.sentences.find( { $text : { $search : ‘dog moss’ } } )
use mongotop to find where does most time have been spent on
mongostat
idx miss -> how many times indexes are not in the memory when they are needed, an import factor

Shard: split up the large data into several mongod client as shards, use a mongos as a sever and let the application talk to mongos. It will use shard_key to issue which shards receive the query. The insert operation must contain the entire shard_key. For update and remove query, if shard_key is not given, mongos will broadcast the query to hall shards.

MongoDB week3 Notes

Always try to use embed data and pre-join the data, since there is no join function provided in mongoDB.
There is no guarantee in mongoDB for the consistence of the data, for example, the foreign key constraints. So pre-join the data to make it intact and consistence. 

One-to-one relationships
  • use true linking(“id”)
  • embed the document 
Things need to considerate
  • frequency of access 
  • size of items, growing
  • atomicity of data
One-to-many relationships
  • the best way is to use true linking ( the people living in a city ), when the “many” is large
  • if the data is few, then we can use embed documents ( the blog schema for comets ), when the “many” is few
Many-to-many relationships
  • the actual relationships are few-to-few, then we could embed an array of ids to link the two documents
  • another way is to use embedded documents, but this may not applicable in some situation, for example, in student-teacher relationship, we may insert a teacher into the system before he has any student
Multikey Indexes: index on a array, which makes embedding an array of links more efficiency to query many-to-many relations in MongoDB

Benefits of embedding
  • Improved read performance, reduce the seek latency since the document is stored sequentially on disk
  • One roundtrip to the DB

Tree representations
  • embed a list of children in the document
  • embed a list of ancestors in the document

Store large documents in MongoDB, larger than 16 MB: GridFS, break the large blobs into pieces to store in MongoDB. GridFS break the documents into two collections, one is called chunk collection and each document in it is 16MB, the other is called files collection, which describe the file put in the chunk collection. The documents in the chunk collection have a files_id associate with the files collection.

ODM: lays between application and driver, tell ODM how to handle the class and hand off objects to ODM,  then it will interact with the driver