2014年12月30日星期二

MongoDB Notes Part IV


Data Modeling Introduction

MongoDB’s collections do not enforce document structure.

Tools for represent the relationships:
  • References
    • store the relationships between data by including links or references from one document to another.
    • normalized data models.

  • Embedded Data
    • store the relationships between data by storing related data in a single document structure.
    • denormalized data model.
    • could guarantee atomicity since all data are in a single document.

In general, use embedded data models when:
  • you have “contains” relationships between entities.
  • you have one-to-many relationships between entities.

Embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. However, this may lead to situations where documents grow after creation.

To interact with the embedded document, use “dot notation”.

In general, use references when:
  • when embedding would cause duplicates and would not bring any advantages.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.

GridFS stores files in two collections:
  • chunks stores the binary chunks.
  • files stores the file’s metadata. 

Model Tree Structures
  • use Parent References, store the reference to the parent category in the field parent.
  • use Child References, store all the reference of the child category in the field children.
  • use Array of Ancestors, provides a fast and efficient way to find the descendants and ancestors of a node by creating an index on the ancestors field.
  • use Materialized Paths, store the path in the field path, the path string uses the comma a a delimiter.

  • use Nested Sets, best for static trees that do not change.

To support keyword search, contains a field of the keywords, and create a multi-key index on this field.

Index Introduction

Index types
  • Single Field Indexes
    • A default index is create on the _id field.
    • The field could be the embedded field or a subdocument.
  • Compound Indexes
    • The order of the fields matter.
    • Supports queries on any prefix of the index fields.
  • Multikey Indexes
    • to create an index on an array, adds index items for each element in the array.
    • the index of a shard key can not be multi key index.

  • Geospatial Indexes
  • Text Indexes
    • to support text search of string content in documents of a collection

  • TTL indexes
    • special indexex that MongoDB can use to automatically remove documents from a collection after a certain amount of time.

  • Unique indexes
    • cause MongoDB to reject all documents that contain a duplicate value for the indexed field.

  • Sparse indexes
    • only contain entries for documents that have the indexed field.

Remove a Specific Index


Modify an Index
  • First drop the index and then build the index

List all Indexes on a Collection

2014年12月29日星期一

[Leetcode] Majority Element

The solution is also available here:https://gist.github.com/pyemma/81269253375b3f9714e2
/*
 * Author: Yang Pei
 * Problem: Majority Element
 * Source: https://oj.leetcode.com/problems/majority-element/
 * 
 * Note:
 * Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times.
 * You may assume that the array is non-empty and the majority element always exist in the array.
 * 
 * Solution:
 * Naive method: Use additional space to hold each element's appearances ---> O(n) space complexity
 * Greed method: Use a box to hold the majority candidate and count its number ---> O(1) space complexity
 * 
 * Follow up:
 * What if the array does not contain a major?
 *   Final check the candidate to see if its count
 */
public class MajorityElement {
 public int majorityElement(int[] num) {
  int n = num.length;
  int major = 0, count = 0;
  for(int i = 0; i < n; i++) {
   if(count == 0) {
    major = num[i];
    count++;
   }
   else {
    if(major == num[i])
     count++;
    else
     count--;
   }
  }
  return major;
 }
}

[Leetcode] Find Peak Element

The code is also available here https://gist.github.com/pyemma/31c930a838b40c847d8a

/*
 * Author: Yang Pei
 * Problem: Find Peak Element
 * Source: https://oj.leetcode.com/problems/find-peak-element/
 * 
 * Note:
 * A peak element is an element that is greater than its neighbors.
 * Given an input array where num[i] ≠ num[i+1], find a peak element and return its index.
 * The array may contain multiple peaks, in that case return the index to any one of the peaks is fine.
 * You may imagine that num[-1] = num[n] = -∞.
 * 
 * For example, in array [1, 2, 3, 1], 3 is a peak element and your function should return the index number 2.
 * 
 * Solution:
 * Naive method: sweep the entire array and find the peak element --> take O(n) time
 * 
 * Binary search method: key idea is to drop as much as portion of array as possible.
 *   We need to check the condition of num[mid] with num[mid+1] to determine
 *   how we drop the array.
 *    If num[mid] < num[mid+1], we know mid would not be the answer
 *    and there must exist a peak from mid+1 to r, we set l <-- mid+1.
 *    Otherwise we have num[mid] > num[mid+1], we know mid could be the answer
 *    and there must exist a peak from l to mid, we set r <-- mid.
 *    When mid == num.length - 1, this indicate that the last element is a peak,
 *    we just l <-- l+1 to break out the loop.
 *    When l == r, according to our operation, we know this would be a peak, we
 *    could break out the loop, so in the outer loop, we set the condition to
 *    "l < r" instead of "l <= r".
 * 
 * Attention:
 * Pay attention to the condition in the outer while loop, if "l <= r" setted, it would
 * involve in infinite loop.
 */
public class FindPeakElement {
 public int findPeakElement(int[] num) {
  int l = 0, r = num.length - 1;
  while(l < r) {
   int mid = l + (r - l) / 2;
   if(mid < num.length - 1) {
    if(num[mid] < num[mid + 1])
     l = mid + 1;
    else
     r = mid;
   }
   else
    l = mid + 1;
  }
  return r;
 }
}

[转] 多态在 Java 和 C++ 编程语言中的实现比较

众所周知,多态是面向对象编程语言的重要特性,它允许基类的指针或引用指向派生类的对象,而在具体访问时实现方法的动态绑定。C++ 和 Java 作为当前最为流行的两种面向对象编程语言,其内部对于多态的支持到底是如何实现的呢,本文对此做了全面的介绍。
注意到在本文中,指针和引用会互换使用,它们仅是一个抽象概念,表示和另一个对象的连接关系,无须在意其具体的实现。

Java 的实现方式

Java 对于方法调用动态绑定的实现主要依赖于方法表,但通过类引用调用和接口引用调用的实现则有所不同。总体而言,当某个方法被调用时,JVM 首先要查找相应的常量池,得到方法的符号引用,并查找调用类的方法表以确定该方法的直接引用,最后才真正调用该方法。以下分别对该过程中涉及到的相关部分做详细介绍。

JVM 的结构

典型的 Java 虚拟机的运行时结构如下图所示
图 1.JVM 运行时结构
图 1.JVM 运行时结构
此结构中,我们只探讨和本文密切相关的方法区 (method area)。当程序运行需要某个类的定义时,载入子系统 (class loader subsystem) 装入所需的 class 文件,并在内部建立该类的类型信息,这个类型信息就存贮在方法区。类型信息一般包括该类的方法代码、类变量、成员变量的定义等等。可以说,类型信息就是类的 Java 文件在运行时的内部结构,包含了改类的所有在 Java 文件中定义的信息。
注意到,该类型信息和 class 对象是不同的。class 对象是 JVM 在载入某个类后于堆 (heap) 中创建的代表该类的对象,可以通过该 class 对象访问到该类型信息。比如最典型的应用,在 Java 反射中应用 class 对象访问到该类支持的所有方法,定义的成员变量等等。可以想象,JVM 在类型信息和 class 对象中维护着它们彼此的引用以便互相访问。两者的关系可以类比于进程对象与真正的进程之间的关系。

Java 的方法调用方式

Java 的方法调用有两类,动态方法调用与静态方法调用。静态方法调用是指对于类的静态方法的调用方式,是静态绑定的;而动态方法调用需要有方法调用所作用的对象,是动态绑定的。类调用 (invokestatic) 是在编译时刻就已经确定好具体调用方法的情况,而实例调用 (invokevirtual) 则是在调用的时候才确定具体的调用方法,这就是动态绑定,也是多态要解决的核心问题。
JVM 的方法调用指令有四个,分别是 invokestatic,invokespecial,invokesvirtual 和 invokeinterface。前两个是静态绑定,后两个是动态绑定的。本文也可以说是对于 JVM 后两种调用实现的考察。

常量池(constant pool)

常量池中保存的是一个 Java 类引用的一些常量信息,包含一些字符串常量及对于类的符号引用信息等。Java 代码编译生成的类文件中的常量池是静态常量池,当类被载入到虚拟机内部的时候,在内存中产生类的常量池叫运行时常量池。
常量池在逻辑上可以分成多个表,每个表包含一类的常量信息,本文只探讨对于 Java 调用相关的常量池表。
CONSTANT_Utf8_info
字符串常量表,该表包含该类所使用的所有字符串常量,比如代码中的字符串引用、引用的类名、方法的名字、其他引用的类与方法的字符串描述等等。其余常量池表中所涉及到的任何常量字符串都被索引至该表。
CONSTANT_Class_info
类信息表,包含任何被引用的类或接口的符号引用,每一个条目主要包含一个索引,指向 CONSTANT_Utf8_info 表,表示该类或接口的全限定名。
CONSTANT_NameAndType_info
名字类型表,包含引用的任意方法或字段的名称和描述符信息在字符串常量表中的索引。
CONSTANT_InterfaceMethodref_info
接口方法引用表,包含引用的任何接口方法的描述信息,主要包括类信息索引和名字类型索引。
CONSTANT_Methodref_info
类方法引用表,包含引用的任何类型方法的描述信息,主要包括类信息索引和名字类型索引。
图 2. 常量池各表的关系
图 2. 常量池各表的关系
可以看到,给定任意一个方法的索引,在常量池中找到对应的条目后,可以得到该方法的类索引(class_index)和名字类型索引 (name_and_type_index), 进而得到该方法所属的类型信息和名称及描述符信息(参数,返回值等)。注意到所有的常量字符串都是存储在 CONSTANT_Utf8_info 中供其他表索引的。

方法表与方法调用

方法表是动态调用的核心,也是 Java 实现动态调用的主要方式。它被存储于方法区中的类型信息,包含有该类型所定义的所有方法及指向这些方法代码的指针,注意这些具体的方法代码可能是被覆写的方法,也可能是继承自基类的方法。
如有类定义 Person, Girl, Boy,
清单 1
 class Person { 
 public String toString(){ 
    return "I'm a person."; 
  } 
 public void eat(){} 
 public void speak(){} 
 
 } 

 class Boy extends Person{ 
 public String toString(){ 
    return "I'm a boy"; 
  } 
 public void speak(){} 
 public void fight(){} 
 } 

 class Girl extends Person{ 
 public String toString(){ 
    return "I'm a girl"; 
  } 
 public void speak(){} 
 public void sing(){} 
 }
当这三个类被载入到 Java 虚拟机之后,方法区中就包含了各自的类的信息。Girl 和 Boy 在方法区中的方法表可表示如下:
图 3.Boy 和 Girl 的方法表
图 3.Boy 和 Girl 的方法表
可以看到,Girl 和 Boy 的方法表包含继承自 Object 的方法,继承自直接父类 Person 的方法及各自新定义的方法。注意方法表条目指向的具体的方法地址,如 Girl 的继承自 Object 的方法中,只有 toString() 指向自己的实现(Girl 的方法代码),其余皆指向 Object 的方法代码;其继承自于 Person 的方法 eat() 和 speak() 分别指向 Person 的方法实现和本身的实现。
Person 或 Object 的任意一个方法,在它们的方法表和其子类 Girl 和 Boy 的方法表中的位置 (index) 是一样的。这样 JVM 在调用实例方法其实只需要指定调用方法表中的第几个方法即可。
如调用如下:
清单 2
 class Party{ 
…
 void happyHour(){ 
 Person girl = new Girl(); 
 girl.speak(); 
…
  } 
 }
当编译 Party 类的时候,生成 girl.speak()的方法调用假设为:
Invokevirtual #12
设该调用代码对应着 girl.speak(); #12 是 Party 类的常量池的索引。JVM 执行该调用指令的过程如下所示:
图 4. 解析调用过程
图 4. 解析调用过程
JVM 首先查看 Party 的常量池索引为 12 的条目(应为 CONSTANT_Methodref_info 类型,可视为方法调用的符号引用),进一步查看常量池(CONSTANT_Class_info,CONSTANT_NameAndType_info ,CONSTANT_Utf8_info)可得出要调用的方法是 Person 的 speak 方法(注意引用 girl 是其基类 Person 类型),查看 Person 的方法表,得出 speak 方法在该方法表中的偏移量 15(offset),这就是该方法调用的直接引用。
当解析出方法调用的直接引用后(方法表偏移量 15),JVM 执行真正的方法调用:根据实例方法调用的参数 this 得到具体的对象(即 girl 所指向的位于堆中的对象),据此得到该对象对应的方法表 (Girl 的方法表 ),进而调用方法表中的某个偏移量所指向的方法(Girl 的 speak() 方法的实现)。

接口调用

因为 Java 类是可以同时实现多个接口的,而当用接口引用调用某个方法的时候,情况就有所不同了。Java 允许一个类实现多个接口,从某种意义上来说相当于多继承,这样同样的方法在基类和派生类的方法表的位置就可能不一样了。
清单 3
interface IDance{ 
   void dance(); 
 } 

 class Person { 
 public String toString(){ 
   return "I'm a person."; 
  } 
 public void eat(){} 
 public void speak(){} 
 
 } 

 class Dancer extends Person 
 implements IDance { 
 public String toString(){ 
   return "I'm a dancer."; 
  } 
 public void dance(){} 
 } 

 class Snake implements IDance{ 
 public String toString(){ 
   return "A snake."; 
  } 
 public void dance(){ 
 //snake dance 
  } 
 }
图 5.Dancer 的方法表(查看大图
图 5.Dancer 的方法表
可以看到,由于接口的介入,继承自于接口 IDance 的方法 dance()在类 Dancer 和 Snake 的方法表中的位置已经不一样了,显然我们无法通过给出方法表的偏移量来正确调用 Dancer 和 Snake 的这个方法。这也是 Java 中调用接口方法有其专有的调用指令(invokeinterface)的原因。
Java 对于接口方法的调用是采用搜索方法表的方式,对如下的方法调用
invokeinterface #13
JVM 首先查看常量池,确定方法调用的符号引用(名称、返回值等等),然后利用 this 指向的实例得到该实例的方法表,进而搜索方法表来找到合适的方法地址。
因为每次接口调用都要搜索方法表,所以从效率上来说,接口方法的调用总是慢于类方法的调用的。

C++ 的实现方式

从上文可以看到,Java 对于多态的实现依赖于方法表,但比较特殊的是,对于接口的支持是非常不同的,每次调用都要搜索方法表。实际上,在 C++ 中,单继承时对于多态的实现非常类似于 Java,但由于支持多重继承,这会碰到和 Java 支持接口动态调用同样的问题,C++ 的解决方案是利用对象的多个方法表指针,不幸的是,这会引入额外的指针调整的复杂性。

单继承

单继承时,C++ 对于多态的实现本质上与 Java 是一样的,也是基于方法表。但 C++ 在编译时就可以确认要调用的方法在方法表中的位置,而没有 JVM 在方法调用时查询常量池的过程。
C++ 编译时,编译器会自动做很多工作,其中之一就是在需要时在对象插入一个变量 vptr 指向类的方法表。如 Person,、Girl 的类定义与上文中 Java 类似,若
清单 4
class Person{ 
  . . . 
 public : 
    Person (){} 
    virtual ~Person (){}; 
    virtual void speak (){}; 
    virtual void eat (){}; 
 }; 

class Girl : public Person{ 
  . . . 
   public : 
   Girl(){} 
   virtual ~Girl(){}; 
   virtual void speak(){}; 
   virtual void sing(){}; 
 };
则 Person 与 Girl 实例的内存对象模型为:
图 6.Person 与 Girl 的对象模型
图 6.Person 与 Girl 的对象模型
如下的调用代码
 Person *p = new Girl(); 
 p->speak(); 
 p->eat();
经编译器编译后调用代码为:
 p->vptr[1](p); 
 p->vptr[2](p);
这样在运行时,会自然的过渡到对 Girl 的相应函数的调用。
可以看到方法表中没有各自的构造函数,这是因为 C++ 的方法表中仅含有用 virtual 修饰的方法,非 virtual 的方法是静态绑定的,没有必要占用方法表的空间。这与 Java 是不同的,Java 的方法表含有类所支持的所有的方法,可以说,Java 类的所有方法都是”virtual”(动态绑定)的。

多重继承

多重继承下,情况就完全不一样了,因为两个不同的类,其继承自与同一个基类的方法,在各自的方法表中的位置可能不同(和 Java 中的接口情况类似),但 Java 在运行时有 JVM 的支持,C++ 在这里引入了多个指向方法表的指针来解决这个问题,由此带来了调整指针位置的额外复杂性。
若有如下关系的三个类,Engineer 继承自 Person 和 Employee
图 7. 类静态结构关系图
图 7. 类静态结构关系图
Engineer 实例对象模型为:
图 8.Engineer 对象模型
图 8.Engineer 对象模型
可以看到 Engineer 实例有两个指向方法表的指针,这是与 Java 大不相同的。
设有如下的代码 ,
清单 5
 Engineer *p = new Engineer(); 
 Person * p1 = (Person *)p; 
 Empolyee *p2 = (Employee *)p;
则各指针在运行时分别指向各自的子对象,如下所示:
图 7.Engineer 实例
图 7.Engineer 实例
C++ 中对象的指针总是指向对象的起始处,如上述代码中,p 是 Engineer 对象的起始地址,而 p1 指向 p 转型成 Person 子对象的指针,可以看到实际上,两者是相等的;但 Employee 子对象的指针 p2 则于 p 和 p1 不同,实际上
 p2 = p + sizeof(Person); 
 p1->eat(); 
 p2->work();
则编译后生成的调用代码为:
 *(p1->vptr1[i]) (p1) 
 *(p2->vptr2[j]) (p2)
某些情况下,甚至需要将 this 指针调整到整个对象的起始处,如:
 delete p2;
析构函数的 this 指针要被调整到 p 所指向的位置,否则则会出现内存泄漏。设析构函数在方法表中的位置为 0,则编译后为:
 *(p2->vptr2[0]) (p)
对于指针的调整,编译器没有足够的知识在编译时刻完成这个任务。如上例中,对于 p2 所指向的对象,该对象类型可能是 Employee 或任何该类的子类 ( 其它的子类如 Teacher 等 ),编译器无法确切的知道 p2 和整个对象的初始地址的距离 (offset), 这样的调整只能发生在运行时刻。
一般有两种方法来调整指针,如下图:
图 8. 指针调整 - 扩展方法表
图 8. 指针调整 - 扩展方法表
这种方法将指针所有调整的 offset 存储于方法表的每个条目中,当调用方法表中的方法时,首先利用 offset 的值完成指针调整再做实际的调用。缺点显而易见,增加了方法表的大小,而且并不是每个方法都需要做指针调整。
图 9. 指针调整 -thunk 技术
图 9. 指针调整 -thunk 技术
这就是所谓的 thunk 技术,方法表的每个条目指向一小段汇编代码,这段代码来保证做指针调整和调用正确的方法,相当于加了一层抽象。

MongoDB Notes Part III

Sharding Introduction 

Sharding is a method of storing data across multiple machines. MongoDB uses sharding to support large data set and high throughput operations.

Two approaches to address the scales
  • vertical scaling
    • adds more CPU and storage resources to increase capacity.
      • As a result there is a practical maximum capacity for vertical scaling.
  • sharding (horizontal scaling)
    • divides data set and distributes the data over multiple servers, or shards.
      • sharding reduces the number of operations each shard handles.
      • sharding reduces the amount of data that each serve needs to store.

Sharding in MongoDB


Sharded cluster has three components: 
  • Shards
    • store the data, each shard is a replica set or a single mongod instance.
  • Query Routers
    • interface with client and direct operation to appropriate shard or shards.
  • Config servers
    • store the cluster’s metadata, which contains the mapping of clusters’ data to shards. There are exactly three config servers.
    • uses two phase commit to confirm immediate consistency and reliability.
    • clusters become inoperable without the cluster metadata, always ensure config servers are available.

Data Partitioning

Sharding partitions a collection’s data by the shard key.
A shard key is either an indexed field or an indexed compound field that exists in every document in the collection. MongoDB divides the shard key values into chunks and distributes them evenly on shards. Shard keys are immutable and cannot be changed after insertion.

Range Based Sharding
  • divides the data set into ranges determined by the shard key values.
  • good for range queries, but the data might not evenly distributed.

Hash Based Sharding
  • computes the hash of a field’s value and then uses the hashes to create chunks.
  • data is distributed more evenly, but the efficiency of range queries goes down.

Maintaining a Balanced Data Distribution

Splitting: A background process that keeps chunk from growing too large. When a chunk grows beyond a specified chunk size, MongoDB splits the chunk in half.

Balancing: A background process that migrate chunks.
  • First, the destination shard is sent all the documents in the chunk of origin shard.
  • Second, the destination shard captures and applies all changes to the data during the migration.
  • Finally, the metadata regarding the location of the chunk on config server are updated.
  • MongoDB removes all chunks on origin shard after the migration is successful.

Broadcast Operations and Target Operations



  • broadcast queries to all shards unless the mongos can determine the single or subset shards to process.
  • The remove( ) is always broadcast.
  • All insert( ) operation targets to one shard.

2014年12月28日星期日

[Leetcode] Text Justification

The code is also available here https://gist.github.com/pyemma/b8b0f771085823ec4b87
/*
 * Author: Yang Pei
 * Problem: Text Justification
 * Source: https://oj.leetcode.com/problems/text-justification/
 * 
 * Note:
 * Given an array of words and a length L, format the text such that each line has exactly L characters and is fully (left and right) justified.
 * You should pack your words in a greedy approach; that is, pack as many words as you can in each line. Pad extra spaces ' ' when necessary so that each line has exactly L characters.
 * Extra spaces between words should be distributed as evenly as possible. If the number of spaces on a line do not divide evenly between words, the empty slots on the left will be assigned more spaces than the slots on the right.
 * For the last line of text, it should be left justified and no extra space is inserted between words.
 * 
 * For example,
 * words: ["This", "is", "an", "example", "of", "text", "justification."]
 * L: 16.
 * 
 * Return the formatted lines as:
 * [
 * "This    is    an",
 * "example  of text",
 * "justification.  
 * ]
 * Note: Each word is guaranteed not to exceed L in length.
 * 
 * Solution:
 * Use one function to determine the words, use one function to make the line.
 * 
 * Corner case:
 * The last line and the line with only one words, we should left-justified.
 */
import java.util.*;

public class TextJustification {
 private int wordCount(String[] words, int beg, int L) {
  // return the first of next line's word's index
  int count = 0;
  int i = beg;
  for(i = beg; i < words.length; i++) {
   count = count + words[i].length();
   if(count > L)
    break;
   count = count + 1;
  }
  return i;
 }
 
 private String newLine(String[] words, int beg, int end, int L) {
  int count = 0;
  for(int i = beg; i < end; i++)
   count = count + words[i].length();
  int number = end - beg;
  int blank = number == 1 ? ((L - count) / (number - 1)) : L - count;
  int extra = number == 1 ? ((L - count) % (number - 1)) : 0;
  String bl = "";
  for(int i = 0; i < blank; i++)
   bl = bl + " ";
  StringBuffer line = new StringBuffer();
  for(int i = beg; i < end; i++) {
   line.append(words[i]);
   if(beg != i && i == end - 1)
    break;
   line.append(bl);
   if(extra > 0) {
    line.append(" ");
    extra--;
   }
  }
  return line.toString();
 }
 
 private String lastLine(String[] words, int beg, int L) {
  StringBuffer line = new StringBuffer();
  for(int i = beg; i < words.length; i++) {
   line.append(words[i]);
   if(i != words.length - 1)
    line.append(" ");
  }
  int blank = L - line.length();
  for(int i = 0; i < blank; i++)
   line.append(" ");
  return line.toString();
 }
 
 public List<String> fullJustify(String[] words, int L) {
  int ind = 0;
  List<String> result = new ArrayList<String>();
  while(ind < words.length) {
   int beg = ind;
   ind = wordCount(words, beg, L);
   if(ind == words.length)
    result.add(lastLine(words, beg, L));
   else
    result.add(newLine(words, beg, ind, L));
  }
  return result;
 }
}

MongoDB Notes Part II

Insert Documents

Insert a document into a collection.

MongoDB provides a Bulk( ) API that you can use to perform multiple write operations in bulk.
  • initialize the operation builder
  • add insert operation
  • execute

Query Documents

Select all document in a collection.


Specify equality condition
  • use { <field> : <value> }


Specify conditions using query operators 


Specify AND condition


Specify OR condition


Exact Match on the Embedded Document



Equality Match on Fields within an Embedded Document


Exact Match on an Array
  • Match every element in the array, including the order


Match an Array Element
  • The array contains at least one element with the specified value.


Match a Specify Element of an Array
  • Match at a particular position in the array


Single Element Satisfies the Criteria
  • Use $elemMatch to specify multiple criteria on the elements of an array such that at least one element in the array satisfies all criteria.



Combination of Elements Satisfies the Criteria


Matching a Field in the Embedded Document Using the Array Index


Match a Field Without Specifying Array Index
  • Select the document where the ‘memos’ field contains an array that at least one embedded document contains the field ‘by’ with ‘shipping’. 

Single Element Satisfies the Criteria


Combination of Elements Satisfies the Criteria


Modify Documents

Use update operators to change field values.


Update an embedded field.
  • Use dot notation

Update multiple documents
  • Use the multi option in update


Replace a document.


Specify upsert: true for the update replacement operation.
Specify uperst: true for the update specific fields operation.
  • If no document matches, a new document would be inserted.

Remove Documents

Remove All Documents


Remove Documents that Match a Condition


Remove a Single Document that Matches a Condition


Limit Fields to Return from a Query


You can not combine inclusion and exclusion semantics in a single projection with the exception of _id.

Iterate a Cursor


Use toArray( ) to convert the cursor into an array. It will consume all the cursor. 

Query with Index


Two Phase Commit

Create an Auto-Incrementing Sequence Field


  • Use Counters Collection
    • use a collection to hold the last id, use a function to get and update the id from this counter collection.
  • Optimistic Loop
    • calculates the incremented _id value and attempts to insert a document with the calculated _id until the insertion is successful.
All pictures are from http://docs.mongodb.org/manual/