2015年1月21日星期三

MongoDB week2 Notes

MongoDB’s CRUD operations exist as methods/functions in programming language APIs, not as a separated language.

db — current database
db.people.insert( ) — insert into collections
_id — the unique filed for all documents inserted into database, it is a primary key field, and it is immutable
The objectID is a global unique identifier, which is used for __id
db.people.findOne( ) — return randomly one document
db.people.findOne( ) || db.people.find( )
  • the first argument specific the criteria to match, like the WHERE clause
  • the second argument specific what field to return, like the SELECT clause
db.people.find( ) — find all documents in people collection
db.people.find( ).pretty( ) — change the format to show the result

db.people.find( { score : { $gt : 95 } } ) — query operator
db.people.find( { profession : { $exists : true } } ); — query on the structure of document
db.people.find( { name : { $type : 2 } } ); — query on the type of fields
db.people.find( { name : { $regex : “a” } } ); — regular expression matching on string
{ $or : [query1, query2, … , queryn] }
{ $and : [query1, query2, … , queryn] }
db.accounts.find( { favorites : “beer” } ) — query if an array contains the specific value, only check the top level and no recursion on the nested sub-documents.
db.accounts.find( { favorites : { $all : [ “beer”, “pretzels” ] } } ) — favorites contains all elements in the array, the order does not matter
db.accounts.find( { name : { $in : [ “xxx”, “yyy”] } } ) — the document which name is in the array, either xxx or yyy
db.users.find( { “email.work” : “xxxx” } ) — dot notation, allows to query for the embedded document

cursor.hasNext( ) — return true as long as there’s another document to visit on this cursor
cursor.next( ) — return next document to be visited
cursor.limit( 5 ) — limit the number of the document of the cursor, instruct the server to return specific number of document when cursor start to iterate
cursor.sort( { name : -1 } ) || cursor.skip( ) 
we could not modify the cursor once we have called hasNext( ) or next( ). limit, sort and skip are executed in server side not client side.
sort —> skip —> limit

db.scores.count( { xxx : yyy } ) — count the document
db.people.update( { name : “Smith” }, { name : “Tomas”, Salary : 50000 }) — the document which name is Smith would be replaced by the second argument which is a new document.
db.people.update( { name : “Smith” }, { $set : { name : “Tomas” } } ) — update the field only, if the field does not exist, it will be created
use $inc to increase the value of a specific field
db.people.update( { name : “Smith” }, { $unset : { professional : 1 } } ) — remove a field in a document
db.array.update( { xxx : yyy }, { $set : { “array.index” : zzz } } ) — use dot notation to specify the element in the array try to change
use $push to add an element into the array from the rightmost place
use $pop to remove the rightmost element int the array
use $pushAll to add append an array from the rightmost place
use $pull to remove an element from the array regardless of its position
use $pullAll to remove a list of element from the array
use $addToSet to treat the array as a set, if duplicates exist, it will do nothing
db.people.update( { }, { }, { upset : true } ) — insert a new document if the document does not exist
db.people.update( { }, { }, { multi : true} ) — update multiple documents
db.people.remove( { } ) — remove a document that matches the specific criteria

Nodejs

var MongoClient = require(‘mongodb’).MongoClient;
MongoClient.connect( ‘connect string here’, function(err db) { } )
db.collection( ‘collection name’ ).findOne( query, function(err, doc) { } )
db.collection( ‘collection name’ ).find( query ).toArray(function( err, docs ) { } )
var cursor = db.collection( ‘collection name’ ).find( query );
cursor.each( function( err, doc ) { } ) 
.find( ) will create a cursor object, only when the cursor call .each( ) or .toArray( ), it starts to retrieves data from database, the database will not return the entire result but a batch of the result
db.collection( ‘collection name’ ).find( query, projection )
cursor.sort( [ [ ‘grade’ , 1 ], [ ‘student’ , -1 ] ] ) —> use array in order to avoid the rearrange of the elements
db.collection( ‘collection name’ ).insert( doc, function( err, inserted ) { } )
db.collection( ‘collection name’ ).update( query, operator, options, function( err, updated ) { } )
we could not mix $operators with normal fields
db.collection( ‘collection name’ ).save( doc, function( err, saved ) { } ) — check to see if the doc exist (_id), if not, then a new document would be inserted otherwise, replacement would be done
findAndModify( query, sort, operator, option, callback ) — atomically find and returns the document, no two client would conflict here on the document

Java 

The parameter of all method is DBObject, which is used to represent a document. — BasicDBObject
MongoClient client = new MongoClient( )
DB courseDB = client.getDB(“xxx”)


DBCollection collection = courseDB.getCollection(“xxx”)

MongoDB week1 Notes

MongoDB is a non-relational data store for JSON documents.
JSON document is like: { key : field }. And it could have some hierarchical. 
MongoDB is also schemaless.
MongoDB tries to maintain scalability and  performance as well as provide much functionality. 
  • MongoDB does not support joins
  • MongoDB also does not support transactions
MongoDB continuos to listen for connections and expect BSON data, there is some protocol to explain this kind of data. A mongoDB driver is a library in some specific language to communicate with mongoDB.

app.get(url, function (req, res) {}) —> tell the express how to response to url with get method.
app.get(‘*’, function (req, res) {}) —> ‘*’ is a wildcard matching and anything not handled above would be handled here.
var cons = require(‘consolidate’)
app.engine(‘html’, cons.swig) —> set the template engine for express.
app.set(‘view engine’, ‘html’)
app.set(‘views’, __dirname + ‘/views’)


There are typically two kinds of things in JSON, arrays [   ] and dictionaries {  }, which is associative maps.

2015年1月20日星期二

Search the missing element in Arithmetic Progression

/*
 * Author: Yang Pei
 * Problem: Search the missing element in Arithmetic Progression
 * Source: http://www.geeksforgeeks.org/find-missing-number-arithmetic-progression/
 * 
 * Note:
 * Given an arithmetic progression, find the missing elements in it. Assume that there
 * exist exact one missing element in it (the head and tail is not missing).
 * 
 * Solution:
 * Naive method is to sweep the entire array to find the missing element.
 * Binary search the array. Pay attention how to cut off the search space.
 * 
 * Follow up:
 * Given a array from 1 ... N and it is sorted, however, m number is missing.
 * Find all missing numbers.
 * Since the missing number could be appear in the two end of the array, we need
 * to check if the current array contains enough position or not, if not, we need
 * to add missing numbers from two ends.
 */
import java.util.*;
public class SearchinArithmeticProgression {
    public static int findMissing(int[] A) {
        int n = A.length;
        int diff = (A[n-1] - A[0]) / n;
        int l = 0, r = n-1;
        while(l < r) {
            if(r - l == 1)
                return A[r] - diff;
            int mid = l + (r - l) / 2;
            int temp = (mid - l) * diff + A[l];
            if(A[mid] == temp)
                l = mid;
            else
                r = mid;
        }
        return A[r] - diff;
    }
    
    public static void findMissing1(int[] A, int l, int r, int N, int m, List<Integer> result) {
        // check if there is missing element on both ends of the array
        int count = (A[r] - A[l]) - (r - l);
        if(m > count) {
            for(int i = 1; i < A[l]; i++)
                result.add(i);
            for(int i = A[r] + 1; i <= N; i++)
                result.add(i);
        }
        if(r - l == 1) {
            for(int i = 1; i <= m; i++)
                result.add(A[l] + i);
        }
        else {
            // half the array and try to find the missing elements recursively
            int mid = l + (r - l) / 2;
            int left = (A[mid] - A[l]) - (mid - l);
            int right = (A[r] - A[mid]) - (r - mid);
            if(left != 0)
                findMissing1(A, l, mid, N, left, result);
            if(right != 0)
                findMissing1(A, mid, r, N, right, result);
        }
    }
    
    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        while(scan.hasNext()) {
            List<Integer> result = new ArrayList<Integer>();
            int n = scan.nextInt();
            int m = scan.nextInt();
            int[] A = new int[n-m];
            for(int i = 0; i < n-m; i++)
                A[i] = scan.nextInt();
            findMissing1(A, 0, n-m-1, n, m, result);
            System.out.println(result.toString());
        }
        scan.close();
    }
}

2015年1月11日星期日

[Leetcode] Dungeon Game

Dp problem. dp[i][j] records if we want to reach the goal from (i, j), the minimum amount of health points we need. dp[i][j] could be obtained from dp[i+1][j] and dp[i][j+1], we get the smaller one from this two arguments (this will be a positive number) and compare it with the value of dungeon and determine the health point we need.
/*
 * Author: Yang Pei
 * Problem: Dungeon Game
 * Source: https://oj.leetcode.com/problems/dungeon-game/
 * 
 * Note:
 * 
 * Soltuion:
 * Dp, dp[i][j] record the minimum health needed to go from (i, j) to reach the goal.
 * dp[i][j] = (dungeon[i][j] - Math.min(dp[i+1][j], dp[i][j+1]) >= 0) ? 1 : (dungeon[i][j] - Math.min(dp[i+1][j], dp[i][j+1]))
 */
public class DungeonGame {
    public static int calculateMinimumHP(int[][] dungeon) {
        int m = dungeon.length;
        if(m == 0)
            return 0;
        int n = dungeon[0].length;
        int[][] dp = new int[m][n];
        for(int i = m-1; i >= 0; i--) {
            for(int j = n-1; j>= 0; j--) {
                int min;
                if(i == m-1 && j == n-1)
                    min = 1;
                else if(i == m-1)
                    min = dp[i][j+1];
                else if(j == n-1)
                    min = dp[i+1][j];
                else 
                    min = Math.min(dp[i][j+1], dp[i+1][j]);
                dp[i][j] = (dungeon[i][j] - min >= 0) ? 1 : (min - dungeon[i][j]);
            }
        }
        return dp[0][0];
    }
    
    public static void main(String[] args) {
        int[][] dungeon = new int[][] {{-2, -3, 3}, {-5, -10, 1}, {10, 30, -5}};
        System.out.println(calculateMinimumHP(dungeon));
    }
}

2015年1月7日星期三

Check K Sum

/*
 * Author: Yang Pei
 * Problem: Check K Sum
 * 
 * Note:
 * Given an array of non-negative numbers, check if we could use exactly K elements
 * in the array (each element could only be used once) to get a target sum, we assume
 * that the given target and k is valide. 
 * 
 * Solution:
 * Using dp to solve the problem, dp[i][j] means if we could obtain sum i using j 
 * elements, then dp[i][j] |= dp[i-A[k]][j-1].
 */
import java.util.*;

public class CheckKSum {
    public static boolean kSum(int[] num, int target, int k) {
        int n = num.length;
        boolean[][] dp = new boolean[target+1][k+1];
        dp[0][0] = true;
        for(int i = 0; i < n; i++) {
            for(int j = target; j >= num[i]; j--) {
                for(int p = 1; p <= k; p++) {
                    dp[j][p] |= dp[j-num[i]][p-1];
                }
            }
        }
        return dp[target][k];
    }
    
    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        while(scan.hasNext()) {
            int n = scan.nextInt();
            int[] num = new int[n];
            for(int i = 0; i < n; i++) {
                num[i] = scan.nextInt();
            }
            int target = scan.nextInt();
            int k = scan.nextInt();
            System.out.println(kSum(num, target, k));
        }
        scan.close();
    }
}

2015年1月6日星期二

BST to Double Linked List

/*
 * Author: Yang Pei
 * Problem: BST to Double Linked List
 * Source: http://www.careercup.com/question?id=4863668900593664
 * 
 * Note:
 * Given a BST, convert this BST into a double linked list that is sorted and 
 * returns the head of the list. Do it in place with space complexity O(1).
 * 
 * Solution:
 *         Use recursive method or use morris iterate.
 */
public class BSTtoDoubleLinkedList {
    public static TreeNode BSTtoDLL(TreeNode root) {
        if(root == null)
            return null;
        TreeNode pre = BSTtoDLL(root.left);
        TreeNode next = BSTtoDLL(root.right);
        if(pre == null) {
            root.left = null;
            root.right = next;
            if(next != null)
                next.left = root;
            return root;
        }
        else {
            TreeNode temp = pre;
            while(temp.right != null)
                temp = temp.right;
            temp.right = root;
            root.left = temp;
            root.right = next;
            if(next != null)
                next.left = root;
            return pre;
        }
    }
    
    public static TreeNode BSTtoDLL1(TreeNode root) {
        if(root == null)
            return null;
        TreeNode cur = root, tmp = null, pre = null;
        while(cur != null) {
            if(cur.left == null) {
                cur.left = pre;
                if(pre != null)
                    pre.right = cur;
                pre = cur;
                cur = cur.right;
            }
            else {
                tmp = cur.left;
                while(tmp.right != null && tmp.right != cur)
                    tmp = tmp.right;
                if(tmp.right == null) {
                    tmp.right = cur;
                    cur = cur.left;
                }
                else {
                    cur.left = pre;
                    if(pre != null)
                        pre.right = cur;
                    pre = cur;
                    cur = cur.right;
                }
            }
        }
        while(root.left != null)
            root = root.left;
        return root;
    }
    
    public static void main(String[] args) {
        TreeNode root = new TreeNode(2);
        root.left = new TreeNode(1);
        root.left.left = new TreeNode(0);
        root.right = new TreeNode(4);
        root.right.left = new TreeNode(3);
        root.right.right = new TreeNode(5);
        root.right.right.right = new TreeNode(6);
        root = BSTtoDLL1(root);
        TreeNode pre = null;
        while(root != null) {
            System.out.print(root.val + " ");
            pre = root;
            root = root.right;
        }
        System.out.println("");
        while(pre != null) {
            System.out.print(pre.val + " ");
            pre = pre.left;
        }
    }
}

Move Zeros Down

/*
 * Author: Yang Pei
 * Problem: Move Zeros Down
 * 
 * Note:
 * Given a binary tree, move the zeros to the bottom, so that if a node's value is 0,
 * any of its descendant are 0s.
 * For example:
 *      1
 *     / \
 *    0   0
 *   / \   \
 *  2   0   3
 * could be changed to 
 *      1
 *     / \
 *    2   3
 *   / \   \
 *  0   0   0   
 * 
 * Solution:
 *         Use level traversal and then assign from back of the list to front, if we find
 *         a node that is not 0 and we still have 0 to assign, record the value and change
 *      the node to 0, otherwise all 0 have assigned, when we meet a node that is 0, 
 *      assign a recorded value to it.
 *      Use preorder traversal, when a node is 0, try to find if there is a lowest descendent
 *      that is not 0, reutrn the node and change the value. 
 */
import java.util.*;

public class MoveZerosDown {
    public static void moveDown(TreeNode root) {
        if(root == null)
            return;
        List<TreeNode> list = new ArrayList<TreeNode>();
        int count = 0;
        Queue<TreeNode> qu = new LinkedList<TreeNode>();
        TreeNode dummy = new TreeNode(0);
        qu.add(root); qu.add(dummy);
        while(qu.size() != 0) {
            TreeNode temp = qu.remove();
            if(temp == dummy) {
                if(qu.size() != 0)
                    qu.add(dummy);
            }
            else {
                count = count + ((temp.val == 0) ? 1 : 0);
                list.add(temp);
                if(temp.left != null)
                    qu.add(temp.left);
                if(temp.right != null)
                    qu.add(temp.right);
            }
        }
        Stack<Integer> stack = new Stack<Integer>();
        for(int i = list.size() - 1; i >= 0; i--) {
            TreeNode temp = list.get(i);
            if(count > 0) {
                if(temp.val != 0) {
                    stack.push(temp.val);
                    temp.val = 0;
                }
                count--;
            }
            else {
                if(temp.val == 0 && stack.size() > 0)
                    temp.val = stack.pop();
            }
        }
    }
    
    public static void moveDown1(TreeNode root) {
        if(root == null)
            return;
        if(root.val == 0) {
            TreeNode left = findNonZero(root.left);
            TreeNode right = findNonZero(root.right);
            if(left != null) {
                root.val = left.val;
                left.val = 0;
            }
            else if(right != null) {
                root.val = right.val;
                right.val = 0;
            }
        }
        moveDown1(root.left);
        moveDown1(root.right);
    }
    
    private static TreeNode findNonZero(TreeNode root) {
        if(root == null)
            return null;
        TreeNode left = findNonZero(root.left);
        TreeNode right = findNonZero(root.right);
        if(left != null)
            return left;
        else if(right != null)
            return right;
        if(root.val != 0)
            return root;
        return null;
    }
    
    public static void main(String[] args) {
        TreeNode node1 = new TreeNode(1);
        node1.left = new TreeNode(0);
        node1.right = new TreeNode(0);
        node1.left.left = new TreeNode(3);
        node1.left.left.left = new TreeNode(0);
        node1.left.right = new TreeNode(0);
        node1.right.right = new TreeNode(5);
        node1.left.right.left = new TreeNode(4);
        node1.left.right.right = new TreeNode(0);
        PrintBST.printBST(node1);
        moveDown1(node1);
        System.out.println("");
        PrintBST.printBST(node1);
    }
}


Reverse List String

/*
 * Author: Yang Pei
 * Problem: Reverse List String
 * 
 * Note:
 * Given a string represent by a list, each node in the list contains a character,
 * A word is separated by space. Reverse each word in the list.
 * For example
 * 'h'->'e'->'l'->'l'->'o'->' '->'w'->'o'->'r'->'l'->'d' would be changed to
 * 'o'->'l'->'l'->'e'->'h'->' '->'d'->'l'->'r'->'o'->'w'
 * 
 * Solution:
 * Two pointers. Be careful: there might be multiple spaces between two words. And there
 * might be leading and tailing spaces. And there might contains no space and there might 
 * be all space.
 * 
 * Define of the ListNodeC 
 * {
 *     char val;
 *     ListNodeC next;
 *     public ListNodeC(char ch) {
 *         this.val = ch;
 *         this.next = null;
 *     }
 * }
 */
public class ReverseListString {
    public static ListNodeC reverse(ListNodeC head) {
        if(head == null) 
            return head;
        ListNodeC dummy = new ListNodeC(' ');
        dummy.next = head;
        ListNodeC pointer1 = dummy, pointer2 = dummy;
        while(pointer1.next != null) {
            while(pointer2.next != null && pointer2.next.val == ' ')
                pointer2 = pointer2.next;
            if(pointer2.next == null)
                break;
            pointer1 = pointer2;
            // pay attention here, otherwise would have infinite loop
            pointer2 = pointer2.next;
            while(pointer2.next != null && pointer2.next.val != ' ') {
                ListNodeC temp = pointer2.next;
                pointer2.next = temp.next;
                temp.next = pointer1.next;
                pointer1.next = temp;
            }
            pointer1 = pointer2;
        }
        return dummy.next;
    }
    
    public static void main(String[] args) {
        String str = " a ba   c  ";
        ListNodeC dummy = new ListNodeC(' ');
        ListNodeC temp = dummy;
        for(int i = 0; i < str.length(); i++) {
            ListNodeC node = new ListNodeC(str.charAt(i));
            temp.next = node;
            temp = temp.next;
        }
        temp = dummy.next;
        while(temp != null) {
            System.out.print(temp.val);
            temp = temp.next;
        }
        System.out.println("");
        temp = reverse(dummy.next);
        while(temp != null) {
            System.out.print(temp.val);
            temp = temp.next;
        }
        System.out.println("");
    }
}

2015年1月4日星期日

[Leetcode] Binary Search Tree Iterator

The solution is also available here:https://gist.github.com/pyemma/bbe39014f436f2a5aa5b
/**
 * Definition for binary tree
 * public class TreeNode {
 *     int val;
 *     TreeNode left;
 *     TreeNode right;
 *     TreeNode(int x) { val = x; }
 * }
 */

public class BSTIterator {
    private Stack<TreeNode> stack;
    public BSTIterator(TreeNode root) {
        stack = new Stack<TreeNode>();
        if(root != null)
            pushleft(root);
    }
    private void pushleft(TreeNode root) {
        while(root != null) {
            stack.push(root);
            root = root.left;
        }
    }
    /** @return whether we have a next smallest number */
    public boolean hasNext() {
        return !stack.isEmpty();
    }

    /** @return the next smallest number */
    public int next() {
        TreeNode temp = stack.pop();
        if(temp.right != null)
            pushleft(temp.right);
        return temp.val;
    }
}

/**
 * Your BSTIterator will be called like this:
 * BSTIterator i = new BSTIterator(root);
 * while (i.hasNext()) v[f()] = i.next();
 */

2015年1月3日星期六

[Leetcode] Excel Sheet Column Title

The solution is also available here:https://gist.github.com/pyemma/93a32e641b90288867ae
/*
 * Author: Yang Pei
 * Problem: Excel Sheet Column Title
 * Source: https://oj.leetcode.com/problems/excel-sheet-column-title/
 * 
 * Note:
 * Given a positive integer, return its corresponding column title as appear in an Excel sheet.
 * For example:
 *     1 -> A
 *     2 -> B
 *     3 -> C
 *     ...
 *     26 -> Z
 *     27 -> AA
 *     28 -> AB
 * Solution:
 * Recursive method or iterative method. 
 */
public class ExcelSheetColumnTitle {
 public String convertToTitle(int n) {
  if(n == 0)
   return "";
  return convertToTitle((n-1)/26) + (char)((n-1)%26 + 'A');
 }
 
 public String convertToTitle1(int n) {
  StringBuilder sb = new StringBuilder();
  while(n > 0) {
   sb.append((char)((n - 1) % 26 + 'A'));
   n = (n - 1) / 26;
  }
  sb = sb.reverse();
  return sb.toString();
 }
}

[Leetcode] Compare Version Numbers

The solution is also available here:https://gist.github.com/pyemma/0d6f6368fdcfcb73451d
/*
 * Author: Yang Pei
 * Problem: Compare Version Numbers
 * Source: https://oj.leetcode.com/problems/compare-version-numbers/
 * 
 * Note:
 * Compare two version numbers version1 and version1.
 * If version1 > version2 return 1, if version1 < version2 return -1, otherwise return 0.
 * 
 * You may assume that the version strings are non-empty and contain only digits and the . character.
 * The . character does not represent a decimal point and is used to separate number sequences.
 * For instance, 2.5 is not "two and a half" or "half way to version three", it is the fifth second-level revision of the second first-level revision.
 * 
 * Here is an example of version numbers ordering:
 * 0.1 < 1.1 < 1.2 < 13.37
 * 
 * Solution:
 * Split the version according to the ".", then compare each number. Pay attention to
 * leading zeros. If each version number is within the Integer, we could use Integer.paresInt
 * instead of writing a function to compare.
 * 
 * Corner case: 1.0 and 1, 1.0.1 and 1. In this case, we need to check the array with
 * more split to see if each number is 0 or not. 
 */
public class CompareVersionNumbers {
 public static int compareVersion(String version1, String version2) {
  String[] strs1 = version1.split("\\.");
  String[] strs2 = version2.split("\\.");
  for(int i = 0; i < Math.min(strs1.length, strs2.length); i++) {
   if(compare(strs1[i], strs2[i]) != 0)
    return compare(strs1[i], strs2[i]);
  }
  if(strs1.length < strs2.length) {
   for(int i = strs1.length; i < strs2.length; i++) {
    if(compare("0", strs2[i]) < 0)
     return -1;
   }
   return 0;
  } 
  else if(strs1.length > strs2.length) {
   for(int i = strs2.length; i < strs1.length; i++) {
    if(compare(strs1[i], "0") > 0)
     return 1;
   }
   return 0;
  }
  else
   return 0;
 }
 private static int compare(String num1, String num2) {
  int ind1 = 0;
  while(ind1 < num1.length() && num1.charAt(ind1) == '0')
   ind1++;
  int ind2 = 0;
  while(ind2 < num2.length() && num2.charAt(ind2) == '0')
   ind2++;
  num1 = num1.substring(ind1);
  num2 = num2.substring(ind2);
  if(num1.length() < num2.length())
   return -1;
  else if(num1.length() > num2.length())
   return 1;
  else {
   for(int i = 0; i < num1.length(); i++) {
    if(num1.charAt(i) < num2.charAt(i))
     return -1;
    else if(num1.charAt(i) > num2.charAt(i))
     return 1;
   }
   return 0;
  }
 }
 
 public static void main(String[] args) {
  String version1 = "1.0.1";
  String version2 = "1";
  System.out.println(compareVersion(version1, version2));
 }
}

2015年1月2日星期五

MongoDB Notes Final

Aggregation Introduction

Aggregations are operations that process data records and return computed results.

Aggregation Pipelines
  • Documents enter a multi-stage pipelines that transforms the documents into an aggregated result.
  • consist of stages.
  • some stages take a aggregation expression as input.

Map-Reduce
  • Map, Reduce, Finalize.
  • use custom JavaScript functions to map values to key.

Single Purpose Aggregation Operations
  • returning a count of matching documents
    • collection.count( )
  • returning the distinct values for a field
    • collection.distinct( )

  • grouping data based on the values of a field
    • collection.group( )


Aggregation Pipeline on Sharded Collections
  • The pipeline is split into two parts
    • The first is run on each shard, or exclude some shards through shard key
    • The second is run on primary shard, which collect the cursor from each shard, then forward the final result to mongos

Map-Reduce Example
  • Define the map function to process each input document

  • Define the corresponding function with two arguments

  • Perform the map-reduce on all documents in the orders collection using the map function and reduce function


Replication Introduction

Replication is the process of synchronizing data across multiple severs.
Replication provides redundancy and increases data availability. Also allows you to recover from hardware failure and service interruption. 

A replica set is a group of mongod instances that host the same data. One mongod, called the primary, receives all write operations. All other instances, called secondaries, apply operations form the primary to have the same data. The primary logs all operations to oplog. Only primary could receive write operations, read operations could be received by all members.

The secondaries apply the oplog to themselves. If the primary is unavailable, one of the secondaries would be elected to the new primary. The secondary that receives majority of the votes.

An arbiter could be added to break the draw during the election when there are even number of secondaries. The arbiter does not hold any data and is only used for election. 

An arbiter is always an arbiter, a primary could become a secondary, and a secondary could become a primary.

Each set has at most 12 members and in each election, at most 7 members could vote.

Priority 0 member is a secondary that could not become a primary, could not trigger elections. It could function as a standby.
A hidden member maintains a copy of the primary’s data and invisible to the client applications. It must be priority 0 and could not be the primary.

Delayed member contains copies of a replica sets’ data. It reflects an earlier or delayed state of the set. They must be priority 0 and must be a hidden member.

Architecture 
  • Three member replica sets
    • The minimum architecture of a replica set
  • Replica sets with four or more members
    • ensure the sets have odd number of voting members
  • Geographically distributed replica sets

Failover
Heartbeats: Replica set members send heartbeats(pings) to each other every two seconds. If it does not return within 10 seconds, then this member would mark it as inaccessible.

Members prefer to vote members with high priority.

Optime: the timestamp of the last operation that a member applied form the oplog. A replica set member could not become a primary unless it has the highest optime of any visible member in the set.

A replica set member can not become primary unless it can connect a majority of the members in the set. In a three members architecture, a secondary could not be a primary when the other two are done since it could not connect to a majority number of the members in the set. Also when the two secondaries are done, the primary would down step to a secondary.

Read Preference
  • primary
  • primaryPreferred
  • secondary
  • secondaryPreferred
  • nearest

The oplog is a special capped collection that keeps a rolling record of all operations that modify the data stored in your database. All replica set maintain a copy of oplog. Any member can import oplog entries from any other member.

Data Synchronization
  • Initial Sync: when a member has no data
    • Clones all data.
    • Applies all changes to the data set.
    • Builds all indexes on all collections.
  • Replication: continuously after initial sync