2015年3月13日星期五

[转]什么是P问题、NP问题和NPC问题

原帖出自:http://www.matrix67.com/blog/archives/105
顺便膜拜一发matrix67大神~

    这或许是众多OIer最大的误区之一。
    你会经常看到网上出现“这怎么做,这不是NP问题吗”、“这个只有搜了,这已经被证明是NP问题了”之类的话。你要知道,大多数人此时所说的NP问题其实都是指的NPC问题。他们没有搞清楚NP问题和NPC问题的概念。NP问题并不是那种“只有搜才行”的问题,NPC问题才是。好,行了,基本上这个误解已经被澄清了。下面的内容都是在讲什么是P问题,什么是NP问题,什么是NPC问题,你如果不是很感兴趣就可以不看了。接下来你可以看到,把NP问题当成是 NPC问题是一个多大的错误。

    还是先用几句话简单说明一下时间复杂度。时间复杂度并不是表示一个程序解决问题需要花多少时间,而是当问题规模扩大后,程序需要的时间长度增长得有多快。也就是说,对于高速处理数据的计算机来说,处理某一个特定数据的效率不能衡量一个程序的好坏,而应该看当这个数据的规模变大到数百倍后,程序运行时间是否还是一样,或者也跟着慢了数百倍,或者变慢了数万倍。不管数据有多大,程序处理花的时间始终是那么多的,我们就说这个程序很好,具有O(1)的时间复杂度,也称常数级复杂度;数据规模变得有多大,花的时间也跟着变得有多长,这个程序的时间复杂度就是O(n),比如找n个数中的最大值;而像冒泡排序、插入排序等,数据扩大2倍,时间变慢4倍的,属于O(n^2)的复杂度。还有一些穷举类的算法,所需时间长度成几何阶数上涨,这就是O(a^n)的指数级复杂度,甚至O(n!)的阶乘级复杂度。不会存在O(2*n^2)的复杂度,因为前面的那个“2”是系数,根本不会影响到整个程序的时间增长。同样地,O (n^3+n^2)的复杂度也就是O(n^3)的复杂度。因此,我们会说,一个O(0.01*n^3)的程序的效率比O(100*n^2)的效率低,尽管在n很小的时候,前者优于后者,但后者时间随数据规模增长得慢,最终O(n^3)的复杂度将远远超过O(n^2)。我们也说,O(n^100)的复杂度小于O(1.01^n)的复杂度。
    容易看出,前面的几类复杂度被分为两种级别,其中后者的复杂度无论如何都远远大于前者:一种是O(1),O(log(n)),O(n^a)等,我们把它叫做多项式级的复杂度,因为它的规模n出现在底数的位置;另一种是O(a^n)和O(n!)型复杂度,它是非多项式级的,其复杂度计算机往往不能承受。当我们在解决一个问题时,我们选择的算法通常都需要是多项式级的复杂度,非多项式级的复杂度需要的时间太多,往往会超时,除非是数据规模非常小。

    自然地,人们会想到一个问题:会不会所有的问题都可以找到复杂度为多项式级的算法呢?很遗憾,答案是否定的。有些问题甚至根本不可能找到一个正确的算法来,这称之为“不可解问题”(Undecidable Decision Problem)。The Halting Problem就是一个著名的不可解问题,在我的Blog上有过专门的介绍和证明。再比如,输出从1到n这n个数的全排列。不管你用什么方法,你的复杂度都是阶乘级,因为你总得用阶乘级的时间打印出结果来。有人说,这样的“问题”不是一个“正规”的问题,正规的问题是让程序解决一个问题,输出一个“YES”或“NO”(这被称为判定性问题),或者一个什么什么的最优值(这被称为最优化问题)。那么,根据这个定义,我也能举出一个不大可能会有多项式级算法的问题来:Hamilton回路。问题是这样的:给你一个图,问你能否找到一条经过每个顶点一次且恰好一次(不遗漏也不重复)最后又走回来的路(满足这个条件的路径叫做Hamilton回路)。这个问题现在还没有找到多项式级的算法。事实上,这个问题就是我们后面要说的NPC问题。

    下面引入P类问题的概念:如果一个问题可以找到一个能在多项式的时间里解决它的算法,那么这个问题就属于P问题。P是英文单词多项式的第一个字母。哪些问题是P类问题呢?通常NOI和NOIP不会出不属于P类问题的题目。我们常见到的一些信息奥赛的题目都是P问题。道理很简单,一个用穷举换来的非多项式级时间的超时程序不会涵盖任何有价值的算法。
    接下来引入NP问题的概念。这个就有点难理解了,或者说容易理解错误。在这里强调(回到我竭力想澄清的误区上),NP问题不是非P类问题。NP问题是指可以在多项式的时间里验证一个解的问题。NP问题的另一个定义是,可以在多项式的时间里猜出一个解的问题。比方说,我RP很好,在程序中需要枚举时,我可以一猜一个准。现在某人拿到了一个求最短路径的问题,问从起点到终点是否有一条小于100个单位长度的路线。它根据数据画好了图,但怎么也算不出来,于是来问我:你看怎么选条路走得最少?我说,我RP很好,肯定能随便给你指条很短的路出来。然后我就胡乱画了几条线,说就这条吧。那人按我指的这条把权值加起来一看,嘿,神了,路径长度98,比100小。于是答案出来了,存在比100小的路径。别人会问他这题怎么做出来的,他就可以说,因为我找到了一个比100 小的解。在这个题中,找一个解很困难,但验证一个解很容易。验证一个解只需要O(n)的时间复杂度,也就是说我可以花O(n)的时间把我猜的路径的长度加出来。那么,只要我RP好,猜得准,我一定能在多项式的时间里解决这个问题。我猜到的方案总是最优的,不满足题意的方案也不会来骗我去选它。这就是NP问题。当然有不是NP问题的问题,即你猜到了解但是没用,因为你不能在多项式的时间里去验证它。下面我要举的例子是一个经典的例子,它指出了一个目前还没有办法在多项式的时间里验证一个解的问题。很显然,前面所说的Hamilton回路是NP问题,因为验证一条路是否恰好经过了每一个顶点非常容易。但我要把问题换成这样:试问一个图中是否不存在Hamilton回路。这样问题就没法在多项式的时间里进行验证了,因为除非你试过所有的路,否则你不敢断定它“没有Hamilton回路”。
    之所以要定义NP问题,是因为通常只有NP问题才可能找到多项式的算法。我们不会指望一个连多项式地验证一个解都不行的问题存在一个解决它的多项式级的算法。相信读者很快明白,信息学中的号称最困难的问题——“NP问题”,实际上是在探讨NP问题与P类问题的关系。

    很显然,所有的P类问题都是NP问题。也就是说,能多项式地解决一个问题,必然能多项式地验证一个问题的解——既然正解都出来了,验证任意给定的解也只需要比较一下就可以了。关键是,人们想知道,是否所有的NP问题都是P类问题。我们可以再用集合的观点来说明。如果把所有P类问题归为一个集合P中,把所有 NP问题划进另一个集合NP中,那么,显然有P属于NP。现在,所有对NP问题的研究都集中在一个问题上,即究竟是否有P=NP?通常所谓的“NP问题”,其实就一句话:证明或推翻P=NP。
    NP问题一直都是信息学的巅峰。巅峰,意即很引人注目但难以解决。在信息学研究中,这是一个耗费了很多时间和精力也没有解决的终极问
题,好比物理学中的大统一和数学中的歌德巴赫猜想等。
    目前为止这个问题还“啃不动”。但是,一个总的趋势、一个大方向是有的。人们普遍认为,P=NP不成立,也就是说,多数人相信,存在至少一个不可能有多项式级复杂度的算法的NP问题。人们如此坚信P≠NP是有原因的,就是在研究NP问题的过程中找出了一类非常特殊的NP问题叫做NP-完全问题,也即所谓的 NPC问题。C是英文单词“完全”的第一个字母。正是NPC问题的存在,使人们相信P≠NP。下文将花大量篇幅介绍NPC问题,你从中可以体会到NPC问题使P=NP变得多么不可思议。


    为了说明NPC问题,我们先引入一个概念——约化(Reducibility,有的资料上叫“归约”)。
    简单地说,一个问题A可以约化为问题B的含义即是,可以用问题B的解法解决问题A,或者说,问题A可以“变成”问题B。《算法导论》上举了这么一个例子。比如说,现在有两个问题:求解一个一元一次方程和求解一个一元二次方程。那么我们说,前者可以约化为后者,意即知道如何解一个一元二次方程那么一定能解出一元一次方程。我们可以写出两个程序分别对应两个问题,那么我们能找到一个“规则”,按照这个规则把解一元一次方程程序的输入数据变一下,用在解一元二次方程的程序上,两个程序总能得到一样的结果。这个规则即是:两个方程的对应项系数不变,一元二次方程的二次项系数为0。按照这个规则把前一个问题转换成后一个问题,两个问题就等价了。同样地,我们可以说,Hamilton回路可以约化为TSP问题(Travelling Salesman Problem,旅行商问题):在Hamilton回路问题中,两点相连即这两点距离为0,两点不直接相连则令其距离为1,于是问题转化为在TSP问题中,是否存在一条长为0的路径。Hamilton回路存在当且仅当TSP问题中存在长为0的回路。
    “问题A可约化为问题B”有一个重要的直观意义:B的时间复杂度高于或者等于A的时间复杂度。也就是说,问题A不比问题B难。这很容易理解。既然问题A能用问题B来解决,倘若B的时间复杂度比A的时间复杂度还低了,那A的算法就可以改进为B的算法,两者的时间复杂度还是相同。正如解一元二次方程比解一元一次方程难,因为解决前者的方法可以用来解决后者。
    很显然,约化具有一项重要的性质:约化具有传递性。如果问题A可约化为问题B,问题B可约化为问题C,则问题A一定可约化为问题C。这个道理非常简单,就不必阐述了。
    现在再来说一下约化的标准概念就不难理解了:如果能找到这样一个变化法则,对任意一个程序A的输入,都能按这个法则变换成程序B的输入,使两程序的输出相同,那么我们说,问题A可约化为问题B。
    当然,我们所说的“可约化”是指的可“多项式地”约化(Polynomial-time Reducible),即变换输入的方法是能在多项式的时间里完成的。约化的过程只有用多项式的时间完成才有意义。

    好了,从约化的定义中我们看到,一个问题约化为另一个问题,时间复杂度增加了,问题的应用范围也增大了。通过对某些问题的不断约化,我们能够不断寻找复杂度更高,但应用范围更广的算法来代替复杂度虽然低,但只能用于很小的一类问题的算法。再回想前面讲的P和NP问题,联想起约化的传递性,自然地,我们会想问,如果不断地约化上去,不断找到能“通吃”若干小NP问题的一个稍复杂的大NP问题,那么最后是否有可能找到一个时间复杂度最高,并且能“通吃”所有的 NP问题的这样一个超级NP问题?答案居然是肯定的。也就是说,存在这样一个NP问题,所有的NP问题都可以约化成它。换句话说,只要解决了这个问题,那么所有的NP问题都解决了。这种问题的存在难以置信,并且更加不可思议的是,这种问题不只一个,它有很多个,它是一类问题。这一类问题就是传说中的NPC 问题,也就是NP-完全问题。NPC问题的出现使整个NP问题的研究得到了飞跃式的发展。我们有理由相信,NPC问题是最复杂的问题。再次回到全文开头,我们可以看到,人们想表达一个问题不存在多项式的高效算法时应该说它“属于NPC问题”。此时,我的目的终于达到了,我已经把NP问题和NPC问题区别开了。到此为止,本文已经写了近5000字了,我佩服你还能看到这里来,同时也佩服一下自己能写到这里来。

    NPC问题的定义非常简单。同时满足下面两个条件的问题就是NPC问题。首先,它得是一个NP问题;然后,所有的NP问题都可以约化到它。证明一个问题是 NPC问题也很简单。先证明它至少是一个NP问题,再证明其中一个已知的NPC问题能约化到它(由约化的传递性,则NPC问题定义的第二条也得以满足;至于第一个NPC问题是怎么来的,下文将介绍),这样就可以说它是NPC问题了。
    既然所有的NP问题都能约化成NPC问题,那么只要任意一个NPC问题找到了一个多项式的算法,那么所有的NP问题都能用这个算法解决了,NP也就等于P 了。因此,给NPC找一个多项式算法太不可思议了。因此,前文才说,“正是NPC问题的存在,使人们相信P≠NP”。我们可以就此直观地理解,NPC问题目前没有多项式的有效算法,只能用指数级甚至阶乘级复杂度的搜索。

    顺便讲一下NP-Hard问题。NP-Hard问题是这样一种问题,它满足NPC问题定义的第二条但不一定要满足第一条(就是说,NP-Hard问题要比 NPC问题的范围广)。NP-Hard问题同样难以找到多项式的算法,但它不列入我们的研究范围,因为它不一定是NP问题。即使NPC问题发现了多项式级的算法,NP-Hard问题有可能仍然无法得到多项式级的算法。事实上,由于NP-Hard放宽了限定条件,它将有可能比所有的NPC问题的时间复杂度更高从而更难以解决。

    不要以为NPC问题是一纸空谈。NPC问题是存在的。确实有这么一个非常具体的问题属于NPC问题。下文即将介绍它。
    下文即将介绍逻辑电路问题。这是第一个NPC问题。其它的NPC问题都是由这个问题约化而来的。因此,逻辑电路问题是NPC类问题的“鼻祖”。
    逻辑电路问题是指的这样一个问题:给定一个逻辑电路,问是否存在一种输入使输出为True。
    什么叫做逻辑电路呢?一个逻辑电路由若干个输入,一个输出,若干“逻辑门”和密密麻麻的线组成。看下面一例,不需要解释你马上就明白了。
  ┌───┐
  │ 输入1├─→┐    ┌──┐
  └───┘    └─→┤    │
                      │ or ├→─┐
  ┌───┐    ┌─→┤    │    │    ┌──┐
  │ 输入2├─→┤    └──┘    └─→┤    │
 &
nbsp;└───┘    │                ┌─→┤AND ├──→输出
                └────────┘┌→┤    │
  ┌───┐    ┌──┐            │  └──┘
  │ 输入3├─→┤ NOT├─→────┘
  └───┘    └──┘

    这是个较简单的逻辑电路,当输入1、输入2、输入3分别为True、True、False或False、True、False时,输出为True。
    有输出无论如何都不可能为True的逻辑电路吗?有。下面就是一个简单的例子。
  ┌───┐
  │输入1 ├→─┐    ┌──┐
  └───┘    └─→┤    │
                      │AND ├─→┐
                ┌─→┤    │    │
                │    └──┘    │  ┌──┐
                │                └→┤    │
  ┌───┐    │                    │AND ├─→输出
  │输入2 ├→─┤  ┌──┐      ┌→┤    │
  └───┘    └→┤NOT ├→──┘  └──┘
                    └──┘

    上面这个逻辑电路中,无论输入是什么,输出都是False。我们就说,这个逻辑电路不存在使输出为True的一组输入。
    回到上文,给定一个逻辑电路,问是否存在一种输入使输出为True,这即逻辑电路问题。
    逻辑电路问题属于NPC问题。这是有严格证明的。它显然属于NP问题,并且可以直接证明所有的NP问题都可以约化到它(不要以为NP问题有无穷多个将给证明造成不可逾越的困难)。证明过程相当复杂,其大概意思是说任意一个NP问题的输入和输出都可以转换成逻辑电路的输入和输出(想想计算机内部也不过是一些 0和1的运算),因此对于一个NP问题来说,问题转化为了求出满足结果为True的一个输入(即一个可行解)。

    有了第一个NPC问题后,一大堆NPC问题就出现了,因为再证明一个新的NPC问题只需要将一个已知的NPC问题约化到它就行了。后来,Hamilton 回路成了NPC问题,TSP问题也成了NPC问题。现在被证明是NPC问题的有很多,任何一个找到了多项式算法的话所有的NP问题都可以完美解决了。因此说,正是因为NPC问题的存在,P=NP变得难以置信。P=NP问题还有许多有趣的东西,有待大家自己进一步的挖掘。攀登这个信息学的巅峰是我们这一代的终极目标。现在我们需要做的,至少是不要把概念弄混淆了。

2015年2月4日星期三

MongoDB week4 Notes

In order to use index in MongoDB, we must give a leftmost set of the indexes. The order of the indexes matter.
db.students.ensureIndex( { student_id : 1 } )     -> create the index on student_id in increasing order
db.students.ensureIndex( { student_id : 1, class : -1 } )  -> create a compound index
db.system.indexes.find( )    -> find all the indexes in the current database, index default on _id field
db.students.getIndexes( )
db.students.dropIndex( { student_id : 1 } )  -> drop the created index
MongoDB allows to create a key on a field which is an array, the index is called multi-key index.
MongoDB allows to create a compound index with an array and a scale, but does not allow to array.
db.stuff.ensureIndex( { thing : 1 }, { unique : true } )  -> create unique index, each key can only appear once
db.stuff.ensureIndex( { thing : 1}, { unique : true, dropDups : true } )  -> drop the duplicates expect for one
sparse index, only create index on the document that has the specific field
In order to find which index to use for a query, MongoDB will experiment different indexes on real data in parallel to test which is optimal and memorize it
db.students.stats( )
Index Cardinality     
  • Regular     1 : 1
  • Sparse      <= documents
  • Multikey    > document ( index on each array elements )
Use hint( ) to manually tell MongoDB what index to use
ensureIndex( {“location” : “2d” } )  -> 2D geospatial index
find( { location : { $near : [x, y] } } )
db.places.find( { location : { $near : {
                                                          $geometry : {
                                                                 type : ‘Point’,
                                                                 coordinates : [x, y] },      
                                                          $maxDistance : 2000
                                              }
                           }
} )
db.sentences.ensureIndex( { ‘words’ : ‘text’ } ) -> support full text search
db.sentences.find( { $text : { $search : ‘dog moss’ } } )
use mongotop to find where does most time have been spent on
mongostat
idx miss -> how many times indexes are not in the memory when they are needed, an import factor

Shard: split up the large data into several mongod client as shards, use a mongos as a sever and let the application talk to mongos. It will use shard_key to issue which shards receive the query. The insert operation must contain the entire shard_key. For update and remove query, if shard_key is not given, mongos will broadcast the query to hall shards.

MongoDB week3 Notes

Always try to use embed data and pre-join the data, since there is no join function provided in mongoDB.
There is no guarantee in mongoDB for the consistence of the data, for example, the foreign key constraints. So pre-join the data to make it intact and consistence. 

One-to-one relationships
  • use true linking(“id”)
  • embed the document 
Things need to considerate
  • frequency of access 
  • size of items, growing
  • atomicity of data
One-to-many relationships
  • the best way is to use true linking ( the people living in a city ), when the “many” is large
  • if the data is few, then we can use embed documents ( the blog schema for comets ), when the “many” is few
Many-to-many relationships
  • the actual relationships are few-to-few, then we could embed an array of ids to link the two documents
  • another way is to use embedded documents, but this may not applicable in some situation, for example, in student-teacher relationship, we may insert a teacher into the system before he has any student
Multikey Indexes: index on a array, which makes embedding an array of links more efficiency to query many-to-many relations in MongoDB

Benefits of embedding
  • Improved read performance, reduce the seek latency since the document is stored sequentially on disk
  • One roundtrip to the DB

Tree representations
  • embed a list of children in the document
  • embed a list of ancestors in the document

Store large documents in MongoDB, larger than 16 MB: GridFS, break the large blobs into pieces to store in MongoDB. GridFS break the documents into two collections, one is called chunk collection and each document in it is 16MB, the other is called files collection, which describe the file put in the chunk collection. The documents in the chunk collection have a files_id associate with the files collection.

ODM: lays between application and driver, tell ODM how to handle the class and hand off objects to ODM,  then it will interact with the driver




 

2015年1月21日星期三

MongoDB week2 Notes

MongoDB’s CRUD operations exist as methods/functions in programming language APIs, not as a separated language.

db — current database
db.people.insert( ) — insert into collections
_id — the unique filed for all documents inserted into database, it is a primary key field, and it is immutable
The objectID is a global unique identifier, which is used for __id
db.people.findOne( ) — return randomly one document
db.people.findOne( ) || db.people.find( )
  • the first argument specific the criteria to match, like the WHERE clause
  • the second argument specific what field to return, like the SELECT clause
db.people.find( ) — find all documents in people collection
db.people.find( ).pretty( ) — change the format to show the result

db.people.find( { score : { $gt : 95 } } ) — query operator
db.people.find( { profession : { $exists : true } } ); — query on the structure of document
db.people.find( { name : { $type : 2 } } ); — query on the type of fields
db.people.find( { name : { $regex : “a” } } ); — regular expression matching on string
{ $or : [query1, query2, … , queryn] }
{ $and : [query1, query2, … , queryn] }
db.accounts.find( { favorites : “beer” } ) — query if an array contains the specific value, only check the top level and no recursion on the nested sub-documents.
db.accounts.find( { favorites : { $all : [ “beer”, “pretzels” ] } } ) — favorites contains all elements in the array, the order does not matter
db.accounts.find( { name : { $in : [ “xxx”, “yyy”] } } ) — the document which name is in the array, either xxx or yyy
db.users.find( { “email.work” : “xxxx” } ) — dot notation, allows to query for the embedded document

cursor.hasNext( ) — return true as long as there’s another document to visit on this cursor
cursor.next( ) — return next document to be visited
cursor.limit( 5 ) — limit the number of the document of the cursor, instruct the server to return specific number of document when cursor start to iterate
cursor.sort( { name : -1 } ) || cursor.skip( ) 
we could not modify the cursor once we have called hasNext( ) or next( ). limit, sort and skip are executed in server side not client side.
sort —> skip —> limit

db.scores.count( { xxx : yyy } ) — count the document
db.people.update( { name : “Smith” }, { name : “Tomas”, Salary : 50000 }) — the document which name is Smith would be replaced by the second argument which is a new document.
db.people.update( { name : “Smith” }, { $set : { name : “Tomas” } } ) — update the field only, if the field does not exist, it will be created
use $inc to increase the value of a specific field
db.people.update( { name : “Smith” }, { $unset : { professional : 1 } } ) — remove a field in a document
db.array.update( { xxx : yyy }, { $set : { “array.index” : zzz } } ) — use dot notation to specify the element in the array try to change
use $push to add an element into the array from the rightmost place
use $pop to remove the rightmost element int the array
use $pushAll to add append an array from the rightmost place
use $pull to remove an element from the array regardless of its position
use $pullAll to remove a list of element from the array
use $addToSet to treat the array as a set, if duplicates exist, it will do nothing
db.people.update( { }, { }, { upset : true } ) — insert a new document if the document does not exist
db.people.update( { }, { }, { multi : true} ) — update multiple documents
db.people.remove( { } ) — remove a document that matches the specific criteria

Nodejs

var MongoClient = require(‘mongodb’).MongoClient;
MongoClient.connect( ‘connect string here’, function(err db) { } )
db.collection( ‘collection name’ ).findOne( query, function(err, doc) { } )
db.collection( ‘collection name’ ).find( query ).toArray(function( err, docs ) { } )
var cursor = db.collection( ‘collection name’ ).find( query );
cursor.each( function( err, doc ) { } ) 
.find( ) will create a cursor object, only when the cursor call .each( ) or .toArray( ), it starts to retrieves data from database, the database will not return the entire result but a batch of the result
db.collection( ‘collection name’ ).find( query, projection )
cursor.sort( [ [ ‘grade’ , 1 ], [ ‘student’ , -1 ] ] ) —> use array in order to avoid the rearrange of the elements
db.collection( ‘collection name’ ).insert( doc, function( err, inserted ) { } )
db.collection( ‘collection name’ ).update( query, operator, options, function( err, updated ) { } )
we could not mix $operators with normal fields
db.collection( ‘collection name’ ).save( doc, function( err, saved ) { } ) — check to see if the doc exist (_id), if not, then a new document would be inserted otherwise, replacement would be done
findAndModify( query, sort, operator, option, callback ) — atomically find and returns the document, no two client would conflict here on the document

Java 

The parameter of all method is DBObject, which is used to represent a document. — BasicDBObject
MongoClient client = new MongoClient( )
DB courseDB = client.getDB(“xxx”)


DBCollection collection = courseDB.getCollection(“xxx”)

MongoDB week1 Notes

MongoDB is a non-relational data store for JSON documents.
JSON document is like: { key : field }. And it could have some hierarchical. 
MongoDB is also schemaless.
MongoDB tries to maintain scalability and  performance as well as provide much functionality. 
  • MongoDB does not support joins
  • MongoDB also does not support transactions
MongoDB continuos to listen for connections and expect BSON data, there is some protocol to explain this kind of data. A mongoDB driver is a library in some specific language to communicate with mongoDB.

app.get(url, function (req, res) {}) —> tell the express how to response to url with get method.
app.get(‘*’, function (req, res) {}) —> ‘*’ is a wildcard matching and anything not handled above would be handled here.
var cons = require(‘consolidate’)
app.engine(‘html’, cons.swig) —> set the template engine for express.
app.set(‘view engine’, ‘html’)
app.set(‘views’, __dirname + ‘/views’)


There are typically two kinds of things in JSON, arrays [   ] and dictionaries {  }, which is associative maps.

2015年1月20日星期二

Search the missing element in Arithmetic Progression

/*
 * Author: Yang Pei
 * Problem: Search the missing element in Arithmetic Progression
 * Source: http://www.geeksforgeeks.org/find-missing-number-arithmetic-progression/
 * 
 * Note:
 * Given an arithmetic progression, find the missing elements in it. Assume that there
 * exist exact one missing element in it (the head and tail is not missing).
 * 
 * Solution:
 * Naive method is to sweep the entire array to find the missing element.
 * Binary search the array. Pay attention how to cut off the search space.
 * 
 * Follow up:
 * Given a array from 1 ... N and it is sorted, however, m number is missing.
 * Find all missing numbers.
 * Since the missing number could be appear in the two end of the array, we need
 * to check if the current array contains enough position or not, if not, we need
 * to add missing numbers from two ends.
 */
import java.util.*;
public class SearchinArithmeticProgression {
    public static int findMissing(int[] A) {
        int n = A.length;
        int diff = (A[n-1] - A[0]) / n;
        int l = 0, r = n-1;
        while(l < r) {
            if(r - l == 1)
                return A[r] - diff;
            int mid = l + (r - l) / 2;
            int temp = (mid - l) * diff + A[l];
            if(A[mid] == temp)
                l = mid;
            else
                r = mid;
        }
        return A[r] - diff;
    }
    
    public static void findMissing1(int[] A, int l, int r, int N, int m, List<Integer> result) {
        // check if there is missing element on both ends of the array
        int count = (A[r] - A[l]) - (r - l);
        if(m > count) {
            for(int i = 1; i < A[l]; i++)
                result.add(i);
            for(int i = A[r] + 1; i <= N; i++)
                result.add(i);
        }
        if(r - l == 1) {
            for(int i = 1; i <= m; i++)
                result.add(A[l] + i);
        }
        else {
            // half the array and try to find the missing elements recursively
            int mid = l + (r - l) / 2;
            int left = (A[mid] - A[l]) - (mid - l);
            int right = (A[r] - A[mid]) - (r - mid);
            if(left != 0)
                findMissing1(A, l, mid, N, left, result);
            if(right != 0)
                findMissing1(A, mid, r, N, right, result);
        }
    }
    
    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        while(scan.hasNext()) {
            List<Integer> result = new ArrayList<Integer>();
            int n = scan.nextInt();
            int m = scan.nextInt();
            int[] A = new int[n-m];
            for(int i = 0; i < n-m; i++)
                A[i] = scan.nextInt();
            findMissing1(A, 0, n-m-1, n, m, result);
            System.out.println(result.toString());
        }
        scan.close();
    }
}

2015年1月11日星期日

[Leetcode] Dungeon Game

Dp problem. dp[i][j] records if we want to reach the goal from (i, j), the minimum amount of health points we need. dp[i][j] could be obtained from dp[i+1][j] and dp[i][j+1], we get the smaller one from this two arguments (this will be a positive number) and compare it with the value of dungeon and determine the health point we need.
/*
 * Author: Yang Pei
 * Problem: Dungeon Game
 * Source: https://oj.leetcode.com/problems/dungeon-game/
 * 
 * Note:
 * 
 * Soltuion:
 * Dp, dp[i][j] record the minimum health needed to go from (i, j) to reach the goal.
 * dp[i][j] = (dungeon[i][j] - Math.min(dp[i+1][j], dp[i][j+1]) >= 0) ? 1 : (dungeon[i][j] - Math.min(dp[i+1][j], dp[i][j+1]))
 */
public class DungeonGame {
    public static int calculateMinimumHP(int[][] dungeon) {
        int m = dungeon.length;
        if(m == 0)
            return 0;
        int n = dungeon[0].length;
        int[][] dp = new int[m][n];
        for(int i = m-1; i >= 0; i--) {
            for(int j = n-1; j>= 0; j--) {
                int min;
                if(i == m-1 && j == n-1)
                    min = 1;
                else if(i == m-1)
                    min = dp[i][j+1];
                else if(j == n-1)
                    min = dp[i+1][j];
                else 
                    min = Math.min(dp[i][j+1], dp[i+1][j]);
                dp[i][j] = (dungeon[i][j] - min >= 0) ? 1 : (min - dungeon[i][j]);
            }
        }
        return dp[0][0];
    }
    
    public static void main(String[] args) {
        int[][] dungeon = new int[][] {{-2, -3, 3}, {-5, -10, 1}, {10, 30, -5}};
        System.out.println(calculateMinimumHP(dungeon));
    }
}

2015年1月7日星期三

Check K Sum

/*
 * Author: Yang Pei
 * Problem: Check K Sum
 * 
 * Note:
 * Given an array of non-negative numbers, check if we could use exactly K elements
 * in the array (each element could only be used once) to get a target sum, we assume
 * that the given target and k is valide. 
 * 
 * Solution:
 * Using dp to solve the problem, dp[i][j] means if we could obtain sum i using j 
 * elements, then dp[i][j] |= dp[i-A[k]][j-1].
 */
import java.util.*;

public class CheckKSum {
    public static boolean kSum(int[] num, int target, int k) {
        int n = num.length;
        boolean[][] dp = new boolean[target+1][k+1];
        dp[0][0] = true;
        for(int i = 0; i < n; i++) {
            for(int j = target; j >= num[i]; j--) {
                for(int p = 1; p <= k; p++) {
                    dp[j][p] |= dp[j-num[i]][p-1];
                }
            }
        }
        return dp[target][k];
    }
    
    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        while(scan.hasNext()) {
            int n = scan.nextInt();
            int[] num = new int[n];
            for(int i = 0; i < n; i++) {
                num[i] = scan.nextInt();
            }
            int target = scan.nextInt();
            int k = scan.nextInt();
            System.out.println(kSum(num, target, k));
        }
        scan.close();
    }
}

2015年1月6日星期二

BST to Double Linked List

/*
 * Author: Yang Pei
 * Problem: BST to Double Linked List
 * Source: http://www.careercup.com/question?id=4863668900593664
 * 
 * Note:
 * Given a BST, convert this BST into a double linked list that is sorted and 
 * returns the head of the list. Do it in place with space complexity O(1).
 * 
 * Solution:
 *         Use recursive method or use morris iterate.
 */
public class BSTtoDoubleLinkedList {
    public static TreeNode BSTtoDLL(TreeNode root) {
        if(root == null)
            return null;
        TreeNode pre = BSTtoDLL(root.left);
        TreeNode next = BSTtoDLL(root.right);
        if(pre == null) {
            root.left = null;
            root.right = next;
            if(next != null)
                next.left = root;
            return root;
        }
        else {
            TreeNode temp = pre;
            while(temp.right != null)
                temp = temp.right;
            temp.right = root;
            root.left = temp;
            root.right = next;
            if(next != null)
                next.left = root;
            return pre;
        }
    }
    
    public static TreeNode BSTtoDLL1(TreeNode root) {
        if(root == null)
            return null;
        TreeNode cur = root, tmp = null, pre = null;
        while(cur != null) {
            if(cur.left == null) {
                cur.left = pre;
                if(pre != null)
                    pre.right = cur;
                pre = cur;
                cur = cur.right;
            }
            else {
                tmp = cur.left;
                while(tmp.right != null && tmp.right != cur)
                    tmp = tmp.right;
                if(tmp.right == null) {
                    tmp.right = cur;
                    cur = cur.left;
                }
                else {
                    cur.left = pre;
                    if(pre != null)
                        pre.right = cur;
                    pre = cur;
                    cur = cur.right;
                }
            }
        }
        while(root.left != null)
            root = root.left;
        return root;
    }
    
    public static void main(String[] args) {
        TreeNode root = new TreeNode(2);
        root.left = new TreeNode(1);
        root.left.left = new TreeNode(0);
        root.right = new TreeNode(4);
        root.right.left = new TreeNode(3);
        root.right.right = new TreeNode(5);
        root.right.right.right = new TreeNode(6);
        root = BSTtoDLL1(root);
        TreeNode pre = null;
        while(root != null) {
            System.out.print(root.val + " ");
            pre = root;
            root = root.right;
        }
        System.out.println("");
        while(pre != null) {
            System.out.print(pre.val + " ");
            pre = pre.left;
        }
    }
}

Move Zeros Down

/*
 * Author: Yang Pei
 * Problem: Move Zeros Down
 * 
 * Note:
 * Given a binary tree, move the zeros to the bottom, so that if a node's value is 0,
 * any of its descendant are 0s.
 * For example:
 *      1
 *     / \
 *    0   0
 *   / \   \
 *  2   0   3
 * could be changed to 
 *      1
 *     / \
 *    2   3
 *   / \   \
 *  0   0   0   
 * 
 * Solution:
 *         Use level traversal and then assign from back of the list to front, if we find
 *         a node that is not 0 and we still have 0 to assign, record the value and change
 *      the node to 0, otherwise all 0 have assigned, when we meet a node that is 0, 
 *      assign a recorded value to it.
 *      Use preorder traversal, when a node is 0, try to find if there is a lowest descendent
 *      that is not 0, reutrn the node and change the value. 
 */
import java.util.*;

public class MoveZerosDown {
    public static void moveDown(TreeNode root) {
        if(root == null)
            return;
        List<TreeNode> list = new ArrayList<TreeNode>();
        int count = 0;
        Queue<TreeNode> qu = new LinkedList<TreeNode>();
        TreeNode dummy = new TreeNode(0);
        qu.add(root); qu.add(dummy);
        while(qu.size() != 0) {
            TreeNode temp = qu.remove();
            if(temp == dummy) {
                if(qu.size() != 0)
                    qu.add(dummy);
            }
            else {
                count = count + ((temp.val == 0) ? 1 : 0);
                list.add(temp);
                if(temp.left != null)
                    qu.add(temp.left);
                if(temp.right != null)
                    qu.add(temp.right);
            }
        }
        Stack<Integer> stack = new Stack<Integer>();
        for(int i = list.size() - 1; i >= 0; i--) {
            TreeNode temp = list.get(i);
            if(count > 0) {
                if(temp.val != 0) {
                    stack.push(temp.val);
                    temp.val = 0;
                }
                count--;
            }
            else {
                if(temp.val == 0 && stack.size() > 0)
                    temp.val = stack.pop();
            }
        }
    }
    
    public static void moveDown1(TreeNode root) {
        if(root == null)
            return;
        if(root.val == 0) {
            TreeNode left = findNonZero(root.left);
            TreeNode right = findNonZero(root.right);
            if(left != null) {
                root.val = left.val;
                left.val = 0;
            }
            else if(right != null) {
                root.val = right.val;
                right.val = 0;
            }
        }
        moveDown1(root.left);
        moveDown1(root.right);
    }
    
    private static TreeNode findNonZero(TreeNode root) {
        if(root == null)
            return null;
        TreeNode left = findNonZero(root.left);
        TreeNode right = findNonZero(root.right);
        if(left != null)
            return left;
        else if(right != null)
            return right;
        if(root.val != 0)
            return root;
        return null;
    }
    
    public static void main(String[] args) {
        TreeNode node1 = new TreeNode(1);
        node1.left = new TreeNode(0);
        node1.right = new TreeNode(0);
        node1.left.left = new TreeNode(3);
        node1.left.left.left = new TreeNode(0);
        node1.left.right = new TreeNode(0);
        node1.right.right = new TreeNode(5);
        node1.left.right.left = new TreeNode(4);
        node1.left.right.right = new TreeNode(0);
        PrintBST.printBST(node1);
        moveDown1(node1);
        System.out.println("");
        PrintBST.printBST(node1);
    }
}


Reverse List String

/*
 * Author: Yang Pei
 * Problem: Reverse List String
 * 
 * Note:
 * Given a string represent by a list, each node in the list contains a character,
 * A word is separated by space. Reverse each word in the list.
 * For example
 * 'h'->'e'->'l'->'l'->'o'->' '->'w'->'o'->'r'->'l'->'d' would be changed to
 * 'o'->'l'->'l'->'e'->'h'->' '->'d'->'l'->'r'->'o'->'w'
 * 
 * Solution:
 * Two pointers. Be careful: there might be multiple spaces between two words. And there
 * might be leading and tailing spaces. And there might contains no space and there might 
 * be all space.
 * 
 * Define of the ListNodeC 
 * {
 *     char val;
 *     ListNodeC next;
 *     public ListNodeC(char ch) {
 *         this.val = ch;
 *         this.next = null;
 *     }
 * }
 */
public class ReverseListString {
    public static ListNodeC reverse(ListNodeC head) {
        if(head == null) 
            return head;
        ListNodeC dummy = new ListNodeC(' ');
        dummy.next = head;
        ListNodeC pointer1 = dummy, pointer2 = dummy;
        while(pointer1.next != null) {
            while(pointer2.next != null && pointer2.next.val == ' ')
                pointer2 = pointer2.next;
            if(pointer2.next == null)
                break;
            pointer1 = pointer2;
            // pay attention here, otherwise would have infinite loop
            pointer2 = pointer2.next;
            while(pointer2.next != null && pointer2.next.val != ' ') {
                ListNodeC temp = pointer2.next;
                pointer2.next = temp.next;
                temp.next = pointer1.next;
                pointer1.next = temp;
            }
            pointer1 = pointer2;
        }
        return dummy.next;
    }
    
    public static void main(String[] args) {
        String str = " a ba   c  ";
        ListNodeC dummy = new ListNodeC(' ');
        ListNodeC temp = dummy;
        for(int i = 0; i < str.length(); i++) {
            ListNodeC node = new ListNodeC(str.charAt(i));
            temp.next = node;
            temp = temp.next;
        }
        temp = dummy.next;
        while(temp != null) {
            System.out.print(temp.val);
            temp = temp.next;
        }
        System.out.println("");
        temp = reverse(dummy.next);
        while(temp != null) {
            System.out.print(temp.val);
            temp = temp.next;
        }
        System.out.println("");
    }
}

2015年1月4日星期日

[Leetcode] Binary Search Tree Iterator

The solution is also available here:https://gist.github.com/pyemma/bbe39014f436f2a5aa5b
/**
 * Definition for binary tree
 * public class TreeNode {
 *     int val;
 *     TreeNode left;
 *     TreeNode right;
 *     TreeNode(int x) { val = x; }
 * }
 */

public class BSTIterator {
    private Stack<TreeNode> stack;
    public BSTIterator(TreeNode root) {
        stack = new Stack<TreeNode>();
        if(root != null)
            pushleft(root);
    }
    private void pushleft(TreeNode root) {
        while(root != null) {
            stack.push(root);
            root = root.left;
        }
    }
    /** @return whether we have a next smallest number */
    public boolean hasNext() {
        return !stack.isEmpty();
    }

    /** @return the next smallest number */
    public int next() {
        TreeNode temp = stack.pop();
        if(temp.right != null)
            pushleft(temp.right);
        return temp.val;
    }
}

/**
 * Your BSTIterator will be called like this:
 * BSTIterator i = new BSTIterator(root);
 * while (i.hasNext()) v[f()] = i.next();
 */

2015年1月3日星期六

[Leetcode] Excel Sheet Column Title

The solution is also available here:https://gist.github.com/pyemma/93a32e641b90288867ae
/*
 * Author: Yang Pei
 * Problem: Excel Sheet Column Title
 * Source: https://oj.leetcode.com/problems/excel-sheet-column-title/
 * 
 * Note:
 * Given a positive integer, return its corresponding column title as appear in an Excel sheet.
 * For example:
 *     1 -> A
 *     2 -> B
 *     3 -> C
 *     ...
 *     26 -> Z
 *     27 -> AA
 *     28 -> AB
 * Solution:
 * Recursive method or iterative method. 
 */
public class ExcelSheetColumnTitle {
 public String convertToTitle(int n) {
  if(n == 0)
   return "";
  return convertToTitle((n-1)/26) + (char)((n-1)%26 + 'A');
 }
 
 public String convertToTitle1(int n) {
  StringBuilder sb = new StringBuilder();
  while(n > 0) {
   sb.append((char)((n - 1) % 26 + 'A'));
   n = (n - 1) / 26;
  }
  sb = sb.reverse();
  return sb.toString();
 }
}

[Leetcode] Compare Version Numbers

The solution is also available here:https://gist.github.com/pyemma/0d6f6368fdcfcb73451d
/*
 * Author: Yang Pei
 * Problem: Compare Version Numbers
 * Source: https://oj.leetcode.com/problems/compare-version-numbers/
 * 
 * Note:
 * Compare two version numbers version1 and version1.
 * If version1 > version2 return 1, if version1 < version2 return -1, otherwise return 0.
 * 
 * You may assume that the version strings are non-empty and contain only digits and the . character.
 * The . character does not represent a decimal point and is used to separate number sequences.
 * For instance, 2.5 is not "two and a half" or "half way to version three", it is the fifth second-level revision of the second first-level revision.
 * 
 * Here is an example of version numbers ordering:
 * 0.1 < 1.1 < 1.2 < 13.37
 * 
 * Solution:
 * Split the version according to the ".", then compare each number. Pay attention to
 * leading zeros. If each version number is within the Integer, we could use Integer.paresInt
 * instead of writing a function to compare.
 * 
 * Corner case: 1.0 and 1, 1.0.1 and 1. In this case, we need to check the array with
 * more split to see if each number is 0 or not. 
 */
public class CompareVersionNumbers {
 public static int compareVersion(String version1, String version2) {
  String[] strs1 = version1.split("\\.");
  String[] strs2 = version2.split("\\.");
  for(int i = 0; i < Math.min(strs1.length, strs2.length); i++) {
   if(compare(strs1[i], strs2[i]) != 0)
    return compare(strs1[i], strs2[i]);
  }
  if(strs1.length < strs2.length) {
   for(int i = strs1.length; i < strs2.length; i++) {
    if(compare("0", strs2[i]) < 0)
     return -1;
   }
   return 0;
  } 
  else if(strs1.length > strs2.length) {
   for(int i = strs2.length; i < strs1.length; i++) {
    if(compare(strs1[i], "0") > 0)
     return 1;
   }
   return 0;
  }
  else
   return 0;
 }
 private static int compare(String num1, String num2) {
  int ind1 = 0;
  while(ind1 < num1.length() && num1.charAt(ind1) == '0')
   ind1++;
  int ind2 = 0;
  while(ind2 < num2.length() && num2.charAt(ind2) == '0')
   ind2++;
  num1 = num1.substring(ind1);
  num2 = num2.substring(ind2);
  if(num1.length() < num2.length())
   return -1;
  else if(num1.length() > num2.length())
   return 1;
  else {
   for(int i = 0; i < num1.length(); i++) {
    if(num1.charAt(i) < num2.charAt(i))
     return -1;
    else if(num1.charAt(i) > num2.charAt(i))
     return 1;
   }
   return 0;
  }
 }
 
 public static void main(String[] args) {
  String version1 = "1.0.1";
  String version2 = "1";
  System.out.println(compareVersion(version1, version2));
 }
}

2015年1月2日星期五

MongoDB Notes Final

Aggregation Introduction

Aggregations are operations that process data records and return computed results.

Aggregation Pipelines
  • Documents enter a multi-stage pipelines that transforms the documents into an aggregated result.
  • consist of stages.
  • some stages take a aggregation expression as input.

Map-Reduce
  • Map, Reduce, Finalize.
  • use custom JavaScript functions to map values to key.

Single Purpose Aggregation Operations
  • returning a count of matching documents
    • collection.count( )
  • returning the distinct values for a field
    • collection.distinct( )

  • grouping data based on the values of a field
    • collection.group( )


Aggregation Pipeline on Sharded Collections
  • The pipeline is split into two parts
    • The first is run on each shard, or exclude some shards through shard key
    • The second is run on primary shard, which collect the cursor from each shard, then forward the final result to mongos

Map-Reduce Example
  • Define the map function to process each input document

  • Define the corresponding function with two arguments

  • Perform the map-reduce on all documents in the orders collection using the map function and reduce function


Replication Introduction

Replication is the process of synchronizing data across multiple severs.
Replication provides redundancy and increases data availability. Also allows you to recover from hardware failure and service interruption. 

A replica set is a group of mongod instances that host the same data. One mongod, called the primary, receives all write operations. All other instances, called secondaries, apply operations form the primary to have the same data. The primary logs all operations to oplog. Only primary could receive write operations, read operations could be received by all members.

The secondaries apply the oplog to themselves. If the primary is unavailable, one of the secondaries would be elected to the new primary. The secondary that receives majority of the votes.

An arbiter could be added to break the draw during the election when there are even number of secondaries. The arbiter does not hold any data and is only used for election. 

An arbiter is always an arbiter, a primary could become a secondary, and a secondary could become a primary.

Each set has at most 12 members and in each election, at most 7 members could vote.

Priority 0 member is a secondary that could not become a primary, could not trigger elections. It could function as a standby.
A hidden member maintains a copy of the primary’s data and invisible to the client applications. It must be priority 0 and could not be the primary.

Delayed member contains copies of a replica sets’ data. It reflects an earlier or delayed state of the set. They must be priority 0 and must be a hidden member.

Architecture 
  • Three member replica sets
    • The minimum architecture of a replica set
  • Replica sets with four or more members
    • ensure the sets have odd number of voting members
  • Geographically distributed replica sets

Failover
Heartbeats: Replica set members send heartbeats(pings) to each other every two seconds. If it does not return within 10 seconds, then this member would mark it as inaccessible.

Members prefer to vote members with high priority.

Optime: the timestamp of the last operation that a member applied form the oplog. A replica set member could not become a primary unless it has the highest optime of any visible member in the set.

A replica set member can not become primary unless it can connect a majority of the members in the set. In a three members architecture, a secondary could not be a primary when the other two are done since it could not connect to a majority number of the members in the set. Also when the two secondaries are done, the primary would down step to a secondary.

Read Preference
  • primary
  • primaryPreferred
  • secondary
  • secondaryPreferred
  • nearest

The oplog is a special capped collection that keeps a rolling record of all operations that modify the data stored in your database. All replica set maintain a copy of oplog. Any member can import oplog entries from any other member.

Data Synchronization
  • Initial Sync: when a member has no data
    • Clones all data.
    • Applies all changes to the data set.
    • Builds all indexes on all collections.
  • Replication: continuously after initial sync