Unicorn: A System for Searching the Social Graph

Facebook ? Everyone seems to have heard about it and used it. But ever wondered how does it actually work , and has been continuously working on for improving the user experience?
So, ever since 2010 Facebook uses a search engine called UNICORN. Facebook’s graph search relied on UNICORN for a major time, which served billions of queries per day with minimal latency.

What is unicorn?

Unicorn is a system for searching a social graph. It is like any other indexing system, which returns indexes stored in memory. But, it differs in the sense that, it is meant for retrieval of data from social graphs and further giving ranks to the retrieved data. When first built , it was simply an inverted –index system, which later was used as an “in-memory database” and was further extended to be used as a search engine handling millions and millions of queries per day with latency less than few hundred seconds.

Unicorn in addition to having the capability of handling a very large index also provided an opportunity for Facebook to move from multiple search back ends to single search back end.

Components of Unicorn –

The major components of unicorn are –

1) INDEX: This is a many – to – many mapping between attributes and the user ids

2) Framework to handle the new upcoming data and adding on to the existing inverted index

3) Framework for retrieving entities from the index, considering the various constraints imposed on the attributes

E.g. Of an index –

Anupama’s friend with fbid , 1234 and living in Delhi will have following mappings corresponding to its index:

Friends: 1001 -> 1234

Lives-in: 100 -> 1234

Here, Friend and Lives in are attributes that might be shared by other user ids as well.

What sort of Queries does it handle?

UNICORN supports the regular queries which includes the regular “AND” and “OR” operator queries. Queries of this sort in general require only “one hop” that means , the final result node will be just one edge far from the source node .But , UNICORN is further equipped to handle “multi - hop” queries which is used to handle queries which use “apply” operator .

Eg. Of apply query - Let’ say the user wants to find “friends of his friend staying in Mumbai” .Then the query will be processed in following steps:

Step 1- all “friends of” friend will be retrieved.

Step 2- from the result of step 1, further “live-in” filter will be applied (i.e. live-in edge will be searched) and final results will be returned back.

This “multi-hop” feature helps in establishing new relationships between indirectly connected nodes.

How is indexing done?

Inverted Indexes are created with help of map-reduce scripts that further collect the data from Hive tables. Processing is done over it and finally inverted indexes are created.

How does it handle dynamic data?

The unique dataset in Facebook keeps on changing continuously. This means that the unicorn indexes need to be updated continuously. To perform this live update pipeline is used that updates the changes within seconds.

What is Search ranking?

Returning the best and the top results corresponding to a search query.

Unicorn as a search engine –

The indexes to be stored are generally too large and hence, can’t be stored into the memory of one machine. So, indexes are divided and then stored on different machines, also known as index servers, such that each division fits exactly onto one machine. A vertical Aggregator receives all incoming queries and is further broadcasted to all index servers. Each, index server fetches results corresponding to the query and returns it back to the vertical aggregator, where the results received from different servers are combined and finally returned back to the caller .

UNICORN as a search engine

Further, the model was extended for embedding ranking capabilities. Few of them are :

- Maintaining separate unicorn verticals for different types of entities and blending, as different entity types will have different considerations while ranking. To, manage all this TOP LEVEL AGGREGATOR was incorporated.

- Query rewriting is done in order to incorporate the user intent and social context and finally generate structured unicorn query.

- Scoring is done, in order to rank the results obtained so that only best results are returned.

Graph Search Query :

Two stages are involved in graph query search –

1) Query Suggestion Phase : In this phase the NLP module parses the query and extract potential entities from the query . These entities are further passed down to UNICORN along with their POS tags, which is a sort of bias, So that unicorn returns suggestions accordingly.

2) Search Phase : Once the suggestion is selected by the user then it is passed to TOP AGGREGATOR from where it is further passed to vertical aggregators. These vertical aggregators further pass it down to index servers , where relevant posting lists are retrieved and final answer fetched . Scoring is done for all this and the results are passed on to next vertical aggregator in a cascaded manner . This goes on till the time required index server is reached and then scored results are returned to vertical aggregators and then to Top Aggregators. Finally the best n results are returned back tot he caller .

References :

· Unicorn: A System for Searching the Social Graph by Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin, Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang.

· http://www.adweek.com/digital/a-deeper-look-at-facebooks-graph-search/

IIITD IR MELANAGE

Search This Blog