Facebook ? Everyone seems to have heard about it
and used it. But ever wondered how does it actually work , and has been continuously
working on for improving the user experience?
So, ever since 2010 Facebook uses a search engine called UNICORN. Facebook’s graph search relied on UNICORN for a major time, which served billions of queries per day with minimal latency.
So, ever since 2010 Facebook uses a search engine called UNICORN. Facebook’s graph search relied on UNICORN for a major time, which served billions of queries per day with minimal latency.
What is
unicorn?
Unicorn is a system
for searching a social graph. It is like any other indexing system, which returns
indexes stored in memory. But, it differs in the sense that, it is meant for retrieval
of data from social graphs and further giving ranks to the retrieved data. When
first built , it was simply an inverted –index system, which later was used as
an “in-memory database” and was further extended to be used as a search engine
handling millions and millions of queries per day with latency less than few
hundred seconds.
Unicorn in addition
to having the capability of handling a very large index also provided an
opportunity for Facebook to move from multiple search back ends to single search
back end.
Components of Unicorn –
The major
components of unicorn are –
1)
INDEX:
This is a many – to – many mapping between attributes and the user ids
2)
Framework
to handle the new upcoming data and adding on to the existing inverted index
3)
Framework
for retrieving entities from the index, considering the various constraints
imposed on the attributes
E.g.
Of an index –
Anupama’s
friend with fbid , 1234 and living in Delhi will have following mappings
corresponding to its index:
Friends:
1001 -> 1234
Lives-in:
100 -> 1234
Here,
Friend and Lives in are attributes that might be shared by other user ids as
well.
What sort of
Queries does it handle?
UNICORN supports
the regular queries which includes the regular “AND” and “OR” operator queries.
Queries of this sort in general require only “one hop” that means , the final result
node will be just one edge far from the source node .But , UNICORN is further
equipped to handle “multi - hop” queries which is used to handle queries which
use “apply” operator .
Eg. Of apply query -
Let’ say the user wants to find “friends of his friend staying in Mumbai” .Then
the query will be processed in following steps:
Step 1- all “friends
of” friend will be retrieved.
Step 2- from the
result of step 1, further “live-in” filter will be applied (i.e. live-in edge
will be searched) and final results will be returned back.
This “multi-hop”
feature helps in establishing new relationships between indirectly connected
nodes.
How is
indexing done?
Inverted Indexes
are created with help of map-reduce scripts that further collect the data from
Hive tables. Processing is done over it and finally inverted indexes are created.
How does it
handle dynamic data?
The unique dataset in Facebook keeps
on changing continuously. This means that the unicorn indexes need to be
updated continuously. To perform this live update pipeline is used that updates
the changes within seconds.
What is Search
ranking?
Returning the best and the top
results corresponding to a search query.
Unicorn as a search engine –
The indexes to be stored are
generally too large and hence, can’t be stored into the memory of one machine.
So, indexes are divided and then stored on different machines, also known as
index servers, such that each division fits exactly onto one machine. A vertical
Aggregator receives all incoming queries and is further broadcasted to all
index servers. Each, index server fetches results corresponding to the query
and returns it back to the vertical aggregator, where the results received from
different servers are combined and finally returned back to the caller .
UNICORN as a search engine
Further, the model was extended for embedding
ranking capabilities. Few of them are :
- Maintaining
separate unicorn verticals for different types of entities and blending,
as different entity types will have different considerations while ranking. To,
manage all this TOP LEVEL AGGREGATOR was incorporated.
- Query
rewriting is done in order to incorporate the user
intent and social context and finally generate structured unicorn query.
- Scoring
is done, in order to rank the results obtained so that only best results are
returned.
Graph
Search Query :
Two stages are involved in graph
query search –
1) Query Suggestion Phase : In this phase
the NLP module parses the query and extract potential entities from the query .
These entities are further passed down to UNICORN along with their POS tags,
which is a sort of bias, So that unicorn returns suggestions accordingly.
2)
Search
Phase : Once the suggestion is selected by the user then it is passed to TOP AGGREGATOR from where it is further passed to vertical aggregators. These vertical aggregators further pass it down to index servers , where relevant posting lists are retrieved and final answer fetched . Scoring is done for all this and the results are passed on to next vertical aggregator in a cascaded manner . This goes on till the time required index server is reached and then scored results are returned to vertical aggregators and then to Top Aggregators. Finally the best n results are returned back tot he caller .
References :
·
Unicorn: A System for Searching the Social
Graph by Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian
Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin, Sriram
Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang.
·
http://www.adweek.com/digital/a-deeper-look-at-facebooks-graph-search/
Comments
Post a Comment