Facebook : it's all AI !!!

From your ad recommendations to your suggested friends, there is more at Facebook that uses the deeper nuances of Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP) and Information retrieval (IR). Have you ever wondered what does Facebook do with all our data. How is it important to Facebook, if I eat at KFC or I eat at Downtown Diners ? Were the new Emoji 'Reactions' on Facebook posts a random selection of Reactions?
Src: Google.com
Image result for facebook emoji reactions
Src: google.com

This post deals with a handful of the few areas where Facebook is consuming this ever increasing user data to help improve user experience or to help generate information from the random data. The examples below are replete with the IR, NLP and AI applications at Facebook.

Social Recommendations

An year before, Facebook had launched its Recommendation feature that recommends you places or services if you write a post asking for it. It highlights the suggested replies with some additional information and with locations marked on a map .

Src: https://goo.gl/ntQRkc

Approach

Let the post be :
- Post: Where should I eat tonight in Delhi ?
- Response: You should totally check out KFC at CP.

The task is to extract the recommendation from the Response by feeding query into a local places search engine developed from comments and data all over FB and combining it with user preferences and likes. This is done in two steps:

Entity Retrieval : Firstly, KFC is resolved into Proper Entity. There could be many KFC all over CP itself. The correct one needs to be surfaced. Moreover a query is parsed on the basis of number of tokens, the category. For E.g. Delhi Bar might be the name of the Bar or it could mean a Bar in Delhi. These cases are handled and their relevance score is calculated.

Entity Scoring: The relevance score is calculated on the basis of the place the search is made from, the query etc. A Gaussian Mixture Model is applied to the previous check- ins on the place. Then, Lambda MART algorithm is applied that uses click data to analyze relevance characteristics like person's location with respect to the place,  the check-in time and find the most probable location.

On This Day 

The On this day feature of Facebook shows people the memories that are LIKELY people may want to see.


Src: https://goo.gl/wio75v

Approach

Filtering: Certainly people would not be interested  in posts  with some negative impact or effect. On the basis of this some automatic filters are applied. These automatic filters also filter out content that blocks some users or person by figuring out our likes and dislikes.

Ranking: After filtering content, we rank most potential memories. A machine learning model is used to rank the memories. It is trained in real time and learns progressively. With time gets more accurate in predicting our preferences. The training is done over the following data:

Personalization: includes their interaction with the system on That day,  their feeds on that day they liked, their demographic information, and attributes of the memory like number of years ago.

Content understanding: uses computer vision to understand the media content that a user posts.The model is built on top of a deep convolution neural network that is trained over many concepts and can distinguish concepts like whether it is an object or an emotion, action or a place etc. This enables the system to know if the photo is of a cat or a road trip. 

Reaction Emoticons: 

Almost an year back Facebook came up with the Reaction emoticons. People at Facebook noticed that most of the comments on the Facebook posts are in emojis or emoticon. Why not directly make the people express the emotion instead of hitting like and then expressing the emotion. 


Src: https://goo.gl/kd8kP7

Large amount of data must have been used to retrieve the top most emojis used. Since the current model uses only 6 emojis. A ranking model was applied to retrieve the top 6 emojis from a stream of multiple emojis. 

Src: https://goo.gl/FqfdcT


The ranking of the emojis were not only done on the basis of the  normalized count of the emojis but also with the aim of covering different bands of emotions. The reaction emoticon was a perfect blend of Machine Learning and User experience.


Other RESEARCH and TOOLS at FACEBOOK :

FAISS: A similarity search library

It is a nearest neighbor search similarity library and can be scaled over billion sized data. It can construct the k-nearest neighbor graph on billion-scale high dimension vectors.

To handle such vectors traditional database systems are not helpful. Consider a case where you know the details of a building ( characteristics ) but not the name of the building. This is done in similarity search. Similar vectors are found on the basis of the euclidean distance.

The system applies a "maximum inner product" search. It compares the dot product of the query vector with the vector of all images in database. Obviously it uses up computation time and space profusely.

DeepText: facebook text 

Related image
Src: https://goo.gl/7o2UKE
Every minute on Facebook 4,00,000 new stories and 1,25,000 new comments are added on public posts. The Facebook deep text categorizes everything  you post and streamlines the user experience to refine the search and text availability, provide content as per user preferences and to mark spams.

By using convolution and recurrent neural nets, the various word sense disambiguation problems of traditional NLP were addressed. Traditional approaches required different systems for different languages  to pre process the text. This can be avoided in Deep text.

This is done by preserving the semantic structure of text for E.g. if simply taking a word and assigning it a id , removes its embedded structure. In order to preserve it words are analysed using the mathematical concept of word embedding.  Moreover a word from different languages will have a nearer word embedding in both the forms.

Multi- lingual support:

Almost half the community in Facebook uses language other than English, it becomes important to address the language. Moreover building a separate model for each language is insane and naive. Multi-lingual embedding helps in scaling to more languages.

In order to get training data to address a text there could be two approaches :

- get the data in all different languages.
- get the data in English, model it, for any other language that comes up, translate it and address it.

The problem in second approach is that the errors in translations are propagated in training. 
Src: https://goo.gl/F47vZc
The goal is to represent each word as a vector (as a word embedding).
Bringing the vectors from all languages in a common vector space.
Since, similar words lie nearer to each other in vector space, the meanings can be addressed

Src: https://goo.gl/F47vZc
Since similar words are now closer to each other, the classifier can perform better.

Conclusion and References:

Clearly AI has been embedded in our daily life, sometimes even without us realizing it. You can find more research and tools at Facebook at the reference given below:

Ref:
https://code.facebook.com/ai-research/
www.google.com









Comments