The topic of my presentation is an approach to the problem of automated identification of Internet users’ interests. The core of the proposed approach is the technique called Latent Dirichlet Allocation — a Machine Learning method based on an elegant probabilistic generative model. After minor adjustments the model appears to be applicable to the problem of user interests identification. We will discuss the algorithm used to train the model and sketch its MapReduce implementation. To demonstrate the efficiency of the proposed approach I will present the results of modeling Mail.Ru Group users’ interests. The presentation is intended for technical specialists interested in applying Machine Learning methods to Big Data processing.
Nikolay Anokhin
Data Scientist, Mail.Ru Group
Nikolay earned a master’s degree in applied mathematics and physics after graduating from Moscow Institute of Physics and Technology. Since 2010 he has been working on various projects that involved applying Machine Learning techniques and Big Data analytics both in Russia and abroad. He is currently working as a data scientist in Ad Operations Department at Mail.Ru Group. Nikolay created and delivered a course of lectures on Data Mining for the students of Computer Science department of Moscow State University.