Neural Network That
Learns From A Huge Graph
Presented at Spark Summit East, 2017.
Lynx Analytics develops a big graph analysis engine on top of Apache Spark. One of
our recent developments is a recurrent neural network library that learns from the
structure of the graph in order to predict missing features of vertices.
A real-life use case is demographic estimation where the task is to predict the age
of different customers of a telco by exploring their connections to other people, the
age of those people and other classical features like internet or phone usage patterns.
One of the main challenges we faced was to develop a training process for our
purposes. The usual way of training a supervised learning algorithm considers each
vertex as an independent prediction problem. But due to the use of connections between
the vertices in our algorithm we cannot treat vertices independently. On the other hand,
if you consider the whole graph as one problem, then you do not have any separate
training data at all. In this talk we will show some tricks that we used in order to
perform the prediction and the training process on the same graph.
The other main challenge is to handle graphs so big that they do not fit into the
memory of a single machine and perform really resource-intensive computations on them.
To tackle this problem it is necessary to store and make computations on the graph
distributedly. The difficulty of this is that we cannot just simply cut the graph into
smaller pieces since we need to propagate data via the edges for the training process.
In the talk we will show core algorithmic ideas to tackle the above-mentioned
problems and present some experimental results.