Graph Machine Learning
6.71K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
Mining and Learning with Graphs Workshop

MLG workshop is a regular workshop on various ML solutions for graphs. The videos for each poster can be found here. Keynotes should be available soon (except for Danai Koutra, which is available now).
Graph Machine Learning research groups: Pietro Liò

I do a series of posts on the groups in graph research, previous post is here. The 13th is Pietro Liò, a computational biologist and a supervisor of Petar Veličković. He has also been very active in GML recently (with 54 papers in 2020) so he could be a good choice if you want to do a PhD in this area.


Pietro Liò (~1965)
- Affiliation: University of Cambridge
- Education: Ph.D. in Theoretical Genetics at University of Firenze, Italy in 1995 and Ph.D. in Engineering at University of Pavia, Italy in 2007;
- h-index: 50;
- Awards: Lagrange Fellowship, best papers at ISEM, MCED, FET;
- Interests: graph neural networks, computational biology, signal processing.
JuliaCon2020 Graph Videos

While Python is a default language for analyzing graphs, there are numerous other languages that provide packages for dealing with graphs. In the recent JuliaCon, devoted to a programming language Julia, many talks were about new graph packages with applications to transportation networks, dynamical systems, geometric deep learning, knowledge graphs, and others. Check out the full program here.
Number of papers in GML: Aug 2020

There are 277 new GML papers in CS section of ArXiv in Aug 2020 (vs 339 in July).
Topology-Based Papers at ICML 2020

Topological data analysis studies the applications of topological methods to real-world data, for example constructing and studying a proper manifold given only 3D points. This topic is increasingly gaining attention and a new post by Bastian Rieck discusses topological papers at ICML 2020 that includes graph filtration techniques, topological autoencoders, and normalizing flows.
GML Newsletter Issue #2

The second newsletter is out!

Blog posts (graph laplacians, SIGN, quantum GNN, TDA), videos (MLSS-Indo, PNA), events (KDD, Israeli workshops, JuliaCon), books, and upcoming events (graph drawing symposium, data fest).
DeepMind's Traffic Prediction with Advanced Graph Neural Networks

A new blog post by DeepMind has been released recently that describes how you can apply GNN for travel time predictions. There are not many details about the model itself (which makes me wonder if deep net trained across all supersegments would suffice), but there are curious details about training.

1. As the road network is huge I suppose, they use sampling sampling of subgraphs in proportion to traffic density. This should be similar to GraphSAGE-like approaches.

2. Sampled subgraphs can vary a lot in a single batch. So they use RL to select subgraph properly. I guess it's some form of imitation learning that selects graphs in a batch based on some objective value.

3. They use MetaGradients algorithm to select a learning rate, which was previously used to parametrize returns in RL. I guess it parametrizes learning rate instead in this blog post.
Graph ML at Twitter

A post by Michael Bronstein and Zehan Wang that talks about the current challenges of using graph models for industry settings: scalability, heterogeneous settings, dynamic graphs, and presence of noise.
On the evaluation of graph neural networks

Over the last year there have been many revealing benchmark papers that re-evaluate existing GNNs on standard tasks such as node classification (see this and this for example). However, the gap between claimed and real results still exist and especially noticeable when the baselines are not properly selected.

For one using MLP only on node features often leads to better results than those from GNNs. This is surprising as GNNs can be seen as a generalization of MLP. I encounter this more and more on new data sets, although for several data sets (e.g. Cora) you can clearly see advantage of using GNNs.

Another ML model that I haven't seen being tried at graph settings is GBDT model (e.g. XGBoost, CatBoost, LightGBM). GBDT model are de-facto winners of many Kaggle competitions where the data is tabular, so you could expect if you have enough variability in your node features just using GBDT on them would often make a good baseline. I have tried this for several problems and it often outperforms the proposed method in the paper. For example, for node classification using GBDT on Bus data set achieves 100% accuracy (vs. ~80% in the paper). Or on graph classification GBDT can beat other top GNN models (see image below). Considering how easy it is to run experiments with GBDT models I would expect it would be a good counterpart to MLP in the realm of baselines.
Graph Machine Learning research groups: Danai Koutra

I
do a series of posts on the groups in graph research, previous post is here. The 14th is Danai Koutra, ex-PhD student of Christos Faloutsos, she leads the graph exploration lab at University of Michigan, and could be a great Ph.D. advisor if you are interested in GML.


Danai Koutra (~1988)
- Affiliation: University of Michigan
- Education: Ph.D. in Carnegie Mellon University in 2010 (advisor: Christos Faloutsos)
- h-index 25
- Awards: ACM SIGKDD 2016 Dissertation Award; best paper awards at ICDM, PAKDD, ICDT
- Interests: graph mining, knowledge graphs, graph embeddings
Latent graph neural networks: Manifold learning 2.0?

One of the hot topics of this year is construction of a graph from unstructured data (e.g. 3d points or images). In a new post Michael Bronstein discusses existing approaches to latent graph learning and suggests that using GNN both to learn the structure of the graph and to solve the downstream tasks can be a better alternative than a de-coupled approach. This is indeed an exciting and active area of research with open problems and known applications to NLP, physics, and biology.
Graph ML at Data Fest 2020

This year, together with @IggiSv9t, I organize a track at Data Fest 2020. It's like a workshop at the conference, but more informal. We will have videos from our amazing speakers and also networking, where you can speak to me, @IggiSv9t, speakers, or other people who are interested in graph machine learning. Besides our track there will be many other interesting tracks on all aspects of ML and DS (interpretability, antifraud, ML in healthcare, and 40 more tracks!).

It will be this weekend, 19-20 September. You need to be registered (for free) at https://fest.ai/2020/.

Our videos:

Day 1 (Saturday)

1. Opening remarks: Graph Machine Learning, Sergey Ivanov, Criteo, France

2. Graph-Based Nearest Neighbor Search: Practice and Theory, Liudmila Prokhorenkova, Yandex, Russia

3. Graphical Models for Tensor Networks and Machine Learning, Roman Schutski, Skoltech, Russia

4. Unsupervised Graph Representations, Anton Tsistulin, University of Bonn & Google, Germany

5. Placing Knowledge Graphs in Graph ML, Michael Galkin, TU Dresden, Germany


Day 2 (Sunday)
1. Large Graph Visualization Tools and Approaches, Sviatoslav Kovalev, Samokat, Russia

2. Business Transformation as Graph Problems, Vadim Safronov, Key Points, Portugal

3. Scene Graph Generation from Images, Boris Knyazev, University of Guelph & Vector Institute, Canada

4. AutoGraph: Graphs Meet AutoML, Denis Vorotinsev, Oura, Finland

5. Link Prediction with Graph Neural Networks, Maxim Panov, Skoltech, Russia


See you there!
On Cora dataset

Cora, Citeseer, and Pubmed are three popular data sets for node classification. It's one of those cases where you can clearly see the power of GNN. For example, on Cora GNNs have around 80% accuracy, while GBDT/MLP have only around 60%. This is not often the case: for many data sets I can see marginal win for GNN compared to non-graph methods and for some data sets it's actually lower.

So why the performance of GNN is so great on this data set? I don't have a good answer for this, but here are some thoughts. Cora is a citation network, where nodes are papers and classes are papers' field. However, it's not clear what are the links between this documents. The original paper didn't describe how exactly links are established. If links were based on citation, i.e. two papers are connected if they have a citation from one to another, then it could explain such big improvement of GNN: GNN explore all nodes during training, while MLP only training nodes and since two papers likely to share the same field, GNN leverage this graph information. If that's the case simple k-nn majority vote baseline would be performing similar to GNN. However, there is an opinion from people who know the authors of the original paper saying that the links are established based on word similarity between documents. If that's true, I'm not sure why GNN is doing so well for this data set. In all cases, establishing the graphs from real-world data is something that requires a lot of attention and visibility, that's why structure learning is such an active topic.
Graph ML at Data Fest 2020

Day 1 was a pleasant surprise: people with different background came, watched videos, and asked questions. Here are 5 videos of day 1:

1. Opening remarks: Graph Machine Learning, Sergey Ivanov, Criteo, France (where I broadly talk about what is GML, what are the best resources, what's the community, etc.);

2. Graph-Based Nearest Neighbor Search: Practice and Theory, Liudmila Prokhorenkova, Yandex, Russia (where she spoke about her k-NN on graphs, HNSW, theory and her ICML 20 work);

3. Graphical Models for Tensor Networks and Machine Learning, Roman Schutski, Skoltech, Russia (where he spoke about graphical models, treewidth, tensor decomposition);

4. Unsupervised Graph Representations, Anton Tsistulin, University of Bonn & Google, Germany (where he spoke about all popular node embeddings methods and what their pros and cons);

5. Placing Knowledge Graphs in Graph ML, Michael Galkin, TU Dresden, Germany (it's all you need to know about knowledge graphs if you don't know what they are).


On day 2, tomorrow, we will have 5 more videos, which would be about applications of graphs.

Please, join us tomorrow at https://spatial.chat/s/ods at 12pm (Moscow time).
Graph ML at Data Fest 2020

Day 2 continued to surprise me as many people have joined on Sunday to listen to our talks. Especially interesting it was to see English-speaking participants who were not humble to ask questions and be present among so many Russian speakers. I see this English activity as a promising step in making ODS community truly global.

Here is the second portion of videos, more related to applications of graphs.

1. Large Graph Visualization Tools and Approaches Sviatoslav Kovalev, Samokat, Russia

2. Business Transformation as Graph Problems Vadim Safronov, Key Points, Portugal

3. AutoGraph: Graphs Meet AutoML Denis Vorotinsev, Oura, Finland

4. Scene Graph Generation from Images Boris Knyazev, University of Guelph & Vector Institute, Canada

5. Link Prediction with Graph Neural Networks Maxim Panov, Skoltech, Russia


My gratitude to all the speakers!

Until next time!