Graph Machine Learning
6.7K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
JAX Molecular Dynamics

I have already mentioned before exciting works from DeepMind about simulating dynamics of particles: each particle is a node in a graph and nearest neighbors are connected with each other, with the goal to predict acceleration at every time step that is provided to physics engine to predict the next state of the particles. The code and data were recently released here.

But there is another Google package JAX MD for performing molecular dynamics in JAX, a numpy-like package with autograd, purely in python (paper and colab notebook). Authors claim to have experiments accelerated in GPU with end-to-end training for hundreds of thousands of particles. The work has been accepted to NeurIPS 2020.
Discovering Symbolic Models from Deep Learning with Inductive Biases

In addition to the previous post of learning dynamics of particles, there is another NeurIPS 2020 work by Cranmer et al. of learning new equations of unknown physics. The idea is to use the messages on edges of GNNs and node outputs as the input for symbolic regression algorithm (eureuka), which does a genetic search of the input and symbols such as +, -, exp, log, etc.

What's interesting is that this methodology can be applied to any NN model, when symbolic regression is used to unravel a compact symbolic expression of the underlying data. Moreover, one can see genetic search as some post-processing of the GNN model discovering new, closed-formula solutions to the data that generalizes better than GNN itself. Blog post (with code) and Yannic explanation are available.
Fresh picks from ArXiv
Today at ArXiv many interesting papers. Debunking performance of GNN with MLP models, a new python library for graph reconstruction and distances, scalable and reliable GNNs, and more papers from NeurIPS 2020.

If I forgot to mention your paper, please shoot me a message and I will update the post.

Conferences
- Handling Missing Data with Graph Representation Learning with Jure Leskovec, NeurIPS 2020
- Scalable Graph Neural Networks via Bidirectional Propagation NeurIPS 2020
- Reliable Graph Neural Networks via Robust Aggregation with Stephan Günnemann, NeurIPS 2020
- Strongly Incremental Constituency Parsing with Graph Neural Networks NeurIPS 2020
- Graph Contrastive Learning with Augmentations NeurIPS 2020
- Graph Neural Network for Metal Organic Framework Potential Energy Approximation Workshop NeurIPS 2020
- Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management ACL 2021
- Be More with Less: Hypergraph Attention Networks for Inductive Text Classification EMNLP 2020
- Event Detection: Gate Diversity and Syntactic Importance Scoresfor Graph Convolution Neural Networks EMNLP 2020
- AutoAudit: Mining Accounting and Time-Evolving Graphs with Christos Faloutsos, Big Data 2020

Graphs
- Log(Graph): A Near-Optimal High-Performance Graph Representation
- netrd: A library for network reconstruction and graph distances
- Revisiting Graph Neural Networks for Link Prediction
- Graph Contrastive Learning with Adaptive Augmentation
- On Graph Neural Networks versus Graph-Augmented MLPs with Joan Bruna
- Combining Label Propagation and Simple Models Out-performs Graph Neural Networks

Survey
- Domain-specific Knowledge Graphs: A survey
Lecture by Xavier Bresson: Recent Developments in GNNs.

Xavier Bresson shared the slides from his lecture on the recent architectures of GNNs and their properties. It nicely summarizes exciting topics such as graph isomorphism, WL tests, equivariance, universal approximations, positional encodings and more. Must-read for those who want to get up-to-date with all those fancy terms.
Language Models are Open Knowledge Graphs

(video) In July 16 I posted that it would be cool to use GPT to build the graphs and on 22 Oct someone did just that. Language Models are Open Knowledge Graphs shows how you can use attention matrices of BERT/GPT-2 to extract the relationships for given entities from the text and build a knowledge graph from that. Yannick thoroughly discusses the paper. Here are comments from Michael Galkin:

The paper shows impressive results extracting facts that are present in Wikidata as well as adding new ones, unknown to Wikidata. There is still a lot of room for improvement, though.
* It's computationally expensive - 20 servers each with 4 Tesla K80 GPUs were running for 48 / 96 hours. GPUs go brrr ;)
* Neither BERT nor GPT have a notion of entities, so you'd need auxiliary tools and NER annotators to map tokens "New" "York" to "New York" as one entity. From that PoV, using KG-augmented LMs trained with millions of explicit entities might be a promising move.
* The facts search strategy is a key. The authors used a generative beam search strategy with some post-processing filtering and it is arguably expensive. The whole space of facts in triple-based KGs is a Cartesian product of all entities and relations (E x R x E), and only a small fraction of those facts are correct.
* One might say that extracted graphs are way too star-shaped, i.e., there are not so many links between leaf nodes - that is a direct consequence of the fact extraction strategy.
NeurIPS 2020 papers

NeurIPS 2020 papers just became available.
Graph Machine Learning research groups: Albert-László Barabási

I
do a series of posts on the groups in graph research, previous post is here. The 18th is Albert-László Barabási, who introduced scale-free networks and in particular Barabási–Albert model and popularized network science through several books and courses.

Albert-László Barabási (1967)
- Affiliation: Northeastern University
- Education: Ph.D. at Boston University in 1994 (advisor: H. Eugene Stanley)
- h-index 145
- Awards: John von Neumann Medal, C&C Prize, Prima Primissima Award, Cozzarelli Prize, Bolyai Prize
- Interests: network science, network structure, network control
Graph ML Newsletter: NeurIPS 2020

This issue I devoted to upcoming NeurIPS 2020: general trends and exciting topics in graph ML. Oversmoothing, adversarials, explainability, scalability, and more.
Fresh picks from ArXiv
Today at ArXiv three interesting surveys: overview of graph kernel literature from the last 15 years, GNNs for recommendations, and digest of Babai and Luks algorithms for graph isomorphism problem.

If I forgot to mention your paper, please shoot me a message and I will update the post.

Graphs
- Single-Node Attack for Fooling Graph Neural Networks
- Learning with Molecules beyond Graph Neural Networks Workshop NeurIPS 2020
- Massively Parallel Graph Drawing and Representation Learning BigData 2020
- Deoscillated Graph Collaborative Filtering with Philip S. Yu
- On Self-Distilling Graph Neural Network
- Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory EMNLP 2020


Survey
- Graph Kernels: State-of-the-Art and Future Challenges with Karsten Borgwardt
- Graph Neural Networks in Recommender Systems: A Survey
- Recent Advances on the Graph Isomorphism Problem
Erdős goes neural

This is a post by Andreas Loukas about their recent work on solving combinatorial problems, accepted to NeurIPS 2020.

There is great progress in combinatorial optimization (CO) problems with neural networks. Unsupervised learning (UL) is a particularly appealing direction as it aspires to avoid common drawbacks (computational cost, limited generalization) associated with the use of labeled instances. A major challenge in UL is to ensure that the discretized output respects CO constraints. Researchers have often tackled these problems by various ad-hoc heuristics that repair/improve upon the fractional solutions.

Our paper proposes a principled approach to use UL GNNs for CO that ensures that constraints will be met without breaking differentiability. Our idea is to utilize the probabilistic method pioneered by Paul Erdős to bridge the gap between the discrete and continuous worlds.

The probabilistic method. To prove the existence of an object with a property π one invents a distribution over possible objects. Showing that P(object has π)>0 is a certificate for the object's existence. For example, we can solve the maximum cut problem by using a fair coin toss to decide in which partition each node is placed. It is easy to show that, on average, half the edges of the graph will be cut, implying the existence of a 1/2-approximation for the maxcut problem.

Erdős goes neural. We utilize the same procedure but learn the node probabilities with a GNN. To achieve this, we analytically construct a differentiable "probabilistic penalty loss" for the discrete problem. As shown, by minimizing the latter, the GNN provides a probabilistic certificate for the existence of a low cost and feasible solution. At inference time, discrete solutions of guaranteed cost are deterministically obtained with the method of conditional expectation.

In practice. We demonstrated the efficacy of this approach on two NP-hard problems: the maximum clique problem and the constrained min-cut problem. In our experiments, our approach always yields feasible solutions and is competitive with neural baselines, MIP solvers, and heuristics.

Future work. There are subtleties in our framework that need to be carefully addressed. On the theoretical front, it can be challenging to derive a probabilistic penalty loss for certain types of constraints (such as TSP). On the computational front, finding ways to accelerate our decoding module could significantly improve the running time during inference.

TLDR:
* We propose a differentiable way to solve CO problems with unsupervised GNNs.
* In our approach, the neural network is learning a distribution of solutions. Successfully minimizing our loss provides a certificate of existence of low-cost feasible solutions in the learned distribution.
* We obtain competitive results against neural baselines, heuristics, and solvers.

blogpost: https://andreasloukas.blog/2020/10/31/erdos-goes-neural
paper: https://rb.gy/uktdfa
Papers and Scores at ICLR 2021

You can see a ranked list of all papers at ICLR 2021 here (~3000 papers) and only for graph papers here (200+ papers). These scores are before rebuttal, so they can change in the final ranking.

With 20% acceptance rate (which is a bit low for ICLR), the average score should be about 6, in order to get accepted.
The median score now is 5.25.

There are many graph papers are at the top of the list, including the top-1 paper.

And from the last year here are some insights:
(1) Every third paper on graphs is accepted, clear indication GML is becoming popular;
(2) On average it's needed [6,6,8] to get accepted, [6,6,6] would be borderline.
(3) AC can sometimes save a paper, even if got low scores. This is rather good, meaning that reviewers are not the only ones who decide.
(4) Likewise, AC can reject a paper, even if it is unanimous accept by the reviewers. I think that happens mostly because the paper does not present enough experimental comparison to SOTA.
EMNLP 2020 stats

Dates: Nov 16-18
Where: Online
Price: $200 ($75 students)

Graph papers can be found at paper digest.

• 3359 submissions (vs 2905 in 2019)
• 754/520 accepted EMNLP/Findings (vs 660 in 2019)
• 22.4% / 20.5% acceptance rate (vs 22.7% in 2019)
• ~104 total graph papers (8% of total)
Workshops at NeurIPS 2020

There are more than 60 workshops at NeurIPS this year. Some relevant (with available accepted papers) are Learning Meets Combinatorial Algorithms (LMCA) on ML + NP-hard problems; and Differential Geometry meets Deep Learning (DiffGeo4DL) on geometry and manifolds.
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks

This paper by Cornell and Facebook made a lot of noise on Twitter recently. In short, it shows that GNNs can be outperformed by simpler models such as MLP + Label Propagation (LP) on several large datasets.

They use LP (actually twice) to propagate the labels from training nodes to test nodes. LP has been used for two decades successfully (NIPS 2004 as well as this survey), it's just it was not directly compared to GNN. Unfortunately, LP does not use node features, so the authors propose first to use MLP on node features and then use LP on predictions of MLP and on labels.

This work only applies for transductive node classification, but not on inductive node classification (applying trained model on new graphs), neither on link prediction nor graph classification. But for node classification it shows pretty good results in terms of speed and quality.

Another detail is that LP usually works on homophilous graphs, i.e. graphs where nodes with the same labels have higher chance of being connected. While this assumption is reasonable, not all graphs have this type of connectivity, for example the mail that goes from a person to a post office to aggregator to the recipient may connect nodes of different classes together. Petar Veličković talks more in detail about this.

I must add that it's not the first time we see that existing graph datasets can be outperformed by simple models. A year ago there were many works showing that MLP works better than GNN on many graph classification datasets (e.g. this paper). MLP don't work on OGB datasets really well, but MLP + LP does. Hopefully it will lead to more graph datasets and subsequently to more insights about which tools are the best for graph prediction problems.
Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings

This is a guest post by Christopher Morris about their recent work accepted to NeurIPS 2020 that deals with higher-order WL algorithms.

Motivation: Since the power of GNNs is upper-bounded by the 1-dimensional Weisfeiler-Leman algorithm (WL) (Xu et al. 2019, Morris et al. 2019), it is natural to design GNNs based on insights from the k-dimensional WL (k-WL), which is a strictly more powerful heuristic for the graph isomorphism heuristic. Instead of computing colors or features for single vertices, the k-WL gets more powerful by computing colors for k-tuples, defined over the vertex set, and defines a suitable adjacency notion between them to do a message-passing style update. Hence, it accounts for the higher-order interactions between vertices. However, it does not scale and may suffer from overfitting when used in a machine learning setting. Hence, it remains an important open problem to design WL-based graph learning methods simultaneously expressive, scalable, and non-overfitting.

Methodological Contribution: In our paper, we propose local variants of the k-WL and corresponding neural architectures, which consider a subset of the original neighborhood, making them more scalable, and less prone to overfitting. Surprisingly, the expressive power of (one of)
our algorithms is strictly higher than the original algorithm in terms of the ability to distinguish non-isomorphic graphs. We then lift our results to the neural setting and connect our finding to recent learning theoretic results for GNNs (Garg et al., 2020), showing that our architectures offer better generalization errors.

Empirical results: Our experimental study confirms that the local
algorithms, both kernel and neural architectures lead to vastly reduced computation times and prevent overfitting. The kernel version establishes a new state-of-the-art for graph classification on a wide range of benchmark datasets. In contrast, the neural version shows promising performance on large-scale molecular regression tasks.

Future Challenges: While our new sparse architecture leads to a boost in expressive power over standard GNNs and is less prone to overfitting than dense architectures, it still does not scale to truly large-scale. The main reason for this is the exponential dependence on k, i.e., the algorithm still considers all n**k tuples. Hence, designing scalable (higher-order) GNNs that can provably capture graph structure is an important future goal.

In general, we believe that moving away from the restrictive graph isomorphism objective and deriving a deeper
understanding of our architecture, when optimized with stochastic gradient descent, are important futures goals.
Knowledge Graphs in NLP @ EMNLP 2020

A new digest from Michael Galkin on the applications of knowledge graphs in NLP at the last EMNLP conference. Much bigger models (6.5B parameters), more languages (100 languages for entity linking), more complex tasks (data to text).
How node features affect performance of GNN?

This is an open question that I recently thought a bit. In particular, what surprised me are the results from a recent paper on Label Propagation on a particular dataset Rice31 (table below).

You can see that some models achieve 80% accuracy, while others 10% (random guess). In the paper they say that the node features are heterogeneous features such as gender or major, but after speaking with authors it seems they use spectral embeddings instead.

I have tried this dataset with GNN and my results are close to random guess (10%). I tried several variations of GNN as well as node features, but didn't get much higher than 15%. Then I tried GBDT with spectral embeddings and it gave me about 50% accuracy. I haven't tried LP yet on this dataset, but it would be remarkable to see that LP with spectral embeddings can have such a drastic difference with GNN.

This and other experiments led me to think that the paradigm of message passing is too strong, i.e. aggregating information simultaneously among your neighbors may not be a good idea in general. The inductive bias that such model has could be wrong for a particular graph dataset. GNN work on some graph datasets, but how node labels depend on the graph structure is very similar to how message-passing works. In other words, if you were to create a dataset, where a node label equals to an average label of your neighbors, then GNN that does average aggregation would easily learn such dependency. But if your node labels depend on the structure in some counter-intuitive way (for example, by picking a neighbor at random and then assigning its node label), then your GNN with average aggregation would fail. In other words, GNN models don't have to follow message-passing paradigm, they can have very different design principles and that's something that I think we will see in the coming years.