Graph Machine Learning
6.7K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
On Cora dataset

Cora, Citeseer, and Pubmed are three popular data sets for node classification. It's one of those cases where you can clearly see the power of GNN. For example, on Cora GNNs have around 80% accuracy, while GBDT/MLP have only around 60%. This is not often the case: for many data sets I can see marginal win for GNN compared to non-graph methods and for some data sets it's actually lower.

So why the performance of GNN is so great on this data set? I don't have a good answer for this, but here are some thoughts. Cora is a citation network, where nodes are papers and classes are papers' field. However, it's not clear what are the links between this documents. The original paper didn't describe how exactly links are established. If links were based on citation, i.e. two papers are connected if they have a citation from one to another, then it could explain such big improvement of GNN: GNN explore all nodes during training, while MLP only training nodes and since two papers likely to share the same field, GNN leverage this graph information. If that's the case simple k-nn majority vote baseline would be performing similar to GNN. However, there is an opinion from people who know the authors of the original paper saying that the links are established based on word similarity between documents. If that's true, I'm not sure why GNN is doing so well for this data set. In all cases, establishing the graphs from real-world data is something that requires a lot of attention and visibility, that's why structure learning is such an active topic.
Graph ML at Data Fest 2020

Day 1 was a pleasant surprise: people with different background came, watched videos, and asked questions. Here are 5 videos of day 1:

1. Opening remarks: Graph Machine Learning, Sergey Ivanov, Criteo, France (where I broadly talk about what is GML, what are the best resources, what's the community, etc.);

2. Graph-Based Nearest Neighbor Search: Practice and Theory, Liudmila Prokhorenkova, Yandex, Russia (where she spoke about her k-NN on graphs, HNSW, theory and her ICML 20 work);

3. Graphical Models for Tensor Networks and Machine Learning, Roman Schutski, Skoltech, Russia (where he spoke about graphical models, treewidth, tensor decomposition);

4. Unsupervised Graph Representations, Anton Tsistulin, University of Bonn & Google, Germany (where he spoke about all popular node embeddings methods and what their pros and cons);

5. Placing Knowledge Graphs in Graph ML, Michael Galkin, TU Dresden, Germany (it's all you need to know about knowledge graphs if you don't know what they are).


On day 2, tomorrow, we will have 5 more videos, which would be about applications of graphs.

Please, join us tomorrow at https://spatial.chat/s/ods at 12pm (Moscow time).
Graph ML at Data Fest 2020

Day 2 continued to surprise me as many people have joined on Sunday to listen to our talks. Especially interesting it was to see English-speaking participants who were not humble to ask questions and be present among so many Russian speakers. I see this English activity as a promising step in making ODS community truly global.

Here is the second portion of videos, more related to applications of graphs.

1. Large Graph Visualization Tools and Approaches Sviatoslav Kovalev, Samokat, Russia

2. Business Transformation as Graph Problems Vadim Safronov, Key Points, Portugal

3. AutoGraph: Graphs Meet AutoML Denis Vorotinsev, Oura, Finland

4. Scene Graph Generation from Images Boris Knyazev, University of Guelph & Vector Institute, Canada

5. Link Prediction with Graph Neural Networks Maxim Panov, Skoltech, Russia


My gratitude to all the speakers!

Until next time!
GNN course at UPenn

In addition to cs224w at Stanford and COMP 766 at McGill (both should happen next semester), there is a good-looking currently ongoing course on Graph Neural Networks at University of Pennsylvania by Alejandro Ribeiro, who worked on graph ML and graph signal processing. This is a third week and there are already videos and assignments about graph convolutional filters, empirical risk minimization, and introduction to the field.
17th Workshop on Algorithms and Models for the Web Graph

There is a pretty interesting workshop on graph theory and its application web graph. There are 5 talks each day, from 21 (today) to 24 Sept. The conference will be held online.
3DGV Seminar: Michael Bronstein

There is a good ongoing seminar on 3D geometry and vision. Last seminar was presented by Michael Bronstein who was talking about inductive biases, timeline of GNN architectures, and several successful applications. Quite insightful.
Message Passing for Hyper-Relational Knowledge Graphs

This is a guest post by Michael Galkin about their recently accepted paper at EMNLP.

Traditionally, knowledge graphs (KGs) use triples to encode their facts, eg
subject, predicate, object
Albert Einstein, educated at, ETH Zurich
Simple and straighforward, triple-based KG are extensively used in a plethora of NLP and CV tasks. But can triples effectively encode richer facts when we need them?
If we have the two facts:
Albert Einstein, educated at, ETH Zurich
Albert Einstein, educated at, University of Zurich
what can we say about Einstein's education? Did he attend two universities at the same time? 🀨

It is a common problem of triple-based KGs when we want to assign more attributes to each typed edge. Luckily, the KG community has two good ways to do that: with RDF* and Labeled Property Graphs (LPGs). With RDF* we could instantiate each fact with qualifiers:

( Albert_Einstein educated_at ETH_Zurich )
academic_degree Bachelor ;
academic_major Maths .
( Albert_Einstein educated_at University_of_Zurich )
academic_degree Doctorate ;
academic_major Physics.

We call such KGs as hyper-relational KGs. Wikidata follows the same model, here is Einstein's page where you'd find statements (hyper-relational facts) with qualifiers (those additional key-value edge attributes).

Interestingly, there is pretty much nothing πŸ•³ in the Graph ML field for hyper-relational graphs. We have a bunch of GNN encoders for directed, multi-relational, triple-based KGs (like R-GCN or CompGCN), and nothing for hyper-relational ones.

In our new paper, we design StarE ⭐️, a GNN encoder for hyper-relational KGs (like RDF* or LPG) where each edge might have unlimited amount of qualifier pairs (relation, entity). Moreover, those entities and relations do not need to be qualifier-specific, they can be used in the main triples as well!

In addition, we carefully constructed WD50K, a new Wikidata-based dataset for link predicion on hyper-relational KGs, and its 3 decendants for various setups. Experiments show that qualifiers greatly improve subject/object prediction accuracy, sometimes reaching a whopping 25 MRR points gap. More applications and tasks are to appear in the future work!

Paper: https://arxiv.org/abs/2009.10847
Blog: Medium friends link
Code: Github
Graph Machine Learning research groups: Alejandro Ribeiro

I
do a series of posts on the groups in graph research, previous post is here. The 15th is Alejandro Ribeiro, head of Alelab at UPenn and the leading author of the ongoing GNN course.


Alejandro Ribeiro (1975)
- Affiliation: University of Pennsylvania
- Education: Ph.D. in University of Minnesota in 2006 (advisor: Georgios B. Giannakis)
- h-index 51
- Awards: Hugo Schuck best paper award, paper awards at CDC, ACC, ICASSP, Lindback award, NSF award
- Interests: wireless autonomous networks, machine learning on network data, distributed collaborative learning
NeurIPS 2020 stats

Dates: Dec 6 - 12
Where: Online
Price: $25/$100 (students/non-students)

β€’ 9454 submissions (vs 6743 in 2019)
β€’ 1900 accepted (vs 1428 in 2019)
β€’ 20.1% acceptance rate (vs 21% in 2019)
β€’ 123 graph papers (6.5% of total)
SE(3)-Transformers

A blog post about a recent paper (NeurIPS 2020) that introduces group theory to set functions. It seems like it performs on par with state-of-the-art methods for classification and regression, but at least is provably equivariant.
The next big thing: the use of graph neural networks to discover particles

It's great to see that GNNs can be useful for fundamental applications such as new particles discovery. In another post by Fermilab, US-based physics lab, researchers discuss that they are able to move GNNs to production for Large Hadron Collider (LHC) at CERN. The goal is to process millions of images and select those that could be relevant to discovery of new particles. They expect to see the results in LHC's Run 3 in 2021. ArXiv preprint is available online.
ICLR 2021 Graph Papers

Last Friday submissions to ICLR 2021 became available for reading. There are 3013 submissions, about 210 graph papers (7% of total). About every third paper came from rejection of NeurIPS (which is based on overlap of paper submissions), which surprised me not just on sheer volume, but also because I'm puzzled where the remaining 6000 rejected papers are resubmitted to.

I extracted graph papers, which are attached, and categorized them loosely in 4 topics: model, theory, application, and survey. Most of the papers (171) are about new models (general GNNs, graph models for new problems, improvements over existing models). 22 papers are novel applications in physics, chemistry, biology, etc. 13 are theoretical papers, and 4 are surveys/evaluation benchmarks.