Graph ML News (May 27th): New Antibiotic found with Geometric DL, Differential Privacy, NeurIPS Submissions
A new antibiotic abaucin is discovered by the power of Geometric Deep Learning! Abaucin targets a stubborn Acinetobacter baumannii pathogen resistant to many drugs. The new Nature Chem Bio paper (feat. Regina Barzilay and Tommi Jaakkola from MIT) sheds more light on the screening process and used methods.
Stanford launches the online version of the flagship CS224W course of Graph ML. The 10-credit course is priced at $1,750 and starts on June 5th.
The TAG in ML workshop on topology announced a new challenge: implementing more topology-enabled neural nets with the TopoModelX framework where top contributors will become co-authors of a JMLR submission. That’s a great option for those who’d like to start working with topological neural architectures!
Vincent Cohen-Addad and Alessandro Epasto of Google Research published a post on differentiably-private clustering: introducing an approach for DP hierarchical clustering with formal guarantees and lower bounds, and an approach for large-scale DP clustering.
The Weekend Reading section this week is brought to you by NeurIPS submissions, quite a number of cool papers:
Link Prediction for Flow-Driven Spatial Networks - the work introduces the Graph Attentive Vectors (GAV) framework for link prediction (based on the labeling trick commonly used in LP) and smashes the OGB-Vessel leaderboard with a 10-points rocauc margin to the previous SOTA.
Edge Directionality Improves Learning on Heterophilic Graphs feat. Emanuele Rossi, Francesco Di Giovanni, Fabrizio Frasca, Michael Bronstein, and Stephan Günnemann
PRODIGY: Enabling In-context Learning Over Graphs feat. Qian Huang, Hongyu Ren, Percy Liang, and Jure Leskovec - a cool attempt to bring prompting to the permutation-invariant nature of graphs.
Uncertainty Quantification over Graph with Conformalized Graph Neural Networks feat. Kexin Huang and Jure Leskovec — one of the first works on Conformal Prediction with GNNs.
Learning Large Graph Property Prediction via Graph Segment Training feat. Jure Leskovec and Bryan Perozzi
ChatDrug - a neat attempt at combining ChatGPT with retrieval plugins and molecular models to edit molecules, peptides, and proteins right with natural language. Extension of MoleculeSTM that we featured in the recent State of Affairs post.
MISATO - Machine learning dataset for structure-based drug discovery - a new dataset of 20K protein-ligand complexes with molecular dynamics traces and electronic properties.
Multi-State RNA Design with Geometric Multi-Graph Neural Networks feat. Chaitanya Joshi and Pietro Lio
A new antibiotic abaucin is discovered by the power of Geometric Deep Learning! Abaucin targets a stubborn Acinetobacter baumannii pathogen resistant to many drugs. The new Nature Chem Bio paper (feat. Regina Barzilay and Tommi Jaakkola from MIT) sheds more light on the screening process and used methods.
Stanford launches the online version of the flagship CS224W course of Graph ML. The 10-credit course is priced at $1,750 and starts on June 5th.
The TAG in ML workshop on topology announced a new challenge: implementing more topology-enabled neural nets with the TopoModelX framework where top contributors will become co-authors of a JMLR submission. That’s a great option for those who’d like to start working with topological neural architectures!
Vincent Cohen-Addad and Alessandro Epasto of Google Research published a post on differentiably-private clustering: introducing an approach for DP hierarchical clustering with formal guarantees and lower bounds, and an approach for large-scale DP clustering.
The Weekend Reading section this week is brought to you by NeurIPS submissions, quite a number of cool papers:
Link Prediction for Flow-Driven Spatial Networks - the work introduces the Graph Attentive Vectors (GAV) framework for link prediction (based on the labeling trick commonly used in LP) and smashes the OGB-Vessel leaderboard with a 10-points rocauc margin to the previous SOTA.
Edge Directionality Improves Learning on Heterophilic Graphs feat. Emanuele Rossi, Francesco Di Giovanni, Fabrizio Frasca, Michael Bronstein, and Stephan Günnemann
PRODIGY: Enabling In-context Learning Over Graphs feat. Qian Huang, Hongyu Ren, Percy Liang, and Jure Leskovec - a cool attempt to bring prompting to the permutation-invariant nature of graphs.
Uncertainty Quantification over Graph with Conformalized Graph Neural Networks feat. Kexin Huang and Jure Leskovec — one of the first works on Conformal Prediction with GNNs.
Learning Large Graph Property Prediction via Graph Segment Training feat. Jure Leskovec and Bryan Perozzi
ChatDrug - a neat attempt at combining ChatGPT with retrieval plugins and molecular models to edit molecules, peptides, and proteins right with natural language. Extension of MoleculeSTM that we featured in the recent State of Affairs post.
MISATO - Machine learning dataset for structure-based drug discovery - a new dataset of 20K protein-ligand complexes with molecular dynamics traces and electronic properties.
Multi-State RNA Design with Geometric Multi-Graph Neural Networks feat. Chaitanya Joshi and Pietro Lio
Nature
Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii
Nature Chemical Biology - Using a neural network trained on bacterial growth inhibition data, in silico prediction of molecules with activity against Acinetobacter baumannii led to the...
Graph ML News (June 3rd)
Molecular ML conference (MoML) took place in Montreal on Monday and hosted invited talks from Marinka Zitnik, Mohammed AlQuraishi, Gábor Csányi, and other top researchers in the field. The recordings are now online on YouTube.
Accepted papers at ICML 2023 are now visible on the conference website (available after registration) - several editors of this channel are going to ICML this year and it is likely we will prepare an overview of the most interesting graph papers!
Weekend reading:
Geometric Latent Diffusion Models for 3D Molecule Generation (ICML’23) feat. Minkai Xu, Stefano Ermon, and Jure Leskovec
Protein Design with Guided Discrete Diffusion feat. Kyunghyun Cho and Andrew Wilson
Graph Inductive Biases in Transformers without Message Passing (ICML’23) feat. Derek Lim
Unsupervised Embedding Quality Evaluation feat. Anton Tsitsulin
Smooth, exact rotational symmetrization for deep learning on point clouds
Molecular ML conference (MoML) took place in Montreal on Monday and hosted invited talks from Marinka Zitnik, Mohammed AlQuraishi, Gábor Csányi, and other top researchers in the field. The recordings are now online on YouTube.
Accepted papers at ICML 2023 are now visible on the conference website (available after registration) - several editors of this channel are going to ICML this year and it is likely we will prepare an overview of the most interesting graph papers!
Weekend reading:
Geometric Latent Diffusion Models for 3D Molecule Generation (ICML’23) feat. Minkai Xu, Stefano Ermon, and Jure Leskovec
Protein Design with Guided Discrete Diffusion feat. Kyunghyun Cho and Andrew Wilson
Graph Inductive Biases in Transformers without Message Passing (ICML’23) feat. Derek Lim
Unsupervised Embedding Quality Evaluation feat. Anton Tsitsulin
Smooth, exact rotational symmetrization for deep learning on point clouds
GraphML News (June 10th)
Emanuele Rossi and Michael Bronstein published a new blog post on Directed GNNs. The idea is rather simple (different learnable aggregations for in- and out- neighbors) and shows very good results on heterophilic graphs. In fact, DirGNNs very much resemble relational GNNs (R-GCN, CompGCN, NBFNet, and so on) who have learnable aggregations per unique relation type (and its inverse), and we recently published a theory paper on the expressiveness of such GNNs. Great to see prominent GNN folks joining the relational party 😉
The Learning on Graphs Conference (LoG) is not just one of the coolest venue for Graph ML research - the organizers do care about the community and factor in your feedback. Bastian Rieck and Corinna Coupette summarized the results from the anonymous poll among authors and reviewers in Evaluating the "Learning on Graphs" Conference Experience highlighting what worked (eg, monetary awards for reviewers and lesser paper load) and not quite (some papers were rejected even when authors submitted rebuttals but reviewers did not engage). Let’s help LoG to grow to the best Graph ML research venues!
FAIR, CMU, and The Open Catalyst Project are about to announce the next large NeurIPS challenge - most likely it would be about adsorption energy estimation. Brace yourselves and prepare your best equivariant geometrics models.
Da Zheng and Florian Saupe from AWS published a post introducing GraphStorm (we noticed the original paper a few weeks ago), a low-code framework for large-scale graph learning targeted for enterprise applications. The post goes through several examples on graph building and running inference.
Some weekend reading:
Ewald-based Long-Range Message Passing for Molecular Graphs (ICML’23) and its LOG2 reading group presentation
Validation of de novo designed water-soluble and transmembrane proteins by in silico folding and melting - comparison of AlphaFold2 vs ESMFold
How does over-squashing affect the power of GNNs? feat. Francesco Di Giovanni, Michael Bronstein, and Petar Veličković
A Fractional Graph Laplacian Approach to Oversmoothing feat. Gitta Kutyniok
Emanuele Rossi and Michael Bronstein published a new blog post on Directed GNNs. The idea is rather simple (different learnable aggregations for in- and out- neighbors) and shows very good results on heterophilic graphs. In fact, DirGNNs very much resemble relational GNNs (R-GCN, CompGCN, NBFNet, and so on) who have learnable aggregations per unique relation type (and its inverse), and we recently published a theory paper on the expressiveness of such GNNs. Great to see prominent GNN folks joining the relational party 😉
The Learning on Graphs Conference (LoG) is not just one of the coolest venue for Graph ML research - the organizers do care about the community and factor in your feedback. Bastian Rieck and Corinna Coupette summarized the results from the anonymous poll among authors and reviewers in Evaluating the "Learning on Graphs" Conference Experience highlighting what worked (eg, monetary awards for reviewers and lesser paper load) and not quite (some papers were rejected even when authors submitted rebuttals but reviewers did not engage). Let’s help LoG to grow to the best Graph ML research venues!
FAIR, CMU, and The Open Catalyst Project are about to announce the next large NeurIPS challenge - most likely it would be about adsorption energy estimation. Brace yourselves and prepare your best equivariant geometrics models.
Da Zheng and Florian Saupe from AWS published a post introducing GraphStorm (we noticed the original paper a few weeks ago), a low-code framework for large-scale graph learning targeted for enterprise applications. The post goes through several examples on graph building and running inference.
Some weekend reading:
Ewald-based Long-Range Message Passing for Molecular Graphs (ICML’23) and its LOG2 reading group presentation
Validation of de novo designed water-soluble and transmembrane proteins by in silico folding and melting - comparison of AlphaFold2 vs ESMFold
How does over-squashing affect the power of GNNs? feat. Francesco Di Giovanni, Michael Bronstein, and Petar Veličković
A Fractional Graph Laplacian Approach to Oversmoothing feat. Gitta Kutyniok
GraphML News (June 17th) -- Distributional Graphormer
It seems researchers took a break after NeurIPS deadlines (or braced themselves with the 6-paper reviewing batches) and there hasn’t been that much news lately.
Microsoft Research announced Distributional Graphormer, a massive generative model based on Graphormer suitable for many AI 4 Science tasks such as protein ligand binding, conformation transition pathway prediction, and even conditional crystal lattice generation (eg, generate a structure with a given band gap). Quoting the authors, “DiG attempts to predict the complicated equilibrium distribution of a given system by gradually transforming a simple distribution (e.g., a standard Gaussian) through the simulation of a predicted diffusion process that leads towards the equilibrium distribution.” The accompanying 80-pager preprint is a nice weekend reading 😉
Apart from that, the Learning on Graphs meetup took place in Paris with exciting keynotes, and Michael Bronstein received a prestigious UKRI Turing AI Fellowship (only two were given this year) to work on Graph ML algorithms inspired by physical systems.
More new papers:
Topological Singularity Detection At Multiple Scales (ICML’23) by Julius von Rohrscheidt and Bastian Rieck. Check out a nice Twitter thread by Bastian for a visual explanation.
Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds (ICML’23) by Yeqing Lin, Mohammed AlQuraishi. Genie is 10x smaller than RFDiffusion but gets quite close in terms of generative performance! Have a look at the thread by Mohammed with fancy generated gifs.
Rigid Body Flows for Sampling Molecular Crystal Structures feat. Pim de Haan and Frank Noé
Enabling tabular deep learning when d≫n with an auxiliary knowledge graph feat. Hongyu Ren and Jure Leskovec
It seems researchers took a break after NeurIPS deadlines (or braced themselves with the 6-paper reviewing batches) and there hasn’t been that much news lately.
Microsoft Research announced Distributional Graphormer, a massive generative model based on Graphormer suitable for many AI 4 Science tasks such as protein ligand binding, conformation transition pathway prediction, and even conditional crystal lattice generation (eg, generate a structure with a given band gap). Quoting the authors, “DiG attempts to predict the complicated equilibrium distribution of a given system by gradually transforming a simple distribution (e.g., a standard Gaussian) through the simulation of a predicted diffusion process that leads towards the equilibrium distribution.” The accompanying 80-pager preprint is a nice weekend reading 😉
Apart from that, the Learning on Graphs meetup took place in Paris with exciting keynotes, and Michael Bronstein received a prestigious UKRI Turing AI Fellowship (only two were given this year) to work on Graph ML algorithms inspired by physical systems.
More new papers:
Topological Singularity Detection At Multiple Scales (ICML’23) by Julius von Rohrscheidt and Bastian Rieck. Check out a nice Twitter thread by Bastian for a visual explanation.
Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds (ICML’23) by Yeqing Lin, Mohammed AlQuraishi. Genie is 10x smaller than RFDiffusion but gets quite close in terms of generative performance! Have a look at the thread by Mohammed with fancy generated gifs.
Rigid Body Flows for Sampling Molecular Crystal Structures feat. Pim de Haan and Frank Noé
Enabling tabular deep learning when d≫n with an auxiliary knowledge graph feat. Hongyu Ren and Jure Leskovec
On the Connection Between MPNN and Graph Transformer
Chen Cai, Truong Son Hy, Rose Yu, Yusu Wang from UCSD
Invited post by Chen Cai
Our work aims to understand the relationship between local GNN (MPNN) and global GNN (Graph Transformer). Here local GNN refers to MPNN that mix node features locally and global GNN refers to the graph transformer, which incorporates graph topology into positional embeddings (along with existing node/edge features) and send a set of feature vectors into a vanallia transformer.
1️⃣ Local MPNN and global Graph Transformer are two major paradigms of graph learning. MPNN comes early and encompasses several popular architectures such as GCN, GraphSage, and GAT. However, they are known to suffer from issues like limited expressive power, over-smoothing, and over-squashing. Graph Transformer, on the other hand, received a lot of attention recently and shows competitive performance on standard benchmarks like OGB and tasks that require long-ranging reasoning. Our goal is to gain a fine-grained understanding of the relationship between such two paradigms.
2️⃣ Previously, 1 showed that with specific positional embedding, GT can approximate 2-IGN Invariant Graph Networks, which is at least as expressive as MPNN. This work establishes an important link from global GNN to local GNN.
3️⃣ In our work, we tried to establish the link from the inverse direction: can MPNN approximate GT as well? We looked into a very simple way to incorporate global modeling into the local mixing of MPNN: virtual node (VN). MPNN + VN adds a virtual node and connects VN to all the graph nodes and then does (heterogenous) message passing on the modified graph. It is a heuristic proposed in the early days of GNN research and shows consistent improvement over MPNN. However, there is little theoretical understanding of MPNN + VN.
4️⃣ We have a set of approximation results of MPNN + VN on GT. 1) for constant depth and width MPNN + VN (O(1) depth and width), it can approximate self-attention layers of two “linear transformers”, Performer & Linear Transformer. 2) for wide MPNN + VN (O(1) depth O(n^d) width), we show it can approximate full GT via a link to equivariant Deepsets. 3) For deep MPNN + VN (O(n) depth O(1) width), it can approximate one self-attention layer.
5️⃣ On the experimental side, we pushed the limit of simple MPNN + VN and showed that 1) it works surprisingly well on LRGB (long-range graph benchmark) dataset where previously GT dominates. 2) leveraging the GraphGPS framework, our MPNN + VN improves over the previous implementation 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task. See our paper and code for more details!
Pure Transformers are Powerful Graph Learners
Chen Cai, Truong Son Hy, Rose Yu, Yusu Wang from UCSD
Invited post by Chen Cai
Our work aims to understand the relationship between local GNN (MPNN) and global GNN (Graph Transformer). Here local GNN refers to MPNN that mix node features locally and global GNN refers to the graph transformer, which incorporates graph topology into positional embeddings (along with existing node/edge features) and send a set of feature vectors into a vanallia transformer.
1️⃣ Local MPNN and global Graph Transformer are two major paradigms of graph learning. MPNN comes early and encompasses several popular architectures such as GCN, GraphSage, and GAT. However, they are known to suffer from issues like limited expressive power, over-smoothing, and over-squashing. Graph Transformer, on the other hand, received a lot of attention recently and shows competitive performance on standard benchmarks like OGB and tasks that require long-ranging reasoning. Our goal is to gain a fine-grained understanding of the relationship between such two paradigms.
2️⃣ Previously, 1 showed that with specific positional embedding, GT can approximate 2-IGN Invariant Graph Networks, which is at least as expressive as MPNN. This work establishes an important link from global GNN to local GNN.
3️⃣ In our work, we tried to establish the link from the inverse direction: can MPNN approximate GT as well? We looked into a very simple way to incorporate global modeling into the local mixing of MPNN: virtual node (VN). MPNN + VN adds a virtual node and connects VN to all the graph nodes and then does (heterogenous) message passing on the modified graph. It is a heuristic proposed in the early days of GNN research and shows consistent improvement over MPNN. However, there is little theoretical understanding of MPNN + VN.
4️⃣ We have a set of approximation results of MPNN + VN on GT. 1) for constant depth and width MPNN + VN (O(1) depth and width), it can approximate self-attention layers of two “linear transformers”, Performer & Linear Transformer. 2) for wide MPNN + VN (O(1) depth O(n^d) width), we show it can approximate full GT via a link to equivariant Deepsets. 3) For deep MPNN + VN (O(n) depth O(1) width), it can approximate one self-attention layer.
5️⃣ On the experimental side, we pushed the limit of simple MPNN + VN and showed that 1) it works surprisingly well on LRGB (long-range graph benchmark) dataset where previously GT dominates. 2) leveraging the GraphGPS framework, our MPNN + VN improves over the previous implementation 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task. See our paper and code for more details!
Pure Transformers are Powerful Graph Learners
GitHub
GitHub - Chen-Cai-OSU/MPNN-GT-Connection
Contribute to Chen-Cai-OSU/MPNN-GT-Connection development by creating an account on GitHub.
GraphML News (June 24th)
A plenty of news, finally.
There is going to be a GraphML community meetup at ICML’23 similar to that at ICML’22 and NeurIPS’22, feel free to drop by if you are at the conference. More details are to follow in the LoG2 slack.
Michael Bronstein, Francesco Di Giovanni and Ben Gutteridge wrote a new blogpost on Dynamically Rewired Delayed Message Passing - an approach to address over-squashing and improve long-range capabilities of GNNs. vDREW is based on the idea of sparse skip connections where a node can have a direct access to its k-hop neighbors but delayed in time. Good to see our long-range LGRB dataset gaining more traction in the community 🙂
New PhD thesis: Cristian Bodnar, the author of message passing architectures using simplicial complexes, cellular complexes, and neural sheaves, published his work on Topological Deep Learning: Graphs, Complexes, Sheaves. This is an excellent introduction to topology and topological DL, highly recommended.
LLMs can instruct robots to create chemical compounds: Andrew White presented a nice demo of ChemCrow - an agent that goes from natural language instructions to a sequence of real robotic actions to synthesize something. The authors synthesized 3 organocatalysts and even an insect repellent.
Weekend reading:
Equiformer V2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations feat. Tess Smidt - the next and faster version of the famous and quite popular Equiformer (ICLR’23).
Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials feat. Shengchao Liu, Anima Anandkumar, and Jian Tang - introduces Geom3D, a massive suite of datasets and 2D/3D geometric models that work on molecules, proteins, and even crystals. The repo offers 10 datasets, 35 models, and 14 pre-training methods.
QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules feat. Shuiwang Ji - QM9 with Hamiltonian matrices for 2.4K MD trajectories and 130K molecular geometries.
Hyperbolic Representation Learning: Revisiting and Advancing feat. Rex Ying - a nice overview of hyperbolic GNNs in general and a new model, Hyperbolic Informed Embedding, in particular.
A plenty of news, finally.
There is going to be a GraphML community meetup at ICML’23 similar to that at ICML’22 and NeurIPS’22, feel free to drop by if you are at the conference. More details are to follow in the LoG2 slack.
Michael Bronstein, Francesco Di Giovanni and Ben Gutteridge wrote a new blogpost on Dynamically Rewired Delayed Message Passing - an approach to address over-squashing and improve long-range capabilities of GNNs. vDREW is based on the idea of sparse skip connections where a node can have a direct access to its k-hop neighbors but delayed in time. Good to see our long-range LGRB dataset gaining more traction in the community 🙂
New PhD thesis: Cristian Bodnar, the author of message passing architectures using simplicial complexes, cellular complexes, and neural sheaves, published his work on Topological Deep Learning: Graphs, Complexes, Sheaves. This is an excellent introduction to topology and topological DL, highly recommended.
LLMs can instruct robots to create chemical compounds: Andrew White presented a nice demo of ChemCrow - an agent that goes from natural language instructions to a sequence of real robotic actions to synthesize something. The authors synthesized 3 organocatalysts and even an insect repellent.
Weekend reading:
Equiformer V2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations feat. Tess Smidt - the next and faster version of the famous and quite popular Equiformer (ICLR’23).
Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials feat. Shengchao Liu, Anima Anandkumar, and Jian Tang - introduces Geom3D, a massive suite of datasets and 2D/3D geometric models that work on molecules, proteins, and even crystals. The repo offers 10 datasets, 35 models, and 14 pre-training methods.
QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules feat. Shuiwang Ji - QM9 with Hamiltonian matrices for 2.4K MD trajectories and 130K molecular geometries.
Hyperbolic Representation Learning: Revisiting and Advancing feat. Rex Ying - a nice overview of hyperbolic GNNs in general and a new model, Hyperbolic Informed Embedding, in particular.
Medium
Dynamically rewired delayed message passing GNNs
Dynamic rewiring and delayed message passing mechanisms offer a tradeoff between standard MPNNs and graph Transformers
GraphML News (July 1st)
⚛️ 3rd Open Catalyst challenge has been announced! This year the task is to predict the global minimum binding energy (adsorption energy) given an adsorbate and catalyst surface. The main dataset includes the new OC20-Dense split with roughly 15K initial structures and 3M frames. Baselines are GemNet-OC and Equivariant Spheric Channel Network (eSCN). Results will be announced at NeurIPS’23.
🎙️ The biennial Sampling Theory and Applications Conference (SAMPTA) 2023 will soon take place at Yale in July 10-14. This year will feature invited talks by Soledad Villar, Gitta Kutyniok, Michael Bronstein, Dan Spielman, and other prominent researchers. Registrations are still open!
🧬 Folks in comp bio might want to refresh the background on state space models (SSMs) - HyenaDNA, a collab between Stanford, Harvard, and Mila, is a DNA model with a whopping context length of up to 1M tokens of single nucleotides. HyenaDNA scales to lengths unattainable even by linear Transformers and shows SOTA of 23 genomic tasks. Pre-trained checkpoints are already on HuggingFace. The authors hint upon further generative applications, so we’ll keep an eye on that.
A few words about hypergraphs: the anonymous author put on a tutorial on the basics of hypergraphs and building hypergraph GNNs. And a new work Hypergraph factorisation for multi-tissue gene expression imputation shows how to use message passing hypergraph NNs for processing gene expression in comp bio applications.
⚛️ 3rd Open Catalyst challenge has been announced! This year the task is to predict the global minimum binding energy (adsorption energy) given an adsorbate and catalyst surface. The main dataset includes the new OC20-Dense split with roughly 15K initial structures and 3M frames. Baselines are GemNet-OC and Equivariant Spheric Channel Network (eSCN). Results will be announced at NeurIPS’23.
🎙️ The biennial Sampling Theory and Applications Conference (SAMPTA) 2023 will soon take place at Yale in July 10-14. This year will feature invited talks by Soledad Villar, Gitta Kutyniok, Michael Bronstein, Dan Spielman, and other prominent researchers. Registrations are still open!
🧬 Folks in comp bio might want to refresh the background on state space models (SSMs) - HyenaDNA, a collab between Stanford, Harvard, and Mila, is a DNA model with a whopping context length of up to 1M tokens of single nucleotides. HyenaDNA scales to lengths unattainable even by linear Transformers and shows SOTA of 23 genomic tasks. Pre-trained checkpoints are already on HuggingFace. The authors hint upon further generative applications, so we’ll keep an eye on that.
A few words about hypergraphs: the anonymous author put on a tutorial on the basics of hypergraphs and building hypergraph GNNs. And a new work Hypergraph factorisation for multi-tissue gene expression imputation shows how to use message passing hypergraph NNs for processing gene expression in comp bio applications.
opencatalystproject.org
Open Catalyst Challenge
Using AI to model and discover new catalysts to address the energy challenges posed by climate change.
GraphML News (July 7th) - Generative Chemistry, Temporal Graph Benchmark
Lots of news this week!
🔬 Starting with new blog posts, Charlie Harris wrote an article on Diffusion Models in Generative Chemistry for Drug Design covering the basics of denoising diffusion and score-based generative modeling going into molecular usecases with Equivariant Diffusion, DiffSBDD, DiffDock, and raising questions about fair evaluation of generative models vs standard tools.
Looking more from the industrial perspective, Leo Wossnig published a piece Where is generative design in drug discovery today discussing successes and failures of generative approaches and highlighting the main obstacles for ML folks wishing to dive into drug discovery: (1) data scarcity (eg, no data for new targets), (2) slow experimental pipelines to generate new data, (3) end-to-end pipelines and tech stack in general.
🔧 Graphium is the new library for molecular representation learning in the Datamol ecosystem. Graphium is packed with latest algorithms (like Random Walk Structural Encodings) and ML models (like recent GPS++, the winner of OGB LSC’22), and scales to large compute, you can even spin up training on Graphcore IPUs.
📏 Temporal Graph Benchmark (TGB) is finally here! It was long awaited in the graph learning community that OGB needs a temporal branch, and TGB delivers dynamic link prediction and node property prediction datasets, standard loaders, evaluators, and, of course, the leaderboards (good old Temporal Graph Network is still a very strong baseline). Similarly to OGB, there are small, medium, and large graphs (the largest include about 1M nodes, 50M edges, 30M timesteps). More details can be found in the preprint by Shenyang Huang, Farimah Poursafaei, and the OGB gang.
AStarNet, the scalable GNN for KG reasoning, can now be integrated right with ChatGPT to enhance its factual correctness. Given a textual query, AStarNet also runs graph inference on the backbone graph (Wikidata subset) and produces top reasoning paths supporting the answer.
New foundation models: 1️⃣ NSQL, a Copilot-like LM for SQL queries by Numbers Station, is openly available in 350M / 2B / 6B versions outperforming all existing open-source SQL models; 2️⃣ xTrimo, 100B closed-source protein LM
Lots of news this week!
🔬 Starting with new blog posts, Charlie Harris wrote an article on Diffusion Models in Generative Chemistry for Drug Design covering the basics of denoising diffusion and score-based generative modeling going into molecular usecases with Equivariant Diffusion, DiffSBDD, DiffDock, and raising questions about fair evaluation of generative models vs standard tools.
Looking more from the industrial perspective, Leo Wossnig published a piece Where is generative design in drug discovery today discussing successes and failures of generative approaches and highlighting the main obstacles for ML folks wishing to dive into drug discovery: (1) data scarcity (eg, no data for new targets), (2) slow experimental pipelines to generate new data, (3) end-to-end pipelines and tech stack in general.
🔧 Graphium is the new library for molecular representation learning in the Datamol ecosystem. Graphium is packed with latest algorithms (like Random Walk Structural Encodings) and ML models (like recent GPS++, the winner of OGB LSC’22), and scales to large compute, you can even spin up training on Graphcore IPUs.
📏 Temporal Graph Benchmark (TGB) is finally here! It was long awaited in the graph learning community that OGB needs a temporal branch, and TGB delivers dynamic link prediction and node property prediction datasets, standard loaders, evaluators, and, of course, the leaderboards (good old Temporal Graph Network is still a very strong baseline). Similarly to OGB, there are small, medium, and large graphs (the largest include about 1M nodes, 50M edges, 30M timesteps). More details can be found in the preprint by Shenyang Huang, Farimah Poursafaei, and the OGB gang.
AStarNet, the scalable GNN for KG reasoning, can now be integrated right with ChatGPT to enhance its factual correctness. Given a textual query, AStarNet also runs graph inference on the backbone graph (Wikidata subset) and produces top reasoning paths supporting the answer.
New foundation models: 1️⃣ NSQL, a Copilot-like LM for SQL queries by Numbers Station, is openly available in 350M / 2B / 6B versions outperforming all existing open-source SQL models; 2️⃣ xTrimo, 100B closed-source protein LM
Medium
Diffusion Models in Generative Chemistry for Drug Design
“Creating noise from data is easy; creating data from noise is generative modelling.” — Yang Song
Graph ML News (July 15th) - NeurIPS workshops, M2Hub
🪛 NeurIPS’23 announced the list of accepted workshops! Graph learning is well-presented, you might want to have a look at:
- AI for Accelerated Materials Design (AI4Mat-2023)
- AI for Science: from Theory to Practice
- Machine Learning and the Physical Sciences
- Machine Learning in Structural Biology Workshop
- New Frontiers in Graph Learning (GLFrontiers)
- New Frontiers of AI for Drug Discovery and Development
- Symmetry and Geometry in Neural Representations
- Temporal Graph Learning Workshop
We will keep an eye on the submission deadlines, generally you might expect them to be somewhere around the NeurIPS accepted papers announcement.
💠 M2Hub is a fresh collection of datasets and models for materials discovery: 11 datasets spanning organic and inorganic molecules and crystals, 8 models including EGNN, Equiformer, DimeNet, and GemNet, and more experiments in the fresh preprint. Materials discovery is catching up with drug discovery!
📈 UniMol, a 3D framework for molecular representation learning, has updated the performance of UniMol+ showing strong performance on OGB PCQM4M v2 and OpenCatalyst (OC20 IS2RE). Looks like this year’s OGB Large Scale challenge and Open Catalyst challenge are going to have a heated competition, eg, having in mind a recently released EquiformerV2.
🔬Simone Scardapane (Sapienza) prepared a nice slide deck on Designing and Explaining GNNs - the second half of the deck is about current explainability methods, have a look if you work in this area.
Weekend reading:
M2Hub: Unlocking the Potential of Machine Learning for Materials Discovery
An OOD Multi-Task Perspective for Link Prediction with New Relation Types and Nodes
🪛 NeurIPS’23 announced the list of accepted workshops! Graph learning is well-presented, you might want to have a look at:
- AI for Accelerated Materials Design (AI4Mat-2023)
- AI for Science: from Theory to Practice
- Machine Learning and the Physical Sciences
- Machine Learning in Structural Biology Workshop
- New Frontiers in Graph Learning (GLFrontiers)
- New Frontiers of AI for Drug Discovery and Development
- Symmetry and Geometry in Neural Representations
- Temporal Graph Learning Workshop
We will keep an eye on the submission deadlines, generally you might expect them to be somewhere around the NeurIPS accepted papers announcement.
💠 M2Hub is a fresh collection of datasets and models for materials discovery: 11 datasets spanning organic and inorganic molecules and crystals, 8 models including EGNN, Equiformer, DimeNet, and GemNet, and more experiments in the fresh preprint. Materials discovery is catching up with drug discovery!
📈 UniMol, a 3D framework for molecular representation learning, has updated the performance of UniMol+ showing strong performance on OGB PCQM4M v2 and OpenCatalyst (OC20 IS2RE). Looks like this year’s OGB Large Scale challenge and Open Catalyst challenge are going to have a heated competition, eg, having in mind a recently released EquiformerV2.
🔬Simone Scardapane (Sapienza) prepared a nice slide deck on Designing and Explaining GNNs - the second half of the deck is about current explainability methods, have a look if you work in this area.
Weekend reading:
M2Hub: Unlocking the Potential of Machine Learning for Materials Discovery
An OOD Multi-Task Perspective for Link Prediction with New Relation Types and Nodes
Call for papers for the ICCV Workshop on Scene graphs and graph representation learning
Guest post by Azade Farshad
We invite you to submit your graph related work to the ICCV Workshop on Scene graphs and graph representation learning. The workshop will feature talks by Fei Fei Li (Stanford), Luca Carlone (MIT), Bernard Ghanem (KAUST), Nicolas Padoy (Uni Strasbourg), Emanuele Rodola (Sapienza Rome), Hanwang Zhang (NTU), and Helisa Dhamo (Huawei).
Deadlines and Important Dates:
- Paper submission deadline: July 25th, 2023 (11:59 PM Anywhere on Earth)
- Notification to Authors: August 8th, 2023
- Camera-ready Deadline: August 19th, 2023
Submission instructions are available on the SG2RL website:
https://sg2rl.github.io/
Workshop date: October 2, 2023
Guest post by Azade Farshad
We invite you to submit your graph related work to the ICCV Workshop on Scene graphs and graph representation learning. The workshop will feature talks by Fei Fei Li (Stanford), Luca Carlone (MIT), Bernard Ghanem (KAUST), Nicolas Padoy (Uni Strasbourg), Emanuele Rodola (Sapienza Rome), Hanwang Zhang (NTU), and Helisa Dhamo (Huawei).
Deadlines and Important Dates:
- Paper submission deadline: July 25th, 2023 (11:59 PM Anywhere on Earth)
- Notification to Authors: August 8th, 2023
- Camera-ready Deadline: August 19th, 2023
Submission instructions are available on the SG2RL website:
https://sg2rl.github.io/
Workshop date: October 2, 2023
Graph ML News (July 22nd) - ICML’23, AI for Science survey
ICML time! Michael will be representing the Graph ML channel in the infamous, 3-of-a-kind, limited edition t-shirt, drop him a line if you’d like to chat. Big labs started to announce their presence and accepted papers (not just graph papers though), eg, Google DeepMind, Meta AI, Amazon, Microsoft, Apple.
If you didn’t make it to ICML this year, consider a fresh selection of the weekend reading:
📚 Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems by Xuan Zhang and 60+ famous authors is a massive 260-page survey on geometric models in scientific applications spanning molecules, proteins, quantum mechanics, PDEs, and materials discovery.
Contextualizing Protein Representations Using Deep Learning on Protein Networks and Single-Cell Data by Michelle M Li et al from Marinka Zitnik’s lab at Harvard. Quote: “We introduce PINNACLE, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-PINNACLE protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, PINNACLE provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs.”
ICML time! Michael will be representing the Graph ML channel in the infamous, 3-of-a-kind, limited edition t-shirt, drop him a line if you’d like to chat. Big labs started to announce their presence and accepted papers (not just graph papers though), eg, Google DeepMind, Meta AI, Amazon, Microsoft, Apple.
If you didn’t make it to ICML this year, consider a fresh selection of the weekend reading:
📚 Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems by Xuan Zhang and 60+ famous authors is a massive 260-page survey on geometric models in scientific applications spanning molecules, proteins, quantum mechanics, PDEs, and materials discovery.
Contextualizing Protein Representations Using Deep Learning on Protein Networks and Single-Cell Data by Michelle M Li et al from Marinka Zitnik’s lab at Harvard. Quote: “We introduce PINNACLE, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-PINNACLE protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, PINNACLE provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs.”
Graph ML News (July 30th) - ICML and Open Catalyst Demo
The ICML week has finally passed with yesterdays’ workshops. Meeting the graph learning community was a blast and I am looking forward seeing you guys and gals at NeurIPS or already in Vienna at the next ICLR and ICML. The review post of most interesting graph papers at ICML is on the way 😉
Meanwhile, Meta AI and CMU released the Open Catalyst Demo - a website where you can play around with relaxations (DFT approximations) of 11.5k catalyst materials on 86 adsorbates in 100 different configurations each (making it up to 100M combinations). The demo is powered by SOTA geometric models GemNet-OC and Equiformer-V2. Hopefully the demo will grow up to something as large and popular as AlphaFold DB (but for materials)!
The GAIN community in Germany hosts the Workshop on Explainability and Applicability of Graph Neural Networks to be held in Kassel on September 6-8th. The workshop will feature invited talks by Christopher Morris, Soledad Villar, Petar Veličković, and Emanuele Rossi.
The ICML week has finally passed with yesterdays’ workshops. Meeting the graph learning community was a blast and I am looking forward seeing you guys and gals at NeurIPS or already in Vienna at the next ICLR and ICML. The review post of most interesting graph papers at ICML is on the way 😉
Meanwhile, Meta AI and CMU released the Open Catalyst Demo - a website where you can play around with relaxations (DFT approximations) of 11.5k catalyst materials on 86 adsorbates in 100 different configurations each (making it up to 100M combinations). The demo is powered by SOTA geometric models GemNet-OC and Equiformer-V2. Hopefully the demo will grow up to something as large and popular as AlphaFold DB (but for materials)!
The GAIN community in Germany hosts the Workshop on Explainability and Applicability of Graph Neural Networks to be held in Kassel on September 6-8th. The workshop will feature invited talks by Christopher Morris, Soledad Villar, Petar Veličković, and Emanuele Rossi.
Metademolab
Open Catalyst demo
Accelerating catalyst discovery with AI
Graph Machine Learning @ ICML 2023
Just finished a new Medium post summarizing Graph ML papers seen at ICML 2023 with some additional photos from Hawaii to make the text less boring 😉 What you can find inside:
- Graph Transformers: Sparser, Faster, and Directed
- Theory: VC dimension of GNNs, deep dive in over-squashing
- New GNN architectures: delays and half-hops
- Generative Models - Stable Diffusion for Molecules, Discrete diffusion
- Geometric Learning: Geometric WL, Clifford Algebras
- Molecules: 2D-3D pretraining, Uncertainty Estimation in MD
- Materials & Proteins: CLIP for proteins, Ewald Message Passing, Equivariant Augmentations
- Cool Applications: Algorithmic reasoning, Inductive KG completion, GNNs for mass spectra
Just finished a new Medium post summarizing Graph ML papers seen at ICML 2023 with some additional photos from Hawaii to make the text less boring 😉 What you can find inside:
- Graph Transformers: Sparser, Faster, and Directed
- Theory: VC dimension of GNNs, deep dive in over-squashing
- New GNN architectures: delays and half-hops
- Generative Models - Stable Diffusion for Molecules, Discrete diffusion
- Geometric Learning: Geometric WL, Clifford Algebras
- Molecules: 2D-3D pretraining, Uncertainty Estimation in MD
- Materials & Proteins: CLIP for proteins, Ewald Message Passing, Equivariant Augmentations
- Cool Applications: Algorithmic reasoning, Inductive KG completion, GNNs for mass spectra
Medium
Graph Machine Learning @ ICML 2023
Recent advancements and hot trends, August 2023 edition
Graph ML News (Aug 12th) - ESM Disbandment, KDD’23, LoG’23
😮 The ESM team at Meta AI has been disbanded to a large surprise of the community - the suite of ESM protein language models (ESM-1, ESM-2) and ESMFold became very popular in the protein representation and generation, and things looked promising upon the release of the ESM Metagenomic Atlas with 600M+ protein structures. Some rumors say the team would continue working on the ESM stack at another place, so we’ll keep an eye on their next steps.
KDD’23 has just finished in Long Beach - perhaps it is the most graph-packed data mining conference featuring 3 workshops and 10 tutorials on Graph ML topics. The proceedings are already available and full of graph papers. I attended the Graph Learning Benchmarks workshop last Sunday to participate in the panel discussion, met old and new friends, and enjoyed a less crowded venue than ICML (still socially drained after Hawaii though).
The submission deadline for the best Graph ML conference Learning on Graph 2023 (LoG) is Aug 21st (AoE) and approaching — consider submitting if you didn’t like savage NeurIPS strong reject reviews 👺. For me, the LoG reviewing (both as an author and reviewer) and conference experience was the best in 2022, highly recommend!
Weekend reading:
AbDiffuser: Full-Atom Generation of In-Vitro Functioning Antibodies feat. Kyunghyun Cho and Andreas Loukas — a continuous (atom coordinates) and discrete (residue types) diffusion model for generating antibodies. “Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57.1% of selected designs were tight binders” 👀.
Augmenting Recurrent Graph Neural Networks with a Cache feat. Nesreen Ahmed — introduces CacheGNNs with memory, sets a new SOTA (with a significant margin) on the Peptides-struct graph regression problem of the Long-Range Graph Benchmark.
VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs feat. Jure Leskovec
Diffusion Denoised Smoothing for Certified and Adversarial Robust Out-Of-Distribution Detection feat. Stephan Günnemann
😮 The ESM team at Meta AI has been disbanded to a large surprise of the community - the suite of ESM protein language models (ESM-1, ESM-2) and ESMFold became very popular in the protein representation and generation, and things looked promising upon the release of the ESM Metagenomic Atlas with 600M+ protein structures. Some rumors say the team would continue working on the ESM stack at another place, so we’ll keep an eye on their next steps.
KDD’23 has just finished in Long Beach - perhaps it is the most graph-packed data mining conference featuring 3 workshops and 10 tutorials on Graph ML topics. The proceedings are already available and full of graph papers. I attended the Graph Learning Benchmarks workshop last Sunday to participate in the panel discussion, met old and new friends, and enjoyed a less crowded venue than ICML (still socially drained after Hawaii though).
The submission deadline for the best Graph ML conference Learning on Graph 2023 (LoG) is Aug 21st (AoE) and approaching — consider submitting if you didn’t like savage NeurIPS strong reject reviews 👺. For me, the LoG reviewing (both as an author and reviewer) and conference experience was the best in 2022, highly recommend!
Weekend reading:
AbDiffuser: Full-Atom Generation of In-Vitro Functioning Antibodies feat. Kyunghyun Cho and Andreas Loukas — a continuous (atom coordinates) and discrete (residue types) diffusion model for generating antibodies. “Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57.1% of selected designs were tight binders” 👀.
Augmenting Recurrent Graph Neural Networks with a Cache feat. Nesreen Ahmed — introduces CacheGNNs with memory, sets a new SOTA (with a significant margin) on the Peptides-struct graph regression problem of the Long-Range Graph Benchmark.
VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs feat. Jure Leskovec
Diffusion Denoised Smoothing for Certified and Adversarial Robust Out-Of-Distribution Detection feat. Stephan Günnemann
Graph ML News (Aug 18th)
A new blog post Designing Deep Networks to Process Other Deep Networks by Haggai Maron, Ethan Fetaya, Aviv Navon, Aviv Shamsian, Idan Achituve, and Gal Chechik applies the concepts of symmetry and invariances (common tools in Geometric DL) to the task of predicting model weights. Working in the Deep Weight Space (all parameters of neural networks), we want neural architectures to be invariant to permutations of neurons because mathematically any permutation should still encode the same function.
Two papers appeared almost simultaneously, PoseBusters by Buttenschoen et al and PoseCheck by Harris et al, providing a critical look on modern generative models (often diffusion-based) for protein-ligand docking and structure-based drug design. PoseBusters finds that generative models often have problems with physical plausibility of the generated outputs while PoseCheck finds many nonphysical features in generated molecules and poses. Huge opportunities for improving equivariant diffusion models!
The Simons Institute for the Theory of Computing held a workshop on large language models and transformers. It was not very much into graph learning but still featured a handful of talks on core topics that will be in graph ML sooner or later. Featuring talks by Chris Manning, Yejin Choi, Ilya Sutskever, Sasha Rush, and other famous researchers — the playlist with recorded talks is already on YouTube 👀
Weekend reading:
Score-based Enhanced Sampling for Protein Molecular Dynamics feat. Jian Tang - a score-based model for approximating MD calculations.
A new blog post Designing Deep Networks to Process Other Deep Networks by Haggai Maron, Ethan Fetaya, Aviv Navon, Aviv Shamsian, Idan Achituve, and Gal Chechik applies the concepts of symmetry and invariances (common tools in Geometric DL) to the task of predicting model weights. Working in the Deep Weight Space (all parameters of neural networks), we want neural architectures to be invariant to permutations of neurons because mathematically any permutation should still encode the same function.
Two papers appeared almost simultaneously, PoseBusters by Buttenschoen et al and PoseCheck by Harris et al, providing a critical look on modern generative models (often diffusion-based) for protein-ligand docking and structure-based drug design. PoseBusters finds that generative models often have problems with physical plausibility of the generated outputs while PoseCheck finds many nonphysical features in generated molecules and poses. Huge opportunities for improving equivariant diffusion models!
The Simons Institute for the Theory of Computing held a workshop on large language models and transformers. It was not very much into graph learning but still featured a handful of talks on core topics that will be in graph ML sooner or later. Featuring talks by Chris Manning, Yejin Choi, Ilya Sutskever, Sasha Rush, and other famous researchers — the playlist with recorded talks is already on YouTube 👀
Weekend reading:
Score-based Enhanced Sampling for Protein Molecular Dynamics feat. Jian Tang - a score-based model for approximating MD calculations.
NVIDIA Technical Blog
Designing Deep Networks to Process Other Deep Networks
Deep neural networks (DNNs) are the go-to model for learning functions from data, such as image classifiers or language models.
Graph ML News (Aug 25th)
The autumn edition of the Molecular ML Conference (MoML) going to take place on Nov 8th at MIT. MoML is a premier venue for bringing together graph learning and life sciences crowd including computation biology, drug discovery, computational chemistry, molecular simulation, and many more. Submit a poster until Oct 13th!
Not an official announcement, but there are rumors that the Stanford Graph Learning Seminar will return on Oct 11th as well 😉
Expect a flurry of ICLR submissions in the next weeks before the deadline, but meanwhile the weekend reading is:
UGSL: A Unified Framework for Benchmarking Graph Structure Learning by Google Research feat. Bahare Fatemi, Anton Tsitsulin and Bryan Perozzi
Simulate Time-integrated Coarse-grained Molecular Dynamics with Multi-scale Graph Networks feat. Xiang Fu, Tommi Jaakkola
Approximately Equivariant Graph Networks by Teresa Huang, Ron Levie, and Soledad Villar
Will More Expressive Graph Neural Networks do Better on Generative Tasks? (spoiler alert: nopes) feat. Pietro Liò
The Expressive Power of Graph Neural Networks: A Survey
The autumn edition of the Molecular ML Conference (MoML) going to take place on Nov 8th at MIT. MoML is a premier venue for bringing together graph learning and life sciences crowd including computation biology, drug discovery, computational chemistry, molecular simulation, and many more. Submit a poster until Oct 13th!
Not an official announcement, but there are rumors that the Stanford Graph Learning Seminar will return on Oct 11th as well 😉
Expect a flurry of ICLR submissions in the next weeks before the deadline, but meanwhile the weekend reading is:
UGSL: A Unified Framework for Benchmarking Graph Structure Learning by Google Research feat. Bahare Fatemi, Anton Tsitsulin and Bryan Perozzi
Simulate Time-integrated Coarse-grained Molecular Dynamics with Multi-scale Graph Networks feat. Xiang Fu, Tommi Jaakkola
Approximately Equivariant Graph Networks by Teresa Huang, Ron Levie, and Soledad Villar
Will More Expressive Graph Neural Networks do Better on Generative Tasks? (spoiler alert: nopes) feat. Pietro Liò
The Expressive Power of Graph Neural Networks: A Survey
MoML Conference
MoML | MIT Jameel Clinic
Molecular Machine Learning Conference | MIT Jameel Clinic
The conference brings together students, experts and leaders across areas with the goal of advancing how machine learning methods can address key scientific goals related to molecular modeling, molecular…
The conference brings together students, experts and leaders across areas with the goal of advancing how machine learning methods can address key scientific goals related to molecular modeling, molecular…
Graph ML News (Sep 2nd) - TpuGraphs Kaggle competition, EvolutionaryScale
Google launched a proper graph learning Kaggle competition ”Fast or Slow?” with a $50k prize pool. The challenge is based off a recently released TpuGraphs dataset — given a computational graph (as a DAG), predict its runtime given a certain input configuration (on node- or graph-level) and get the fastest config. Practically, it can be framed as a regression or ranking problem. TpuGraphs is pretty large: 7k nodes / 31M configuration pairs for the layout collection, and 40 nodes / 13M pairs for the tile collection. Baselines include GCN and GraphSAGE, but we can probably expect Kaggle grandmasters to come up with creative gradient boosting and decision trees techniques as well 😉 So XGBoost or GNNs? The challenge is open until Nov 17th.
A few weeks ago we found out that Meta disbanded the protein team working on ESM, ESMFold, and a handful of other projects. Now we know that the ESM team formed EvolutionaryScale and raised about $40M of funding promising new versions of ESM every year. Great news for thousands of protein projects using ESM models!
Weekend reading:
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
Exploring "dark matter" protein folds using deep learning feat. Andreas Loukas, Michael Bronstein, and Bruno Correia
Google launched a proper graph learning Kaggle competition ”Fast or Slow?” with a $50k prize pool. The challenge is based off a recently released TpuGraphs dataset — given a computational graph (as a DAG), predict its runtime given a certain input configuration (on node- or graph-level) and get the fastest config. Practically, it can be framed as a regression or ranking problem. TpuGraphs is pretty large: 7k nodes / 31M configuration pairs for the layout collection, and 40 nodes / 13M pairs for the tile collection. Baselines include GCN and GraphSAGE, but we can probably expect Kaggle grandmasters to come up with creative gradient boosting and decision trees techniques as well 😉 So XGBoost or GNNs? The challenge is open until Nov 17th.
A few weeks ago we found out that Meta disbanded the protein team working on ESM, ESMFold, and a handful of other projects. Now we know that the ESM team formed EvolutionaryScale and raised about $40M of funding promising new versions of ESM every year. Great news for thousands of protein projects using ESM models!
Weekend reading:
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
Exploring "dark matter" protein folds using deep learning feat. Andreas Loukas, Michael Bronstein, and Bruno Correia
Kaggle
Google - Fast or Slow? Predict AI Model Runtime
Predict how fast an AI model runs
Graph ML News (Sep 9th)
The upcoming ICLR deadline and LOG reviewing period seem to keep the community busy and reduce the amount of news content this week. We’ll compensate for that the day ICLR submissions are on OpenReview 😉
The local LoG meetup in Trento will take place on November 27th-30th (together with the main conference held online and fully remotely). There is a handful of local meetups already (if I remember correctly, other locations include UK, Germany, Canada, and a few in the US). Actually, it might be a good time for the LOG organizers to publish the confirmed ones.
The GAIN workshop on explainability and applicability of GNNs took place this week (Sept 6-8th), waiting for the recordings!
Weekend reading:
RetroBridge: Modeling Retrosynthesis with Markov Bridges by Ilia Igashov, Arne Schneuing, Marwin Segler, Michael Bronstein, Bruno Correia — a new generative framework for template-free retrosynthesis with some math traces of discrete diffusion
Where Did the Gap Go? Reassessing the Long-Range Graph Benchmark by Jan Tönshoff, Martin Ritzert, Eran Rosenbluth, Martin Grohe — turns out some hyperparameters tinkering can boost baseline performance on LRGB!
Using Multiple Vector Channels Improves E(n)-Equivariant Graph Neural Networks by Levy, Kaba, et al - a simple and inexpensive multi-channel trick to boost EGNNs
A few theory papers:
Representing Edge Flows on Graphs via Sparse Cell Complexes by Josef Hoppe, Michael T. Schaub
Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond by Shao et al.
The upcoming ICLR deadline and LOG reviewing period seem to keep the community busy and reduce the amount of news content this week. We’ll compensate for that the day ICLR submissions are on OpenReview 😉
The local LoG meetup in Trento will take place on November 27th-30th (together with the main conference held online and fully remotely). There is a handful of local meetups already (if I remember correctly, other locations include UK, Germany, Canada, and a few in the US). Actually, it might be a good time for the LOG organizers to publish the confirmed ones.
The GAIN workshop on explainability and applicability of GNNs took place this week (Sept 6-8th), waiting for the recordings!
Weekend reading:
RetroBridge: Modeling Retrosynthesis with Markov Bridges by Ilia Igashov, Arne Schneuing, Marwin Segler, Michael Bronstein, Bruno Correia — a new generative framework for template-free retrosynthesis with some math traces of discrete diffusion
Where Did the Gap Go? Reassessing the Long-Range Graph Benchmark by Jan Tönshoff, Martin Ritzert, Eran Rosenbluth, Martin Grohe — turns out some hyperparameters tinkering can boost baseline performance on LRGB!
Using Multiple Vector Channels Improves E(n)-Equivariant Graph Neural Networks by Levy, Kaba, et al - a simple and inexpensive multi-channel trick to boost EGNNs
A few theory papers:
Representing Edge Flows on Graphs via Sparse Cell Complexes by Josef Hoppe, Michael T. Schaub
Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond by Shao et al.
Graph ML News (Sep 16th) - Breakthrough Prize, OpenCatalyst cases, Illustrated Cats, EvoDiff
🏆 The Breakthrough Prize winners aka “Oscars of Science” were announced earlier this week (Ig Nobel Prizes were announced as well but that’s a story for another fun time) and they do have a nice connection to Geometric DL! The Math prize went to Simon Brendle (Columbia) for “transformative contributions to differential geometry, including sharp geometric inequalities, many results on Ricci flow and mean curvature flow and the Lawson conjecture on minimal tori in the 3-sphere.”
Ricci flows played a key role in understanding theoretical capabilities of GNNs in the seminal paper by Topping et al that received ICLR 2022 Outstanding Award and spun off more research of differential geometry and GNNs. Perfect time to jump on the Ricci flowwagon (pun intended). Do check other winners, their research is very cool as well.
🧪 The OpenCatalyst team published two case studies how the OCP demo helped in the scientific research of catalyst discovery: for the nitrogen reduction reaction (NRR) and for hydrogen fuel cells. OpenCatalyst turns into smth like AlphaFold but for materials science and chemistry.
😼 Finally, check out the Category Theory Illustrated book by Boris Marinov - this perhaps the most visual resource to understand the basics of Category Theory. As of now, 6 chapters are ready — on Sets, Categories, Monoids, Order, Logic, and Functors. Don’t forget about Cats4AI to learn more about Category Theory applied to ML and GNNs.
🧬 MSR AI4Science released EvoDiff - a massive work on the discrete diffusion generative model for conditional generation of protein sequences. EvoDiff was designed for sequences and MSAs and ships in two sizes — 38M and 640M params so it would fit on a variety of GPUs.
Some weekend reading:
Protein generation with evolutionary diffusion: sequence is all you need - introducing EvoDiff
Graph Neural Networks Use Graphs When They Shouldn't by Bechler-Speicher et al. - one more evidence for graph rewiring
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network? by Qin et al - for all you hardcore theory lovers on the channel
🏆 The Breakthrough Prize winners aka “Oscars of Science” were announced earlier this week (Ig Nobel Prizes were announced as well but that’s a story for another fun time) and they do have a nice connection to Geometric DL! The Math prize went to Simon Brendle (Columbia) for “transformative contributions to differential geometry, including sharp geometric inequalities, many results on Ricci flow and mean curvature flow and the Lawson conjecture on minimal tori in the 3-sphere.”
Ricci flows played a key role in understanding theoretical capabilities of GNNs in the seminal paper by Topping et al that received ICLR 2022 Outstanding Award and spun off more research of differential geometry and GNNs. Perfect time to jump on the Ricci flowwagon (pun intended). Do check other winners, their research is very cool as well.
🧪 The OpenCatalyst team published two case studies how the OCP demo helped in the scientific research of catalyst discovery: for the nitrogen reduction reaction (NRR) and for hydrogen fuel cells. OpenCatalyst turns into smth like AlphaFold but for materials science and chemistry.
😼 Finally, check out the Category Theory Illustrated book by Boris Marinov - this perhaps the most visual resource to understand the basics of Category Theory. As of now, 6 chapters are ready — on Sets, Categories, Monoids, Order, Logic, and Functors. Don’t forget about Cats4AI to learn more about Category Theory applied to ML and GNNs.
🧬 MSR AI4Science released EvoDiff - a massive work on the discrete diffusion generative model for conditional generation of protein sequences. EvoDiff was designed for sequences and MSAs and ships in two sizes — 38M and 640M params so it would fit on a variety of GPUs.
Some weekend reading:
Protein generation with evolutionary diffusion: sequence is all you need - introducing EvoDiff
Graph Neural Networks Use Graphs When They Shouldn't by Bechler-Speicher et al. - one more evidence for graph rewiring
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network? by Qin et al - for all you hardcore theory lovers on the channel
GraphML News (Sep 23rd) - Stanford Graph Learning Workshop, AlphaMisuse, PEFT for ESM
NeurIPS decisions for both tracks are out - congrats to those who made it in and encouragements to those who did not, hopefully the next iteration would get better! Our team got 2 papers accepted including A*Net - a scalable knowledge graph reasoning method that can be used, eg, for improving factual correctness of language models (demo is on github). Next weeks we can expect more accepted papers to be publicly available, so we’ll keep you updated. Don’t forget about the NeurIPS graph workshops many of which extended their deadlines to early October!
Stanford Graph Learning Workshop was officially announced and will take place physically on Oct 24th. This time the organizers published a call for contributed talks from the academic and industry tracks. I will try to be there, ping me if you want to chat.
Google DeepMind announced AlphaMisuse, a model for categorizing “missense” genetic mutations based on AlphaFold. AlphaMisuse predicted labels for ~60M possible missense mutations whereas humans covered at most ~700K. Unfortunately, the authors say the model weights won’t be released so let’s hope for re-implementations in open source ecosystems.
If you have been living under the rock, parameter-efficient fine-tuning (PEFT) techniques took the world of LLMs by the storm and it’s pretty much everywhere now. Amelie Schreiber wrote a great blogpost on applying LoRA to the ESM-2 family of protein LMs so even the beefiest of ESMs (still pretty small compared to Llama’s though) can be now fine-tuned on commodity GPUs. To learn more about PEFT, check out this fresh survey by Vladislav Lialin et al.
Some freshly accepted NeurIPS papers for the weekend reading:
Implicit Transfer Operator Learning: Multiple Time-Resolution Surrogates for Molecular Dynamics
SE(3) Equivariant Augmented Coupling Flows
When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability
Fine-grained Expressivity of Graph Neural Networks
Next week ICLR’24 submissions become available, so oh boy we’ll have the weekend reading 👀
NeurIPS decisions for both tracks are out - congrats to those who made it in and encouragements to those who did not, hopefully the next iteration would get better! Our team got 2 papers accepted including A*Net - a scalable knowledge graph reasoning method that can be used, eg, for improving factual correctness of language models (demo is on github). Next weeks we can expect more accepted papers to be publicly available, so we’ll keep you updated. Don’t forget about the NeurIPS graph workshops many of which extended their deadlines to early October!
Stanford Graph Learning Workshop was officially announced and will take place physically on Oct 24th. This time the organizers published a call for contributed talks from the academic and industry tracks. I will try to be there, ping me if you want to chat.
Google DeepMind announced AlphaMisuse, a model for categorizing “missense” genetic mutations based on AlphaFold. AlphaMisuse predicted labels for ~60M possible missense mutations whereas humans covered at most ~700K. Unfortunately, the authors say the model weights won’t be released so let’s hope for re-implementations in open source ecosystems.
If you have been living under the rock, parameter-efficient fine-tuning (PEFT) techniques took the world of LLMs by the storm and it’s pretty much everywhere now. Amelie Schreiber wrote a great blogpost on applying LoRA to the ESM-2 family of protein LMs so even the beefiest of ESMs (still pretty small compared to Llama’s though) can be now fine-tuned on commodity GPUs. To learn more about PEFT, check out this fresh survey by Vladislav Lialin et al.
Some freshly accepted NeurIPS papers for the weekend reading:
Implicit Transfer Operator Learning: Multiple Time-Resolution Surrogates for Molecular Dynamics
SE(3) Equivariant Augmented Coupling Flows
When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability
Fine-grained Expressivity of Graph Neural Networks
Next week ICLR’24 submissions become available, so oh boy we’ll have the weekend reading 👀
GraphML News (Oct 3rd)
Well, no big news from the past weekend since ICLR’24 submissions are still not available after the main deadline 🙁 At least we can read the abstracts of all accepted NeurIPS’23 papers here. A brief search indicates that the amount of papers with “diffusion” (192) is as large as “graph” papers (202).
Meanwhile, VantAI launches a monthly lecture series on Generative AI in Drug Discovery hosted by Michael Bronstein and Bruno Correia. The inaugural meeting will be held this Friday, October 6, at 11 am ET / 5 pm CET. Free to join using the links provided.
A few fresh software releases: PyDGN got updated to 1.5, and industry-grade GraphStorm released v0.2 featuring better support for distributed training on GPUs.
Paper reading:
Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems (NeurIPS’23) by Google on featurization strategies for ML in search, ads, and recsys.
Limits, approximation and size transferability for GNNs on sparse graphs via graphops (NeurIPS’23) by Thien Le and Stefanie Jegelka on size generalization in GNNs.
Sheaf Hypergraph Networks (NeurIPS’23) by Iulia Duta et al (math alert 🤯)
On the Power of the Weisfeiler-Leman Test for Graph Motif Parameters by Matthias Lanzinger and Pablo Barceló
Well, no big news from the past weekend since ICLR’24 submissions are still not available after the main deadline 🙁 At least we can read the abstracts of all accepted NeurIPS’23 papers here. A brief search indicates that the amount of papers with “diffusion” (192) is as large as “graph” papers (202).
Meanwhile, VantAI launches a monthly lecture series on Generative AI in Drug Discovery hosted by Michael Bronstein and Bruno Correia. The inaugural meeting will be held this Friday, October 6, at 11 am ET / 5 pm CET. Free to join using the links provided.
A few fresh software releases: PyDGN got updated to 1.5, and industry-grade GraphStorm released v0.2 featuring better support for distributed training on GPUs.
Paper reading:
Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems (NeurIPS’23) by Google on featurization strategies for ML in search, ads, and recsys.
Limits, approximation and size transferability for GNNs on sparse graphs via graphops (NeurIPS’23) by Thien Le and Stefanie Jegelka on size generalization in GNNs.
Sheaf Hypergraph Networks (NeurIPS’23) by Iulia Duta et al (math alert 🤯)
On the Power of the Weisfeiler-Leman Test for Graph Motif Parameters by Matthias Lanzinger and Pablo Barceló