Graph ML position at Trade Desk
An interesting position to apply at the Trade Desk. As a researcher in the AI Lab working on graph ML, you will be part of the mission to upgrade their TTD ML tech stack to be graph-ML based. You will also have the opportunities to R&D on cutting edge graph ML technologies and publish them in top conferences, or build innovative product PoC to shape our future product roadmaps. 1 day in the office a week. The tech hubs in London, Madrid & Munich, or in the US!
An interesting position to apply at the Trade Desk. As a researcher in the AI Lab working on graph ML, you will be part of the mission to upgrade their TTD ML tech stack to be graph-ML based. You will also have the opportunities to R&D on cutting edge graph ML technologies and publish them in top conferences, or build innovative product PoC to shape our future product roadmaps. 1 day in the office a week. The tech hubs in London, Madrid & Munich, or in the US!
Upcoming GraphML Venues: LoG and Stanford Graph Learning Workshop
September finally brings some fresh news and updates:
- The abstract deadline for the upcoming Learning of Graphs (LoG) conference is September, 9th AoE with two tracks: full papers and extended abstracts. LoG aims to be the premier venue for Graph ML research, so consider publishing there your best stuff.
- Stanford organizes the 2nd iteration of the Graph Learning Workshop on September 28th covering latest updates in PyG and cool industrial applications. In addition to Stanford speakers there will be invited talks from NVidia, Intel, Meta, Google, Spotify, and Kumo.ai.
A nice relaxing event after the ICLR deadline 🙂 We will be keeping an eye on interesting ICLR submissions as well.
September finally brings some fresh news and updates:
- The abstract deadline for the upcoming Learning of Graphs (LoG) conference is September, 9th AoE with two tracks: full papers and extended abstracts. LoG aims to be the premier venue for Graph ML research, so consider publishing there your best stuff.
- Stanford organizes the 2nd iteration of the Graph Learning Workshop on September 28th covering latest updates in PyG and cool industrial applications. In addition to Stanford speakers there will be invited talks from NVidia, Intel, Meta, Google, Spotify, and Kumo.ai.
A nice relaxing event after the ICLR deadline 🙂 We will be keeping an eye on interesting ICLR submissions as well.
Learning on Graphs Conference
👃 GNNs Learn To Smell & Awesome NeurReps
1) Back in 2019, Google AI started a project on learning representations of smells. From basic chemistry we know that aromaticity depends on the molecular structure, e.g., cyclic compounds. In fact, the whole group of ”aromatic hydrocarbons” was named aromatic because they actually has some smell (compared to many non-organic molecules). If we have a molecular structure, we can employ a GNN on top of it and learn some representations - that is a tl;dr of smell representation learning with GNNs.
Recently, Google AI released a new blogpost describing the next phase of the project - the Principal Odor Map that is able to group molecules in “odor clusters”. The authors conducted 3 cool experiments: classifying 400 new molecules never smelled before and comparison to the averaged rating of a group of human panelists; linking odor quality to fundamental biology; and probing aromatic molecules on their mosquito repelling qualities. The GNN-based model shows very good results - now we can finally claim that GNNs can smell! Looking forward for GNNs transforming the perfume industry 📈
2) The NeurReps commnuity (Symmetry and Geometry in Neural Representations) is curating the Awesome List of resources and research related to the geometry of representations in the brain, deep networks, and beyond. A great resource for Neuroscience and Geometric DL folks to learn about the adjacent field!
1) Back in 2019, Google AI started a project on learning representations of smells. From basic chemistry we know that aromaticity depends on the molecular structure, e.g., cyclic compounds. In fact, the whole group of ”aromatic hydrocarbons” was named aromatic because they actually has some smell (compared to many non-organic molecules). If we have a molecular structure, we can employ a GNN on top of it and learn some representations - that is a tl;dr of smell representation learning with GNNs.
Recently, Google AI released a new blogpost describing the next phase of the project - the Principal Odor Map that is able to group molecules in “odor clusters”. The authors conducted 3 cool experiments: classifying 400 new molecules never smelled before and comparison to the averaged rating of a group of human panelists; linking odor quality to fundamental biology; and probing aromatic molecules on their mosquito repelling qualities. The GNN-based model shows very good results - now we can finally claim that GNNs can smell! Looking forward for GNNs transforming the perfume industry 📈
2) The NeurReps commnuity (Symmetry and Geometry in Neural Representations) is curating the Awesome List of resources and research related to the geometry of representations in the brain, deep networks, and beyond. A great resource for Neuroscience and Geometric DL folks to learn about the adjacent field!
research.google
Digitizing Smell: Using Molecular Maps to Understand Odor
Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell? …Until you can measure their ...
Workshop: Hot Topics in Graph Neural Networks
Uni Kassel and Fraunhofer IEE organize a GNN workshop on October 25th, the announced line-up of speakers includes Fabian Jogl (TU Wien), Massimo Perini (University of Edinburgh), Hannes Stärk (MIT), Maximilian Thiessen (TU Wien), Rakshit Trivedi (Harvard), and Petar Veličković (DeepMind). Quoting the chairs:
“Find out about our current projects and follow exciting talks about new advances in Graph Neural Networks by international speakers. The work of the GAIN group addresses dynamic GNN models, the expressivity of GNN models, and their application in the power grid. Among others, the speakers will enlighten us with their work on Algorithmically-aligned GNNs, the Improvement of Message-passing, and Geometric Machine Learning for Molecules.
The public part of the event will take place on the 25th of October 2022 from 10am to 6pm. The workshop will be held in a hybrid format, but we are happy if you could come in person! To make the workshop more interactive for everyone who cannot participate in person, we have built a virtual 2D world which you can join to network with other participants!”
Uni Kassel and Fraunhofer IEE organize a GNN workshop on October 25th, the announced line-up of speakers includes Fabian Jogl (TU Wien), Massimo Perini (University of Edinburgh), Hannes Stärk (MIT), Maximilian Thiessen (TU Wien), Rakshit Trivedi (Harvard), and Petar Veličković (DeepMind). Quoting the chairs:
“Find out about our current projects and follow exciting talks about new advances in Graph Neural Networks by international speakers. The work of the GAIN group addresses dynamic GNN models, the expressivity of GNN models, and their application in the power grid. Among others, the speakers will enlighten us with their work on Algorithmically-aligned GNNs, the Improvement of Message-passing, and Geometric Machine Learning for Molecules.
The public part of the event will take place on the 25th of October 2022 from 10am to 6pm. The workshop will be held in a hybrid format, but we are happy if you could come in person! To make the workshop more interactive for everyone who cannot participate in person, we have built a virtual 2D world which you can join to network with other participants!”
Upcoming NeurIPS’22 Workshops & Submission Deadlines
As NeurIPS’22 decisions are out, you might want to submit your work to some cool upcoming domain-specific graph workshops:
1. Temporal Graph Learning Workshop @ NeurIPS’22 organized by researchers from Mila and Oxford - deadline September 19th
2. New Frontiers in Graph Learning @ NeurIPS’22 organized by researchers from Stanford, Harvard, Yale, UCLA, Google Brain, and MIT - deadline September 22nd
3. Symmetry and Geometry in Neural Representations @ NeurIPS’22 - organized by researchers from UC Berkley, Institut Pasteur, ENS, UC Santa Barbara - deadline September 22nd
4. Workshop on Graph Learning for Industrial Applications @ NeurIPS’22 organized by JP Morgan, Capital One, Bank of America, Schonfeld, Mila, IBM, Pfizer, Oxford, and FINRA - deadline September 22nd
5. Critical Assessment of Molecular ML (NeurIPS’22 side-event) organized by ELLIS units in Cambridge and Linz - deadline October 18th
If you are at MICCAI in Singapore those days, don’t forget to attend the 4th Workshop on Graphs in biomedical Image Analysis (GRAIL) on September 18th organized by NVIDIA, TU Munich, and Oxford. There will be talks by Marinka Zitnik, Islem Rekik, Mark O’Donoghue, and Xavier Bresson.
As NeurIPS’22 decisions are out, you might want to submit your work to some cool upcoming domain-specific graph workshops:
1. Temporal Graph Learning Workshop @ NeurIPS’22 organized by researchers from Mila and Oxford - deadline September 19th
2. New Frontiers in Graph Learning @ NeurIPS’22 organized by researchers from Stanford, Harvard, Yale, UCLA, Google Brain, and MIT - deadline September 22nd
3. Symmetry and Geometry in Neural Representations @ NeurIPS’22 - organized by researchers from UC Berkley, Institut Pasteur, ENS, UC Santa Barbara - deadline September 22nd
4. Workshop on Graph Learning for Industrial Applications @ NeurIPS’22 organized by JP Morgan, Capital One, Bank of America, Schonfeld, Mila, IBM, Pfizer, Oxford, and FINRA - deadline September 22nd
5. Critical Assessment of Molecular ML (NeurIPS’22 side-event) organized by ELLIS units in Cambridge and Linz - deadline October 18th
If you are at MICCAI in Singapore those days, don’t forget to attend the 4th Workshop on Graphs in biomedical Image Analysis (GRAIL) on September 18th organized by NVIDIA, TU Munich, and Oxford. There will be talks by Marinka Zitnik, Islem Rekik, Mark O’Donoghue, and Xavier Bresson.
Google
TGL Workshop
Important Information
NeurIPS virtual site (to follow livestream and recordings) Workshop Date: Saturday, Dec. 3rd 2022; Room 399 Workshop Time Zone: New Orleans (GMT-5)
Contact Email: [email protected] Twitter: https://twitter.com/tgl_workshop join…
NeurIPS virtual site (to follow livestream and recordings) Workshop Date: Saturday, Dec. 3rd 2022; Room 399 Workshop Time Zone: New Orleans (GMT-5)
Contact Email: [email protected] Twitter: https://twitter.com/tgl_workshop join…
📚 Weekend Reading
This week brought quite a few interesting papers and resources - we encourage you to invest there some time:
Geometric multimodal representation learning by Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. A survey of 100+ papers on graphs combined with other modalities and a framework of multi-modal approaches for natural sciences like physical interaction, molecular reasoning, and protein modeling.
Clifford Neural Layers for PDE Modeling by Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta. If you thought you know all the basics from the Geometric Deep Learning Course - here is something more challenging. The authors introduce the ideas from Geometric Algebra into ML tasks, namely, Clifford Algebras that unify numbers, vectors, complex numbers, quaternions, and have additional primitives to incorporate plane and volume segments. The paper gives a great primer on the math and applications. You can also watch a very visual YouTube lecture on Geometric Algebras.
Categories for AI (Cats4AI) - an upcoming open course on Category Theory created by Andrew Dudzik, Bruno Gavranović, João Guilherme Araújo, Petar Veličković, and Pim de Haan. “This course is aimed towards machine learning researchers, but approachable to anyone with a basic understanding of linear algebra and differential calculus. The material is self-contained and all the necessary background will be introduced along the way.” Don’t forget your veggies 🥦
This week brought quite a few interesting papers and resources - we encourage you to invest there some time:
Geometric multimodal representation learning by Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. A survey of 100+ papers on graphs combined with other modalities and a framework of multi-modal approaches for natural sciences like physical interaction, molecular reasoning, and protein modeling.
Clifford Neural Layers for PDE Modeling by Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta. If you thought you know all the basics from the Geometric Deep Learning Course - here is something more challenging. The authors introduce the ideas from Geometric Algebra into ML tasks, namely, Clifford Algebras that unify numbers, vectors, complex numbers, quaternions, and have additional primitives to incorporate plane and volume segments. The paper gives a great primer on the math and applications. You can also watch a very visual YouTube lecture on Geometric Algebras.
Categories for AI (Cats4AI) - an upcoming open course on Category Theory created by Andrew Dudzik, Bruno Gavranović, João Guilherme Araújo, Petar Veličković, and Pim de Haan. “This course is aimed towards machine learning researchers, but approachable to anyone with a basic understanding of linear algebra and differential calculus. The material is self-contained and all the necessary background will be introduced along the way.” Don’t forget your veggies 🥦
YouTube
A Swift Introduction to Geometric Algebra
This video is an introduction to geometric algebra, a severely underrated mathematical language that can be used to describe almost all of physics. This video was made as a presentation for my lab that I work in. While I had the people there foremost in…
TorchProtein & PEER Protein Sequence Benchmark Release
MilaGraph released TorchProtein, a new version of TorchDrug powered with a suite of tools for protein sequence understanding. Quoting the authors:
“ TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.
With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.”
Simultaneously, the authors present PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, a new benchmark of 17 protein understanding tasks grouped into 5 categories (Function Prediction, Localization Prediction, Structure Prediction, Protein-Protein Interaction Prediction, Protein-Ligand Interaction Prediction) already available in TorchProtein. ProtBert and ESM-1b have been probed on PEER (and ESM-2 is expected to arrive as well).
MilaGraph released TorchProtein, a new version of TorchDrug powered with a suite of tools for protein sequence understanding. Quoting the authors:
“ TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.
With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.”
Simultaneously, the authors present PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, a new benchmark of 17 protein understanding tasks grouped into 5 categories (Function Prediction, Localization Prediction, Structure Prediction, Protein-Protein Interaction Prediction, Protein-Ligand Interaction Prediction) already available in TorchProtein. ProtBert and ESM-1b have been probed on PEER (and ESM-2 is expected to arrive as well).
GitHub
Release 0.2.0 Release · DeepGraphLearning/torchdrug
V0.2.0 is a major release with a new family member TorchProtein, a library for machine-learning-guided protein science. Aiming at simplifying the development of protein methods, TorchProtein encaps...
GraphML News: PyG + NVIDIA, Breakthrough Prize
🚀 PyG announced the release of pyg-lib, the result of collaboration with NVIDIA on speeding up most important PyG operations. It is a low-level GNN library that integrates cuGraph, cuDF, and CUTLASS that improve the speed of matrix multiplications and graph sampling (a common bottleneck when working on large graphs). The reported speedups are pretty astounding - up to x150 when sampling on a GPU. There will be more exciting news about PyG at the upcoming Stanford Graph Learning Workshop!
👏 Breakthrough Prize (renowned as the “Oscars of Science”) announced the winners in life sciences, maths, and physics - graph and geometry areas are well represented there!
- John Jumper (DeepMind) and Demis Hassabis (DeepMind) received the Life Sciences prize for AlphaFold
- Daniel A. Spielman (Yale University) received the Math prize for contributions to spectral graph theory, the Kadison-Singer problem, optimization, and coding theory
- Ronen Eldan (Weizmann Institute of Science and Microsoft Research) received the New Horizons in Mathematics Prize for advancing high-dimensional geometry and probability including the KLS conjecture
- Vera Traub (Uni Bonn PhD 2020) received the Maryam Mirzakhani New Frontiers Prize for advances in approximation results in classical combinatorial optimization problems, including the traveling salesman problem and network design.
🚀 PyG announced the release of pyg-lib, the result of collaboration with NVIDIA on speeding up most important PyG operations. It is a low-level GNN library that integrates cuGraph, cuDF, and CUTLASS that improve the speed of matrix multiplications and graph sampling (a common bottleneck when working on large graphs). The reported speedups are pretty astounding - up to x150 when sampling on a GPU. There will be more exciting news about PyG at the upcoming Stanford Graph Learning Workshop!
👏 Breakthrough Prize (renowned as the “Oscars of Science”) announced the winners in life sciences, maths, and physics - graph and geometry areas are well represented there!
- John Jumper (DeepMind) and Demis Hassabis (DeepMind) received the Life Sciences prize for AlphaFold
- Daniel A. Spielman (Yale University) received the Math prize for contributions to spectral graph theory, the Kadison-Singer problem, optimization, and coding theory
- Ronen Eldan (Weizmann Institute of Science and Microsoft Research) received the New Horizons in Mathematics Prize for advancing high-dimensional geometry and probability including the KLS conjecture
- Vera Traub (Uni Bonn PhD 2020) received the Maryam Mirzakhani New Frontiers Prize for advances in approximation results in classical combinatorial optimization problems, including the traveling salesman problem and network design.
ICLR 2023 Submissions
The list of submissions to the top AI venue is available on OpenReview (with full-text PDFs). There are 6000+ submissions this year (3x growth from 2000+ last year), we will be keeping an eye on cool Graph ML submissions and prepare an overview. Enjoy the weekend reading and checking if someone has scooped a project you’ve been working on last months/years 😉
The list of submissions to the top AI venue is available on OpenReview (with full-text PDFs). There are 6000+ submissions this year (3x growth from 2000+ last year), we will be keeping an eye on cool Graph ML submissions and prepare an overview. Enjoy the weekend reading and checking if someone has scooped a project you’ve been working on last months/years 😉
openreview.net
ICLR 2023 Conference
Welcome to the OpenReview homepage for ICLR 2023 Conference
DGL: Billion-Scale Graphs and Sparse Matrix API
In a new release 0.9.1 DGL accelerated the pipeline of working with very large graphs (5B edges). Before it was taking 10 hours and 4TB of RAM and now 3 hours and 500GB of RAM, which also reduces the cost by 4x.
Also, if you use or would like to use sparse API for your GNNs, you can provide the feedback and use cases to the DGL team (feel free to reach out to @ivanovserg990 to connect). They are looking for the following profiles:
* Researchers/students who are familiar with sparse matrix notations or linear algebra.
* May have math or geometry backgrounds.
* Work majorly on innovating GNN architecture; less on domain applications.
* May have PyG/DGL experience.
In a new release 0.9.1 DGL accelerated the pipeline of working with very large graphs (5B edges). Before it was taking 10 hours and 4TB of RAM and now 3 hours and 500GB of RAM, which also reduces the cost by 4x.
Also, if you use or would like to use sparse API for your GNNs, you can provide the feedback and use cases to the DGL team (feel free to reach out to @ivanovserg990 to connect). They are looking for the following profiles:
* Researchers/students who are familiar with sparse matrix notations or linear algebra.
* May have math or geometry backgrounds.
* Work majorly on innovating GNN architecture; less on domain applications.
* May have PyG/DGL experience.
www.dgl.ai
Deep Graph Library
Library for deep learning on graphs
Hot New Graph ML Submissions from ICLR
🧬 Diffusion remains the top trend in AI/ML venues this year, including the graph domain. Ben Blaiszik compiled a Twitter thread of interesting papers in AI 4 Science domain including material discovery, catalyst discovery, and crystallography. Particularly cool works:
- Protein structure generation via folding diffusion by the collab between Stanford and MSR - Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini - why do you need AlphaFold and MSAs if you can just train a diffusion model to predict all the structure? 😉
- Dynamic-Backbone Protein-Ligand Structure Prediction with Multiscale Generative Diffusion Models by NVIDIA and Caltech - Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Anima Anandkumar
- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking by MIT - Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola - the next version of the famous EquiDock and EquiBind combined with the recent Torsional Diffusion.
- We’d include here a novel benchmark work Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design by Stanford and University of Toronto - AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik
📚 In a more general context, Yilun Xu shared a Google Sheet with ICLR submissions on diffusion papers and score-based generative modeling including trendy text-to-video models announced by FAIR and Google.
🤖 Derek Lim compiled a Twitter thread on 10+ ICLR submissions on Graph Transformers - the field looks a bit saturated at the moment, let’s see what reviewers say.
🪓 Michael Bronstein’s lab at Twitter announced two cool papers:
- Gradient Gating for Deep Multi-Rate Learning on Graphs by the collab between ETH Zurich, Oxford, and Berkley - T. Konstantin Rusch, Benjamin P. Chamberlain, Michael W. Mahoney, Michael M. Bronstein, Siddhartha Mishra. A clever trick improving a standard residual connection to allow nodes to get updated ad different speeds. A blast from the past - GraphSAGE from 2017 with gradient gating becomes a unanimous leader by a large margin in heterophilic graphs 👀
- Graph Neural Networks for Link Prediction with Subgraph Sketching by Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M. Bronstein, Max Hansmire. A neat usage of sketching to encode subgraphs in ELPH and its more scalable buddy BUDDY for solving link prediction in large graphs.
🧬 Diffusion remains the top trend in AI/ML venues this year, including the graph domain. Ben Blaiszik compiled a Twitter thread of interesting papers in AI 4 Science domain including material discovery, catalyst discovery, and crystallography. Particularly cool works:
- Protein structure generation via folding diffusion by the collab between Stanford and MSR - Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini - why do you need AlphaFold and MSAs if you can just train a diffusion model to predict all the structure? 😉
- Dynamic-Backbone Protein-Ligand Structure Prediction with Multiscale Generative Diffusion Models by NVIDIA and Caltech - Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Anima Anandkumar
- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking by MIT - Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola - the next version of the famous EquiDock and EquiBind combined with the recent Torsional Diffusion.
- We’d include here a novel benchmark work Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design by Stanford and University of Toronto - AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik
📚 In a more general context, Yilun Xu shared a Google Sheet with ICLR submissions on diffusion papers and score-based generative modeling including trendy text-to-video models announced by FAIR and Google.
🤖 Derek Lim compiled a Twitter thread on 10+ ICLR submissions on Graph Transformers - the field looks a bit saturated at the moment, let’s see what reviewers say.
🪓 Michael Bronstein’s lab at Twitter announced two cool papers:
- Gradient Gating for Deep Multi-Rate Learning on Graphs by the collab between ETH Zurich, Oxford, and Berkley - T. Konstantin Rusch, Benjamin P. Chamberlain, Michael W. Mahoney, Michael M. Bronstein, Siddhartha Mishra. A clever trick improving a standard residual connection to allow nodes to get updated ad different speeds. A blast from the past - GraphSAGE from 2017 with gradient gating becomes a unanimous leader by a large margin in heterophilic graphs 👀
- Graph Neural Networks for Link Prediction with Subgraph Sketching by Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M. Bronstein, Max Hansmire. A neat usage of sketching to encode subgraphs in ELPH and its more scalable buddy BUDDY for solving link prediction in large graphs.
📏 Long Range Graph Benchmark
Vijay Dwivedi (NTU, Singapore) published a new blogpost on long-range graph benchmarks introducing 5 new challenging tasks in node classification, link prediction, graph classification, and graph regression.
“Many of the existing graph learning benchmarks consist of prediction tasks that primarily rely on local structural information rather than distant information propagation to compute a target label or metric. This can be observed in datasets such as ZINC, ogbg-molhiv and ogbg-molpcba where models that rely significantly on encoding local (or, near-local) structural information continue to be among leaderboard toppers.”
LRGB, a new collection of datasets, aims at evaluating long-range capabilities of MPNNs and graph transformers. Particularly, the node classification tasks were derived from image-based Pascal-VOC and COCO, the link prediction task is derived from PCQM4M asking about links between atoms distant in the 2D space (5+ hops away) but close in the 3D space where only 2D features are given, and the graph-level tasks focus on predicting structures and functions of small proteins (peptides).
Message passing nets (MPNNs) are known to suffer from the bottleneck effects and oversquashing and, hence, underperform in long-range tasks. First LRGB experiments confirm that showing that fully-connected graph transformers quite significantly outperform MPNNs. A big room for improving MPNNs!
Paper, Code, Leaderboard
Vijay Dwivedi (NTU, Singapore) published a new blogpost on long-range graph benchmarks introducing 5 new challenging tasks in node classification, link prediction, graph classification, and graph regression.
“Many of the existing graph learning benchmarks consist of prediction tasks that primarily rely on local structural information rather than distant information propagation to compute a target label or metric. This can be observed in datasets such as ZINC, ogbg-molhiv and ogbg-molpcba where models that rely significantly on encoding local (or, near-local) structural information continue to be among leaderboard toppers.”
LRGB, a new collection of datasets, aims at evaluating long-range capabilities of MPNNs and graph transformers. Particularly, the node classification tasks were derived from image-based Pascal-VOC and COCO, the link prediction task is derived from PCQM4M asking about links between atoms distant in the 2D space (5+ hops away) but close in the 3D space where only 2D features are given, and the graph-level tasks focus on predicting structures and functions of small proteins (peptides).
Message passing nets (MPNNs) are known to suffer from the bottleneck effects and oversquashing and, hence, underperform in long-range tasks. First LRGB experiments confirm that showing that fully-connected graph transformers quite significantly outperform MPNNs. A big room for improving MPNNs!
Paper, Code, Leaderboard
Medium
LRGB: Long Range Graph Benchmark
Benchmark to evaluate graph networks with long range modeling
Graph Papers of the Week
Expander Graph Propagation by Andreea Deac, Marc Lackenby, Petar Veličković. A clever approach to bypass bottlenecks without fully-connected graph transformers. Turns out that sparse but well-connected 4-regular Cayley graphs (expander graphs) can be a helpful template for message propagation. Cayley graphs of a desired size can be pre-computed w/o looking at the original graph. Practically, you can add a GNN layer propagating along a Cayley graph after each normal GNN layer over the original graph.
The anonymous ICLR 2023 submission Exphormer: Scaling Graph Transformers with Expander Graphs applies the same idea of expander graphs as a sparse attention in Graph Transformers allowing them to scale to ogb-arxiv (170k nodes)
Rethinking Knowledge Graph Evaluation Under the Open-World Assumption by Haotong Yang, Zhouchen Lin, Muhan Zhang. When evaluating KG link prediction tasks, there is no guarantee that the test set contains really all missing triples. The authors show that if there is an additional set of true triples (not labeled as true in the test), as small as 10% of the test set, MRR on the original test set only log-correlates with the MRR on the true test set. It means that if your model shows 40% MRR on the test set and you think it’s incomplete, chances are the true MRR can be much higher, you should inspect the top predictions as possibly new unlabeled true triples.
Pre-training via Denoising for Molecular Property Prediction by Sheheryar Zaidi, Michael Schaarschmidt, and DeepMind team. The paper takes the NoisyNodes SSL objective to the next level (aka NoisyNodes on steroids). NoisyNodes takes a molecular graph with 3D coordinates, adds Gaussian noise to those 3D features, and asks to predict this noise as a loss term. NoisyNodes, as an auxiliary objective, was used in many OGB Large-Scale Challenge winning approaches, but now the authors study NoisyNodes as the sole pre-training SSL objective. Theory-wise, the authors find a link between denoising and score-matching (commonly used in generative diffusion models) and find that denoising helps to learn force fields. MPNN pre-trained on PCQM4Mv2 with this objective transfers well to QM9 and OC20 datasets and often outperforms fancier models like DimeNet++ and E(n)-GNN.
Expander Graph Propagation by Andreea Deac, Marc Lackenby, Petar Veličković. A clever approach to bypass bottlenecks without fully-connected graph transformers. Turns out that sparse but well-connected 4-regular Cayley graphs (expander graphs) can be a helpful template for message propagation. Cayley graphs of a desired size can be pre-computed w/o looking at the original graph. Practically, you can add a GNN layer propagating along a Cayley graph after each normal GNN layer over the original graph.
The anonymous ICLR 2023 submission Exphormer: Scaling Graph Transformers with Expander Graphs applies the same idea of expander graphs as a sparse attention in Graph Transformers allowing them to scale to ogb-arxiv (170k nodes)
Rethinking Knowledge Graph Evaluation Under the Open-World Assumption by Haotong Yang, Zhouchen Lin, Muhan Zhang. When evaluating KG link prediction tasks, there is no guarantee that the test set contains really all missing triples. The authors show that if there is an additional set of true triples (not labeled as true in the test), as small as 10% of the test set, MRR on the original test set only log-correlates with the MRR on the true test set. It means that if your model shows 40% MRR on the test set and you think it’s incomplete, chances are the true MRR can be much higher, you should inspect the top predictions as possibly new unlabeled true triples.
Pre-training via Denoising for Molecular Property Prediction by Sheheryar Zaidi, Michael Schaarschmidt, and DeepMind team. The paper takes the NoisyNodes SSL objective to the next level (aka NoisyNodes on steroids). NoisyNodes takes a molecular graph with 3D coordinates, adds Gaussian noise to those 3D features, and asks to predict this noise as a loss term. NoisyNodes, as an auxiliary objective, was used in many OGB Large-Scale Challenge winning approaches, but now the authors study NoisyNodes as the sole pre-training SSL objective. Theory-wise, the authors find a link between denoising and score-matching (commonly used in generative diffusion models) and find that denoising helps to learn force fields. MPNN pre-trained on PCQM4Mv2 with this objective transfers well to QM9 and OC20 datasets and often outperforms fancier models like DimeNet++ and E(n)-GNN.
Blog Posts of the Week
A few fresh blog posts to add to your weekend reading list.
Graph Neural Networks as gradient flows by Michael Bronstein, Francesco Di Giovanni, James Rowbottom, Ben Chamberlain, and Thomas Markovich. The blog summarizes recent efforts in understanding GNNs from the physics perspective. Particularly, the post describes how GNNs can be seen as gradient flows that help in heterophilic graphs. Essentially, the approach implies having one symmetric weight matrix W shared among all GNN layers, residual connections, and non-linearities can be dropped. Under this sauce, classic GCNs by Kipf & Welling strike back!
Graph-based nearest neighbor search by Liudmila Prokhorenkova and Dmitry Baranchuk. The post gives a nice intro to the graph-based technology (eg, HNSW) behind many vector search engines and reviews recent efforts in improving scalability and recall. Particularly, the authors show that non-Euclidean hyperbolic space might have a few cool benefits unattainable by classic Euclidean-only algorithms.
Long Range Graph Benchmark by Vijay Dwivedi. Covered in one the previous posts in this channel, the post introduces a new suite of tasks designed for capturing long-range interactions in graphs.
Foundation Models are Entering their Data-Centric Era by Chris Ré and Simran Arora. The article is very relevant to any large-scale model pre-training in any domain, be it NLP, Vision, or Graph ML. The authors observe that in the era of foundation models we have to rethink how we train such big models, and data diversity becomes the single most important factor of inference capabilities of those models. Two lessons learned by the authors: “Once a technology stabilizes the pendulum for value swings back to the data” and “We can (and need to) handle noise”.
A few fresh blog posts to add to your weekend reading list.
Graph Neural Networks as gradient flows by Michael Bronstein, Francesco Di Giovanni, James Rowbottom, Ben Chamberlain, and Thomas Markovich. The blog summarizes recent efforts in understanding GNNs from the physics perspective. Particularly, the post describes how GNNs can be seen as gradient flows that help in heterophilic graphs. Essentially, the approach implies having one symmetric weight matrix W shared among all GNN layers, residual connections, and non-linearities can be dropped. Under this sauce, classic GCNs by Kipf & Welling strike back!
Graph-based nearest neighbor search by Liudmila Prokhorenkova and Dmitry Baranchuk. The post gives a nice intro to the graph-based technology (eg, HNSW) behind many vector search engines and reviews recent efforts in improving scalability and recall. Particularly, the authors show that non-Euclidean hyperbolic space might have a few cool benefits unattainable by classic Euclidean-only algorithms.
Long Range Graph Benchmark by Vijay Dwivedi. Covered in one the previous posts in this channel, the post introduces a new suite of tasks designed for capturing long-range interactions in graphs.
Foundation Models are Entering their Data-Centric Era by Chris Ré and Simran Arora. The article is very relevant to any large-scale model pre-training in any domain, be it NLP, Vision, or Graph ML. The authors observe that in the era of foundation models we have to rethink how we train such big models, and data diversity becomes the single most important factor of inference capabilities of those models. Two lessons learned by the authors: “Once a technology stabilizes the pendulum for value swings back to the data” and “We can (and need to) handle noise”.
GraphML News
Today (Oct 21st) MIT hosts the first Molecular ML Conference (MoML 2022).
OpenBioML backed by StabilityAI (creators of Stable Diffusion) launches an open-source initiative to improve protein structure prediction. The base implementation will be OpenFold — powered by the cluster behind Stable Diffusion, we could expect full reproduction of AlphaFold experiments, ablations, and, of course, better interfaces thanks to the open-source community!
OpenMM, one of the most popular Python frameworks for molecular modeling, released a new version 8.0.
The workshop on Geometric Deep Learning in Medical Image Analysis (GeoMediA), to be held on Nov 18th in Amsterdam, published the list of accepted papers and a program including keynotes by Emma Robinson and Michael Bronstein.
Today (Oct 21st) MIT hosts the first Molecular ML Conference (MoML 2022).
OpenBioML backed by StabilityAI (creators of Stable Diffusion) launches an open-source initiative to improve protein structure prediction. The base implementation will be OpenFold — powered by the cluster behind Stable Diffusion, we could expect full reproduction of AlphaFold experiments, ablations, and, of course, better interfaces thanks to the open-source community!
OpenMM, one of the most popular Python frameworks for molecular modeling, released a new version 8.0.
The workshop on Geometric Deep Learning in Medical Image Analysis (GeoMediA), to be held on Nov 18th in Amsterdam, published the list of accepted papers and a program including keynotes by Emma Robinson and Michael Bronstein.
MoML Conference
MoML | MIT Jameel Clinic
Molecular Machine Learning Conference | MIT Jameel Clinic
The conference brings together students, experts and leaders across areas with the goal of advancing how machine learning methods can address key scientific goals related to molecular modeling, molecular…
The conference brings together students, experts and leaders across areas with the goal of advancing how machine learning methods can address key scientific goals related to molecular modeling, molecular…
Wednesday Papers
Something you might be interested in while waiting for the LOG reviews (unless you are writing emergency reviews, hehe)
- Expander Graphs Are Globally Synchronising by Pedro Abdalla, Afonso S. Bandeira, Martin Kassabov, Victor Souza, Steven H. Strogatz, Alex Townsend. In the previous posts, we covered interesting properties of Expander Graphs (Cayley graphs). This new work on the theory side employs expander graphs to demonstrate that random Erdos-Renyi graphs G(n,p) are connected if
- On Classification Thresholds for Graph Attention with Edge Features by Kimon Fountoulakis, Dake He, Silvio Lattanzi, Bryan Perozzi, Anton Tsitsulin, Shenghao Yang
- Simplifying Node Classification on Heterophilous Graphs with Compatible Label Propagation by Zhiqiang Zhong, Sergei Ivanov, Jun Pang
- Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures and Interaction Networks by Arian R. Jamasb, Ramon Viñas, Eric J. Ma, Charlie Harris, Kexin Huang, Dominic Hall, Pietro Lió, Tom L. Blundell
- Annotation of spatially resolved single-cell data with STELLAR by Maria Brbić, Kaidi Cao, John W. Hickey, Yuqi Tan, Michael P. Snyder, Garry P. Nolan & Jure Leskovec
Something you might be interested in while waiting for the LOG reviews (unless you are writing emergency reviews, hehe)
- Expander Graphs Are Globally Synchronising by Pedro Abdalla, Afonso S. Bandeira, Martin Kassabov, Victor Souza, Steven H. Strogatz, Alex Townsend. In the previous posts, we covered interesting properties of Expander Graphs (Cayley graphs). This new work on the theory side employs expander graphs to demonstrate that random Erdos-Renyi graphs G(n,p) are connected if
p ≥ (1 + eps)(log n)/n 👏- On Classification Thresholds for Graph Attention with Edge Features by Kimon Fountoulakis, Dake He, Silvio Lattanzi, Bryan Perozzi, Anton Tsitsulin, Shenghao Yang
- Simplifying Node Classification on Heterophilous Graphs with Compatible Label Propagation by Zhiqiang Zhong, Sergei Ivanov, Jun Pang
- Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures and Interaction Networks by Arian R. Jamasb, Ramon Viñas, Eric J. Ma, Charlie Harris, Kexin Huang, Dominic Hall, Pietro Lió, Tom L. Blundell
- Annotation of spatially resolved single-cell data with STELLAR by Maria Brbić, Kaidi Cao, John W. Hickey, Yuqi Tan, Michael P. Snyder, Garry P. Nolan & Jure Leskovec
Webinar on Fraud Detection with GNNs
Graph neural networks (GNN) are increasingly being used to identify suspicious behavior. GNNs can combine graph structures, such as email accounts, addresses, phone numbers, and purchasing behavior to find meaningful patterns and enhance fraud detection.
Join the webinar by Nikita Iserson, Senior ML/AI Architect at TigerGraph, to learn how graphs are used to uncover fraud on Thursday, Oct 27th, 6pm CET.
Agenda:
• Introduction to TigerGraph
• Fraud Detection Challenges
• Graph Model, Data Exploration, and Investigation
• Visual Rules, Red Flags, and Feature Generation
• TigerGraph Machine Learning Workbench
• XGBoost with Graph Features
• Graph Neural Network and Explainability
Graph neural networks (GNN) are increasingly being used to identify suspicious behavior. GNNs can combine graph structures, such as email accounts, addresses, phone numbers, and purchasing behavior to find meaningful patterns and enhance fraud detection.
Join the webinar by Nikita Iserson, Senior ML/AI Architect at TigerGraph, to learn how graphs are used to uncover fraud on Thursday, Oct 27th, 6pm CET.
Agenda:
• Introduction to TigerGraph
• Fraud Detection Challenges
• Graph Model, Data Exploration, and Investigation
• Visual Rules, Red Flags, and Feature Generation
• TigerGraph Machine Learning Workbench
• XGBoost with Graph Features
• Graph Neural Network and Explainability
GraphML News
It’s Friday - time to look back at what happened in the field this week.
📚 Blogs & Books
(Editors’ Choice 👍) An Introduction to Poisson Flow Generative Models by Ryan O’Connor. Diffusion models are the hottest topic in Geometric Deep Learning but have an important drawback - the sampling is slow 🐌 due to necessity of performing 100-1000 forward passes. Poisson Flow generative models take inspiration from physics and offer another look at the generation process that allows much much faster sampling. This blog gives a very detailed and pictorial explanation of Poisson Flows.
Awesome GFlowNets by Narsil-Dinghuai Zhang. Generative Flow Networks (GFlowNets) bring together generative modeling with ideas from reinforcement learning and show especially promising results in drug discovery. This Awesome repo will get you acquainted with the main ideas, most important papers, and some implementations
Sheaf Theory through Examples - a book by Daniel Rosiak on the sheaf theory. If you felt you want to know more after reading the Sheaf Diffusion paper - this would be your next step.
🗞️ News & Press
Elon Musk finally acquired Twitter so it’s time to move to Telegram
Mila and Helmholtz Institute announced a new German-Canadian partnership on developing causal models of the cell. As Geometric DL is in the heart of modern structural biology, we’ll keep an eye on the future outcomes.
🛠️ Code & Data
We somehow missed that but catching up now - the DGL team at Amazon published the materials of the KDD’2022 tutorial on GNNs in Life Sciences.
Geometric Kernels - a new fresh framework for kernels and Gaussian processes on non-Euclidean spaces (including graphs, meshes, and Riemannian manifodls). Supports PyTorch, TensorFlow, JAX, and Numpy.
A post with the hot new papers for your weekend reading will be arriving shortly!
It’s Friday - time to look back at what happened in the field this week.
📚 Blogs & Books
(Editors’ Choice 👍) An Introduction to Poisson Flow Generative Models by Ryan O’Connor. Diffusion models are the hottest topic in Geometric Deep Learning but have an important drawback - the sampling is slow 🐌 due to necessity of performing 100-1000 forward passes. Poisson Flow generative models take inspiration from physics and offer another look at the generation process that allows much much faster sampling. This blog gives a very detailed and pictorial explanation of Poisson Flows.
Awesome GFlowNets by Narsil-Dinghuai Zhang. Generative Flow Networks (GFlowNets) bring together generative modeling with ideas from reinforcement learning and show especially promising results in drug discovery. This Awesome repo will get you acquainted with the main ideas, most important papers, and some implementations
Sheaf Theory through Examples - a book by Daniel Rosiak on the sheaf theory. If you felt you want to know more after reading the Sheaf Diffusion paper - this would be your next step.
🗞️ News & Press
Elon Musk finally acquired Twitter so it’s time to move to Telegram
Mila and Helmholtz Institute announced a new German-Canadian partnership on developing causal models of the cell. As Geometric DL is in the heart of modern structural biology, we’ll keep an eye on the future outcomes.
🛠️ Code & Data
We somehow missed that but catching up now - the DGL team at Amazon published the materials of the KDD’2022 tutorial on GNNs in Life Sciences.
Geometric Kernels - a new fresh framework for kernels and Gaussian processes on non-Euclidean spaces (including graphs, meshes, and Riemannian manifodls). Supports PyTorch, TensorFlow, JAX, and Numpy.
A post with the hot new papers for your weekend reading will be arriving shortly!
Assemblyai
An Introduction to Poisson Flow Generative Models
Poisson Flow Generative Models (PFGMs) are a new type of generative Deep Learning model, taking inspiration from physics much like Diffusion Models. Learn the theory behind PFGMs and how to generate images with them in this easy-to-follow guide.
Halloween Paper Reading 🎃
We hope you managed to procure enough candies and carve spooky faces on a bunch of pumpkins those days so now you can relax and read a few papers (not that spooky).
Molecular dynamics is one of the booming Geometric DL areas where equivariant models show the best qualities. The two cool recent papers on that topic:
⚛️ Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations by Fu et al. introduces a new benchmark for molecular dynamics - in addition to MD17, the authors add datasets on modeling liquids (Water), peptides (Alanine dipeptide), and solid-state materials (LiPS). More importantly, apart from Energy as the main metric, the authors consider a wide range of physical properties like Stability, Diffusivity, and Radial Distribution Functions. Most SOTA molecular dynamics models were probed including SchNet, ForceNet, DimeNet, GemNet (-T and -dT), NequIP.
Density Functional Theory (DFT) calculations are one of the main workhorses of molecular dynamics (and account for a great deal of computing time in big clusters). DFT is O(n^3) to the input size though, so can ML help here? Learned Force Fields Are Ready For Ground State Catalyst Discovery by Schaarschmidt et al. present the experimental study of models of learned potentials - turns out GNNs can do a very good job in O(n) time! Easy Potentials (trained on Open Catalyst data) turns out to be quite a good predictor especially when paired with a subsequent postprocessing step. Model-wise, it is an MPNN with the NoisyNodes self-supervised objective that we covered a few weeks ago.
🪐 For astrophysics aficionados: Mangrove: Learning Galaxy Properties from Merger Trees by Jespersen et al. apply GraphSAGE to merger trees of dark matter to predict a variety of galactic properties like stellar mass, cold gas mass, star formation rate, and even black hole mass. The paper is heavy on the terminology of astrophysics but pretty easy in terms of GNN parameterization and training. Mangrove works 4-9 orders of magnitude faster than standard models (that is, 10 000 - 1 000 000 000 times faster). Experimental charts are pieces of art that you can hang on a wall.
🤖 Compositional Semantic Parsing with Large Language Models by Drozdov, Schärli et al. pretty much solve the compositional semantic parsing task (natural language query - structured query like SPARQL) using only
We hope you managed to procure enough candies and carve spooky faces on a bunch of pumpkins those days so now you can relax and read a few papers (not that spooky).
Molecular dynamics is one of the booming Geometric DL areas where equivariant models show the best qualities. The two cool recent papers on that topic:
⚛️ Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations by Fu et al. introduces a new benchmark for molecular dynamics - in addition to MD17, the authors add datasets on modeling liquids (Water), peptides (Alanine dipeptide), and solid-state materials (LiPS). More importantly, apart from Energy as the main metric, the authors consider a wide range of physical properties like Stability, Diffusivity, and Radial Distribution Functions. Most SOTA molecular dynamics models were probed including SchNet, ForceNet, DimeNet, GemNet (-T and -dT), NequIP.
Density Functional Theory (DFT) calculations are one of the main workhorses of molecular dynamics (and account for a great deal of computing time in big clusters). DFT is O(n^3) to the input size though, so can ML help here? Learned Force Fields Are Ready For Ground State Catalyst Discovery by Schaarschmidt et al. present the experimental study of models of learned potentials - turns out GNNs can do a very good job in O(n) time! Easy Potentials (trained on Open Catalyst data) turns out to be quite a good predictor especially when paired with a subsequent postprocessing step. Model-wise, it is an MPNN with the NoisyNodes self-supervised objective that we covered a few weeks ago.
🪐 For astrophysics aficionados: Mangrove: Learning Galaxy Properties from Merger Trees by Jespersen et al. apply GraphSAGE to merger trees of dark matter to predict a variety of galactic properties like stellar mass, cold gas mass, star formation rate, and even black hole mass. The paper is heavy on the terminology of astrophysics but pretty easy in terms of GNN parameterization and training. Mangrove works 4-9 orders of magnitude faster than standard models (that is, 10 000 - 1 000 000 000 times faster). Experimental charts are pieces of art that you can hang on a wall.
🤖 Compositional Semantic Parsing with Large Language Models by Drozdov, Schärli et al. pretty much solve the compositional semantic parsing task (natural language query - structured query like SPARQL) using only
code-davinci-002 language model from OpenAI (which is InstructGPT fine-tuned on code). No need for hefty tailored semantic parsing models - turns out a smart extension of the Chain-of-thought prompting (aka "let's think step by step") devised as Least-to-Most prompting (where we first answer easy subproblems before generating a full query) yields whopping 95% accuracy even on hardest Compositional Freebase Questions (CFQ) dataset. CFQ was introduced at ICLR 2020, and just after two years LMs cracked this task - looks like it's time for the new, even more complex dataset.Telegram
Graph Machine Learning
Graph Papers of the Week
Expander Graph Propagation by Andreea Deac, Marc Lackenby, Petar Veličković. A clever approach to bypass bottlenecks without fully-connected graph transformers. Turns out that sparse but well-connected 4-regular Cayley graphs (expander…
Expander Graph Propagation by Andreea Deac, Marc Lackenby, Petar Veličković. A clever approach to bypass bottlenecks without fully-connected graph transformers. Turns out that sparse but well-connected 4-regular Cayley graphs (expander…
ESM Metagenomic Atlas
Meta AI just published the ESM Metagenomic Atlas - a collection of >600M metagenomic protein structures built with ESMFold - the most recent model from Meta for protein folding. We covered ESMFold a few months ago, and both ESM-2 and ESMFold are available in the recent 🤗 Transformers 4.24 release (checkpoints for 8M - 3B models for ESM2, a full checkpoint for ESMFold). That’s a nice flex from Meta AI after DeepMind released 200M AlphaFold predictions for PDB, the community definitely benefits from the competition.
Meta AI just published the ESM Metagenomic Atlas - a collection of >600M metagenomic protein structures built with ESMFold - the most recent model from Meta for protein folding. We covered ESMFold a few months ago, and both ESM-2 and ESMFold are available in the recent 🤗 Transformers 4.24 release (checkpoints for 8M - 3B models for ESM2, a full checkpoint for ESMFold). That’s a nice flex from Meta AI after DeepMind released 200M AlphaFold predictions for PDB, the community definitely benefits from the competition.
Esmatlas
ESM Metagenomic Atlas | Meta AI
An open atlas of 617 million predicted metagenomic protein structures
Weekend Reading
For those who are not busy with ICLR rebuttals — you can now have a look at all accepted NeurIPS’22 papers on OpenReview (we will have a review of graph papers at NeurIPS a bit later). Meanwhile, the week brought several cool new works:
Are Defenses for Graph Neural Networks Robust? by Felix Mujkanovic, Simon Geisler, Stephan Günnemann, Aleksandar Bojchevski. Probably THE most comprehensive work of 2022 on adversarial robustness of GNNs.
TuneUp: A Training Strategy for Improving Generalization of Graph Neural Networks by Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Jure Leskovec. The paper introduces a new self-supervised strategy by asking the model to generalize better on tail nodes of the graph after some synthetic edge dropout. Works in node classification, link prediction, and recsys.
Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions by Nikolaos Karalias, Joshua David Robinson, Andreas Loukas, Stefanie Jegelka. Insightful theoretical work on set functions and discrete learning. Particularly good results on combinatorial optimization problems like max clique and max independent set.
For those who are not busy with ICLR rebuttals — you can now have a look at all accepted NeurIPS’22 papers on OpenReview (we will have a review of graph papers at NeurIPS a bit later). Meanwhile, the week brought several cool new works:
Are Defenses for Graph Neural Networks Robust? by Felix Mujkanovic, Simon Geisler, Stephan Günnemann, Aleksandar Bojchevski. Probably THE most comprehensive work of 2022 on adversarial robustness of GNNs.
TuneUp: A Training Strategy for Improving Generalization of Graph Neural Networks by Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Jure Leskovec. The paper introduces a new self-supervised strategy by asking the model to generalize better on tail nodes of the graph after some synthetic edge dropout. Works in node classification, link prediction, and recsys.
Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions by Nikolaos Karalias, Joshua David Robinson, Andreas Loukas, Stefanie Jegelka. Insightful theoretical work on set functions and discrete learning. Particularly good results on combinatorial optimization problems like max clique and max independent set.
openreview.net
NeurIPS 2022 Conference
Welcome to the OpenReview homepage for NeurIPS 2022 Conference