Graph Machine Learning
6.7K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
Postdoctoral Researcher Position in Geometric Deep Learning & AI for Science at AITHYRA

b/w AITHYRA and Technical University of Vienna

Michael Bronstein, AITHYRA Scientific Director AI and Honorary Professor of the Technical University of Vienna in collaboration with Ismail Ilkan Ceylan, expert in graph machine learning, invites outstanding candidates to apply for a postdoctoral research position in Geometric Deep Learning, with a strong emphasis on applications to biology and scientific discovery. This unique research collaboration between AITHYRA and the Technical University of Vienna offers an exceptional opportunity to engage in both foundational machine learning research and high-impact interdisciplinary applications in the natural sciences. The position offers access to top-tier academic and industry research ecosystems and is ideally suited for researchers seeking to push the boundaries of geometric and graph-based learning in real-world scientific contexts. The research program is flexible and interdisciplinary.

The application deadline is 31.8.2025. Link

---

Highly recommend applying - working with Michael and Ismail is a great experience 🙂
🔥167🙏3👍1
GraphML News (Aug 9th) - AITHYRA Call for PhD students, Chai Discovery Round, Graph Learning Meets Theoretical CS

While everyone is busy with GPT-5, Opus 4.1, and GPT-OSS, let’s sneak in some graph news!

🎓 A few days ago you could’ve seen an AITHYRA call for postdocs - but fear not if you are still deciding about starting your scientific career, AITHYRA has a call for PhD students too! The plan includes 15-20 fully funded scholarships on the intersection of AI/ML, Molecular Technologies and Systems Medicine (degree either from Medical University or TU of Vienna). Application deadline is September 10, 2025. Glad to see Vienna becoming a new scientific hub in Europe.

💸 Chai Discovery raised $70M Series A from Menlo Ventures & Anthology Fund (Anthropic), Thrive Capital, OpenAI + others ($30M seed). The startup is known for Chai-2 generative model and aims at antibody design. Congrats to Chai!

📚 The Simons Foundation organizes the Graph Learning Meets Theoretical CS workshop (to be held physically at UC Berkeley) inviting renowned professors from both areas (and me, a simple man from industry). The program is packed with a bunch of cool topics starting from practicals things like graph foundation models up to graphons, invariances, combinatorial optimization, and many more. The talks will be streamed on YouTube, and participation is actually free, so come by if you’re at the UC Berkeley campus.
👍1610🔥6
GraphML News (Aug 30th) - OpenAI enters bio, AtomWorks, OrbMol, NeurIPS workshops

📈 The church of scale enters comp bio: OpenAI published first results on protein design of Yamanaka factors (linked to cell aging) together with Retro Bio (where sama happens to be one of investors). The backbone is gpt-4b micro initialized from an existing 4o checkpoint and enriched with “tokenized 3D structure data” (remember ESM-3?) fine-tuned on a specialized dataset. Experimental results are claimed to be quite solid: hit rates of 30-50% (typically it’s less than 10%) with a bunch of other biochemistry markers. The argument between scalable non-equivariant models vs bespoke geometric models got a new data point: will raw compute of OpenAI + vanilla transformers conquer the biotech world too? We’ll keep you posted.

🧬 BakerLab released RosettaFold 3 and AtomWorks, a data processing framework used to train it. While you’d certainly see general remarks about comparisons with AF3 and Boltz, I’d highlight that comp bio folks start to recognize the value of data as much as the model itself (what frontier labs recognized quite some time ago). Real engineering will start when they’d need to serve those protein design models to a few billion clients 😉

⚛️ Orbital Materials released OrbMol, a version of Orb-v3 for molecules (the others are for crystals) trained on OpenMolecules 2025. Orb is still an MPNN which makes it quite fast and useful for MD computations.

By the way, also check out NeurIPS 2025 workshops — finally more diverse than just LLMs and reasoning — and features a handful of graph learning venues.

Weekend reading:

Turning Tabular Foundation Models into Graph Foundation Models from Yandex Research - another interesting approach to GFMs via TabPFNv2 over original node features + mined structural features
👍1710👏6
GraphML News (September 2025) - Stanford Graph Learning WS, MoML, RF Diffusion 3

While the community is processing NeurIPS rejects due to “limited physical space” and rushing to the ICLR deadline, it’s about time to plan attending some future events!

🌲 Stanford organizes its annual Graph Learning Workshop on Oct 14th. The main topics are Relational Foundation Models (get ready to hear a lot about it, hehe), Agents (Biomni is quite successful), and fast LLM inference. I attended the event last 3 years and it was quite fun.

🧬 About one week later (Oct 22nd) and on the East Coast, MIT organizes Molecular ML (MoML) conference going full Geometric DL mode — expect news about Boltz and new drug discovery methods, most of the big pharma is in the sponsors.

🧬🧬 The Baker Lab released a pre-print of RFDiffusion 3 (the data pipeline of it, AtomWorks, was pre-printed a bit earlier). Compared to AF3, it has much fewer Pairformer layers (only 2 vs 48) without all the triangular attention complexity, and most of the params and compute went into the diffusion module (and good data pipelines, hehe). RFD3 is substantially faster than previous versions on longer residue structures, and much more accurate than RFaa. Code is not yet there.

🎅 FAIR Chemistry opened an Open Molecules 2025 leaderboard and, to our utter amusement, 4-years old GemNet OC tops the benchmark in several tasks. The grand-dad of ML potentials still rocks if you give it better data and more compute. That’s a good lesson on designing models that can stand a test of time and new data.

Finally, for some weekend reading, check Random graphs as perfect expanders on Quanta Magazine. Obtaining good expanders is a non-trivial task (which will very quickly get you into the group theory), but turns out you should never underestimate good ole ER graphs to be sufficiently ok expanders.
24
How can we create general-purpose graph foundation models?
(by Dmitry Eremeev)

For a long time, we believed that general-purpose graph foundation models were impossible to create. Indeed, graphs are used to represent data across many different domains, and thus graph machine learning must handle tasks on extremely diverse datasets, such as social, information, transportation, and co-purchasing networks, or models of various physical, biological, or engineering systems. Given the vast differences in structure, features, and labels among these datasets, it seemed unlikely that a single model could achieve robust cross-domain generalization and perform well on all of them.

However, we noticed that tabular machine learning faces a similar challenge of working with diverse datasets containing different features and labels. And yet, this field has recently witnessed the emergence of first successful foundation models such as TabPFNv2, which are based on the prior-data fitted networks (PFNs) paradigm. Thus, we have decided to try to bring their success to the graph domain.

Our first attempt, G2T-FM, was relatively straightforward. We manually injected graph information into node features by computing structural and positional encodings, along with neighborhood-aggregated features. We then applied tabular foundation models (TabPFNv2 and LimiX) to these enriched features. Even this simple approach delivered impressive results. G2T-FM not only strongly outperforms previous graph foundation models on the GraphLand benchmark and classic datasets, but also often outperforms architecturally-improved and carefully tuned GNNs trained from scratch.

Building on this, our next step was to create GraphPFN – the first graph foundation model in the PFN framework. Moving beyond manual feature engineering of the previous approach, we first integrated message passing modules into the LimiX model so that it could learn graph-based dependencies directly, and then continually pretrained it on 4,000,000 synthetic graph datasets sampled from our specially designed attributed graph prior. The obtained model can perform node property prediction on graph datasets in a single forward pass via in-context learning and produces strong results, substantially outperforming both G2T-FM and classic GNNs on several datasets.

There remains much work to be done, including scaling to larger graphs, improving model architectures and designing better graph priors for synthetic dataset generation. However, we are now convinced that building general-purpose graph foundation models is indeed possible, and a prior-data fitted network approach is a promising path towards this goal.

For more details, check out our papers:
Turning Tabular Foundation Models into Graph Foundation Models
GraphPFN: A Prior-Data Fitted Graph Foundation Model
👍197🔥6
Tired of evaluating your graph ML models on Cora, CiteSeer, and PubMed? We have a better benchmark for you!
(by Oleg Platonov)

Paper: link (NeurIPS 2025 D&B track)
Datasets: Zenodo and PyG (in PyG, all the necessary feature preprocessing can be done automatically)
Code: GitHub

Recently, there has been a lot of criticism of existing popular graph ML benchmark datasets concerning such aspects as lacking practical relevance, low structural diversity that leaves most of the possible graph structure space not represented, low application domain diversity, graph structure not being beneficial for the considered tasks, and potential bugs in the data collection processes. Some of these criticisms previously appeared on this channel.

To provide the community with better benchmarks, we present GraphLand: a collection of 14 graph datasets for node property prediction coming from diverse real-world industrial applications of graph ML. What makes this benchmark stand out?

Diverse application domains: social networks, web graphs, road networks, and more. Importantly, half of the datasets feature node-level regression tasks that are currently underrepresented in graph ML benchmarks, but are often encountered in real-world applications.

Range of sizes: from thousands to millions of nodes, providing opportunities for researchers with different computational resources.

Rich node attributes that contain numerical and categorical features — these are more typical for industrial applications than textual descriptions that are standard for current benchmarks.

Different learning scenarios. For all datasets, we provide two random data splits with low and high label rate. Further, many of our networks are evolving over time, and for them we additionally provide more challenging temporal data splits and an opportunity to evaluate models in the inductive setting where only an early snapshot of the evolving network is available at train time.

We evaluated a range of models on our datasets and found that, while GNNs achieve strong performance on industrial datasets, they can sometimes be rivaled by popular in the industry gradient boosted decision trees which are provided with additional graph-based input features.

Further, we evaluated several graph foundation models (GFMs). Despite much attention being paid to GFMs recently, we found that there are currently only a few GFMs that can handle arbitrary node features (which is required for true generalization between different graphs) and that these GFMs produce very weak results on our benchmark. So it seemed like the problem of developing general-purpose graph foundation models was far from being solved, which motivated our research in this direction (see the previous post).
👍224🔥2