Axis of Ordinary

Interesting developments: https://x.com/adcock_brett/status/1886860098980733197

Related:
1. OpenAI files a trademark application for humanoid robots https://www.businessinsider.com/openai-trademark-humanoid-robots-vr-headsets-sam-altman-hardware-2025-2
2. OpenAI is hiring robotics engineers https://x.com/kalinowski007/status/1877809579154948223

🤔2

2.7K views20:30

0:20

Physical Intelligence (π) is open sourcing π_0, the first general-purpose robotic foundation model: https://www.pi.website/blog/openpi

👍3👏1

1.2K views20:36

0:50

ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills

Open source: https://agile.human2humanoid.com/

👍7

1.34K views20:51

MaestroMotif: A method for AI-assisted skill design that produces highly capable and steerable hierarchical agents.

The first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making.

Read more: https://github.com/mklissa/maestromotif

1K views16:25

0:46

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models https://arxiv.org/abs/2502.02492

2.88K views17:33

Links for 2025-02-05

AI:

1. A step towards robust jailbreak defenses: “After thousands of hours of red teaming, not one participant found a reliable jailbreak that extracted detailed information across a set of 10 harmful questions.” https://www.anthropic.com/research/constitutional-classifiers

2. Google presents: Scaling Embedding Layers in Language Models —Outperforms a 1.9B parameter baseline across diverse corpora, while using only half the inference time FLOPS https://arxiv.org/abs/2502.01637

3. Improving Transformer World Models for Data-Efficient RL —super-human-level performance on the challenging Craftax-classic benchmark, an open-world 2D survival game https://arxiv.org/abs/2502.01591

4. Process Reinforcement through Implicit Rewards—PRIME achieves a 15.1% average improvement across several key reasoning benchmarks over the SFT model https://arxiv.org/abs/2502.01456

5. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training https://arxiv.org/abs/2501.17161

6. First large-scale, verifiable code training dataset (ACECODE-89K) with automatically generated test cases. This is a major step towards enabling more effective RL for code generation. https://arxiv.org/abs/2502.01718

7. Reinforcement Learning for Long-Horizon Interactive LLM Agents — ”Our analysis reveals a variety of behavioral patterns that emerge in the course of training…” https://arxiv.org/abs/2502.01600

8. Natural Language Reinforcement Learning —enabling the training of chain-of-thought language policies and language value function (known as generative value) solely from environment feedback, without human or stronger model's labels. https://github.com/waterhorse1/Natural-language-RL

9. Chain-of-Associated-Thoughts (CoAT) is a new framework that enhances LLMs' reasoning abilities by combining Monte Carlo Tree Search with dynamic knowledge integration. https://arxiv.org/abs/2502.02390

10. Language Models Use Trigonometry to Do Addition https://www.lesswrong.com/posts/E7z89FKLsHk5DkmDL/language-models-use-trigonometry-to-do-addition-1

11. S1: The $6 R1 Competitor? https://timkellogg.me/blog/2025/02/03/s1

12. Sam Altman says the leap from GPT-4 to GPT-5 will be as big as that of GPT-3 to 4 and the plan is to integrate the GPT and o series of models into one model that can do everything https://youtu.be/qyTOVq31JIE?si=TzSFM3W45hPCSXZ1&t=741

13. Sam Altman: “finally for the first time, I think the models that are on the near term horizon, um the models that will release in the coming months, are over the threshold of being good enough to really address these problems and now people just have to go build the solutions” https://www.youtube.com/live/8vHr_8k8IbM?si=HUFjdfZvkPG921Te&t=3446

14. Decoding can be just as good as regular pointwise heads for regression, but you also get density estimation for free. https://arxiv.org/abs/2501.19383

15. Google DeepMind released a book on scaling language models on TPUs. https://jax-ml.github.io/scaling-book/index

16. Time to democratize humanoid robots! ToddlerBot, a low-cost ($6K), open-source humanoid for robotics and AI research. https://toddlerbot.github.io/

AI politics:

1. How to Rapidly Build Gigawatt-Scale AI Clusters in the United States https://ifp.org/special-compute-zones/

2. Palantir CTO Shyam Sankar says the US is in a winner-take-all AI arms race and war with China and DeepSeek has made it clear that "the time to mobilize has come" https://www.youtube.com/live/MW0zvoEMdRA?si=mmuKS3myNpSeSufO&t=2025

3. 93% of IT Leaders Plan to Deploy AI Agents by 2026 https://www.zdnet.com/article/93-of-it-leaders-will-implement-ai-agents-in-the-next-two-years/

Science:

1. Scientists ‘mimic real biological processes’ using synthetic neurons https://news.northwestern.edu/stories/2025/01/scientists-mimic-real-biological-processes-using-synthetic-neurons

2. Necessity of complex numbers https://www.youtube.com/watch?v=f079K1f2WQk

3. The chance of asteroid 2024 YR4 hitting out planet in 2032 is now 1.5%, or 1 in 67. https://x.com/Astro_Jonny/status/1886742128199336362

❤2👍1

1.13K views18:19

UK government rips up rules to fire-up nuclear power https://www.gov.uk/government/news/government-rips-up-rules-to-fire-up-nuclear-power

Awesome! This is the way!

👏5👎1

833 views13:08

Human level sample efficiency? LIMO: Less is More for Reasoning https://arxiv.org/abs/2502.03387

- LIMO achieves unprecedented performance in mathematical reasoning with only 1% of the training data used by previous approaches, showcasing remarkable data efficiency.

- LIMO exhibits exceptional out-of-distribution generalization, outperforming models trained on 100x more data by a significant 40.5% absolute improvement across diverse benchmarks.

LIMO Hypothesis: In foundation models with comprehensively encoded domain knowledge (achieved through extensive pre-training), sophisticated reasoning can emerge through minimal, precisely orchestrated demonstrations of cognitive processes.

- The core of LIMO's success lies in the meticulous curation of a small, high-quality dataset. The resulting dataset of 817 examples was carefully selected from millions of candidates.

- LIMO fundamentally challenges the assumption that massive datasets are necessary for complex reasoning in LLMs. Quality of the examples, rather than just the number, is the key factor.

- LIMO suggests that modern, well-pretrained models like Qwen already possess latent, rich reasoning capabilities. LIMO demonstrates that these capabilities can be unlocked and activated effectively with the right "cognitive templates" provided by curated examples.

- LIMO indicates that sophisticated reasoning, regardless of complexity, could potentially be activated with minimal samples given sufficient pre-trained domain knowledge and optimal cognitive reasoning chains for activation.

Further research is needed to validate the LIMO hypothesis across different model architectures and reasoning domains beyond mathematics.

927 views13:42

1:37

Making robots truly helpful and safe in our everyday lives: Latent-Space Reachability Analysis https://kensukenk.github.io/latent-safety/

A new approach called "Latent Safety Filters" allows robots to understand and prevent complex "failures." Imagine teaching a robot to pick up a bag of Skittles. Traditional safety systems might stop the robot from bumping into the table, but they wouldn't understand that pulling the bag up too quickly will cause the candy to spill everywhere.

The researchers equip the robot with a world model that learn to understand how the world works just by watching videos and trying things out. Think of it as the robot building a mental picture of the scene.

The "Safety Filter" then acts like a guardian angel for the robot's actions. It monitors what the robot is about to do and checks if it's heading towards a failure in its imagined world. It does this without needing to be told exactly how to be safe in every situation beforehand. It learns from experience and its "imagination."

🤔2😱1

1.03K views14:15

0:39

Hibiki: Real-time speech translation that runs on your phone.

Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech.

Samples: https://x.com/neilzegh/status/1887498102455869775
Paper: https://arxiv.org/abs/2502.03382
Inference code: https://github.com/kyutai-labs/hibiki
Models: https://huggingface.co/kyutai

🤯3❤2

1.39K views15:28

Links for 2025-02-06

AI:

1. Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search https://satori-reasoning.github.io/blog/satori/

2. Dynamic object goal pushing with mobile manipulators through constrained reinforcement learning https://www.youtube.com/watch?v=wGAdPGVf9Ws

3. SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations https://arxiv.org/abs/2502.02472

4. BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation https://www.arxiv.org/abs/2502.01697

5. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning https://arxiv.org/abs/2502.03275

6. Demystifying Long Chain-of-Thought Reasoning in LLMs https://arxiv.org/abs/2502.03373

7. Deep Dive into LLMs like ChatGPT: "This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications." https://www.youtube.com/watch?v=7xTGNNLPyMI

Science and Technology:

1. The brain calculates with waves: New insights into neural waves could revolutionize the development of energy-efficient AI systems https://www.mpg.de/24143275/oscillating-networks-in-the-brain

2. Google says commercial quantum computing applications arriving within five years https://www.reuters.com/technology/google-says-commercial-quantum-computing-applications-arriving-within-five-years-2025-02-05/ [no paywall: https://archive.is/iS7s4]

3. What is an Electron? How Times Have Changed https://profmattstrassler.com/2025/02/06/what-is-an-electron-how-times-have-changed/

4. A gene-editing technology called 'dual prime editing' was used in plants for the first time. This tool can precisely delete up to two million bases of DNA, or replace a 258,000 base stretch of DNA with a new sequence, in both wheat and tomatoes (so far). https://www.nature.com/articles/s41477-024-01898-3

5. A large study, performed on 960 female mice, suggests that genetics – and not diet or exercise – are the biggest predictor of which mice live longer than others. https://www.nature.com/articles/s41586-024-08026-3

👍1

1.28K views16:17

0:35

"We know how to improve these models so, so much. And there's not an obvious roadblock in front of us."

Sam Altman believes the AI progress from Feb 2025 to Feb 2027 will feel more impressive than the advancements from Feb 2023 to Feb 2025.

Source: In the Age of AI – A Panel Discussion with Sam Altman at TU Berlin https://www.youtube.com/live/McuO7Osgzqo?si=B4NEOIZ_R3fB6yys&t=2993

🥴8🔥4

941 views15:14

AlphaGeometry2 can solve Olympiad geometry problems at a superhuman level

- It has an 84% solve rate on IMO geometry problems from the past 25 years, up from 54% with the previous version.

- The system uses a combination of language models and symbolic reasoning to solve geometry problems.

- The language model is used to generate possible solutions, and the symbolic engine is used to check whether these solutions are correct.

- AlphaGeometry2 is also able to solve problems that are not constructive, meaning that they cannot be solved by simply following a set of steps. This is done by using a numerical optimization algorithm to find a possible solution.

Paper: https://arxiv.org/abs/2502.03544

🔥9😁1

1.04K viewsedited 15:33

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving https://arxiv.org/abs/2502.00212

Inspired by how mathematicians continue advancing the field, the authors train an LLM that conjectures and attempts proofs; then they iteratively reinforce/re-train it with correct, elegant, novel, and approachable generated conjectures and correctly generated proofs.

STP has two main components: a conjecturer and a prover. The conjecturer generates increasingly challenging conjectures that are barely provable by the current prover. The prover attempts to prove these conjectures and receives training signals based on its success.

STP significantly improves the performance of LLMs in formal theorem proving.

👍2

991 views19:06

We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answers.

We evaluated them on the AIME 2025 I competition from *yesterday* and the results are good!

Source: https://x.com/mbalunovic/status/1887962694659060204

1.11K views23:09

How AI Takeover Might Happen in 2 Years — LessWrong

A science fiction short story for the weekend. https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-takeover-might-happen-in-2-years

Lesswrong

I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI security researcher is to think about the more troubling scenarios. …

🤡9👍3🥱1

961 views09:50

Robust Autonomy Emerges from Self-Play

Apple team shows self-driving AI can learn entirely by practicing against itself - no human driving data needed.

In testing, their system averages 17.5 years of continuous driving between incidents, far surpassing humans. All through self-play, not imitation.

Paper: https://arxiv.org/abs/2502.03349

👍3

1.55K viewsedited 12:35

Links for 2025-02-08

AI:

1. Sam Altman Dialogue at UTokyo: Altman says OpenAI have an internal AI model that ranks as the 50th best competitive programmer in the world and by the end of 2025 their model will be ranked #1. He says in 2035, a single AI data center will have the same intellectual capacity as all humans plus AI currently on Earth combined. https://www.youtube.com/watch?v=8LmfkUb2uIY

2. GitHub Copilot: The agent awakens https://github.blog/news-insights/product-news/github-copilot-the-agent-awakens/

3. Database-Augmented Transformer-Based Large Language Models Achieve High Accuracy in Mapping Gene-Phenotype Relationships https://www.biorxiv.org/content/10.1101/2025.01.28.635344v1

4. DeepPrep: an accelerated, scalable and robust pipeline for neuroimaging preprocessing empowered by deep learning https://www.nature.com/articles/s41592-025-02599-1

5. A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods https://arxiv.org/abs/2502.01618

6. BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation https://arxiv.org/abs/2502.03860

7. Value-Based Deep RL Scales Predictably https://arxiv.org/abs/2502.04327

8. ReAG - Reasoning Augmented Generation https://github.com/superagent-ai/reag

9. Learn how to use Gemini 2.0 to convert PDF into structured JSON data. https://www.philschmid.de/gemini-pdf-to-data

10. Advancing Reasoning in Large Language Models: Promising Methods and Approaches https://arxiv.org/abs/2502.03671

11. Syntriever: How to Train Your Retriever with Synthetic Data from LLMs https://arxiv.org/abs/2502.03824

12. DeepSeek AI Runs Near Instantaneously on These Weird Chips https://cerebras.ai/blog/cerebras-launches-worlds-fastest-deepseek-r1-llama-70b-inference

13. DARPA program on AI for pure mathematics https://sam.gov/opp/4def3c13ca3947069b1779e7ff697c6a/view

AI investments:

1. Amazon will invest $100 billion in infrastructure this year, mostly in artificial intelligence https://www.bloomberg.com/news/articles/2025-02-06/amazon-projects-profit-missing-estimates-on-rising-ai-spending [no paywall: https://archive.is/Oz9Wd]

2. UAE to invest billions in France AI data center https://www.lemonde.fr/en/france/article/2025/02/06/uae-to-invest-billions-in-france-ai-data-center_6737871_7.html

3. Ilya Sutskever's Safe Superintelligence Inc is in talks to raise funding at a valuation of at least $20 billion. https://www.reuters.com/technology/openai-co-founder-sutskevers-ssi-talks-be-valued-20-bln-sources-say-2025-02-07/ [no paywall: https://archive.is/Nkgrd]

4. Artificial intelligence startup Anthropic’s financing is oversubscribed and on track to be larger than expected, exceeding the $2 billion fundraising that was previously reported https://www.bloomberg.com/news/articles/2025-02-07/general-catalyst-mgx-in-talks-to-join-anthropic-megaround [no paywall: https://archive.is/b9gro]

5. DeepSeek fever fuels patriotic bets on Chinese AI stocks https://www.reuters.com/markets/asia/deepseek-fever-fuels-patriotic-bets-chinese-ai-stocks-2025-02-06/ [no paywall: https://archive.is/5KSJe]

Science and Technology:

1. New laser-based artificial neuron processes enormous data sets at high speed https://www.livescience.com/technology/artificial-intelligence/new-laser-based-artificial-neuron-processes-enormous-data-sets-at-high-speed

2. A high-quality online IQ test normed with a nationally representative US sample. https://www.youtube.com/watch?v=PdS6gYnnk30

3. Active agent against cancer metastasis discovered: Adhibin prevents migration and attachment to other cells https://phys.org/news/2025-02-agent-cancer-metastasis-adhibin-migration.html

4. CiFi: A significant advancement in the field of genomics because it allows scientists to study DNA organization and interactions in more detail than previously possible. https://www.biorxiv.org/content/10.1101/2025.01.31.635566v1

5. Terence Tao on how we measure the cosmos | Part 1 https://www.youtube.com/watch?v=YdOXS_9_P4U

🤡5👍4🥴3

1.26K views14:42