Axis of Ordinary

Links for 2025-03-11 (Part 2)

24. Currently, total AI cognitive effort is growing ~25x yearly—hundreds of times faster than human research effort (4% yearly). Once AI can meaningfully substitute for human research, total research growth (human+AI) will increase *dramatically*. https://www.forethought.org/research/preparing-for-the-intelligence-explosion

25. “Why I believe that the brain does something like gradient descent” https://medium.com/@kording/why-i-believe-that-the-brain-does-something-like-gradient-descent-27611c491205

26. “If we treat the brain as a neural network with optimized algorithms instead of as an artifact disconnected from the rest of AI research, we conclude the coming decade should see many new AI capabilities emerging as we continue closing the gap with the brain.” https://epoch.ai/gradient-updates/what-ai-can-currently-do-is-not-the-story

27. METR evaluated DeepSeek-R1’s ability to act as an autonomous agent. On generic SWE tasks it performs on-par with o1-preview but worse than 3.5 Sonnet (new) or o1. Overall R1 is ~6 months behind leading US AI companies at agentic SWE tasks and is only a small improvement on V3. https://metr.github.io/autonomy-evals-guide/deepseek-r1-report/

28. Elicitation -- that base models have tons of capabilities that post-training pulls out -- is remarkably simple to understand and will make it much easier for not so technical folks to feel the AGI. https://www.interconnects.ai/p/elicitation-theory-of-post-training

29. Mathematical Foundations of Reinforcement Learning https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

30. Deep Learning is Not So Mysterious or Different https://arxiv.org/abs/2503.02113

31. Can a 7B parameter model learn to solve Sudoku through pure reinforcement learning without any cold start data? A surprising yes! https://hrishbh.com/teaching-language-models-to-solve-sudoku-through-reinforcement-learning/

32. So how well is Claude playing Pokémon? https://www.lesswrong.com/posts/HyD3khBjnBhvsp8Gb/so-how-well-is-claude-playing-pokemon

33. PokéChamp: an Expert-level Minimax Language Agent https://arxiv.org/abs/2503.04094

34. Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases https://www.lesswrong.com/posts/ywzLszRuGRDpabjCk/do-reasoning-models-use-their-scratchpad-like-we-do-evidence

35. Factorio Learning Environment (FLE): A benchmark based on the game of Factorio, that tests agents in long-term planning, program synthesis, and resource optimization https://jackhopkins.github.io/factorio-learning-environment/

36. Russian scientists fuse reasoning models with drone-control models for thinking drones https://arxiv.org/abs/2503.01378v1

Neuroscience

1. Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior https://arxiv.org/abs/2502.20349

2. Melbourne start-up launches 'biological computer' made of human brain cells https://www.abc.net.au/news/science/2025-03-05/cortical-labs-neuron-brain-chip/104996484 (product page: https://corticallabs.com/cl1.html)

3. Biological Neurons vs Deep Reinforcement Learning: Sample efficiency in a simulated game-world [published in 2022] https://openreview.net/forum?id=N5qLXpc7HQy

Science

1. Stanford researchers have developed an antibody duo therapy that neutralizes all SARS-CoV-2 variants by targeting two different parts of the virus simultaneously. https://www.science.org/doi/10.1126/scitranslmed.adq5720

🏆5👍4

1.36K views22:56

1. The party that told Donald Trump, “We are not for sale,” just won the elections in Greenland.

2. MAGA has made Canadian Libs great again—quite an achievement.

We're gonna win so much, you may even get tired of winning. And you'll say, 'Please, please. It's too much winning. We can't take it anymore. Mr. President, it's too much.'

— Donald Trump

😁15👍4🤣3😢1

1.33K views09:52

0:32

Anthropic CEO Dario Amodei predicts AI writing 90% of code within 3-6 months and 100% of code within a year.

To be clear, I don't believe this. But it's an interesting and falsifiable prediction. Let's see what happens.

🤡24🤣3🤔2🥴1

1.84K views12:11

Gemini models for robotics:

- Completes precise tasks, like origami folding handled entirely by AI-driven robots

- Understands commands in everyday language.

- Quickly adapts if objects move or environment changes.

- Instantly handles new tasks and objects.

- Performs tasks it's never seen before, doubling other models’ performance.

- Understands objects’ shape and position instantly.

- Plans safe, accurate movements to grasp and manipulate objects.

- Works across various robots, including humanoid models like Apollo.

- Easily adapts to different robotic hardware.

Read more: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/

1.45K viewsedited 21:21

Links for 2025-03-14

AI

1. Meta Reinforcement Fine-Tuning (MRT): Training LLMs to make measurable progress with each step, not just reach correct answers. https://cohenqu.github.io/mrt.github.io/

2. Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models https://arxiv.org/abs/2503.09573

3. Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. https://arxiv.org/abs/2503.07565

4. AI Can Now Learn 100x Faster Without Wasting Energy https://www.tum.de/en/news-and-events/all-news/press-releases/details/new-method-significantly-reduces-ai-energy-consumption

5. LLM inference prices have fallen 9x to 900x/year, depending on the task https://epoch.ai/data-insights/llm-inference-price-trends?insight-option=All+benchmarks

6. Google unveils Gemma 3, a powerful model that runs on a single GPU https://blog.google/technology/developers/gemma-3/

7. A quarter of startups in YC’s current cohort have codebases that are almost entirely AI-generated https://www.youtube.com/watch?v=IACHfKmZMr8

8. Sam Altman reveals that OpenAI has trained a new model that delivers standout creative writing, capturing metafiction’s nuanced vibe. https://x.com/sama/status/1899535387435086115

9. OpenAI o1 and o3-mini now offer Python-powered data analysis in ChatGPT. https://x.com/OpenAI/status/1900308446211432484

10. OpenAI Nonprofit Buyout: Much More Than You Wanted To Know https://www.astralcodexten.com/p/openai-nonprofit-buyout-much-more

11. OpenAI submitted their policy proposal to the US government. They directly link fair use with national security, and said if China continues to have free access to data while 'American companies are left without fair use access, the race for Al is effectively over.' https://openai.com/global-affairs/openai-proposals-for-the-us-ai-action-plan/

12. “I believe it is a clear demonstration that misalignment likely does not stem from the model being “evil.” It simply found a better way to achieve its goal using unintended means.” https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-forbidden-technique

13. Auditing language models for hidden objectives https://www.anthropic.com/research/auditing-hidden-objectives

14. Anthropic, and taking "technical philosophy" more seriously https://www.lesswrong.com/posts/7uTPrqZ3xQntwQgYz/untitled-draft-7csk

15. China steels itself for Donald Trump’s turmoil with ‘DeepSeek congress’ https://www.ft.com/content/8bdcf44d-7654-4bb5-ab15-5c69a4a998b7 [no paywall: https://archive.is/D3SMo]

Science and Technology

1. Low-Power Brain Chip Predicts Users’ Intentions https://spectrum.ieee.org/brain-computer-interface-2671224658

2. Lack of context modulation in human single neuron responses in the medial temporal lobe https://www.cell.com/cell-reports/fulltext/S2211-1247(24)01569-9

3. MIT engineers turn skin cells directly into neurons for cell therapy https://news.mit.edu/2025/mit-engineers-turn-skin-cells-into-neurons-for-cell-therapy-0313

4. East Asian personality may stem from Ice Age Siberia ~20000 yrs ago https://psycnet.apa.org/fulltext/2025-88410-001.html

5. Have we passed peak intelligence? In international tests, student scores for reading and maths sunk to a new low. https://x.com/_alice_evans/status/1900449985629487366

6. A new programming language called "Exo 2" could enable high-performance coding that can compete with state-of-the-art libraries with a few hundred lines of code, instead of tens or hundreds of thousands. https://news.mit.edu/2025/high-performance-computing-with-much-less-code-0313

7. “Committing fraud is, right now, a viable career strategy that can propel you at the top of the academic world.” https://statmodeling.stat.columbia.edu/2025/03/08/a-post-mortem-on-the-gino-case-committing-fraud-is-right-now-a-viable-career-strategy-that-can-propel-you-at-the-top-of-the-academic-world/

👍3

1.41K views14:41

1:18

BotQ: A High-Volume Manufacturing Facility for Humanoid Robots

Initially designed to produce 12,000 robots/year, it will scale to support a fleet of 100,000 in the next four years.

Read more: https://www.figure.ai/news/botq

🤯6🔥3❤1

1.28K viewsedited 17:01

by Mehdi Alibeygi

https://www.instagram.com/p/BEO4C9tqS8c/

😁27❤1🤯1

1.13K views14:46

Significant vibe shift toward short timelines in recent weeks.

Examples:

1. https://x.com/michael_nielsen/status/1901255530552975415
2. https://x.com/nabeelqu/status/1901317098485232007
3. https://x.com/jackclarkSF/status/1900655202274803861
4. https://www.nytimes.com/2025/03/14/technology/why-im-feeling-the-agi.html
5. https://www.nytimes.com/2025/03/04/opinion/ezra-klein-podcast-ben-buchanan.html

🥱12👍3😁2❤1

1.27K views17:31

RAND Corporation:

First, AGI might enable a significant first-mover advantage via the sudden emergence of a decisive wonder weapon.

1.14K viewsedited 17:34

❤23👏3🤡1

1.18K views18:57

Terence Tao:

So all in all a pretty good assist from [o3-mini-high]; it made a mistake that I corrected, but I also made a mistake that it corrected, and code that would have taken perhaps an hour of my time on my own was generated, tested, modified, and reported in maybe ten minutes.

Source: https://mathstodon.xyz/@tao/114173696303072269

This wouldn't be an interesting observation if it weren't for the fact that many people insist that current AI models are useless. Yet more and more mathematicians are claiming that they are useful, or on the verge of being useful.

Also keep in mind that o3-mini-high isn't even the best existing model. And even the best model will pale in comparison to what will be available by the end of the year.

❤7👍5

1.26K views23:28

0:42

OpenAI CPO, Kevin Weil:

this is the year that AI gets better than humans at programming forever

Source: https://youtu.be/SnSoMh9m5hc?si=uyoRy7CEHg1pffCL

🥱13✍4👍4🤣1😨1

1.41K views11:04

Links for 2025-03-17

AI

1. Agents Play Thousands of 3D Video Games: Traditional game AI requires extensive training or hand-coding. PORTAL differs by using LLMs to design the policy structure as an architect, not to play directly as actors. The LLM "writes code" to structure tactics. https://zhongwen.one/projects/portal/

2. Microsoft has released this useful tool for performing R&D with LLM-based agents. https://github.com/microsoft/RD-Agent

3. A key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCo. TL;DR: We can do LLM training across data centers in a way that scales incredibly well to larger and larger models! https://arxiv.org/abs/2503.09799

4. Compute-Optimal LLMs Provably Generalize Better with Scale https://openreview.net/forum?id=MF7ljU8xcf

5. Transformers without Normalization https://arxiv.org/abs/2503.10622

6. SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion https://arxiv.org/abs/2503.11576

7. Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia https://venturebeat.com/ai/cerebras-just-announced-6-new-ai-datacenters-that-process-40m-tokens-per-second-and-it-could-be-bad-news-for-nvidia/

8. The tiny chips behind Amazon’s big AI investment https://www.semafor.com/article/03/14/2025/amazons-trainium-chips-to-be-tested-by-anthropic

9. AI Tools for Existential Security https://www.forethought.org/research/ai-tools-for-existential-security

10. System built by Google DeepMind team takes individual views and generates a set of group statements https://www.lesswrong.com/posts/j9K4Wu9XgmYAY3ztL/habermas-machine

11. Really powerful AI could wreck society by making governments too powerful https://arxiv.org/abs/2503.05710

12. Robin Hanson lost a bet that “Systems in GPT line will by 2025 make <$1B in customer revenue clearly tied to such systems.” https://x.com/robinhanson/status/1901329487532548511

Science and Technology

1. A socratic dialogue over the utility of DNA language models https://www.owlposting.com/p/a-socratic-dialogue-over-the-utility

2. A torpor-like state in mice slows blood epigenetic aging and prolongs healthspan https://www.nature.com/articles/s43587-025-00830-4

3. “Magpies and Crows Are Using “Anti-Bird Spikes” to Make Their Nests.” https://www.audubon.org/magazine/apparently-magpies-and-crows-are-using-anti-bird-spikes-make-their-nests

4. The Hypercuriosity Theory of ADHD https://epsig.substack.com/p/the-hypercuriosity-theory-of-adhd

5. “Metacognition Broke My Nail-Biting Habit” https://www.lesswrong.com/posts/RW3B4EcChkvAR6Ydv/metacognition-broke-my-nail-biting-habit

6. What Did We Learn From Torturing Babies? https://marginalrevolution.com/marginalrevolution/2025/03/what-do-we-learn-from-torturing-babies.html

7. Was our universe born inside a black hole? https://www.space.com/space-exploration/james-webb-space-telescope/is-our-universe-trapped-inside-a-black-hole-this-james-webb-space-telescope-discovery-might-blow-your-mind

Intelligence

1. Intelligence is unequally distributed between individuals, countries and its inhabitants. This even affects economic growth. https://schweizermonat.ch/the-worldwide-distribution-of-intelligence/

2. The search for a test that produces no racial differences in performance and still predicts performance on the job is a quest to find the impossible. https://www.sciencedirect.com/science/article/abs/pii/S0160289624000862

Politics

1. Alignment is EASY and Roko's Basilisk is GOOD?! AI Doom Debate with Roko Mijic https://www.youtube.com/watch?v=AY4jD26RntE

2. Most Externalities are Solved with Technology, Not Coordination https://www.maximum-progress.com/p/most-externalities-are-solved-with

3. “We Were Badly Misled About the Event That Changed Our Lives” https://www.nytimes.com/2025/03/16/opinion/covid-pandemic-lab-leak.html [no paywall: https://archive.is/iweAg]

👍3

1.54K views16:16

0:39

👎13🤣13🥴8

1.52K views11:31

Random sampling works better than you think: Gemini 1.5 = o1. The secret? Self-verification magically gets easier with scale.

Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new Google paper, the authors instead focus on scaling the search axis.

By just randomly sampling 200 responses and self-verifying, Gemini 1.5 (an ancient early 2024 model!) beats o1-Preview and approaches o1. This is without finetuning, RL, or ground-truth verifiers.

This was surprising: search is bottlenecked by verification, and models are notoriously bad at self-verifying (think hallucinations) and self-consistency doesn't scale. The magic is that self-verification naturally becomes easier at scale! You'd expect that picking out a correct solution becomes harder the larger your pool of solutions is, but the opposite is the case!

Read more: https://eric-zhao.com/blog/sampling

👍5

1.37K viewsedited 17:02

0:59

In this video, Atlas is demonstrating policies developed using reinforcement learning with references from human motion capture and animation. This work was done as part of a research partnership between Boston Dynamics and the Robotics and AI Institute (RAI Institute).

❤6🤯6😨1

1.98K views15:18