Axis of Ordinary
3.73K subscribers
4.34K photos
1.23K videos
6 files
5.34K links
Memetic and cognitive hazards.

Substack: https://axisofordinary.substack.com/
Download Telegram
🤡
🤣29🤡6🔥4🤔1
😨18😁1
Links for 2025-03-11 (Part 1)

AI

1. “…the agent trained with CoT pressure still learns to reward hack; only now its cheating is undetectable by the monitor because it has learned to hide its intent in the chain-of-thought.” https://openai.com/index/chain-of-thought-monitoring/

2. R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning https://arxiv.org/abs/2503.05379

3. R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning https://arxiv.org/abs/2503.05592

4. Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models https://arxiv.org/abs/2503.06749

5. MALT: Improving Reasoning with Multi-Agent LLM Training https://arxiv.org/abs/2412.01928

6. LADDER is a framework enabling LLMs to recursively generate and solve progressively simpler variants of complex problems—boosting math integration accuracy. https://arxiv.org/abs/2503.00735

7. START: Self-taught Reasoner with Tools https://arxiv.org/abs/2503.04625

8. *ARC‑AGI Without Pretraining* – No pretraining. No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. https://iliao2345.github.io/blog_posts/arc_agi_without_pretraining/arc_agi_without_pretraining.html

9. Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems https://arxiv.org/abs/2502.17019

10. Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks https://www.arxiv.org/abs/2503.04378

11. Differentiable Logic Cellular Automata https://google-research.github.io/self-organising-systems/difflogic-ca/

12. Token-Efficient Long Video Understanding for Multimodal LLMs https://research.nvidia.com/labs/lpr/storm/

13. The Manus Marketing Madness https://www.lesswrong.com/posts/ijSiLasnNsET6mPCz/the-manus-marketing-madness

14. What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit https://www.lesswrong.com/posts/dnCdqxPh5JtPp78FP/what-the-headlines-miss-about-the-latest-decision-in-the

15. Mathematician Daniel Litt on what he learned from designing a problem for the FrontierMath benchmark and the ability of reasoning models like o3-mini-high to solve it https://x.com/littmath/status/1898461323391815820

16. Terence Tao: “My general sense is that for research-level mathematical tasks at least, current models fluctuate between "genuinely useful with only broad guidance from user" and "only useful after substantial detailed user guidance", with the most powerful models having a greater proportion of answers in the former category.” https://mathstodon.xyz/@tao/114139125505827565

17. Will AI be capable of producing an Annals-quality math paper for $100k by March 2030? https://manifold.markets/TamayBesiroglu/will-ai-be-capable-of-producing-ann

18. Mayo Clinic’s secret weapon against AI hallucinations: Reverse RAG in action https://venturebeat.com/ai/mayo-clinic-secret-weapon-against-ai-hallucinations-reverse-rag-in-action/

19. How Orakl Oncology is using DINOv2 to accelerate cancer treatment discovery https://ai.meta.com/blog/orakl-oncology-dinov2-accelerating-cancer-treatment/

20. "Not great for my comparative advantage, but from some experiments we have done at Rotman, I am totally convinced the vast majority of research that doesn't involve the physical world can be done more cheaply with AI & a little human intervention than by even good researchers. 1/7" https://x.com/Afinetheorem/status/1898822592594874598

21. Superintelligence Strategy https://www.nationalsecurity.ai/

22. The Nuclear-Level Risk of Superintelligent AI https://time.com/7265056/nuclear-level-risk-of-superintelligent-ai/

23. “Imagine if you could train one human for thousands years to achieve unparalleled expertise, then make many copies. That’s what AI enables: spend heavily on training a single model, then cheaply replicate it. This creates a unique source of increasing returns at scale.” https://epoch.ai/blog/train-once-deploy-many-ai-and-increasing-returns
👍3🤯2
Links for 2025-03-11 (Part 2)

24. Currently, total AI cognitive effort is growing ~25x yearly—hundreds of times faster than human research effort (4% yearly). Once AI can meaningfully substitute for human research, total research growth (human+AI) will increase *dramatically*. https://www.forethought.org/research/preparing-for-the-intelligence-explosion

25. “Why I believe that the brain does something like gradient descent” https://medium.com/@kording/why-i-believe-that-the-brain-does-something-like-gradient-descent-27611c491205

26. “If we treat the brain as a neural network with optimized algorithms instead of as an artifact disconnected from the rest of AI research, we conclude the coming decade should see many new AI capabilities emerging as we continue closing the gap with the brain.” https://epoch.ai/gradient-updates/what-ai-can-currently-do-is-not-the-story

27. METR evaluated DeepSeek-R1’s ability to act as an autonomous agent. On generic SWE tasks it performs on-par with o1-preview but worse than 3.5 Sonnet (new) or o1. Overall R1 is ~6 months behind leading US AI companies at agentic SWE tasks and is only a small improvement on V3. https://metr.github.io/autonomy-evals-guide/deepseek-r1-report/

28. Elicitation -- that base models have tons of capabilities that post-training pulls out -- is remarkably simple to understand and will make it much easier for not so technical folks to feel the AGI. https://www.interconnects.ai/p/elicitation-theory-of-post-training

29. Mathematical Foundations of Reinforcement Learning https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

30. Deep Learning is Not So Mysterious or Different https://arxiv.org/abs/2503.02113

31. Can a 7B parameter model learn to solve Sudoku through pure reinforcement learning without any cold start data? A surprising yes! https://hrishbh.com/teaching-language-models-to-solve-sudoku-through-reinforcement-learning/

32. So how well is Claude playing Pokémon? https://www.lesswrong.com/posts/HyD3khBjnBhvsp8Gb/so-how-well-is-claude-playing-pokemon

33. PokéChamp: an Expert-level Minimax Language Agent https://arxiv.org/abs/2503.04094

34. Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases https://www.lesswrong.com/posts/ywzLszRuGRDpabjCk/do-reasoning-models-use-their-scratchpad-like-we-do-evidence

35. Factorio Learning Environment (FLE): A benchmark based on the game of Factorio, that tests agents in long-term planning, program synthesis, and resource optimization https://jackhopkins.github.io/factorio-learning-environment/

36. Russian scientists fuse reasoning models with drone-control models for thinking drones https://arxiv.org/abs/2503.01378v1

Neuroscience

1. Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior https://arxiv.org/abs/2502.20349

2. Melbourne start-up launches 'biological computer' made of human brain cells https://www.abc.net.au/news/science/2025-03-05/cortical-labs-neuron-brain-chip/104996484 (product page: https://corticallabs.com/cl1.html)

3. Biological Neurons vs Deep Reinforcement Learning: Sample efficiency in a simulated game-world [published in 2022] https://openreview.net/forum?id=N5qLXpc7HQy

Science

1. Stanford researchers have developed an antibody duo therapy that neutralizes all SARS-CoV-2 variants by targeting two different parts of the virus simultaneously. https://www.science.org/doi/10.1126/scitranslmed.adq5720
🏆5👍4
1. The party that told Donald Trump, “We are not for sale,” just won the elections in Greenland.

2. MAGA has made Canadian Libs great again—quite an achievement.

We're gonna win so much, you may even get tired of winning. And you'll say, 'Please, please. It's too much winning. We can't take it anymore. Mr. President, it's too much.'


— Donald Trump
😁15👍4🤣3😢1
This media is not supported in your browser
VIEW IN TELEGRAM
Anthropic CEO Dario Amodei predicts AI writing 90% of code within 3-6 months and 100% of code within a year.

To be clear, I don't believe this. But it's an interesting and falsifiable prediction. Let's see what happens.
🤡24🤣3🤔2🥴1
Media is too big
VIEW IN TELEGRAM
Gemini models for robotics:

- Completes precise tasks, like origami folding handled entirely by AI-driven robots

- Understands commands in everyday language.

- Quickly adapts if objects move or environment changes.

- Instantly handles new tasks and objects.

- Performs tasks it's never seen before, doubling other models’ performance.

- Understands objects’ shape and position instantly.

- Plans safe, accurate movements to grasp and manipulate objects.

- Works across various robots, including humanoid models like Apollo.

- Easily adapts to different robotic hardware.

Read more: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/
Links for 2025-03-14

AI

1. Meta Reinforcement Fine-Tuning (MRT): Training LLMs to make measurable progress with each step, not just reach correct answers. https://cohenqu.github.io/mrt.github.io/

2. Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models https://arxiv.org/abs/2503.09573

3. Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. https://arxiv.org/abs/2503.07565

4. AI Can Now Learn 100x Faster Without Wasting Energy https://www.tum.de/en/news-and-events/all-news/press-releases/details/new-method-significantly-reduces-ai-energy-consumption

5. LLM inference prices have fallen 9x to 900x/year, depending on the task https://epoch.ai/data-insights/llm-inference-price-trends?insight-option=All+benchmarks

6. Google unveils Gemma 3, a powerful model that runs on a single GPU https://blog.google/technology/developers/gemma-3/

7. A quarter of startups in YC’s current cohort have codebases that are almost entirely AI-generated https://www.youtube.com/watch?v=IACHfKmZMr8

8. Sam Altman reveals that OpenAI has trained a new model that delivers standout creative writing, capturing metafiction’s nuanced vibe. https://x.com/sama/status/1899535387435086115

9. OpenAI o1 and o3-mini now offer Python-powered data analysis in ChatGPT. https://x.com/OpenAI/status/1900308446211432484

10. OpenAI Nonprofit Buyout: Much More Than You Wanted To Know https://www.astralcodexten.com/p/openai-nonprofit-buyout-much-more

11. OpenAI submitted their policy proposal to the US government. They directly link fair use with national security, and said if China continues to have free access to data while 'American companies are left without fair use access, the race for Al is effectively over.' https://openai.com/global-affairs/openai-proposals-for-the-us-ai-action-plan/

12. “I believe it is a clear demonstration that misalignment likely does not stem from the model being “evil.” It simply found a better way to achieve its goal using unintended means.” https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-forbidden-technique

13. Auditing language models for hidden objectives https://www.anthropic.com/research/auditing-hidden-objectives

14. Anthropic, and taking "technical philosophy" more seriously https://www.lesswrong.com/posts/7uTPrqZ3xQntwQgYz/untitled-draft-7csk

15. China steels itself for Donald Trump’s turmoil with ‘DeepSeek congress’ https://www.ft.com/content/8bdcf44d-7654-4bb5-ab15-5c69a4a998b7 [no paywall: https://archive.is/D3SMo]

Science and Technology

1. Low-Power Brain Chip Predicts Users’ Intentions https://spectrum.ieee.org/brain-computer-interface-2671224658

2. Lack of context modulation in human single neuron responses in the medial temporal lobe https://www.cell.com/cell-reports/fulltext/S2211-1247(24)01569-9

3. MIT engineers turn skin cells directly into neurons for cell therapy https://news.mit.edu/2025/mit-engineers-turn-skin-cells-into-neurons-for-cell-therapy-0313

4. East Asian personality may stem from Ice Age Siberia ~20000 yrs ago https://psycnet.apa.org/fulltext/2025-88410-001.html

5. Have we passed peak intelligence? In international tests, student scores for reading and maths sunk to a new low. https://x.com/_alice_evans/status/1900449985629487366

6. A new programming language called "Exo 2" could enable high-performance coding that can compete with state-of-the-art libraries with a few hundred lines of code, instead of tens or hundreds of thousands. https://news.mit.edu/2025/high-performance-computing-with-much-less-code-0313

7. “Committing fraud is, right now, a viable career strategy that can propel you at the top of the academic world.” https://statmodeling.stat.columbia.edu/2025/03/08/a-post-mortem-on-the-gino-case-committing-fraud-is-right-now-a-viable-career-strategy-that-can-propel-you-at-the-top-of-the-academic-world/
👍3
This media is not supported in your browser
VIEW IN TELEGRAM
BotQ: A High-Volume Manufacturing Facility for Humanoid Robots

Initially designed to produce 12,000 robots/year, it will scale to support a fleet of 100,000 in the next four years.

Read more: https://www.figure.ai/news/botq
🤯6🔥31
😁271🤯1
RAND Corporation:

First, AGI might enable a significant first-mover advantage via the sudden emergence of a decisive wonder weapon.


Read more: https://www.rand.org/pubs/perspectives/PEA3691-4.html
23👏3🤡1
Terence Tao:

So all in all a pretty good assist from [o3-mini-high]; it made a mistake that I corrected, but I also made a mistake that it corrected, and code that would have taken perhaps an hour of my time on my own was generated, tested, modified, and reported in maybe ten minutes.

Source: https://mathstodon.xyz/@tao/114173696303072269

This wouldn't be an interesting observation if it weren't for the fact that many people insist that current AI models are useless. Yet more and more mathematicians are claiming that they are useful, or on the verge of being useful.

Also keep in mind that o3-mini-high isn't even the best existing model. And even the best model will pale in comparison to what will be available by the end of the year.
7👍5
This media is not supported in your browser
VIEW IN TELEGRAM
OpenAI CPO, Kevin Weil:

this is the year that AI gets better than humans at programming forever


Source: https://youtu.be/SnSoMh9m5hc?si=uyoRy7CEHg1pffCL
🥱134👍4🤣1😨1
Links for 2025-03-17

AI

1. Agents Play Thousands of 3D Video Games: Traditional game AI requires extensive training or hand-coding. PORTAL differs by using LLMs to design the policy structure as an architect, not to play directly as actors. The LLM "writes code" to structure tactics. https://zhongwen.one/projects/portal/

2. Microsoft has released this useful tool for performing R&D with LLM-based agents. https://github.com/microsoft/RD-Agent

3. A key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCo. TL;DR: We can do LLM training across data centers in a way that scales incredibly well to larger and larger models! https://arxiv.org/abs/2503.09799

4. Compute-Optimal LLMs Provably Generalize Better with Scale https://openreview.net/forum?id=MF7ljU8xcf

5. Transformers without Normalization https://arxiv.org/abs/2503.10622

6. SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion https://arxiv.org/abs/2503.11576

7. Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia https://venturebeat.com/ai/cerebras-just-announced-6-new-ai-datacenters-that-process-40m-tokens-per-second-and-it-could-be-bad-news-for-nvidia/

8. The tiny chips behind Amazon’s big AI investment https://www.semafor.com/article/03/14/2025/amazons-trainium-chips-to-be-tested-by-anthropic

9. AI Tools for Existential Security https://www.forethought.org/research/ai-tools-for-existential-security

10. System built by Google DeepMind team takes individual views and generates a set of group statements https://www.lesswrong.com/posts/j9K4Wu9XgmYAY3ztL/habermas-machine

11. Really powerful AI could wreck society by making governments too powerful https://arxiv.org/abs/2503.05710

12. Robin Hanson lost a bet that “Systems in GPT line will by 2025 make <$1B in customer revenue clearly tied to such systems.” https://x.com/robinhanson/status/1901329487532548511

Science and Technology

1. A socratic dialogue over the utility of DNA language models https://www.owlposting.com/p/a-socratic-dialogue-over-the-utility

2. A torpor-like state in mice slows blood epigenetic aging and prolongs healthspan https://www.nature.com/articles/s43587-025-00830-4

3. “Magpies and Crows Are Using “Anti-Bird Spikes” to Make Their Nests.” https://www.audubon.org/magazine/apparently-magpies-and-crows-are-using-anti-bird-spikes-make-their-nests

4. The Hypercuriosity Theory of ADHD https://epsig.substack.com/p/the-hypercuriosity-theory-of-adhd

5. “Metacognition Broke My Nail-Biting Habit” https://www.lesswrong.com/posts/RW3B4EcChkvAR6Ydv/metacognition-broke-my-nail-biting-habit

6. What Did We Learn From Torturing Babies? https://marginalrevolution.com/marginalrevolution/2025/03/what-do-we-learn-from-torturing-babies.html

7. Was our universe born inside a black hole? https://www.space.com/space-exploration/james-webb-space-telescope/is-our-universe-trapped-inside-a-black-hole-this-james-webb-space-telescope-discovery-might-blow-your-mind

Intelligence

1. Intelligence is unequally distributed between individuals, countries and its inhabitants. This even affects economic growth. https://schweizermonat.ch/the-worldwide-distribution-of-intelligence/

2. The search for a test that produces no racial differences in performance and still predicts performance on the job is a quest to find the impossible. https://www.sciencedirect.com/science/article/abs/pii/S0160289624000862

Politics

1. Alignment is EASY and Roko's Basilisk is GOOD?! AI Doom Debate with Roko Mijic https://www.youtube.com/watch?v=AY4jD26RntE

2. Most Externalities are Solved with Technology, Not Coordination https://www.maximum-progress.com/p/most-externalities-are-solved-with

3. “We Were Badly Misled About the Event That Changed Our Lives” https://www.nytimes.com/2025/03/16/opinion/covid-pandemic-lab-leak.html [no paywall: https://archive.is/iweAg]
👍3
Random sampling works better than you think: Gemini 1.5 = o1. The secret? Self-verification magically gets easier with scale.


Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new Google paper, the authors instead focus on scaling the search axis.

By just randomly sampling 200 responses and self-verifying, Gemini 1.5 (an ancient early 2024 model!) beats o1-Preview and approaches o1. This is without finetuning, RL, or ground-truth verifiers.

This was surprising: search is bottlenecked by verification, and models are notoriously bad at self-verifying (think hallucinations) and self-consistency doesn't scale. The magic is that self-verification naturally becomes easier at scale! You'd expect that picking out a correct solution becomes harder the larger your pool of solutions is, but the opposite is the case!


Read more: https://eric-zhao.com/blog/sampling
👍5
This media is not supported in your browser
VIEW IN TELEGRAM
In this video, Atlas is demonstrating policies developed using reinforcement learning with references from human motion capture and animation. This work was done as part of a research partnership between Boston Dynamics and the Robotics and AI Institute (RAI Institute).
6🤯6😨1
“Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

These results appear robust. The authors were able to retrodict back to GPT-2. They further ran experiments on SWE-bench Verified and found a similar trend.

Read more: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/