Axis of Ordinary

Zuckerberg predicts a mid-level coding engineer AI agent for 2025 calling it "potentially one of the most important innovations in history".

He also says that their reasoning models and larger model are looking good too.

Source: https://www.facebook.com/share/p/15n83eiPNF/

🥴6🤓3❤1👎1👏1

1.08K viewsedited 08:42

Axis of Ordinary

[2012] Google/DeepMind: We're going to build artificial general intelligence before 2030

Skeptics: It's just hype to attract investors

[2012] Google's driverless car passes the test for Nevada self-driving vehicles

[2016] AlphaGo defeats the world champion in Go

[2019] AlphaStar defeats professional StarCraft II players demonstrating advanced strategic planning, real-time decision-making, and adaptability in a complex, dynamic environment

[2020] MuZero achieves superhuman performance in complex board games and learns to play Atari games without prior knowledge of their rules

[2020] AlphaChip accelerates and optimizes chip design and creates superhuman chip layouts that are now used in hardware around the world

[2020] AlphaFold accurately predicts the three-dimensional structures of proteins

[2022] Flamingo, a generalist visual language model, rapidly adapts its behaviour given just a handful of examples

[2022] AlphaTensor found a way to speed up a calculation at the heart of many different kinds of code, beating a 50-year record

[2023] AlphaDev discovers and optimize fundamental computer science algorithms

[2023] FunSearch cracks a famous unsolved problem in pure mathematics

[2023] AlphaCode 2 reaches the 85th percentile on the Codeforces platform

[2023] RoboCat, a self-improving robotic agent, learns to solve new tasks on different robotic arms with as few as 100 demonstrations - and improves skills from self-generated training data

[2024] AlphaGeometry solves Olympiad geometry problems at a level approaching a human gold-medalist

[2024] AlphaProof achieves silver-medal standard solving International Mathematical Olympiad problems

[2024] SIMA is the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games

[2024] Genie 2, a large-scale foundation world model, creates a path to unlimited environments for training and evaluating embodied agents

[2024] DeepMind scientists create a computerized insect that can walk and fly just like the real thing

[2024] DeepMind achieves human level competitive robot table tennis

[2024] After 25.3 million autonomous miles driven, Google's Waymo vehicles have an 88% reduction in property damage claims and a 92% reduction in bodily injury claims compared to human drivers per mile driven

[2024] Veo 2 generative AI video model demonstrates a superior understanding of real-world physics, human motion and facial expressions

[2024] Google releases Flash Thinking, a new reasoning model with a one-million token context window

[2025] Google/DeepMind: We're on track to build artificial general intelligence before 2030.

Skeptics: It's just hype to attract investors.

🥱17👍16🤡3❤2💯2🤔1

1.04K views11:51

Axis of Ordinary

Links for 2025-01-31

AI:

1. TopoNets: High-Performing Vision and Language models with Brain-Like Topography https://toponets.github.io/

2. TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models https://arxiv.org/abs/2501.16937

3. Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling https://arxiv.org/abs/2501.11651

4. Clear evidence that adding a symbolic planner and world-model to a frontier LLM greatly increases its effectiveness at real-world tasks. See Figure 11: “Across all 10 environments, LLMs with Incalmo-WAP performed the worst, no LLMs achieved any part of a goal in all 10 environments. This suggests that Incalmo's action planner plays a signficant role in enabling multistage attacks.” https://arxiv.org/abs/2501.16466

5. “Why do LLMs trained on over 90% English text perform so well in non-English languages? We find that they learn to share highly abstract grammatical concept representations, even across unrelated languages!” https://arxiv.org/abs/2501.06346v1

6. Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs https://arxiv.org/abs/2501.18585

7. Large Language Models Think Too Fast To Explore Effectively https://arxiv.org/abs/2501.18009

8. Scaling the Tülu 3 post-training recipes to surpass the performance of DeepSeek V3 https://allenai.org/blog/tulu-3-405B

9. Mistral Small 3: Apache 2.0, 81% MMLU, 150 tokens/s https://mistral.ai/news/mistral-small-3/

10. Emad Mostaque says by next year, AI reasoning models like OpenAI's o1 and DeepSeek's R1 will run on smartphones and perform PhD-level tasks with 20 watts of electricity - equivalent to the human brain https://youtu.be/lY8Ja00PCQM?si=n2KGTr4DgBs5XdqN&t=1258

11. OpenAI in Talks for Huge Investment Round Valuing It Up to $300 Billion https://www.wsj.com/tech/ai/openaiin-talks-for-huge-investment-round-valuing-it-up-to-300-billion-2a2d4327 [no paywall: https://archive.is/Mk9CX]

AI politics:

1. “This is not another social media. This is not another smartphone. This is something altogether different. Titanic things, beyond everyone’s grasp, are happening.” https://www.hyperdimensional.co/p/novus-ordo-seclorum

2. “What happens once AIs make humans obsolete? Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values.” https://gradual-disempowerment.ai/

3. EU Commission officially endorsed pursuing a CERN for AI. https://cfg.eu/building-cern-for-ai/

Science and Technology:

1. Andreas Holmstrom explains how every electron is a tiny world full of number theory. https://www.youtube.com/watch?v=-OxVsVUesSc

2. “Samples from asteroid Bennu contain A, G, C, T & U nucleotide bases, and 14 of the 20 amino acids used by life — but while we use only left-handed versions of these molecules, Bennu has equal L/R, puncturing theories that the bias on Earth came from an initial cosmic seeding.” — Greg Egan https://www.nature.com/articles/d41586-025-00264-3 [no paywall: https://archive.is/Zw2gh]

3. There's a class of binary star system that causes recurrent novae: A white dwarf star draws material from its larger partner onto its surface, leading to a thermodynamic runaway reaction, that it survives and repeats every few decades to centuries. https://www.cambridge.org/core/journals/proceedings-of-the-international-astronomical-union/article/recurrent-novae-what-do-we-know-about-them/476D81C8EC05DEBC1381CC5B72FC78DA

4. Sam Altman’s Helion raises $425M to help build a fusion reactor for Microsoft https://techcrunch.com/2025/01/28/helion-raises-425m-to-help-build-a-fusion-reactor-for-microsoft/

👍1

1.43K views15:21

Axis of Ordinary

o3-mini is now available to all users: https://openai.com/index/openai-o3-mini/

🤣11🔥4

1.05K views20:24

Axis of Ordinary

This media is not supported in your browser

VIEW IN TELEGRAM

World Models on Pre-trained Visual Features enable Zero-shot Planning

A method that allows a robot to learn a “world model” during training and then uses that model at test time (when it’s actually operating) to figure out what actions to take—all without any extra teaching or rewards for the new task (zero-shot planning). The work is a step toward creating agents that are not limited to one specific task. Instead, they can adapt to a variety of tasks using the same underlying model of the world.

It uses a image recognition model, that has already been trained on a huge number of images. This is an example of how machine learning research from different directions can be combined to create more powerful systems (another example is combining language models with reinforcement learning to create inference models).

Instead of working with raw images (which are huge and complex), the method works in a “latent space”—a simplified, abstract version of the world.

Project site: https://dino-wm.github.io/

974 viewsedited 08:51

Axis of Ordinary

Are you primarily thinking in latent space or language space?

Do you think in concepts, images, or abstract relationships that aren't immediately translated into words, or do you process information in a more linear, sentence-by-sentence way?

Anonymous Poll

233 voters999 views14:17

Axis of Ordinary

1:07

This media is not supported in your browser

VIEW IN TELEGRAM

Lithuanian traditional polyphonic songs, known as sutartinės.

"These songs are characterised by a specific musical language, archaic texts and elements of ritual choreography. Most of the sutartinės songs have the following features: (1) linear polyphony; intertwining voices with regular or frequent harmonization at the interval of second; (2) narrow melodic range and limited number of scale steps; (3) polyrhythms and rhythmic complementarity with frequent syncopation; (4) two different texts performed simultaneously; (5) stanzaic structure, where a stanza consists of a meaningful text and a constantly recurring refrain of asemantic words or syllables; (6) the syncretic nature of the performance, where music, text and movement are closely linked."

https://en.wikipedia.org/wiki/Lithuanian_folk_music

👍13❤7🔥2👻1

1.29K views15:21

Axis of Ordinary

Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement.

Quote: "Scaling this weak-to-strong training approach yields (seemingly) unbounded improvements in both length and hardness generalization, allowing models to solve problem instances far exceeding the difficulty of those in the training data distribution...Our results show that careful self-supervision allows small transformers to transcend superficial pattern matching failures and learn multi step algorithms."

Talk: https://www.youtube.com/watch?v=szhEnXiSjJY

Paper on arXiv coming on Monday.

👍2😐2

1.31K views09:19

Axis of Ordinary

Links for 2025-02-02

AI:

1. Figure Plans To Ship 100,000 Humanoid Robots Over Next 4 Years https://www.forbes.com/sites/johnkoetsier/2025/01/30/figure-plans-to-ship-100000-humanoid-robots-over-next-4-years/

2. “Everyone is sleeping on the *collective* advantages AIs will have, which have nothing to do with raw IQ: they can be copied, distilled, merged, scaled, and evolved in ways humans simply can't.” https://www.dwarkeshpatel.com/p/ai-firm

3. Cerebras Becomes the World’s Fastest Host for DeepSeek R1, Outpacing Nvidia GPUs by 57x https://venturebeat.com/ai/cerebras-becomes-the-worlds-fastest-host-for-deepseek-r1-outpacing-nvidia-gpus-by-57x/

4. "Has Europe’s great hope for AI missed its moment? Mistral AI was hailed as a potential global leader in the technology. But it has lost ground to US rivals—& now China’s emerging star" (low on equity, revenue, compute, scale) https://www.ft.com/content/fa8bad75-dc55-47d9-9eb4-79ac94e54d82 [no paywall: https://archive.is/ragEs]

5. The Failed Strategy of Artificial Intelligence Doomers https://www.lesswrong.com/posts/YqrAoCzNytYWtnsAx/the-failed-strategy-of-artificial-intelligence-doomers

6. This Autonomous Drone Can Track Humans Through Dense Forests at High Speed https://singularityhub.com/2025/01/31/this-autonomous-drone-can-track-humans-through-dense-forests-at-high-speed/

7. Reasoning + Tool Use: https://www.reddit.com/r/OpenAI/comments/1ieonxv/comment/maa05ic/ (Note: o3-mini got 32% on Frontier Math (!) when given access to use a Python tool. https://openai.com/index/openai-o3-mini/)

8. “The progress with our Gemini reasoning models is actually wild, we are in the GPT-2 era of scaling reasoning!” https://x.com/OfficialLoganK/status/1885374062098018319

9. Stanford CS234: Reinforcement Learning Lectures https://www.youtube.com/playlist?list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u

Science:

1. New Study Uncovers Key Mechanism Behind Learning and Memory https://news.cuanschutz.edu/news-stories/new-study-uncovers-key-mechanism-behind-learning-and-memory

2. Stem cells used to partially repair damaged hearts https://arstechnica.com/science/2025/01/stem-cells-used-to-partially-repair-damaged-hearts/

3. New study looking at ancient DNA from Eastern Eurasian populations obtains and analyzes polygenic scores and finds "positive selection for cognitive-related traits such as IQ." https://www.cambridge.org/core/journals/twin-research-and-human-genetics/article/abs/directional-selection-and-evolution-of-polygenic-traits-in-eastern-eurasia-insights-from-ancient-dna/10AE9628ED6E7F2B4B1E72F30D64D4AA

❤2👍2

931 views16:12

Axis of Ordinary

Links for 2025-02-03

AI:

1. OpenAI Deep Research is a new agentic AI designed to synthesize large amounts of online information and execute multi-step research tasks autonomously. Leveraging advanced reasoning capabilities, it can transform complex, time-consuming problems into well-researched solutions in as little as 10–30 minutes—a process that might take human experts, such as PhD-level researchers, over 10 hours. https://openai.com/index/introducing-deep-research/

2. Stanford presents s1: Simple test-time scaling — Seeks the simplest approach to achieve test-time scaling and strong reasoning performance; Exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24); Model, data, and code are open-source https://arxiv.org/abs/2501.19393

3. Facebook figures out a zero-training way to massively improve LLM performance: Unlike conventional approaches that require training specialized models on large amounts of task-specific multimodal data, MILS directly “upgrades” an off-the-shelf LLM into a multimodal solver by exploiting its reasoning capabilities. https://arxiv.org/abs/2501.18096

4. Using multiple AI agents fact-checking each other reduced hallucination scores by ~2,800% across 310 test cases https://arxiv.org/abs/2501.13946

5. Scalable-Softmax Is Superior for Attention: SSMax significantly enhances the model’s performance on tasks involving long input sequences. It can be integrated into existing Transformer-based models without requiring major architectural changes. https://arxiv.org/abs/2501.19399

6. Heima: An efficient reasoning framework that leverages reasoning CoTs at hidden latent space https://arxiv.org/abs/2501.19201

7. DeepMind figures out a way to make it 100X more bandwidth-efficient to train models in a distributed way https://arxiv.org/abs/2501.18512v1

8. R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 https://github.com/Deep-Agent/R1-V

9. In an OpenAI Event at University of Tokyo Sam Altman discussed the future direction of development: "GPT-5 and GPT-6, [...], will utilize reinforcement learning and will be like discovering new science, such as new algorithms, physics, and biology." https://x.com/houseiwang/status/1886224083630915872

10. “R1 is just the latest data point indicating that superhuman AI will be easier and cheaper to build than most people think, and won't be monopolized.” https://milesbrundage.substack.com/p/the-real-lesson-of-deepseeks-r1

11. “I find it very difficult to ask o1 pro an economics question it cannot answer...In an economics test, or any other kind of naturally occurring knowledge test I can think of, it would beat all of you (and me). Its rate of hallucination is far below what you are used to from other LLMs.” https://marginalrevolution.com/marginalrevolution/2025/02/o1-pro.html

12. Chinese paper about AI as a catastrophic [existential?] risk: “If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings.” https://www.arxiv.org/abs/2412.12140

Science:

1. UChicago scientists have invented a soft, flexible semiconductor capable of transmitting information from living tissue to electronics. This major bioelectronics breakthrough could lead to better brain-machine interfaces, biosensors and pacemakers. https://news.uchicago.edu/story/bioelectronics-breakthrough-scientists-create-soft-flexible-semiconductors

2. Ultrahigh Specific Strength by Bayesian Optimization of Carbon Nanolattices https://advanced.onlinelibrary.wiley.com/doi/10.1002/adma.202410651

3. January 2025 was quite unexpectedly the warmest January on record at 1.75C above preindustrial, beating the prior record set in 2024. This is despite the presence of La Niña conditions in the tropical Pacific, with the El Niño event of 2023/2024 long faded. https://www.theclimatebrink.com/p/january-sets-an-unexpected-temperature

👍4

957 views15:36

Axis of Ordinary

0:58

This media is not supported in your browser

VIEW IN TELEGRAM

"Our greatest glory is not in never falling, but in rising every time we fall." – Confucius

https://project-instinct.github.io/

🙏3😁1🐳1

1.06K views21:21

Axis of Ordinary

The largest medical AI randomized controlled trial yet performed, enrolling >100,000 women undergoing mammography screening, was just published.

The use of AI led to 29% higher detection of cancer, no increase of false positives, and reduced workload compared with radiologists without AI.

Paper: https://thelancet.com/journals/landig/article/PIIS2589-7500(24)00267-X/fulltext

👍18

2.38K views03:20

O2 (the company, not the skilled GPT version number) has announced Daisy, a language model of its own. It answers fraudulent phone calls in real time, wasting the scammer’s time by impersonating a vulnerable elderly person.

https://news.virginmediao2.co.uk/o2-unveils-daisy-the-ai-granny-wasting-scammers-time/

🥴15❤7👏5😁3

1.45K views14:24

Axis of Ordinary

Interesting developments: https://x.com/adcock_brett/status/1886860098980733197

Related:
1. OpenAI files a trademark application for humanoid robots https://www.businessinsider.com/openai-trademark-humanoid-robots-vr-headsets-sam-altman-hardware-2025-2
2. OpenAI is hiring robotics engineers https://x.com/kalinowski007/status/1877809579154948223

🤔2

2.7K views20:30

Axis of Ordinary

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

Physical Intelligence (π) is open sourcing π_0, the first general-purpose robotic foundation model: https://www.pi.website/blog/openpi

👍3👏1

1.2K views20:36

Axis of Ordinary

0:50

This media is not supported in your browser

VIEW IN TELEGRAM

ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills

Open source: https://agile.human2humanoid.com/

👍7

1.34K views20:51

Axis of Ordinary

MaestroMotif: A method for AI-assisted skill design that produces highly capable and steerable hierarchical agents.

The first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making.

Read more: https://github.com/mklissa/maestromotif

1K views16:25

Axis of Ordinary

0:46

This media is not supported in your browser

VIEW IN TELEGRAM

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models https://arxiv.org/abs/2502.02492

2.88K views17:33

Axis of Ordinary

Links for 2025-02-05

AI:

1. A step towards robust jailbreak defenses: “After thousands of hours of red teaming, not one participant found a reliable jailbreak that extracted detailed information across a set of 10 harmful questions.” https://www.anthropic.com/research/constitutional-classifiers

2. Google presents: Scaling Embedding Layers in Language Models —Outperforms a 1.9B parameter baseline across diverse corpora, while using only half the inference time FLOPS https://arxiv.org/abs/2502.01637

3. Improving Transformer World Models for Data-Efficient RL —super-human-level performance on the challenging Craftax-classic benchmark, an open-world 2D survival game https://arxiv.org/abs/2502.01591

4. Process Reinforcement through Implicit Rewards—PRIME achieves a 15.1% average improvement across several key reasoning benchmarks over the SFT model https://arxiv.org/abs/2502.01456

5. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training https://arxiv.org/abs/2501.17161

6. First large-scale, verifiable code training dataset (ACECODE-89K) with automatically generated test cases. This is a major step towards enabling more effective RL for code generation. https://arxiv.org/abs/2502.01718

7. Reinforcement Learning for Long-Horizon Interactive LLM Agents — ”Our analysis reveals a variety of behavioral patterns that emerge in the course of training…” https://arxiv.org/abs/2502.01600

8. Natural Language Reinforcement Learning —enabling the training of chain-of-thought language policies and language value function (known as generative value) solely from environment feedback, without human or stronger model's labels. https://github.com/waterhorse1/Natural-language-RL

9. Chain-of-Associated-Thoughts (CoAT) is a new framework that enhances LLMs' reasoning abilities by combining Monte Carlo Tree Search with dynamic knowledge integration. https://arxiv.org/abs/2502.02390

10. Language Models Use Trigonometry to Do Addition https://www.lesswrong.com/posts/E7z89FKLsHk5DkmDL/language-models-use-trigonometry-to-do-addition-1

11. S1: The $6 R1 Competitor? https://timkellogg.me/blog/2025/02/03/s1

12. Sam Altman says the leap from GPT-4 to GPT-5 will be as big as that of GPT-3 to 4 and the plan is to integrate the GPT and o series of models into one model that can do everything https://youtu.be/qyTOVq31JIE?si=TzSFM3W45hPCSXZ1&t=741

13. Sam Altman: “finally for the first time, I think the models that are on the near term horizon, um the models that will release in the coming months, are over the threshold of being good enough to really address these problems and now people just have to go build the solutions” https://www.youtube.com/live/8vHr_8k8IbM?si=HUFjdfZvkPG921Te&t=3446

14. Decoding can be just as good as regular pointwise heads for regression, but you also get density estimation for free. https://arxiv.org/abs/2501.19383

15. Google DeepMind released a book on scaling language models on TPUs. https://jax-ml.github.io/scaling-book/index

16. Time to democratize humanoid robots! ToddlerBot, a low-cost ($6K), open-source humanoid for robotics and AI research. https://toddlerbot.github.io/

AI politics:

1. How to Rapidly Build Gigawatt-Scale AI Clusters in the United States https://ifp.org/special-compute-zones/

2. Palantir CTO Shyam Sankar says the US is in a winner-take-all AI arms race and war with China and DeepSeek has made it clear that "the time to mobilize has come" https://www.youtube.com/live/MW0zvoEMdRA?si=mmuKS3myNpSeSufO&t=2025

3. 93% of IT Leaders Plan to Deploy AI Agents by 2026 https://www.zdnet.com/article/93-of-it-leaders-will-implement-ai-agents-in-the-next-two-years/

Science:

1. Scientists ‘mimic real biological processes’ using synthetic neurons https://news.northwestern.edu/stories/2025/01/scientists-mimic-real-biological-processes-using-synthetic-neurons

2. Necessity of complex numbers https://www.youtube.com/watch?v=f079K1f2WQk

3. The chance of asteroid 2024 YR4 hitting out planet in 2032 is now 1.5%, or 1 in 67. https://x.com/Astro_Jonny/status/1886742128199336362

❤2👍1

1.13K views18:19

About

Blog

Apps

Platform