Axis of Ordinary
3.73K subscribers
4.34K photos
1.23K videos
6 files
5.34K links
Memetic and cognitive hazards.

Substack: https://axisofordinary.substack.com/
Download Telegram
Politics, Tech Chiefs Double Down on AI Spending:

- French President Emmanuel Macron has announced a €109 billion investment in AI for France in the coming years. This investment will be supported by the United Arab Emirates, major American and Canadian investment funds, and French companies. President Emmanuel Macron announced the spending ahead of a two-day AI summit he is cohosting in Paris with Indian Prime Minister Narendra Modi, attended by the US vice president, China’s vice premier, and the bosses of OpenAI and Google.

- European Commission chief Ursula von der Leyen is expected to announce around 10 public supercomputers for researchers and startups.

- Tech giants Amazon, Google, Microsoft, and Meta are significantly increasing their investments in AI. They plan to spend a combined total of at least $215 billion in the current fiscal year, an increase of over 45% from the previous year.

Sources:

1. https://www.france24.com/en/europe/20250210-government-tech-leaders-paris-ai
2. https://www.lemonde.fr/en/economy/article/2025/02/10/ai-with-the-announcement-of-a-109-billion-investment-macron-intends-to-take-on-the-us_6737985_19.html [no paywall: https://archive.is/JZm6I]
3. https://www.wsj.com/tech/ai/tech-giants-double-down-on-their-massive-ai-spending-b3040b33 [no paywall: https://archive.is/FeKCf]
👀2👍1
Image 1: An example of a PISA level 1 Math question.

Image 2: Share unable to reach overall level 1 PISA math and science.
🌚9
Marriages in China fell by 20% in 2024. Since nearly all births in China are within marriage, this implies further large declines in fertility ahead.

China's TFR was just 1.02 in 2023.

Without advanced AI and robotics, we'll eventually face a global collapse of all welfare systems, followed by a collapse of advanced technologies like smartphones, which require a minimum population of one billion people to be maintained.
Links for 2025-02-10

AI:

1. Agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. https://arxiv.org/abs/2502.04403

2. Generating Symbolic World Models via Test-time Scaling of Large Language Models https://arxiv.org/abs/2502.04728

3. CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance https://arxiv.org/abs/2502.04350

4. “OpenAI o1 significantly outperforms other reasoning models that are on par on benchmarks that test specialized knowledge.” https://arxiv.org/abs/2502.01584

5. Exploring the possibility to enable models to correct errors immediately after they are made. https://arxiv.org/abs/2408.16293

6. Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models https://arxiv.org/abs/2502.04404

7. DexterityGen (DexGen): A new system that helps robots use their hands better. It improves how they grip, move, and handle objects… from holding a pen to using a screwdriver. DexGen learns in simulation and refines its skills in the real world, making robotic hands much more useful. https://zhaohengyin.github.io/dexteritygen/

8. MedRAX: Medical Reasoning Agent for Chest X-ray https://arxiv.org/abs/2502.02673

9. Verifiable agents are the next meta in crypto x AI - agents that don't require trust. https://www.blog.eigenlayer.xyz/introducing-verifiable-agents-on-eigenlayer/

10. Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation https://arxiv.org/abs/2502.05151

11. Karina Nguyen, research & product at OpenAI, says pre-training was approaching a data wall, but now post-training scaling (o1 series) unlocks "infinite tasks." Says models were already "diverse and creative" from pre-training, but teaching AI real-world skills is paving the way to "extremely super intelligent" models. https://youtu.be/DeskgjrLxxs?si=kXjvn89Sdf5N-vF6&t=578

AI compute:

1. This AI chip is the size of a grain of salt https://www.popsci.com/technology/ai-fiber-optic-chip/

2. "How Intel ruined an Israeli startup it bought for $2b, Habana Labs—and lost the AI race" (the end of the Gaudi chips) https://www.calcalistech.com/ctechnews/article/s1tra0sfye

AI politics:

1. How Sam Altman Sidestepped Elon Musk to Win Over Donald Trump https://www.nytimes.com/2025/02/08/technology/sam-altman-elon-musk-trump.html [no paywall: https://archive.is/5ERSg]

2. Human takeover might be worse than AI takeover https://www.lesswrong.com/posts/FEcw6JQ8surwxvRfr/human-takeover-might-be-worse-than-ai-takeover

Science:

1. Children’s arithmetic skills do not transfer between applied and academic mathematics https://www.nature.com/articles/s41586-024-08502-w

2. Three Years After Experimental Vaccine, These Patients Are Still Cancer-Free https://gizmodo.com/three-years-after-experimental-vaccine-these-patients-are-still-cancer-free-2000559585

3. “What is it like to live in a society with an estimated median IQ around 70? A Nigerian psychologist explains.” https://woodfromeden.substack.com/p/guest-post-the-global-iq-debate-a
👍6
Emergent AI preferences:

- As AIs get smarter, they develop their own coherent value systems.

- AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.

- As AIs become smarter, they become more opposed to having their values changed

- AIs put a price on human life itself and systematically value some human lives more than others.

- Their political values are strongly clustered to the left.

Project page: https://www.emergent-values.ai/
😨10🤮5👍2
Competitive Programming with Large Reasoning Models:

- The model o3 employs a learned scoring function for test-time ranking, in addition to a chain of thought, to enhance its reasoning abilities in competitive programming.

- Complex test-time reasoning strategies emerge naturally from end-to-end RL, leading to unprecedented performance on competitive programming benchmarks.

- o3 demonstrates more insightful and deliberate chains of thought compared to earlier models.

- Enhanced reasoning skills extend beyond competitive programming challenges, proving applicable to real-world tasks like software engineering.

- As a general-purpose model, o3 surpasses the performance achieved by using hand-crafted inference heuristics.

- o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating comparable to elite human competitors.

Paper: https://arxiv.org/abs/2502.06807
🤯3😍1😨1
Sam Altman:

"OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:

We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.

We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.

We hate the model picker as much as you do and want to return to magic unified intelligence.

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.

After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks.

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds.

Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more."

Source: https://x.com/sama/status/1889755723078443244
🥴8🔥31
Links for 2025-02-12

AI:

1. LLMs can be used to discover interpretable models of human and animal behavior. A method, called CogFunSearch, adapts FunSearch, a tool that uses large language models (LLMs) in an evolutionary algorithm. The discovered programs can be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. https://www.biorxiv.org/content/10.1101/2025.02.05.636732v1

2. LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters https://arxiv.org/abs/2502.07374

3. NatureLM: Deciphering the Language of Nature for Scientific Discovery https://arxiv.org/abs/2502.07527

4. Evolution and The Knightian Blindspot of Machine Learning — The authors propose that ML can benefit from considering the temporal unfolding of an open world, using a diversity-and-filter approach to handle KU, and incorporating non-stationarity into foundation model pertaining. https://arxiv.org/abs/2501.13075

5. On the Emergence of Thinking in LLMs I: Searching for the Right Intuition https://arxiv.org/abs/2502.06773

6. ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates https://arxiv.org/abs/2502.06772

7. Training Language Models to Reason Efficiently https://arxiv.org/abs/2502.04463

8. “o3 can't multiply 10 digit numbers, but here is the acc of a 14m transformer that teaches itself how to do it, with iterative self-improvement” https://x.com/DimitrisPapail/status/1889755872642970039

9. Scaling Pre-training to One Hundred Billion Data for Vision Language Models https://arxiv.org/abs/2502.07617

10. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling https://arxiv.org/abs/2502.06703

11. DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2 (but see this thread: https://x.com/DimitrisPapail/status/1889422843982524558)

12. 8GB of high-quality reasoning math https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw

AI politics:

1. 'Possibly by 2026 or 2027 (and almost certainly no later than 2030), the capabilities of AI systems will be best thought of as akin to an entirely new state populated by highly intelligent people appearing on the global stage' https://www.anthropic.com/news/paris-ai-summit

2. Sam Altman says the $500 billion Stargate project will be dwarfed in a few years with $5 trillion AI compute clusters, despite the recent DeepSeek release https://youtu.be/oEdlwfD5vK8?si=UpmTkOCaUxmQYFc8&t=664

3. The Paris AI Anti-Safety Summit https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit

4. Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion? https://www.lesswrong.com/posts/tdb76S4viiTHfFr2u/why-did-elon-musk-just-offer-to-buy-control-of-openai-for

5. Meta Platforms is reportedly in discussions to acquire South Korean AI chip startup FuriosaAI. https://www.koreatimes.co.kr/www/tech/2025/02/129_392093.html

6. OpenAI set to finalize first custom chip design this year https://www.reuters.com/technology/openai-set-finalize-first-custom-chip-design-this-year-2025-02-10/

Science and Technology:

1. Princeton neuroscientists crack the code of how we make decisions https://pni.princeton.edu/news/2025/princeton-neuroscientists-crack-code-how-we-make-decisions

2. Physicists have built a new type of digital-analogue quantum simulator in Google’s laboratory, which can be used to study physical processes with unprecedented precision and flexibility. https://www.psi.ch/en/news/media-releases/unique-quantum-simulator-opens-door-to-new-research

3. Anduril Takes Over $22 Billion Contract to Build Technomancers for U.S. Army https://www.corememory.com/p/anduril-takes-over-22-billion-contract

4. Einstein Was Right – Euclid Just Captured Space-Time Warping in a Perfect Cosmic Ring https://www.esa.int/Science_Exploration/Space_Science/Euclid/Euclid_discovers_a_stunning_Einstein_ring
👍6
"We're working out the algorithms as we speak...many more than 10,000 researchers are hacking at it, many of them at Google"

https://www.dwarkeshpatel.com/p/jeff-dean-and-noam-shazeer
🔥6💩4
Nvidia put r1 in a loop for 15 minutes and it generated: "better than the optimized kernels developed by skilled engineers in some cases"

Inference-time budget affects the agent’s solving rate. Allocating more than 10 minutes per problem in the Level-1 category enables the workflow to produce numerical correct code for most of the 100 problems.

Read more: https://developer.nvidia.com/blog/automating-gpu-kernel-generation-with-deepseek-r1-and-inference-time-scaling/
👍7🥴2
😁25💩6👎1🤡1
Links for 2025-02-13

AI:

1. Training Deep Learning Models with Norm-Constrained LMOs—has the potential to significantly improve the efficiency and speed of training LLMs, allowing for the training of even larger and more complex models. https://arxiv.org/abs/2502.07529

2. LLM Pretraining with Continuous Concepts https://arxiv.org/abs/2502.08524

3. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving —iteratively refines the prover through expert iteration, dramatically increasing the number of solved problems (e.g., 29.7K solved in Lean Workbook) and securing top rankings on benchmarks like PutnamBench. https://arxiv.org/abs/2502.07640

4. RAGEN: A General-Purpose Reasoning Agent Training Framework https://github.com/ZihanWang314/ragen/tree/main

5. Unsupervised Predictive Memory in a Goal-Directed Agent [published in 2018] https://arxiv.org/abs/1803.10760

6. CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction https://codei-o.github.io/

7. Elon Musk says Grok 3 will be released in "a week or two" and it is "scary smart", displaying reasoning skills that outperform any other AI model that has been released https://www.youtube.com/live/eV396ioBs3g?si=KOAokGapPj_Cb666&t=811

8. Noam Shazeer, co-lead on Google's Gemini, says by 2030 there will be AI assistants in glasses that provide advice and solve problems for you in real time, as well as turning programmers into 10,000,000x engineers https://youtu.be/v0gjI__RyCY?si=QHw1hrywgBvBnieQ&t=5390

9. Studies of Human Error Rate: "…skeptics often gesture to hallucinations, errors. An ideal symbolic system never makes such errors, therefore LLMs cannot truly "understand" even simple concepts like addition. See e.g. Evaluating the World Model Implicit in a Generative Model for this argument in the literature. However, such arguments reliably rule out human "understanding" as well! Studies within Human Reliability Analysis find startlingly high rates even for basic tasks, and even with double checking. Generally, the human reference class is too often absent (or assumed ideal) in AI discussions, and many LLM oddities have close parallels in psychology. If you're willing to look!" https://www.lesswrong.com/posts/9unBWgRXFT5BpeSdb/studies-of-human-error-rate

10. Rogo scales AI-driven financial research with OpenAI o1 https://openai.com/index/rogo/

AI politics and safety:

1. Tell me about yourself: LLMs are aware of their learned behaviors https://arxiv.org/abs/2501.11120

2. Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models https://arxiv.org/abs/2411.14257

3. OpenAI hides chain-of-thought reasoning because it may include unaligned content. From “Model Spec—a document which defines how we want our models to behave.” https://model-spec.openai.com/2025-02-12.html

4. Meta Starts Eliminating Jobs in Shift to Find AI Talent https://www.bloomberg.com/news/articles/2025-02-10/meta-starts-eliminating-jobs-in-shift-to-find-ai-talent [no paywall: https://archive.is/T7Kog]

Science and Technology:

1. Learning produces an orthogonalized state machine in the hippocampus https://www.nature.com/articles/s41586-024-08548-w

2. Rarely categorical, always high-dimensional: how the neural code changes along the cortical hierarchy https://www.biorxiv.org/content/10.1101/2024.11.15.623878v3

3. "Dozens of new obesity drugs are coming: these are ones to watch; next-generation obesity drugs will work differently from Ozempic & Wegovy—aiming to deliver greater weight loss with fewer side effects" https://www.nature.com/articles/d41586-025-00404-9 [no paywall: https://archive.is/X9CW3]

4. A single human zygote contains all the information you need to develop into an adult human and at the same time contains within it, the evolutionary history of our species. The Genomic Code: the genome instantiates a generative model of the organism https://www.cell.com/trends/genetics/fulltext/S0168-9525(25)00008-3
👍2🔥2
German Helsing builds 6,000 AI-enabled HX-2 combat drones for Ukraine

- up to 100 km range
- on-board AI enables full resistance to electronic warfare
- can assemble into swarms, controlled by single human operators
- can be equipped with different payloads – multi-purpose, anti-tank, anti-structure ammunition
- features developed and tested based on Helsing's extensive experience in Ukraine

"Resilience Factories are Helsing’s high-efficiency production facilities designed to provide nation states with local and sovereign manufacturing capacities. Helsing is set to build Resilience Factories across the European continent, with the ability to scale manufacturing rates to tens of thousands of units in case of a conflict."

Source: https://helsing.ai/newsroom/helsing-to-produce-6000-additional-strike-drones-for-ukraine
🔥207🙈5🤮4👍2🫡1
Installed computing power of NVIDIA chips has doubled every 10 months on average, since 2019.

Source: https://epoch.ai/data/machine-learning-hardware?insight-option=Absolute#nvidia-chip-production
👏3
The Perils of Overthinking: How AI Can Get Stuck in Its Own Thoughts

Large reasoning models often prioritize extended internal reasoning over taking action, leading to three recurring problems:

1. Analysis Paralysis – The model endlessly debates potential solutions but never executes one.

2. Rogue Actions – It makes unnecessary or unhelpful moves instead of focusing on the task.

3. Premature Disengagement – It stops reasoning too soon and submits incomplete solutions.

The more an AI model overthinks, the worse its performance. Surprisingly, choosing solutions with lower overthinking scores improved accuracy by nearly 30% while cutting computing costs by 43%.

The findings highlight that overthinking isn't just a human problem—it hampers AI, too. To combat this, the researchers propose techniques like leveraging AI’s ability to call external functions and using reinforcement learning to fine-tune decision-making.

Paper: https://www.arxiv.org/abs/2502.08235
💔51
This media is not supported in your browser
VIEW IN TELEGRAM
Introducing SnakeBench, an experimental benchmark side quest:

We made 50 LLMs battle each other in head-to-head snake 🐍

2.8K matches showed which models are the best at snake real-time strategy and spatial reasoning

Key findings from SnakeBench:

1. Reasoning models dominated - o3-mini and DeepSeek won 78% of their matches

2. Context is crucial - Models still needed extensive board data and clear coordinate systems to play effectively

3. Basic spatial reasoning remains a huge challenge for LLMs. Most models failed to track their position and made obvious mistakes.

Only GPT-4, Gemini 2.0, and o3-mini showed enough reasoning for strategic gameplay.


* See matches: https://snakebench.com
* Read the analysis: https://arcprize.org/blog/snakebench
* View the code: https://github.com/gkamradt/SnakeBench
🥱2
Can frontier models cost-effectively accelerate ML workloads via optimizing GPU kernels? Yes, and they’re improving pretty steeply.

AI agents can nearly double the speed of kernel execution compared to traditional methods for a fraction of the estimated cost of paying an expert kernel engineer.

The speedup achievable with the best model ~doubled over the last ~6 months. Although code optimization is only a small part of frontier AI R&D workflows, many positive feedback loops like this could lead to very rapid progress unlocking significant efficiency gains that could save hundreds of millions of dollars in compute costs worldwide. Optimized kernels can make long-running ML workloads substantially cheaper, an edge that even modest speedups can provide in large-scale applications.

The speedups are not driven just by success on the simplest tasks - performance is actually better on the more complex problems.

Read more: https://metr.org/blog/2025-02-14-measuring-automated-kernel-engineering/
When ELIZA meets therapists: A Turing test for the heart and mind

Across a sample of 830 people, participants:

(1) couldn't tell the difference between ChatGPT and a human therapist,

(2) preferred responses written by ChatGPT on key psychotherapy principles like empathy

Study: https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145
👍5😁2🏆2🤡1
Links for 2025-02-16

AI

1. Stanford researchers crack Among Us: Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL. https://socialdeductionllm.github.io/

2. SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models https://arxiv.org/abs/2502.09604

3. AI model deciphers the code in proteins that tells them where to go https://news.mit.edu/2025/ai-model-deciphers-code-proteins-tells-them-where-to-go-0213

4. AI used to design a multi-step enzyme that can digest some plastics https://arstechnica.com/science/2025/02/using-ai-to-design-proteins-is-now-easy-making-enzymes-remains-hard/

5. Musk: "Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth." https://x.com/elonmusk/status/1890958798841389499

6. EnigmaEval: A collection of long, complex reasoning challenges that take groups of people many hours or days to solve. The best AI systems score below 10% on normal puzzles, and for the ones designed for MIT students, AI systems score 0%. https://scale.com/leaderboard/enigma_eval

7. Introducing Prime Intellect’s Protocol & Testnet: A peer-to-peer compute and intelligence network https://www.primeintellect.ai/blog/protocol

8. Finally, hard data on a real-world AI business use case: It’s huge for customer service https://sherwood.news/tech/finally-hard-data-on-a-real-world-ai-business-use-case-its-huge-for-customer/

9. OmniParser V2 can turn any LLM into an agent capable of using a computer https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/

10. This DARPA-backed startup banked $100 million for its energy-slashing analog chips https://www.fastcompany.com/91278505/encharge-ai-banks-100-million-for-its-energy-slashing-analog-chips

Robots

1. Meta Plans Major Investment Into AI-Powered Humanoid Robots https://www.bloomberg.com/news/articles/2025-02-14/meta-plans-major-investment-into-ai-powered-humanoid-robots [no paywall: https://archive.is/TA8fq]

2. China’s electric vehicle giants are betting big on humanoid robots https://www.technologyreview.com/2025/02/14/1111920/chinas-electric-vehicle-giants-pivot-humanoid-robots/ [no paywall: https://archive.is/GXeYf]

3. China registers over 450,000 smart robotics firms https://www.chinadaily.com.cn/a/202502/10/WS67a99669a310a2ab06eab353.html

Computer science

1. A formalization of Gowers’ no-coincidence principle: If a highly unlikely or “outrageous” coincidence appears in a mathematical or computational context, there should be an underlying structural explanation for it rather than it being a mere accident. https://www.lesswrong.com/posts/Xt9r4SNNuYxW83tmo/a-computational-no-coincidence-principle

2. Generalized Transformers from Applicative Functors, by Tuomas Laakkonen https://cybercat.institute/2025/02/12/transformers-applicative-functors/

3. The Hundred-Page Language Models Book https://thelmbook.com/

4. bytecode interpreters for tiny computers https://dercuano.github.io/notes/tiny-interpreters-for-microcontrollers.html

5. New Book-Sorting Algorithm Almost Reaches Perfection https://www.quantamagazine.org/new-book-sorting-algorithm-almost-reaches-perfection-20250124/

Science and Technology

1. Does X cause Y? An in-depth evidence review https://www.cold-takes.com/does-x-cause-y-an-in-depth-evidence-review/

2. Neuralink competitor Paradromics secures investment from Saudi Arabia’s Neom https://www.cnbc.com/2025/02/12/neuralink-competitor-paradromics-partners-with-saudi-arabias-neom.html

3. “How can a brain disease increase creativity? First, we derive a brain circuit for creativity from studies of creative tasks demonstrating that they share reduced activity in the right frontal pole.” https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2830230

4. Scientists have a new explanation for the last two years of record heat https://www.washingtonpost.com/climate-environment/2025/02/14/global-warming-acceleration-clouds/ [no paywall: https://archive.is/1bwYx]
👍73