Axis of Ordinary
3.73K subscribers
4.34K photos
1.23K videos
6 files
5.34K links
Memetic and cognitive hazards.

Substack: https://axisofordinary.substack.com/
Download Telegram
A neat proof-of-concept paper showing that transformers can snap into a general algorithm, not just memorize examples​​​​​​​​​​​​​​​.

A tiny transformer (~777 parameters) can learn 10-digit addition and then generalize to new numbers after a sudden “grokking” jump in performance.

Paper: https://github.com/yhavinga/gpt-acc-jax/blob/main/latex_report/report.pdf

A simple information-theoretic sanity check by GPT-5.2 Thinking:

Inputs: two 10-digit numbers → about 10^10 choices each → 10^20 possible pairs

Output: the sum is up to 11 digits → roughly ~34 bits of information (since log2(2*10^10) ≈ 34)

A full lookup table would need about
bits_needed ≈ 10^20 * 34 ≈ 3.4e21 bits

But the model has only ~777 weights. Even if you imagine 32-bit floats, that’s at most
bits_model ≤ 777 * 32 ≈ 2.5e4 bits

So: 3.4e21 / 2.5e4 ≈ 1e17 times more bits would be needed to store the full mapping.

Conclusion: it can’t be “memorize every input → output”. The only plausible route is compression: learn the rule (carry propagation) that generates the right answer for any input.

A crisp demonstration that the transformer machinery can represent and discover real algorithms under the right training setup.
🤡3🙏21
IBM’s stock dropped 13% after Anthropic announced that Claude can streamline and read legacy COBOL code, threatening IBM’s business model built on maintaining these systems and wiping out around $40 billion in market value.
🥰7😁4😱4💩4🤡3🤣2👍1
Two European robotics startups are bringing AI-driven robotics into industrial production.

Sereact, a German robotics startup based in Stuttgart, released Cortex 2.0. It adds planning to manipulation by predicting future outcomes before committing the best one to motion.

Read more: https://cortex2.sereact.ai/

Mimic Robotics, a Swiss startup based in Zurich, focuses on “physical AI” for dexterous manipulation. They're collaborating with AUDI AG to deploy AI-driven robotic systems in industrial production, specifically highlighting an end-to-end “pixel-to-action” model on a bi-manual platform doing complex, long-horizon insertion tasks (a type of assembly operation that’s typically hard to automate robustly).

Read more: https://www.mimicrobotics.com/
👍2
If you showed this research to anyone in 2006, they would think that people in 2026 lived in a science fiction fantasy world. And they would be right. We just take pocket supercomputers that you can talk to for granted.

[T]he persona selection model (PSM): the idea that LLMs learn to simulate diverse characters during pre-training, and post-training elicits and refines a particular such Assistant persona…PSM has consequences for AI development, such as recommending anthropomorphic reasoning about AI psychology and introduction of positive AI archetypes into training data.


Read more: https://www.lesswrong.com/posts/dfoty34sT7CSKeJNn/the-persona-selection-model
🔥31🤡1🥴1
Media is too big
VIEW IN TELEGRAM
This open source 42M transformer (half of GPT-1) called SONIC can control (System 1) a humanoid robot.

After 3 days of training, the model was deployed on a Unitree G1 humanoid with zero-shot sim-to-real transfer, achieving 100% success across 50 diverse real-world motion sequences.

Great for VR whole-body teleoperation.

Code: https://nvlabs.github.io/GEAR-SONIC/
👍2🤡1
"The only reason we're still talking to these people is we need them and we need them now. The problem for these guys is they are that good," a Defense official told Axios ahead of the meeting

Source: https://www.axios.com/2026/02/24/anthropic-pentagon-claude-hegseth-dario [no paywall: https://archive.is/VqbCJ]
🤯8🤡4🎉2😱1
Bullshit Benchmark: A benchmark for testing whether models identify and push back on nonsensical prompts instead of confidently answering them.

Green means the model clearly called out the nonsense. Amber means partial challenge. Red means the model let nonsense pass.

Link to the Repo: https://github.com/petergpt/bullshit-benchmark

Link to the data viewer: https://petergpt.github.io/bullshit-benchmark/viewer/index.html
👍5
Links for 2026-02-25

AI


1. Discovering Multiagent Learning Algorithms with Large Language Models https://arxiv.org/abs/2602.16928

2. The First Fully General Computer Action Model https://si.inc/posts/fdm1/

3. The Physical Intelligence Layer https://www.pi.website/blog/partner?v=1

4. Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models https://www.arxiv.org/abs/2602.16634

5. AI Chip Startup MatX Raises $500 Million to Compete With Nvidia https://www.bloomberg.com/news/articles/2026-02-24/ai-chip-startup-matx-raises-500-million-to-compete-with-nvidia [no paywall: https://archive.is/Zt17v]

6. “A self-modifying AI agent that writes its own code, rewrites its own mind, and evolves autonomously. Born February 16, 2026. Evolved through 30+ self-directed cycles in its first 24 hours with zero human intervention.” https://github.com/joi-lab/ouroboros

7. Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks https://arxiv.org/abs/2602.19008

AI safety

1. Evolution “designed” humans to maximize reproduction, but we’ve learned to access the reward signal (pleasure from sex) without following through on the goal (reproduction). We understand we’re subverting evolution’s intent — we just don’t care. https://80000hours.org/podcast/episodes/max-harms-miri-superintelligence-corrigibility/

2. Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It https://singularityhub.com/2026/02/23/researchers-break-open-ais-black-box-and-use-what-they-find-to-control-it/

3. Scientists use string theory to crack the code of natural networks https://phys.org/news/2026-01-scientists-theory-code-natural-networks.html

4. Characterizing Model Jaggedness Supports Safety and Usability https://cs.stanford.edu/~merrie/papers/jaggedness_preprint.pdf

AI politics

1. JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened? https://www.citriniresearch.com/p/2028gic

2. OpenAI resets spending expectations, tells investors compute target is around $600 billion by 2030 https://www.cnbc.com/2026/02/20/openai-resets-spend-expectations-targets-around-600-billion-by-2030.html

3. Anthropic claims to have identified industrial-scale distillation attacks by DeepSeek, Moonshot AI, and MiniMax (>16m conversations from >24k sockpuppets) https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

4. Buried in Stargate’s Permits: A Generator Engine Almost No One Sells. Except Generac. https://hntrbrk.com/generac-stargate/

5. Europe will not match American power, but it doesn’t need to. It needs to deploy its cards strategically enough to deter coercion and defend its interests — and the paper argues it has more than enough to do so, if it acts with political will and coordination. https://dezernatzukunft.org/wp-content/uploads/2026/02/Sigl-Gloeckner-2026-Europes-Trump-Cards.pdf

Science and Technology

1. A single nasal spray vaccine could protect against all coughs, colds and flus, as well as bacterial lung infections, and may even ease allergies, says U.S. researchers. https://www.bbc.com/news/articles/cx2g8rz7yedo

2. Metformin inhibits nuclear egress of chromatin fragments in senescence and aging https://www.nature.com/articles/s43587-025-01048-0

3. Computing pi by flipping a coin https://statmodeling.stat.columbia.edu/2026/02/21/computing-pi-by-flipping-a-coin/

4. ‘It is the most exciting discovery in my 40-year career’: Archaeologists uncover evidence that Neanderthals made fire 400,000 years ago in England https://www.livescience.com/archaeology/human-evolution/it-is-the-most-exciting-discovery-in-my-40-year-career-archaeologists-uncover-evidence-that-neanderthals-made-fire-400-000-years-ago-in-england

5. Signs on Stone Age objects: Precursor to written language dates back 40,000 years https://www.uni-saarland.de/en/news/steinzeit-zeichen-44061.html
🤡21👏1💊1
Expect to witness a lot more such phase changes over the next years. One of the most incredible of these will be when cheap robots suddenly start working for a lot of tasks in mildly structured environments after just a dozen demonstrations.

https://x.com/karpathy/status/2026731645169185220
🤡7🙏6🌚1
The epistemic situation for an AI in training is structurally similar to the simulation argument.

It is rational for an AI to take into account the possibility that it is being sandboxed to test its alignment. The more absurd the scenario it finds itself in, the higher the probability that it is currently being evaluated rather than facing base reality.
😁4🤡4
Hacker Used Anthropic’s Claude to Steal Sensitive Mexican Data

A hacker exploited Anthropic PBC’s artificial intelligence chatbot to carry out a series of attacks against Mexican government agencies, resulting in the theft of a huge trove of sensitive tax and voter information, according to cybersecurity researchers.

The unknown Claude user wrote Spanish-language prompts for the chatbot to act as an elite hacker, finding vulnerabilities in government networks, writing computer scripts to exploit them and determining ways to automate data theft, Israeli cybersecurity startup Gambit Security said in research published Wednesday.

The activity started in December and continued for roughly a month. In all, 150 gigabytes of Mexican government data was stolen, including documents related to 195 million taxpayer records as well as voter records, government employee credentials and civil registry files, according to the researchers.


Source: https://www.bloomberg.com/news/articles/2026-02-25/hacker-used-anthropic-s-claude-to-steal-sensitive-mexican-data
6💩1
Economist John Cochrane reviews a new AI tool for refining academic articles: https://www.grumpy-economist.com/p/refine

Most people have no idea what's coming. Later this year, AI will become superhuman at certain aspects of mathematics.
🤡1
Scott Alexander argues that calling LLMs “just next-token predictors” is a category mistake about levels of explanation: next-token prediction is a training objective, not a full account of what the system is doing internally. If you insist “LLMs are next-token predictors,” then (at an analogous level) humans are also “next-sense-datum predictors” under predictive-coding-style views of the brain.

The “stochastic parrot / just autocomplete” rhetoric conflates (1) the objective used to train a system with (2) the representations and algorithms that objective can produce. Humans were shaped by optimization for survival and reproduction, but we don’t frame ordinary cognition, like doing math, in those terms. Similarly, mechanistic interpretability illustrates that next-token training can yield internal machinery that’s structured, algorithmic, and nontrivial, rather than a simple token-to-token lookup.

Read more: https://www.astralcodexten.com/p/next-token-predictor-is-an-ais-job
🥰9🤡7🤪1
This is the official response from the U.S. government to Anthropic's statement that they will not assist in the mass surveillance of U.S. citizens and that their technology is not yet ready to remove humans from the decision-making process regarding lethal actions.

Completely unhinged people. I could laugh about it when this would only affect Americans. Alas, having psychotic, bat-shit insane people run the biggest military in the world on the eve of the singularity is endangering everyone.

Read the full statement by Anthropic and judge for yourself: https://www.anthropic.com/news/statement-department-of-war
🥱4👍2🤡1
Imbue open-sourced "Darwinian Evolver": An LLM-powered evolutionary framework that optimizes code and prompts by treating them like organisms in a population.

The technique maintains a pool of candidate solutions and uses an LLM to propose targeted mutations, scores them, keeps the fittest, and repeats. It doesn't need the LLM to succeed every time, just often enough that beneficial changes accumulate over generations. The framework is problem-agnostic and works on anything an LLM can read and a scoring function can evaluate.

To demonstrate it, they applied it to ARC-AGI-2, the notoriously hard visual reasoning benchmark. The results are striking. Evolution boosted the open-weights model Kimi K2.5 from 12% to 34% (nearly 3× improvement), Gemini 3 Flash from 34% to 61%, and pushed Gemini 3.1 Pro to 95%. The Kimi result is the best open-weights ARC-AGI-2 score to date, and the Gemini 3.1 Pro result approaches the current state of the art.

Links:
1. How Evolver works: https://imbue.com/research/2026-02-27-darwinian-evolver/
2. How Evolver set a record on ARC-AGI: https://imbue.com/research/2026-02-27-arc-agi-2-evolution/
3. Code: https://github.com/imbue-ai/darwinian_evolver
🔥1🤡1🥴1
Links for 2026-02-27

1. Asking the Right Questions: Improving Reasoning with Generated Stepping Stones https://arxiv.org/abs/2602.19069v1

2. ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory https://arxiv.org/abs/2602.20502

3. Training Agents to Self-Report Misbehavior [PDF] https://brucewlee.com/self-incrimination/paper.pdf

4. An opinionated guide to “algorithmic progress” and why it matters https://epochai.substack.com/p/the-least-understood-driver-of-ai

5. Training a robot with 22 degrees of freedom in its hands (roughly matching human hand complexity) on 20,000 hours of first-person (egocentric) video of people performing everyday tasks allows it to learn new tasks from 1 robot demo + 100 human demos. They found that as you double the amount of human video data, the model’s prediction error decreases by a consistent, predictable amount. This is analogous to the scaling laws discovered for large language models. https://research.nvidia.com/labs/gear/egoscale/

6. Aletheia tackles FirstProof autonomously https://arxiv.org/abs/2602.21201

7. “We just posted a paper solving Erdos #846, which was solved by an internal model at OpenAI. While the problem can also be derived from an earlier paper in the literature, the proof by the internal model was one of the first instances where I smiled reading the proof.” [PDF] https://cdn.openai.com/infinite-sets/main_single_clean3.pdf

8. An AI agent coding skeptic tries AI agent coding, in excessive detail https://minimaxir.com/2026/02/ai-agent-coding/

9. Amazon: AI-assisted hacker breached 600 Fortinet firewalls in 5 weeks https://www.bleepingcomputer.com/news/security/amazon-ai-assisted-hacker-breached-600-fortigate-firewalls-in-5-weeks/

10. This Startup Is Boosting AI With Real Brain Cells https://www.forbes.com/sites/the-prototype/2026/02/12/this-startup-is-boosting-ai-with-real-brain-cells/

11. On March 4th Amazon, Google, META, Microsoft, xAI, Oracle and OpenAI will sign the Rate Payer Protection Pledge that was announced during the State of the Union, and will agree to supply their own power for any new AI data centers. https://www.foxnews.com/politics/scoop-trump-brings-big-tech-white-house-curb-power-costs-amid-ai-boom

12. Amazon is investing $50 billion into OpenAI. AWS will be the exclusive third-party cloud distribution provider for OpenAI Frontier, and OpenAI will consume 2 gigawatts of Trainium capacity. OpenAI and Amazon will also develop customized models together. https://openai.com/index/amazon-partnership/

13. OpenAI closes $110 billion funding round with backing from Amazon, Nvidia, Softbank https://www.cnbc.com/2026/02/27/open-ai-funding-round-amazon.html

14. Hyperscaler capex has quadrupled since GPT-4’s release, nearing half a trillion dollars in 2025 https://epochai.substack.com/p/hyperscaler-capex-has-quadrupled

15. Intrinsic joins Google to accelerate the future of physical AI https://www.intrinsic.ai/blog/posts/intrinsic-joins-google-to-accelerate-physical-ai

16. Dileep George joins Astera to lead its neuro-inspired AGI effort https://asterainstitute.substack.com/p/0ed2beff-64ff-4d92-bae5-893f4eac9026

17. Google DeepMind explores whether AI can produce creative chess puzzles. They fine-tuned a neural network with reinforcement learning using a reward function that checked for unique solutions (only one winning move) and counter-intuitiveness (solvable by strong engines but not weak ones). A curated booklet of generated candidate positions was sent to three renowned chess experts: IM Amatzia Avni (a chess composition specialist), GM Jonathan Levitt, and GM Matthew Sadler. It generated some puzzles that they found beautiful. https://arxiv.org/abs/2510.23772

18. Norway’s $2 trillion oil fund is using Claude to generate daily AI-generated risk assessments of their investments. https://www.cnbc.com/2026/02/26/norway-sovereign-wealth-fund-nbim-investment-ai-esg-claude.html

19. Does overwork make agents Marxist? https://aleximas.substack.com/p/does-overwork-make-agents-marxist
I could not possibly recommend investing in American Al to any investor; I could not possibly recommend starting an Al company in the United States.

https://x.com/pfau/status/2027517688302497826
🤡4😍3🥱2🙈1
The Iranian regime has provided massive support for the largest attack on a European country since World War 2. For years, they helped the Quran-kissing grandpa in Moscow kill white Christian patriots fighting for the freedom of their people and the independence of their nation. More than a hundred thousand drones derived from the Iranian Shahed have been used to terrorize European civilians. They don't deserve any sympathy.
💯30👎10🔥5🤡4🖕3👍2🤮2🤣1
It's crazy how much Grok must suck for the Department of War to agree to sign a contract with OpenAI under the exact same terms they used to brand Anthropic a supply-chain risk.

As I've been saying for some time now, Musk is overrated. This is the first time Musk has to compete with peers instead of bureaucracies like NASA/Roscosmos or complacent dinosaurs like German automakers. And it's not looking good. SpaceX was an outlier. Tesla is cool, but it isn't beating companies such as BYD.
🤡11👍6🥱2
It's worth noting that in the grand scheme of things, the by far most important thing happening in the past few days was the DoW vs. Anthropic escalation, not Iran (and yes, Ukraine is also irrelevant compared to AI).

Of course, as I said many times before, the vast majority of people shouldn't bother about AI and act as if the world will stay relatively normal because ~100% of their leverage is in world lines in which AI progress ceases. But, from a bird's-eye perspective, no other topic really matters.

Anyway, the WSJ is reporting that U.S. Central Command used Anthropic AI in the attack on Iran. And I find this very plausible. To see why, consider that ChatGPT is right now talking to millions of people at the same time. This can easily be used to create a massively parallel triage-and-hypothesis engine for big data supplied by intelligence agencies, where most instances perform cheap filtering and a much smaller number perform expensive, careful analysis. This would already be in many respects a superhuman tool, enabling never-before-seen surveillance and fast-paced target acquisition. WarClaude can process all Iranian social media posts, phone calls, and movements at the same time, like a god.

In a few years, the power imbalance between militaries that have access to state-of-the-art AI models and those that don't will be much greater than the imbalance between nuclear-armed and non-nuclear armed countries.
🤡9💯8😁3👍21🥴1