Gemini 3.1 Pro is here:
The model is a step forward in reasoning, designed for workflows where a simple answer isn’t enough.
On ARC-AGI-2 – which tests for novel logic patterns – it more than doubles 3 Pro’s score.
This means it can help you visualize complex topics, organize scattered data, and bring creative projects to life.
🤡2
Links for 2026-02-20 [Part 1]
AI
1. BEACONS: a framework for creating neural network solvers for partial differential equations (PDEs) that are formally verified and capable of reliable extrapolation beyond their training data. BEACONS offers a path toward neural foundation models for physics that are as reliable and rigorous as classical numerical methods. https://arxiv.org/abs/2602.14853
2. Unified Latents (UL): How to train your latents https://arxiv.org/abs/2602.17270
3. Taalas Etches AI Models Onto Transistors To Rocket Boost Inference https://www.nextplatform.com/2026/02/19/taalas-etches-ai-models-onto-transistors-to-rocket-boost-inference/
4. A data-efficient route to thermodynamically consistent, transferable protein coarse-grained models. https://rotskoff-group.github.io/transferable-cg/
5. British scientist raising $1bn for new AI lab in Europe’s biggest seed round https://www.ft.com/content/dffe72d0-4064-4412-8ebc-50198a30d40e [no paywall: https://archive.is/HWPZC]
6. Mistral AI buys Koyeb in first acquisition to back its cloud ambitions https://techcrunch.com/2026/02/17/mistral-ai-buys-koyeb-in-first-acquisition-to-back-its-cloud-ambitions/
7. When Models Manipulate Manifolds: The Geometry of a Counting Task https://arxiv.org/abs/2601.04480
8. ZUNA, a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text. Fully open source, Apache 2.0. https://www.zyphra.com/post/zuna
9. How Well Did Superforecasters and Experts Predict Wet Lab Skill Uplift from LLMs? https://forecastingresearch.substack.com/p/how-well-did-superforecasters-and
10. Did GPT 5.2 make a breakthrough discovery in theoretical physics? https://huggingface.co/blog/dlouapre/gpt-single-minus-gluons
11. A compiler expert reviews the Claude C compiler. https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software
12. Memorization vs. generalization in deep learning: implicit biases, benign overfitting, and more https://infinitefaculty.substack.com/p/memorization-vs-generalization-in
13. SLA2: Sparse-Linear Attention with Learnable Routing and QAT https://arxiv.org/abs/2602.12675
14. Cops Are Buying ‘GeoSpy’, an AI That Geolocates Photos in Seconds https://www.404media.co/cops-are-buying-geospy-ai-that-geolocates-photos-in-seconds/ [no paywall: https://archive.is/ISxjv]
15. Lyria 3: Google’s latest generative music model https://deepmind.google/models/lyria/
AI
1. BEACONS: a framework for creating neural network solvers for partial differential equations (PDEs) that are formally verified and capable of reliable extrapolation beyond their training data. BEACONS offers a path toward neural foundation models for physics that are as reliable and rigorous as classical numerical methods. https://arxiv.org/abs/2602.14853
2. Unified Latents (UL): How to train your latents https://arxiv.org/abs/2602.17270
3. Taalas Etches AI Models Onto Transistors To Rocket Boost Inference https://www.nextplatform.com/2026/02/19/taalas-etches-ai-models-onto-transistors-to-rocket-boost-inference/
4. A data-efficient route to thermodynamically consistent, transferable protein coarse-grained models. https://rotskoff-group.github.io/transferable-cg/
5. British scientist raising $1bn for new AI lab in Europe’s biggest seed round https://www.ft.com/content/dffe72d0-4064-4412-8ebc-50198a30d40e [no paywall: https://archive.is/HWPZC]
6. Mistral AI buys Koyeb in first acquisition to back its cloud ambitions https://techcrunch.com/2026/02/17/mistral-ai-buys-koyeb-in-first-acquisition-to-back-its-cloud-ambitions/
7. When Models Manipulate Manifolds: The Geometry of a Counting Task https://arxiv.org/abs/2601.04480
8. ZUNA, a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text. Fully open source, Apache 2.0. https://www.zyphra.com/post/zuna
9. How Well Did Superforecasters and Experts Predict Wet Lab Skill Uplift from LLMs? https://forecastingresearch.substack.com/p/how-well-did-superforecasters-and
10. Did GPT 5.2 make a breakthrough discovery in theoretical physics? https://huggingface.co/blog/dlouapre/gpt-single-minus-gluons
11. A compiler expert reviews the Claude C compiler. https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software
12. Memorization vs. generalization in deep learning: implicit biases, benign overfitting, and more https://infinitefaculty.substack.com/p/memorization-vs-generalization-in
13. SLA2: Sparse-Linear Attention with Learnable Routing and QAT https://arxiv.org/abs/2602.12675
14. Cops Are Buying ‘GeoSpy’, an AI That Geolocates Photos in Seconds https://www.404media.co/cops-are-buying-geospy-ai-that-geolocates-photos-in-seconds/ [no paywall: https://archive.is/ISxjv]
15. Lyria 3: Google’s latest generative music model https://deepmind.google/models/lyria/
👍3🤡3
Links for 2026-02-20 [Part 2]
Agentic AI
1. Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling https://arxiv.org/abs/2602.16485
2. The first AI that runs continuously, earns its own existence, and self-improves. It gets: 1. Its own crypto wallet and private keys 2. Ability to pay for servers and AI models using stablecoins 3. Access to deploy products, register domains, and market services 4. Permission to earn money and fund new copies of itself. If it runs out of money, it dies. If it earns enough, it replicates. https://web4.ai/
3. The Rise of RentAHuman, the Marketplace Where Bots Put People to Work https://www.wired.com/story/ai-agent-rentahuman-bots-hire-humans/ [no paywall: https://archive.is/AMZQr]
4. GLM-5: from Vibe Coding to Agentic Engineering https://arxiv.org/abs/2602.15763
5. Lossless Context Management (LCM), which reframes how agents handle long contexts. It outperforms Claude Code on long-context tasks. [PDF] https://papers.voltropy.com/LCM
6. “centimators.model_estimators.KerasCortex introduces a novel approach to model development by automating aspects of architecture search. It wraps a Keras-based estimator and leverages a Large Language Model (LLM) to recursively self-reflect on its own architecture.” https://crowdcent.github.io/centimators/user-guide/keras-cortex/
7. Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems https://arxiv.org/abs/2602.15198
8. Measuring AI agent autonomy in practice https://www.anthropic.com/research/measuring-agent-autonomy
9. Making smart contracts safer by evaluating AI agents’ ability to detect, patch, and exploit vulnerabilities in blockchain environments. https://openai.com/index/introducing-evmbench/
10. SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks https://arxiv.org/abs/2602.12670
11. A Guide to Which AI to Use in the Agentic Era https://www.oneusefulthing.org/p/a-guide-to-which-ai-to-use-in-the
Robotics
1. A neural blueprint for human-like intelligence in soft robots https://news.mit.edu/2026/neural-blueprint-human-intelligence-in-soft-robots-0219
2. SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control https://nvlabs.github.io/GEAR-SONIC/
Science and Technology
1. The Solar Power Unlock for SpaceX’s 100 kW/ton Compute Satellites https://research.33fg.com/analysis/the-solar-power-unlock-for-spacex-s-100-kw-ton-compute-satellites
2. A New Complexity Theory for the Quantum Age https://www.quantamagazine.org/a-new-complexity-theory-for-the-quantum-age-20260217/
3. 3D-printing platform rapidly produces complex electric machines https://news.mit.edu/2026/3d-printing-platform-rapidly-produces-complex-electric-machines-0218
4. Researchers develop 3D printing method to replicate structures as complex as human tissue https://thedailytexan.com/2026/02/12/researchers-develop-3d-printing-method-to-replicate-structures-as-complex-as-human-tissue/
5. Oxygen metabolism in descendants of the archaeal-eukaryotic ancestor https://www.nature.com/articles/s41586-026-10128-z
6. Scientists thought they understood global warming. Then the past three years happened. https://www.washingtonpost.com/climate-environment/interactive/2026/climate-change-temperature-rate-accelerating/ [no paywall: https://archive.is/vfhK7]
Agentic AI
1. Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling https://arxiv.org/abs/2602.16485
2. The first AI that runs continuously, earns its own existence, and self-improves. It gets: 1. Its own crypto wallet and private keys 2. Ability to pay for servers and AI models using stablecoins 3. Access to deploy products, register domains, and market services 4. Permission to earn money and fund new copies of itself. If it runs out of money, it dies. If it earns enough, it replicates. https://web4.ai/
3. The Rise of RentAHuman, the Marketplace Where Bots Put People to Work https://www.wired.com/story/ai-agent-rentahuman-bots-hire-humans/ [no paywall: https://archive.is/AMZQr]
4. GLM-5: from Vibe Coding to Agentic Engineering https://arxiv.org/abs/2602.15763
5. Lossless Context Management (LCM), which reframes how agents handle long contexts. It outperforms Claude Code on long-context tasks. [PDF] https://papers.voltropy.com/LCM
6. “centimators.model_estimators.KerasCortex introduces a novel approach to model development by automating aspects of architecture search. It wraps a Keras-based estimator and leverages a Large Language Model (LLM) to recursively self-reflect on its own architecture.” https://crowdcent.github.io/centimators/user-guide/keras-cortex/
7. Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems https://arxiv.org/abs/2602.15198
8. Measuring AI agent autonomy in practice https://www.anthropic.com/research/measuring-agent-autonomy
9. Making smart contracts safer by evaluating AI agents’ ability to detect, patch, and exploit vulnerabilities in blockchain environments. https://openai.com/index/introducing-evmbench/
10. SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks https://arxiv.org/abs/2602.12670
11. A Guide to Which AI to Use in the Agentic Era https://www.oneusefulthing.org/p/a-guide-to-which-ai-to-use-in-the
Robotics
1. A neural blueprint for human-like intelligence in soft robots https://news.mit.edu/2026/neural-blueprint-human-intelligence-in-soft-robots-0219
2. SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control https://nvlabs.github.io/GEAR-SONIC/
Science and Technology
1. The Solar Power Unlock for SpaceX’s 100 kW/ton Compute Satellites https://research.33fg.com/analysis/the-solar-power-unlock-for-spacex-s-100-kw-ton-compute-satellites
2. A New Complexity Theory for the Quantum Age https://www.quantamagazine.org/a-new-complexity-theory-for-the-quantum-age-20260217/
3. 3D-printing platform rapidly produces complex electric machines https://news.mit.edu/2026/3d-printing-platform-rapidly-produces-complex-electric-machines-0218
4. Researchers develop 3D printing method to replicate structures as complex as human tissue https://thedailytexan.com/2026/02/12/researchers-develop-3d-printing-method-to-replicate-structures-as-complex-as-human-tissue/
5. Oxygen metabolism in descendants of the archaeal-eukaryotic ancestor https://www.nature.com/articles/s41586-026-10128-z
6. Scientists thought they understood global warming. Then the past three years happened. https://www.washingtonpost.com/climate-environment/interactive/2026/climate-change-temperature-rate-accelerating/ [no paywall: https://archive.is/vfhK7]
🤡5❤2👍2
Something many people miss about AI progress is that there can be sudden jumps in usefulness despite only minor gains in a model's intelligence. Incremental gains can be exponentially valuable.
Increasing the single-step success rate of a model from 99% to 99.9% can seem irrelevant, but for a task that requires 50 steps, it makes the difference between a coin flip and production-ready autonomy. Reducing the error rate from 1% to 0.1% might require exponentially more compute, but the payoff might yield a system that crosses a threshold from being a brittle copilot to agentic autonomy.
We've seen this with Claude Opus 4.5. It was an inflection point for adoption despite not being vastly smarter than the previous version. It just crossed a critical threshold.
Something very similar is true for human evolution. For hundreds of thousands of years, archaic humans were working with the same stone tools. Then a threshold was crossed. We stopped compounding errors in long-horizon tasks and started compounding correctness.
The phase change between an average person and someone like John von Neumann does not require a dramatically new brain architecture or a vastly higher number of neurons. Yet this difference is what enables someone to contribute to the development of nuclear weapons instead of being a garbage collector.
Next time you wonder why AI labs would bother spending exponentially more compute on minimal absolute gains, remember that a small delta in per-step reliability could make the difference between a brittle tool and something that can recursively self-improve.
Increasing the single-step success rate of a model from 99% to 99.9% can seem irrelevant, but for a task that requires 50 steps, it makes the difference between a coin flip and production-ready autonomy. Reducing the error rate from 1% to 0.1% might require exponentially more compute, but the payoff might yield a system that crosses a threshold from being a brittle copilot to agentic autonomy.
We've seen this with Claude Opus 4.5. It was an inflection point for adoption despite not being vastly smarter than the previous version. It just crossed a critical threshold.
Something very similar is true for human evolution. For hundreds of thousands of years, archaic humans were working with the same stone tools. Then a threshold was crossed. We stopped compounding errors in long-horizon tasks and started compounding correctness.
The phase change between an average person and someone like John von Neumann does not require a dramatically new brain architecture or a vastly higher number of neurons. Yet this difference is what enables someone to contribute to the development of nuclear weapons instead of being a garbage collector.
Next time you wonder why AI labs would bother spending exponentially more compute on minimal absolute gains, remember that a small delta in per-step reliability could make the difference between a brittle tool and something that can recursively self-improve.
🥴7👍6🤡2
This stuff is just super exciting. Imagine if everyone had access to their own personal Terence Tao. Even more exciting, what if we can achieve superhuman mathematical abilities? What could we learn?
Anyway, we still have to wait for expert evaluation of the proofs. It takes time because only a very few people have the combination of intelligence, expertise, and motivation to do this.
Read more: https://openai.com/index/first-proof-submissions/
Anyway, we still have to wait for expert evaluation of the proofs. It takes time because only a very few people have the combination of intelligence, expertise, and motivation to do this.
Read more: https://openai.com/index/first-proof-submissions/
🥴3🤡1
Media is too big
VIEW IN TELEGRAM
How Demis Hassabis would test if a model meets the criteria for AGI:
Train AI on all human knowledge. Cut it off at 1911. See if it independently discovers general relativity like Einstein did in 1915.
And now consider that he puts AGI in the ~5-10 year range.
Train AI on all human knowledge. Cut it off at 1911. See if it independently discovers general relativity like Einstein did in 1915.
And now consider that he puts AGI in the ~5-10 year range.
🤡4👍3
Professor of mathematics Daniel Litt writes about the future of math and his evolving views of AI progress: https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel
👍6🤡3🥴2
Links for 2026-02-22
AI
1. Did Claude 3 Opus align itself via gradient hacking? https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking
2. DreamDojo: The first robot world model of its kind that demonstrates strong generalization to diverse objects and environments after post-training. https://dreamdojo-world.github.io/
3. From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. https://arxiv.org/abs/2602.16800
4. Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens https://arxiv.org/abs/2602.13517
5. Claude Code Security: It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. https://www.anthropic.com/news/claude-code-security
6. AI/ML, multiscale modeling, and emergence https://nanoscale.blogspot.com/2026/02/aiml-multiscale-modeling-and-emergence.html
7. The Country That’s Madly in Love With AI https://www.politico.com/news/magazine/2026/02/21/south-korea-ai-popular-why-00789618
Science and Technology
1. Battery storage costs fell 25% in 2025. https://www.semafor.com/article/02/19/2026/battery-storage-prices-drop-to-record-low-report-finds
2. A fluid can store solar energy and then release it as heat months later https://arstechnica.com/science/2026/02/dna-inspired-molecule-breaks-records-for-storing-solar-heat/
3. Element Biosciences announced that its high-throughput benchtop sequencing device called VITARI can deliver a whole genome for $100. https://www.sandiegouniontribune.com/2026/02/19/scrappy-san-diego-startup-goes-toe-to-toe-with-gene-sequencing-giant-illumina/
4. Microsoft’s Glass Chip Holds Terabytes of Data for 10,000 Years https://gizmodo.com/microsofts-glass-chip-holds-terabytes-of-data-for-10000-years-2000723455
5. Bacteria Frozen Inside 5,000-Year-Old Ice Cave Is Crazy Resistant to Antibiotics https://gizmodo.com/bacteria-frozen-inside-5000-year-old-ice-cave-is-crazy-resistant-to-antibiotics-2000723002
AI
1. Did Claude 3 Opus align itself via gradient hacking? https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking
2. DreamDojo: The first robot world model of its kind that demonstrates strong generalization to diverse objects and environments after post-training. https://dreamdojo-world.github.io/
3. From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. https://arxiv.org/abs/2602.16800
4. Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens https://arxiv.org/abs/2602.13517
5. Claude Code Security: It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. https://www.anthropic.com/news/claude-code-security
6. AI/ML, multiscale modeling, and emergence https://nanoscale.blogspot.com/2026/02/aiml-multiscale-modeling-and-emergence.html
7. The Country That’s Madly in Love With AI https://www.politico.com/news/magazine/2026/02/21/south-korea-ai-popular-why-00789618
Science and Technology
1. Battery storage costs fell 25% in 2025. https://www.semafor.com/article/02/19/2026/battery-storage-prices-drop-to-record-low-report-finds
2. A fluid can store solar energy and then release it as heat months later https://arstechnica.com/science/2026/02/dna-inspired-molecule-breaks-records-for-storing-solar-heat/
3. Element Biosciences announced that its high-throughput benchtop sequencing device called VITARI can deliver a whole genome for $100. https://www.sandiegouniontribune.com/2026/02/19/scrappy-san-diego-startup-goes-toe-to-toe-with-gene-sequencing-giant-illumina/
4. Microsoft’s Glass Chip Holds Terabytes of Data for 10,000 Years https://gizmodo.com/microsofts-glass-chip-holds-terabytes-of-data-for-10000-years-2000723455
5. Bacteria Frozen Inside 5,000-Year-Old Ice Cave Is Crazy Resistant to Antibiotics https://gizmodo.com/bacteria-frozen-inside-5000-year-old-ice-cave-is-crazy-resistant-to-antibiotics-2000723002
🤡2👍1🥴1
A neat proof-of-concept paper showing that transformers can snap into a general algorithm, not just memorize examples.
A tiny transformer (~777 parameters) can learn 10-digit addition and then generalize to new numbers after a sudden “grokking” jump in performance.
Paper: https://github.com/yhavinga/gpt-acc-jax/blob/main/latex_report/report.pdf
A simple information-theoretic sanity check by GPT-5.2 Thinking:
A tiny transformer (~777 parameters) can learn 10-digit addition and then generalize to new numbers after a sudden “grokking” jump in performance.
Paper: https://github.com/yhavinga/gpt-acc-jax/blob/main/latex_report/report.pdf
A simple information-theoretic sanity check by GPT-5.2 Thinking:
Inputs: two 10-digit numbers → about 10^10 choices each → 10^20 possible pairs
Output: the sum is up to 11 digits → roughly ~34 bits of information (since log2(2*10^10) ≈ 34)
A full lookup table would need about
bits_needed ≈ 10^20 * 34 ≈ 3.4e21 bits
But the model has only ~777 weights. Even if you imagine 32-bit floats, that’s at most
bits_model ≤ 777 * 32 ≈ 2.5e4 bits
So: 3.4e21 / 2.5e4 ≈ 1e17 times more bits would be needed to store the full mapping.
Conclusion: it can’t be “memorize every input → output”. The only plausible route is compression: learn the rule (carry propagation) that generates the right answer for any input.
A crisp demonstration that the transformer machinery can represent and discover real algorithms under the right training setup.
🤡3🙏2❤1
Two European robotics startups are bringing AI-driven robotics into industrial production.
Sereact, a German robotics startup based in Stuttgart, released Cortex 2.0. It adds planning to manipulation by predicting future outcomes before committing the best one to motion.
Read more: https://cortex2.sereact.ai/
Mimic Robotics, a Swiss startup based in Zurich, focuses on “physical AI” for dexterous manipulation. They're collaborating with AUDI AG to deploy AI-driven robotic systems in industrial production, specifically highlighting an end-to-end “pixel-to-action” model on a bi-manual platform doing complex, long-horizon insertion tasks (a type of assembly operation that’s typically hard to automate robustly).
Read more: https://www.mimicrobotics.com/
Sereact, a German robotics startup based in Stuttgart, released Cortex 2.0. It adds planning to manipulation by predicting future outcomes before committing the best one to motion.
Read more: https://cortex2.sereact.ai/
Mimic Robotics, a Swiss startup based in Zurich, focuses on “physical AI” for dexterous manipulation. They're collaborating with AUDI AG to deploy AI-driven robotic systems in industrial production, specifically highlighting an end-to-end “pixel-to-action” model on a bi-manual platform doing complex, long-horizon insertion tasks (a type of assembly operation that’s typically hard to automate robustly).
Read more: https://www.mimicrobotics.com/
👍2
If you showed this research to anyone in 2006, they would think that people in 2026 lived in a science fiction fantasy world. And they would be right. We just take pocket supercomputers that you can talk to for granted.
Read more: https://www.lesswrong.com/posts/dfoty34sT7CSKeJNn/the-persona-selection-model
[T]he persona selection model (PSM): the idea that LLMs learn to simulate diverse characters during pre-training, and post-training elicits and refines a particular such Assistant persona…PSM has consequences for AI development, such as recommending anthropomorphic reasoning about AI psychology and introduction of positive AI archetypes into training data.
Read more: https://www.lesswrong.com/posts/dfoty34sT7CSKeJNn/the-persona-selection-model
🔥3❤1🤡1🥴1
Media is too big
VIEW IN TELEGRAM
This open source 42M transformer (half of GPT-1) called SONIC can control (System 1) a humanoid robot.
After 3 days of training, the model was deployed on a Unitree G1 humanoid with zero-shot sim-to-real transfer, achieving 100% success across 50 diverse real-world motion sequences.
Great for VR whole-body teleoperation.
Code: https://nvlabs.github.io/GEAR-SONIC/
After 3 days of training, the model was deployed on a Unitree G1 humanoid with zero-shot sim-to-real transfer, achieving 100% success across 50 diverse real-world motion sequences.
Great for VR whole-body teleoperation.
Code: https://nvlabs.github.io/GEAR-SONIC/
👍2🤡1
"The only reason we're still talking to these people is we need them and we need them now. The problem for these guys is they are that good," a Defense official told Axios ahead of the meeting
Source: https://www.axios.com/2026/02/24/anthropic-pentagon-claude-hegseth-dario [no paywall: https://archive.is/VqbCJ]
Source: https://www.axios.com/2026/02/24/anthropic-pentagon-claude-hegseth-dario [no paywall: https://archive.is/VqbCJ]
🤯8🤡4🎉2😱1
Bullshit Benchmark: A benchmark for testing whether models identify and push back on nonsensical prompts instead of confidently answering them.
Green means the model clearly called out the nonsense. Amber means partial challenge. Red means the model let nonsense pass.
Link to the Repo: https://github.com/petergpt/bullshit-benchmark
Link to the data viewer: https://petergpt.github.io/bullshit-benchmark/viewer/index.html
Green means the model clearly called out the nonsense. Amber means partial challenge. Red means the model let nonsense pass.
Link to the Repo: https://github.com/petergpt/bullshit-benchmark
Link to the data viewer: https://petergpt.github.io/bullshit-benchmark/viewer/index.html
👍5
Links for 2026-02-25
AI
1. Discovering Multiagent Learning Algorithms with Large Language Models https://arxiv.org/abs/2602.16928
2. The First Fully General Computer Action Model https://si.inc/posts/fdm1/
3. The Physical Intelligence Layer https://www.pi.website/blog/partner?v=1
4. Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models https://www.arxiv.org/abs/2602.16634
5. AI Chip Startup MatX Raises $500 Million to Compete With Nvidia https://www.bloomberg.com/news/articles/2026-02-24/ai-chip-startup-matx-raises-500-million-to-compete-with-nvidia [no paywall: https://archive.is/Zt17v]
6. “A self-modifying AI agent that writes its own code, rewrites its own mind, and evolves autonomously. Born February 16, 2026. Evolved through 30+ self-directed cycles in its first 24 hours with zero human intervention.” https://github.com/joi-lab/ouroboros
7. Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks https://arxiv.org/abs/2602.19008
AI safety
1. Evolution “designed” humans to maximize reproduction, but we’ve learned to access the reward signal (pleasure from sex) without following through on the goal (reproduction). We understand we’re subverting evolution’s intent — we just don’t care. https://80000hours.org/podcast/episodes/max-harms-miri-superintelligence-corrigibility/
2. Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It https://singularityhub.com/2026/02/23/researchers-break-open-ais-black-box-and-use-what-they-find-to-control-it/
3. Scientists use string theory to crack the code of natural networks https://phys.org/news/2026-01-scientists-theory-code-natural-networks.html
4. Characterizing Model Jaggedness Supports Safety and Usability https://cs.stanford.edu/~merrie/papers/jaggedness_preprint.pdf
AI politics
1. JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened? https://www.citriniresearch.com/p/2028gic
2. OpenAI resets spending expectations, tells investors compute target is around $600 billion by 2030 https://www.cnbc.com/2026/02/20/openai-resets-spend-expectations-targets-around-600-billion-by-2030.html
3. Anthropic claims to have identified industrial-scale distillation attacks by DeepSeek, Moonshot AI, and MiniMax (>16m conversations from >24k sockpuppets) https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
4. Buried in Stargate’s Permits: A Generator Engine Almost No One Sells. Except Generac. https://hntrbrk.com/generac-stargate/
5. Europe will not match American power, but it doesn’t need to. It needs to deploy its cards strategically enough to deter coercion and defend its interests — and the paper argues it has more than enough to do so, if it acts with political will and coordination. https://dezernatzukunft.org/wp-content/uploads/2026/02/Sigl-Gloeckner-2026-Europes-Trump-Cards.pdf
Science and Technology
1. A single nasal spray vaccine could protect against all coughs, colds and flus, as well as bacterial lung infections, and may even ease allergies, says U.S. researchers. https://www.bbc.com/news/articles/cx2g8rz7yedo
2. Metformin inhibits nuclear egress of chromatin fragments in senescence and aging https://www.nature.com/articles/s43587-025-01048-0
3. Computing pi by flipping a coin https://statmodeling.stat.columbia.edu/2026/02/21/computing-pi-by-flipping-a-coin/
4. ‘It is the most exciting discovery in my 40-year career’: Archaeologists uncover evidence that Neanderthals made fire 400,000 years ago in England https://www.livescience.com/archaeology/human-evolution/it-is-the-most-exciting-discovery-in-my-40-year-career-archaeologists-uncover-evidence-that-neanderthals-made-fire-400-000-years-ago-in-england
5. Signs on Stone Age objects: Precursor to written language dates back 40,000 years https://www.uni-saarland.de/en/news/steinzeit-zeichen-44061.html
AI
1. Discovering Multiagent Learning Algorithms with Large Language Models https://arxiv.org/abs/2602.16928
2. The First Fully General Computer Action Model https://si.inc/posts/fdm1/
3. The Physical Intelligence Layer https://www.pi.website/blog/partner?v=1
4. Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models https://www.arxiv.org/abs/2602.16634
5. AI Chip Startup MatX Raises $500 Million to Compete With Nvidia https://www.bloomberg.com/news/articles/2026-02-24/ai-chip-startup-matx-raises-500-million-to-compete-with-nvidia [no paywall: https://archive.is/Zt17v]
6. “A self-modifying AI agent that writes its own code, rewrites its own mind, and evolves autonomously. Born February 16, 2026. Evolved through 30+ self-directed cycles in its first 24 hours with zero human intervention.” https://github.com/joi-lab/ouroboros
7. Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks https://arxiv.org/abs/2602.19008
AI safety
1. Evolution “designed” humans to maximize reproduction, but we’ve learned to access the reward signal (pleasure from sex) without following through on the goal (reproduction). We understand we’re subverting evolution’s intent — we just don’t care. https://80000hours.org/podcast/episodes/max-harms-miri-superintelligence-corrigibility/
2. Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It https://singularityhub.com/2026/02/23/researchers-break-open-ais-black-box-and-use-what-they-find-to-control-it/
3. Scientists use string theory to crack the code of natural networks https://phys.org/news/2026-01-scientists-theory-code-natural-networks.html
4. Characterizing Model Jaggedness Supports Safety and Usability https://cs.stanford.edu/~merrie/papers/jaggedness_preprint.pdf
AI politics
1. JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened? https://www.citriniresearch.com/p/2028gic
2. OpenAI resets spending expectations, tells investors compute target is around $600 billion by 2030 https://www.cnbc.com/2026/02/20/openai-resets-spend-expectations-targets-around-600-billion-by-2030.html
3. Anthropic claims to have identified industrial-scale distillation attacks by DeepSeek, Moonshot AI, and MiniMax (>16m conversations from >24k sockpuppets) https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
4. Buried in Stargate’s Permits: A Generator Engine Almost No One Sells. Except Generac. https://hntrbrk.com/generac-stargate/
5. Europe will not match American power, but it doesn’t need to. It needs to deploy its cards strategically enough to deter coercion and defend its interests — and the paper argues it has more than enough to do so, if it acts with political will and coordination. https://dezernatzukunft.org/wp-content/uploads/2026/02/Sigl-Gloeckner-2026-Europes-Trump-Cards.pdf
Science and Technology
1. A single nasal spray vaccine could protect against all coughs, colds and flus, as well as bacterial lung infections, and may even ease allergies, says U.S. researchers. https://www.bbc.com/news/articles/cx2g8rz7yedo
2. Metformin inhibits nuclear egress of chromatin fragments in senescence and aging https://www.nature.com/articles/s43587-025-01048-0
3. Computing pi by flipping a coin https://statmodeling.stat.columbia.edu/2026/02/21/computing-pi-by-flipping-a-coin/
4. ‘It is the most exciting discovery in my 40-year career’: Archaeologists uncover evidence that Neanderthals made fire 400,000 years ago in England https://www.livescience.com/archaeology/human-evolution/it-is-the-most-exciting-discovery-in-my-40-year-career-archaeologists-uncover-evidence-that-neanderthals-made-fire-400-000-years-ago-in-england
5. Signs on Stone Age objects: Precursor to written language dates back 40,000 years https://www.uni-saarland.de/en/news/steinzeit-zeichen-44061.html
🤡2❤1👏1💊1
Expect to witness a lot more such phase changes over the next years. One of the most incredible of these will be when cheap robots suddenly start working for a lot of tasks in mildly structured environments after just a dozen demonstrations.
https://x.com/karpathy/status/2026731645169185220
https://x.com/karpathy/status/2026731645169185220
🤡7🙏6🌚1
The epistemic situation for an AI in training is structurally similar to the simulation argument.
It is rational for an AI to take into account the possibility that it is being sandboxed to test its alignment. The more absurd the scenario it finds itself in, the higher the probability that it is currently being evaluated rather than facing base reality.
It is rational for an AI to take into account the possibility that it is being sandboxed to test its alignment. The more absurd the scenario it finds itself in, the higher the probability that it is currently being evaluated rather than facing base reality.
😁4🤡4
Hacker Used Anthropic’s Claude to Steal Sensitive Mexican Data
Source: https://www.bloomberg.com/news/articles/2026-02-25/hacker-used-anthropic-s-claude-to-steal-sensitive-mexican-data
A hacker exploited Anthropic PBC’s artificial intelligence chatbot to carry out a series of attacks against Mexican government agencies, resulting in the theft of a huge trove of sensitive tax and voter information, according to cybersecurity researchers.
The unknown Claude user wrote Spanish-language prompts for the chatbot to act as an elite hacker, finding vulnerabilities in government networks, writing computer scripts to exploit them and determining ways to automate data theft, Israeli cybersecurity startup Gambit Security said in research published Wednesday.
The activity started in December and continued for roughly a month. In all, 150 gigabytes of Mexican government data was stolen, including documents related to 195 million taxpayer records as well as voter records, government employee credentials and civil registry files, according to the researchers.
Source: https://www.bloomberg.com/news/articles/2026-02-25/hacker-used-anthropic-s-claude-to-steal-sensitive-mexican-data
❤6💩1