Media is too big
VIEW IN TELEGRAM
GEN-1: Generalist’s new embodied foundation model hits ~99% success rates on real-world manipulation tasks, adapts to interruptions in real time, and learns new tasks with surprisingly little robot-specific data.
Read more: https://generalistai.com/blog/apr-02-2026-GEN-1
Read more: https://generalistai.com/blog/apr-02-2026-GEN-1
🔥2🤡2
This media is not supported in your browser
VIEW IN TELEGRAM
Watching bullets impact glass at 10 million frames per second, fast enough to follow the 2.5 km/s shockwave it creates, or even detect a surprising 13.7 km/s ripple speeding ahead.
Watch the full video: https://youtu.be/IM4zZchluX0
Watch the full video: https://youtu.be/IM4zZchluX0
🔥4😱1
Anthropic is in talks over a further $30 billion funding round at a valuation of $950 billion. The AI firm is perhaps the fastest-growing company in history by valuation: It was founded just five years ago, and if the latest talks do not fall through, then it would have increased its valuation 2.5 times in just three months. Investors poured a record $300 billion into startups in the first quarter, close to 70% of all venture capital spending in 2025, with the latest bets heavily concentrated on Anthropic, OpenAI, Waymo, and xAI. Investors are also betting on AI uses beyond chatbots; the Alphabet-backed AI drug research firm Isomorphic raised $2.1 billion.
🔥4🤡1
The number of students enrolled in the University of California, San Diego's remedial math course, which aims to bridge fundamental learning gaps, increased from 32 in fall 2020 to approximately 1,000 in fall 2025, accounting for 12% of the student body.
25% of UCSD remedial math students reportedly failed to solve 7 + 2 = x + 6.
Many students needing remedial math had transcripts saying they were strong math students. 42% of those below middle-school level reported completing calculus or precalculus.
Grades have become so inflated that students can reach college believing, or being told, they are calculus-ready while lacking middle-school algebra skills.
Read more: https://thezvi.substack.com/p/childhood-and-education-18-do-the
25% of UCSD remedial math students reportedly failed to solve 7 + 2 = x + 6.
Many students needing remedial math had transcripts saying they were strong math students. 42% of those below middle-school level reported completing calculus or precalculus.
Grades have become so inflated that students can reach college believing, or being told, they are calculus-ready while lacking middle-school algebra skills.
Read more: https://thezvi.substack.com/p/childhood-and-education-18-do-the
😱8🌚7😢3🤣3🙊3
UK government's AI Security Institute received access to a newer Mythos Preview checkpoint.
On a 32-step corporate network attack they estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.
Quote: The length of tasks frontier models can autonomously complete in our narrow cyber suite has been doubling every few months. This doubling rate has become faster over time, and recent models exceeded our previous trends.
Read more: https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber-capability-advancing
On a 32-step corporate network attack they estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.
Quote: The length of tasks frontier models can autonomously complete in our narrow cyber suite has been doubling every few months. This doubling rate has become faster over time, and recent models exceeded our previous trends.
Read more: https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber-capability-advancing
🤡2🔥1
Anthropic has passed OpenAI in business adoption for the first time: https://ramp.com/leading-indicators/ai-index-may-2026
🥱2
Superstar AI researchers are paid >10× more than their frontier lab colleagues, and >100× more than most postdocs.
https://epochai.substack.com/p/the-economics-of-superstar-ai-researchers
https://epochai.substack.com/p/the-economics-of-superstar-ai-researchers
🔥5🤔2🤡2💩1
What happens when you post a real Monet and say it’s AI? Art social experiment: https://x.com/SHL0MS/status/2054280631807316329
😁17🔥1
If the public gets its way, this will be a vastly bigger blunder than Germany's nuclear phase-out.
Whatever country steps in and allows the unrestricted construction of data centers and power plants will have massive leverage, even without any AI of its own.
Of course, the future of data centers is in space.
Source: https://news.gallup.com/poll/709772/americans-oppose-data-centers-area.aspx
Whatever country steps in and allows the unrestricted construction of data centers and power plants will have massive leverage, even without any AI of its own.
Of course, the future of data centers is in space.
Source: https://news.gallup.com/poll/709772/americans-oppose-data-centers-area.aspx
🤡7❤4💊2
This media is not supported in your browser
VIEW IN TELEGRAM
Can frontier AI coding agents write a complete Game Boy Advance emulator from scratch in 24 hours?
A new benchmark: https://gbaeval.com/
GPT-5.5's emulator runs games best, with Claude Sonnet 4.6 and Opus 4.7 close behind. Gemini 3.1 Pro failed to produce a working emulator.
A new benchmark: https://gbaeval.com/
GPT-5.5's emulator runs games best, with Claude Sonnet 4.6 and Opus 4.7 close behind. Gemini 3.1 Pro failed to produce a working emulator.
❤3🤡2👍1
The article claims that two Ukrainian long-range drones may have autonomously selected and struck an oil facility in Rezekne, Latvia, after being knocked off course by Russian electronic warfare. If true, this could be one of the first known cases in warfare where an AI-enabled weapon chose a target without direct human control.
Janis Sarts, head of NATO’s Strategic Communications Centre of Excellence, says that the drones were programmed to seek Russian oil infrastructure, lost their way, visually identified a similar-looking oil facility in Latvia, and struck it autonomously.
A Latvian army drone expert made a similar point: long-range drones increasingly contain “the germs of artificial intelligence,” meaning they can search for preprogrammed target types, but may not reliably understand borders or context. In this interpretation, the incident was not Russia redirecting the drones, but an AI targeting failure.
Read more: https://www.theglobeandmail.com/world/article-latvian-government-collapses-after-ukrainian-drones-possibly/ [no paywall: https://archive.is/3jwvg]
Janis Sarts, head of NATO’s Strategic Communications Centre of Excellence, says that the drones were programmed to seek Russian oil infrastructure, lost their way, visually identified a similar-looking oil facility in Latvia, and struck it autonomously.
A Latvian army drone expert made a similar point: long-range drones increasingly contain “the germs of artificial intelligence,” meaning they can search for preprogrammed target types, but may not reliably understand borders or context. In this interpretation, the incident was not Russia redirecting the drones, but an AI targeting failure.
Read more: https://www.theglobeandmail.com/world/article-latvian-government-collapses-after-ukrainian-drones-possibly/ [no paywall: https://archive.is/3jwvg]
🤔3👍2🍌1😭1
Links for 2026-05-15
AI
1. Multi-Stream LLMs: parallel thoughts, inputs and outputs. https://arxiv.org/abs/2605.12460
2. Efficient pre-training with Token Superposition. https://arxiv.org/abs/2605.06546
3. physics-intern: agentic physics research; Gemini 3.1 Pro rises from 17.7% to 31.4% on CritPt. https://huggingface.co/spaces/huggingface/physics-intern
4. Prime Intellect automated nanogpt-speedruns: GPT 5.5 and Opus 4.7 beat the human baseline after 10k runs / 14k H200 hours; Opus record: 2930 steps. https://www.primeintellect.ai/auto-nanogpt
5. Poetiq’s Meta-System built its own coding harness and reached SOTA on LiveCodeBench Pro using standard APIs. https://poetiq.ai/posts/recursive_self_improvement_coding/
6. Google Aletheia appears to have found a proof strategy for a Kirby/K3 open problem; humans checked and wrote it up. https://arxiv.org/abs/2605.08122
7. Flux Matching: generative modeling beyond diffusion / score functions. https://arxiv.org/abs/2605.07319
8. Codex with GPT-5.5 xhigh found a KL-cache trick for full-vocab distillation. https://jonathanc.net/blog/kl-cache-trick
9. DeepMind partners with EVE Online for AI model testing. https://arstechnica.com/gaming/2026/05/google-deepmind-partners-with-eve-online-for-ai-model-testing/
10. Major researchers join a $4B self-improving AI effort. https://www.nytimes.com/2026/05/13/technology/recursive-superintelligence-funding-ai.html [no paywall: https://archive.is/fV8as]
11. Anthropic passes OpenAI in business adoption. https://ramp.com/leading-indicators/ai-index-may-2026
12. Anthropic sketches two possible 2028 worlds with transformative AI. https://www.anthropic.com/research/2028-ai-leadership
13. Bessent expects a big LLM “step-function jump” from Gemini and OpenAI. https://www.cnbc.com/2026/05/14/us-china-ai-rules-bessent-us-lead.html
14. AI diffusion may not be automatic; frontier access could be rationed by trust, money, compute and geopolitics. https://writing.antonleicht.iss.one/p/cut-off
COMPUTER SCIENCE
1. Zero-knowledge proofs meet Gödelian unknowability. https://www.quantamagazine.org/how-unknowable-math-can-help-hide-secrets-20260511/
2. Air-gapped data exfiltration via CPU workloads and magnetic signals. https://arxiv.org/abs/1802.02700
BEHAVIOR
1. Selected-for vs intentional, world-model-based behavior. https://www.lesswrong.com/posts/GhhNswGB6butBhmE6/optimisation-selective-versus-predictive
2. Sawtooth Problems. https://www.lesswrong.com/posts/iyLirpAeQotmZK4QC/sawtooth-problems
ENGINEERING
1. Organ-scale rewarming for reversible cryopreservation using alternating magnetic fields. https://www.untillabs.com/blog/rewarming
2. Fiber optic cables can eavesdrop on nearby conversations. https://www.science.org/content/article/fiber-optic-cables-can-eavesdrop-nearby-conversations
3. Rapid atomic rearrangement to “reprogram” materials. https://news.mit.edu/2026/researchers-reprogram-materials-quickly-rearranging-their-atoms-0513
PHYSICS
1. “Negative time”: postselected photons through atom clouds can yield a backward-pointing weak-measurement clock reading. https://singularityhub.com/2026/05/14/physicists-have-measured-negative-time-in-the-lab/
2. Largest physicist survey: deep disagreement on the Big Bang, quantum measurement, many worlds, string theory and foundations. [PDF] https://nafshordi.github.io/aps-dashboard/APS_survey_Arxiv_paper.pdf
MISCELLANEOUS
1. Nostalgebraist’s theory of taste. https://www.astralcodexten.com/p/nostalgebraists-hydrogen-jukeboxes
2. Three easy proofs of Pythagoras’ Theorem. https://cameroncounts.wordpress.com/2026/05/08/three-easy-proofs-of-pythagoras-theorem/
AI
1. Multi-Stream LLMs: parallel thoughts, inputs and outputs. https://arxiv.org/abs/2605.12460
2. Efficient pre-training with Token Superposition. https://arxiv.org/abs/2605.06546
3. physics-intern: agentic physics research; Gemini 3.1 Pro rises from 17.7% to 31.4% on CritPt. https://huggingface.co/spaces/huggingface/physics-intern
4. Prime Intellect automated nanogpt-speedruns: GPT 5.5 and Opus 4.7 beat the human baseline after 10k runs / 14k H200 hours; Opus record: 2930 steps. https://www.primeintellect.ai/auto-nanogpt
5. Poetiq’s Meta-System built its own coding harness and reached SOTA on LiveCodeBench Pro using standard APIs. https://poetiq.ai/posts/recursive_self_improvement_coding/
6. Google Aletheia appears to have found a proof strategy for a Kirby/K3 open problem; humans checked and wrote it up. https://arxiv.org/abs/2605.08122
7. Flux Matching: generative modeling beyond diffusion / score functions. https://arxiv.org/abs/2605.07319
8. Codex with GPT-5.5 xhigh found a KL-cache trick for full-vocab distillation. https://jonathanc.net/blog/kl-cache-trick
9. DeepMind partners with EVE Online for AI model testing. https://arstechnica.com/gaming/2026/05/google-deepmind-partners-with-eve-online-for-ai-model-testing/
10. Major researchers join a $4B self-improving AI effort. https://www.nytimes.com/2026/05/13/technology/recursive-superintelligence-funding-ai.html [no paywall: https://archive.is/fV8as]
11. Anthropic passes OpenAI in business adoption. https://ramp.com/leading-indicators/ai-index-may-2026
12. Anthropic sketches two possible 2028 worlds with transformative AI. https://www.anthropic.com/research/2028-ai-leadership
13. Bessent expects a big LLM “step-function jump” from Gemini and OpenAI. https://www.cnbc.com/2026/05/14/us-china-ai-rules-bessent-us-lead.html
14. AI diffusion may not be automatic; frontier access could be rationed by trust, money, compute and geopolitics. https://writing.antonleicht.iss.one/p/cut-off
COMPUTER SCIENCE
1. Zero-knowledge proofs meet Gödelian unknowability. https://www.quantamagazine.org/how-unknowable-math-can-help-hide-secrets-20260511/
2. Air-gapped data exfiltration via CPU workloads and magnetic signals. https://arxiv.org/abs/1802.02700
BEHAVIOR
1. Selected-for vs intentional, world-model-based behavior. https://www.lesswrong.com/posts/GhhNswGB6butBhmE6/optimisation-selective-versus-predictive
2. Sawtooth Problems. https://www.lesswrong.com/posts/iyLirpAeQotmZK4QC/sawtooth-problems
ENGINEERING
1. Organ-scale rewarming for reversible cryopreservation using alternating magnetic fields. https://www.untillabs.com/blog/rewarming
2. Fiber optic cables can eavesdrop on nearby conversations. https://www.science.org/content/article/fiber-optic-cables-can-eavesdrop-nearby-conversations
3. Rapid atomic rearrangement to “reprogram” materials. https://news.mit.edu/2026/researchers-reprogram-materials-quickly-rearranging-their-atoms-0513
PHYSICS
1. “Negative time”: postselected photons through atom clouds can yield a backward-pointing weak-measurement clock reading. https://singularityhub.com/2026/05/14/physicists-have-measured-negative-time-in-the-lab/
2. Largest physicist survey: deep disagreement on the Big Bang, quantum measurement, many worlds, string theory and foundations. [PDF] https://nafshordi.github.io/aps-dashboard/APS_survey_Arxiv_paper.pdf
MISCELLANEOUS
1. Nostalgebraist’s theory of taste. https://www.astralcodexten.com/p/nostalgebraists-hydrogen-jukeboxes
2. Three easy proofs of Pythagoras’ Theorem. https://cameroncounts.wordpress.com/2026/05/08/three-easy-proofs-of-pythagoras-theorem/
🤡2❤1
To build intuition: suppose you walk past a geyser, and see a sign saying “This geyser last erupted 100,000 years ago”. You know nothing else about geysers. What’s the chance it will erupt in the next hour? It must be very low, right? If it erupted in the next hour, you would have walked past it 99.99999% of the way through its eruption cycle - in other words, your random sample had a higher value than 99.99999% of points. That’s not how random samples usually work! On the other hand, suppose you walk past another geyser, and see a sign saying “This geyser last erupted 10 minutes ago”. What is the chance that this geyser will erupt in the next hour? Pretty high, right? It seems like this geyser’s eruptions occur on a scale of every few minutes. When you calculate it out, your median prediction for the length of time until the next eruption should just be the number on the sign. In the same way, your median prediction for how long it should take before an entirely-mysterious trend changes shape should be the amount of time since the last change.
The Sigmoids Won’t Save You https://www.astralcodexten.com/p/the-sigmoids-wont-save-you
🤡2
Desire for children has collapsed among young people in China.
Figure: Percentage of Young adults 18-24 who desire no children, total and by gender
Paper: https://www.researchsquare.com/article/rs-8921502/v1
Thread: https://x.com/i/status/2055228466069385368
Figure: Percentage of Young adults 18-24 who desire no children, total and by gender
Paper: https://www.researchsquare.com/article/rs-8921502/v1
Thread: https://x.com/i/status/2055228466069385368
👍4
Claude Mythos looks like a major capability jump on ExploitBench.
The benchmark tests whether AI models can take known bugs in Chrome’s V8 engine and turn them into working exploits. V8 is widely used and heavily defended, so this is a serious target.
Most tested models never reached the top level: full arbitrary code execution. GPT-5.5 reached it only in a tiny number of cases.
Claude Mythos Preview was in a different league: 18/41 bugs in the baseline setting, and 21/41 including nudges.
The striking part: the baseline did not use a polished agent product like Claude Code or Codex, just a minimal uniform tool harness. So this seems to reflect the model’s own reasoning more than a fancy wrapper.
Read more: https://exploitbench.ai/
The benchmark tests whether AI models can take known bugs in Chrome’s V8 engine and turn them into working exploits. V8 is widely used and heavily defended, so this is a serious target.
Most tested models never reached the top level: full arbitrary code execution. GPT-5.5 reached it only in a tiny number of cases.
Claude Mythos Preview was in a different league: 18/41 bugs in the baseline setting, and 21/41 including nudges.
The striking part: the baseline did not use a polished agent product like Claude Code or Codex, just a minimal uniform tool harness. So this seems to reflect the model’s own reasoning more than a fancy wrapper.
Read more: https://exploitbench.ai/
🤡2👍1🤯1😢1💔1