BREAKING π¨: A new Gemini checkpoint has been spotted in A/B testing.
Will we see this live? π
h/t x@marmaduke091
Will we see this live? π
h/t x@marmaduke091
π₯14π3 3
Who will crash benchmarks this week?
Anonymous Poll
22%
49%
31%
11%
BREAKING π¨: OPENAI ANNOUNCED OPENAI FRONTIER, A NEW ENTERPRISE PLATFORM TO CREATE AND MANAGE AI COWORKERS.
"Frontier gives agents the same skills people need to succeed at work: Understand how work gets done, Use a computer and tools, Improve quality over time, Stay governed & observable"
The biggest part π
"Built-in ways to evaluate and optimise performance make it clear to human managers and AI coworkers whatβs working and what isnβt, so good behaviours improve over time. Over time, AI coworkers learn what good looks like and get better at the work that matters most."
"Frontier gives agents the same skills people need to succeed at work: Understand how work gets done, Use a computer and tools, Improve quality over time, Stay governed & observable"
The biggest part π
"Built-in ways to evaluate and optimise performance make it clear to human managers and AI coworkers whatβs working and what isnβt, so good behaviours improve over time. Over time, AI coworkers learn what good looks like and get better at the work that matters most."
π₯5π4 1
BREAKING π¨: A BIG DROP IS EXPECTED FOR CODEX TODAY! CODEX GITHUB ALSO DOESNβT STATE βLATESTβ NEXT TO GPT-5.2 ANYMORE.
β€7 4π3
TestingCatalog AI News π
Perplexity launches Advanced Deep Research for Max users Perplexity launched the DRACO Benchmark to publicly assess AI research tools on real-world tasks across ten domains. It measures accuracy, depth, presentation, and sourcing, with initial results showingβ¦
This media is not supported in your browser
VIEW IN TELEGRAM
BREAKING π¨: PERPLEXITY LAUNCHES MODEL COUNCIL, A NEW MODE WHERE GEMINI 3 PRO, OPUS 4.5 AND GPT 5.2 WILL WORK AS A SWARM OF ASYNC AGENTS ON A GIVEN TASK.
Perplexity MAX π
Perplexity MAX π
β€5π1
TestingCatalog AI News π
BREAKING π¨: CLAUDE OPUS 4.6 IS ROLLING OUT ON THE WEB, APPS AND DESKTOP! TESTING TIME π₯
This media is not supported in your browser
VIEW IN TELEGRAM
BREAKING π¨: Claude Opus 4.6 has been officially announced. Opus 4.6 comes with an improved performance across various agentic, reasearch and coding tasks.
What would you test first? π
What would you test first? π
β€3π1
TestingCatalog AI News π
BREAKING π¨: Claude Opus 4.6 has been officially announced. Opus 4.6 comes with an improved performance across various agentic, reasearch and coding tasks. What would you test first? π
Opus 4.6 comes with a big improvement at Agentic Search, Agentic financial analysis and Office tasks.
"Financial professionals use AI to research across multiple data sources, support financial analyses, and create deliverables that their teams and customers can act on."
"Financial professionals use AI to research across multiple data sources, support financial analyses, and create deliverables that their teams and customers can act on."
β€2π1
TestingCatalog AI News π
BREAKING π¨: GPT-5.3-CODEX IS ROLLING OUT ON CODEX CLI AND DESKTOP APP! COMPETITION AT SCALE π₯
BREAKING π¨: GPTβ5.3βCODEX WAS USED TO SUPPORT CREATING ITSELF, ACCORDING TO OPENAI'S BLOG!
It achieves SOTA score of 57% at SWE Bench Pro and 76% on TerminalBench.
"With GPTβ5.3-Codex, Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer."
It achieves SOTA score of 57% at SWE Bench Pro and 76% on TerminalBench.
"With GPTβ5.3-Codex, Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer."
β€4π1π₯1
TestingCatalog AI News π
BREAKING π¨: GPTβ5.3βCODEX WAS USED TO SUPPORT CREATING ITSELF, ACCORDING TO OPENAI'S BLOG! It achieves SOTA score of 57% at SWE Bench Pro and 76% on TerminalBench. "With GPTβ5.3-Codex, Codex goes from an agent that can write and review code to an agentβ¦
OpenAI opens up Trusted Access framework to accelerate cyber defence.
GPT-5.3-Codex was the first model to hit a "High" on OpenAI's preparedness framework.
Shit is about to get real π
GPT-5.3-Codex was the first model to hit a "High" on OpenAI's preparedness framework.
Shit is about to get real π
β€6π6π€3
TestingCatalog AI News π
BREAKING π¨: CLAUDE OPUS 4.6 IS ROLLING OUT ON THE WEB, APPS AND DESKTOP! TESTING TIME π₯
Claude subscribers can claim $50 worth of credits for TESTING Claude Opus 4.6!
Claim it π
Claim it π
β€8π₯5π3
TestingCatalog AI News π
Opus 4.6 comes with a big improvement at Agentic Search, Agentic financial analysis and Office tasks. "Financial professionals use AI to research across multiple data sources, support financial analyses, and create deliverables that their teams and customersβ¦
Claude Opus 4.6 is a new SOTA model on ARC-AGI-2 benchmark with 68.8% achievement.
The next leap π
The next leap π
π₯5π2
Anthropic readies upgraded Claude voice mode for desktop
Anthropic is testing upgraded voice functionality for Claude across web and mobile, alongside a new knowledge base feature to organize and retain conversation context. A near-term launch is possible, potentially aligning with upcoming marketing efforts.
π #claude
Anthropic is testing upgraded voice functionality for Claude across web and mobile, alongside a new knowledge base feature to organize and retain conversation context. A near-term launch is possible, potentially aligning with upcoming marketing efforts.
π #claude
TestingCatalog
Anthropic readies upgraded voice mode for Claude desktop
Anthropic is testing upgraded voice mode for the desktop app and a new knowledge base feature for Claude.
β€2π2
TestingCatalog AI News π
Anthropic readies upgraded Claude voice mode for desktop Anthropic is testing upgraded voice functionality for Claude across web and mobile, alongside a new knowledge base feature to organize and retain conversation context. A near-term launch is possibleβ¦
Media is too big
VIEW IN TELEGRAM
BREAKING π¨: Anthropic prepares an upgraded voice mode for Claude desktop and mobile!
Here is an early look at how it works π
Here is an early look at how it works π
β€7π3
TestingCatalog AI News π
BREAKING π¨: Anthropic prepares an upgraded voice mode for Claude desktop and mobile! Here is an early look at how it works π
Anthropic keeps working on Knowledge Bases, as a new "Save to knowledge base" button has been spotted in testing. Isn't this a continuous learning solution?
Save button triggers this prompt π
Save button triggers this prompt π
π6β€2π₯2
It turns out that Telegram boosts vaporise quite quickly. 5 are missing until level 3 to enable auto translations and 5 more will be lost later in February.
Do you have these sparks?β‘
https://t.iss.one/boost/testingcatalog
Do you have these sparks?
https://t.iss.one/boost/testingcatalog
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
TestingCatalog AI News π
Boost this channel to help it unlock additional features.
OpenAI debuts Frontier to deploy AI agents for enterprise users
OpenAI launched Frontier, an enterprise platform to build and manage AI agents as coworkers across business systems. It integrates existing tools and data sources and supports feedback-driven learning.
π #chatgpt
OpenAI launched Frontier, an enterprise platform to build and manage AI agents as coworkers across business systems. It integrates existing tools and data sources and supports feedback-driven learning.
π #chatgpt
TestingCatalog
OpenAI debuts Frontier to deploy AI agents for enterprise users
OpenAI launches Frontier, a new enterprise platform for deploying AI coworkers across real business systems, now available to select customers with partners onboard.