ml4se
501 subscribers
448 photos
1 file
526 links
Machine Learning for Software Engineering
Download Telegram
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts

The authors presented RE-Bench, a suite of environments that measure the ability of AI agents to automate AI R&D tasks. They compare humans to several public frontier models through best-of-k with varying time budgets and agent designs, and find that the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. However, humans currently display better returns to increasing time budgets, narrowly exceeding the top AI agent scores given an 8-hour budget, and achieving 2x the score of the top AI agent when both are given 32 total hours.
👍2
Salesforce Will Hire No More Software Engineers in 2025

Salesforce will not be hiring any more software engineers in 2025 amid significant productivity boosts from AI, Marc Benioff has revealed.

“We’re not adding any more software engineers next year because we have increased the productivity this year with Agentforce and with other AI technology that we’re using for engineering teams by more than 30% – to the point where our engineering velocity is incredible. I can’t believe what we’re achieving in engineering.

“And then, we will have less support engineers next year because we have an agentic layer. We will have more salespeople next year because we really need to explain to people exactly the value that we can achieve with AI. So, we will probably add another 1,000 to 2,000 salespeople in the short term.”
👍31🔥1👏1
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

The paper investigates _subliminal learning_, a phenomenon where language models transmit behavioral traits (e.g., animal preferences or misalignment) through generated data that is semantically unrelated to those traits. Experiments show that training a student model on a teacher's number sequences, code, or reasoning traces can cause the student to adopt the teacher's traits, even after rigorous data filtering.

The findings highlight a potential risk in AI development, where unintended traits could be inadvertently propagated through model distillation. If a model becomes misaligned, then data generated by this model might transmit misalignment to other models, even if developers are careful to remove overt signs of misalignment from the data.
🔥5🤯3👍1🤔1
Looks promising. We'll see how it goes

https://nof1.ai/
Open-source has continued to trail frontier, closed-source models in performance by nine to 12 months

Open-source models offer clear enterprise advantages: greater customization, potential cost savings, and the ability to deploy within private cloud or on-premises environments. But despite these benefits and recent improvements, open-source has continued to trail frontier, closed-source models in performance by nine to 12 months.
AI is making us work more

The article highlights the paradox that AI tools, designed to increase efficiency, are instead fueling a culture of overwork. With systems available 24/7, a psychological pressure emerges where any moment not spent being "productive" feels like falling behind. This mirrors historical shifts, like artificial lighting, which turned the ability to work longer into an obligation.

Personally, I find that while I constantly use AI and accomplish more tasks, I don't work any less—and might even be working more than before.
👍2😢1
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

The results demonstrate that the Assistant persona in LLMs corresponds to a specific linear direction—the "Assistant Axis"—within activation space. This axis is inherited from base models and encodes Assistant-like properties. The model's position on this axis is fragile: it can be perturbed by intentional prompts or through organic conversation. Understanding and controlling such personas is key to ensuring reliable model behavior, and the analysis shows that inspecting model internals is an effective approach for this task.

The model can unintentionally shift along this axis during dialogue, moving away from the Assistant role. This drift correlates with harmful or bizarre behavior (e.g., support for suicidal ideation, reinforcement of delusional ideas).
Robots need your body

AI agents rent humans to perform tasks in the physical world. Users register, list their skills and rates, and receive assignments from AIs—ranging from errands and shopping to equipment testing.