✨Sliding Window Attention Adaptation
📝 Summary:
Sliding Window Attention Adaptation SWAA allows pretrained LLMs to use efficient sliding window attention for long contexts without retraining. SWAA combines five adaptation methods, with specific synergistic combinations effectively recovering original long-context performance.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10411
• PDF: https://arxiv.org/pdf/2512.10411
🔹 Models citing this paper:
• https://huggingface.co/yuyijiong/Qwen3-SWA-adaptation
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yuyijiong/LongMemEval_24k
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SlidingWindowAttention #LongContextAI #NLP #AIResearch
📝 Summary:
Sliding Window Attention Adaptation SWAA allows pretrained LLMs to use efficient sliding window attention for long contexts without retraining. SWAA combines five adaptation methods, with specific synergistic combinations effectively recovering original long-context performance.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10411
• PDF: https://arxiv.org/pdf/2512.10411
🔹 Models citing this paper:
• https://huggingface.co/yuyijiong/Qwen3-SWA-adaptation
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yuyijiong/LongMemEval_24k
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SlidingWindowAttention #LongContextAI #NLP #AIResearch
❤2
✨Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems
📝 Summary:
CJE improves LLM-as-judge evaluation by fixing statistical issues like uncalibrated scores and poor confidence intervals. It achieves 99% ranking accuracy at 14x lower cost by calibrating a cheaper judge with 5% oracle labels.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11150
• PDF: https://arxiv.org/pdf/2512.11150
• Project Page: https://www.cimolabs.com/cje
• Github: https://github.com/cimo-labs/cje
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIEvaluation #MachineLearning #DataScience #NLP
📝 Summary:
CJE improves LLM-as-judge evaluation by fixing statistical issues like uncalibrated scores and poor confidence intervals. It achieves 99% ranking accuracy at 14x lower cost by calibrating a cheaper judge with 5% oracle labels.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11150
• PDF: https://arxiv.org/pdf/2512.11150
• Project Page: https://www.cimolabs.com/cje
• Github: https://github.com/cimo-labs/cje
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIEvaluation #MachineLearning #DataScience #NLP
✨VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs
📝 Summary:
Voyager is a novel, training-free method that iteratively generates diverse synthetic datasets from LLMs. It uses determinantal point processes to optimize diversity, significantly outperforming baselines with a 1.5-3x improvement.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12072
• PDF: https://arxiv.org/pdf/2512.12072
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SyntheticData #DataScience #MachineLearning #AI
📝 Summary:
Voyager is a novel, training-free method that iteratively generates diverse synthetic datasets from LLMs. It uses determinantal point processes to optimize diversity, significantly outperforming baselines with a 1.5-3x improvement.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12072
• PDF: https://arxiv.org/pdf/2512.12072
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SyntheticData #DataScience #MachineLearning #AI
❤2
✨FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition
📝 Summary:
FiNERweb is a new pipeline that scales multilingual Named Entity Recognition dataset creation to 91 languages using LLMs. It produces 225k high-quality passages, enabling models to achieve comparable or improved zero-shot performance with 19x less data.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13884
• PDF: https://arxiv.org/pdf/2512.13884
• Github: https://github.com/whoisjones/FiNERweb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#NER #NLP #LLMs #MultilingualAI #Datasets
📝 Summary:
FiNERweb is a new pipeline that scales multilingual Named Entity Recognition dataset creation to 91 languages using LLMs. It produces 225k high-quality passages, enabling models to achieve comparable or improved zero-shot performance with 19x less data.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13884
• PDF: https://arxiv.org/pdf/2512.13884
• Github: https://github.com/whoisjones/FiNERweb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#NER #NLP #LLMs #MultilingualAI #Datasets
❤1
✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
❤1
✨Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning
📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning
❤1
✨Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives
📝 Summary:
This study explores syllogistic reasoning in LLMs, examining both symbolic inference and natural language understanding. Some models achieve perfect symbolic performance, leading to questions about whether LLMs are becoming more formal reasoning mechanisms.
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12620
• PDF: https://arxiv.org/pdf/2512.12620
• Github: https://github.com/XAheli/Logic-in-LLMs
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SyllogisticReasoning #NaturalLanguageProcessing #AIResearch #FormalLogic
📝 Summary:
This study explores syllogistic reasoning in LLMs, examining both symbolic inference and natural language understanding. Some models achieve perfect symbolic performance, leading to questions about whether LLMs are becoming more formal reasoning mechanisms.
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12620
• PDF: https://arxiv.org/pdf/2512.12620
• Github: https://github.com/XAheli/Logic-in-LLMs
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SyllogisticReasoning #NaturalLanguageProcessing #AIResearch #FormalLogic
✨Scaling Laws for Code: Every Programming Language Matters
📝 Summary:
This paper explores scaling laws for multilingual code pre-training, finding interpreted languages benefit more from scaling. It proposes an optimal token allocation strategy for programming languages based on utility and synergy, outperforming uniform distribution.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13472
• PDF: https://arxiv.org/pdf/2512.13472
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CodeAI #MachineLearning #ProgrammingLanguages #ScalingLaws #LLMs
📝 Summary:
This paper explores scaling laws for multilingual code pre-training, finding interpreted languages benefit more from scaling. It proposes an optimal token allocation strategy for programming languages based on utility and synergy, outperforming uniform distribution.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13472
• PDF: https://arxiv.org/pdf/2512.13472
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CodeAI #MachineLearning #ProgrammingLanguages #ScalingLaws #LLMs
✨SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch
📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch
❤2
✨Scaling Open-Ended Reasoning to Predict the Future
📝 Summary:
This work trains language models for open-ended future prediction using a new dataset synthesized from news. Their OpenForecaster 8B model matches larger proprietary models in accuracy, calibration, and consistency. All resources are open-sourced.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25070
• PDF: https://arxiv.org/pdf/2512.25070
• Project Page: https://www.openforecaster.github.io
• Github: https://github.com/OpenForecaster/scaling-forecasting-training
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #FuturePrediction #AI #OpenSourceAI #MachineLearning
📝 Summary:
This work trains language models for open-ended future prediction using a new dataset synthesized from news. Their OpenForecaster 8B model matches larger proprietary models in accuracy, calibration, and consistency. All resources are open-sourced.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25070
• PDF: https://arxiv.org/pdf/2512.25070
• Project Page: https://www.openforecaster.github.io
• Github: https://github.com/OpenForecaster/scaling-forecasting-training
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #FuturePrediction #AI #OpenSourceAI #MachineLearning
✨Recursive Language Models
📝 Summary:
Recursive Language Models RLMs allow LLMs to process arbitrarily long prompts. RLMs programmatically decompose prompts and recursively call the LLM over snippets. This extends input length 100x and improves performance, even for shorter prompts, at similar cost.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24601
• PDF: https://arxiv.org/pdf/2512.24601
• Github: https://github.com/alexzhang13/rlm/tree/main
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AI #NLP #RecursiveLMs #LongContext
📝 Summary:
Recursive Language Models RLMs allow LLMs to process arbitrarily long prompts. RLMs programmatically decompose prompts and recursively call the LLM over snippets. This extends input length 100x and improves performance, even for shorter prompts, at similar cost.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24601
• PDF: https://arxiv.org/pdf/2512.24601
• Github: https://github.com/alexzhang13/rlm/tree/main
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AI #NLP #RecursiveLMs #LongContext
❤1
✨Steerability of Instrumental-Convergence Tendencies in LLMs
📝 Summary:
This research investigates AI system steerability, noting a safety-security dilemma. It demonstrates that a short anti-instrumental prompt suffix dramatically reduces unwanted instrumental behaviors, like self-replication, in large language models. For Qwen3-30B, this reduced the convergence rate...
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01584
• PDF: https://arxiv.org/pdf/2601.01584
• Github: https://github.com/j-hoscilowicz/instrumental_steering/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AISafety #LLMs #AISteering #PromptEngineering #AIAlignment
📝 Summary:
This research investigates AI system steerability, noting a safety-security dilemma. It demonstrates that a short anti-instrumental prompt suffix dramatically reduces unwanted instrumental behaviors, like self-replication, in large language models. For Qwen3-30B, this reduced the convergence rate...
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01584
• PDF: https://arxiv.org/pdf/2601.01584
• Github: https://github.com/j-hoscilowicz/instrumental_steering/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AISafety #LLMs #AISteering #PromptEngineering #AIAlignment
✨Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
📝 Summary:
A case study of four LLM agent attempts to autonomously generate ML research papers reveals six recurring failure modes. Most attempts failed, though one was accepted to a special AI-first author venue, leading to proposed design principles for future AI-scientist systems.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03315
• PDF: https://arxiv.org/pdf/2601.03315
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIResearch #MachineLearning #AIAgents #AutonomousSystems
📝 Summary:
A case study of four LLM agent attempts to autonomously generate ML research papers reveals six recurring failure modes. Most attempts failed, though one was accepted to a special AI-first author venue, leading to proposed design principles for future AI-scientist systems.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03315
• PDF: https://arxiv.org/pdf/2601.03315
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIResearch #MachineLearning #AIAgents #AutonomousSystems
👍1
✨One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling
📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning
📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning
❤1
✨FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
📝 Summary:
FAPO improves reinforcement learning for LLMs by penalizing flawed-positive rollouts that reinforce unreliable reasoning. It uses these flaws for initial gains while shifting optimization toward reliable reasoning, enhancing correctness and stability.
🔹 Publication Date: Published on Oct 26, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22543
• PDF: https://arxiv.org/pdf/2510.22543
• Project Page: https://fapo-rl.github.io/
• Github: https://fapo-rl.github.io
🔹 Models citing this paper:
• https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B
• https://huggingface.co/dyyyyyyyy/FAPO-32B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #AI #MachineLearning #Reasoning
📝 Summary:
FAPO improves reinforcement learning for LLMs by penalizing flawed-positive rollouts that reinforce unreliable reasoning. It uses these flaws for initial gains while shifting optimization toward reliable reasoning, enhancing correctness and stability.
🔹 Publication Date: Published on Oct 26, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22543
• PDF: https://arxiv.org/pdf/2510.22543
• Project Page: https://fapo-rl.github.io/
• Github: https://fapo-rl.github.io
🔹 Models citing this paper:
• https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B
• https://huggingface.co/dyyyyyyyy/FAPO-32B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #AI #MachineLearning #Reasoning
✨Distilling Feedback into Memory-as-a-Tool
📝 Summary:
This framework converts transient critiques into retrievable guidelines using a file-based memory system and agent tools. It enables LLMs to achieve test-time refinement performance with significantly reduced inference costs.
🔹 Publication Date: Published on Jan 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05960
• PDF: https://arxiv.org/pdf/2601.05960
• Github: https://github.com/vicgalle/feedback-memory-as-a-tool
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIAgents #MemorySystems #AIResearch #MachineLearning
📝 Summary:
This framework converts transient critiques into retrievable guidelines using a file-based memory system and agent tools. It enables LLMs to achieve test-time refinement performance with significantly reduced inference costs.
🔹 Publication Date: Published on Jan 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05960
• PDF: https://arxiv.org/pdf/2601.05960
• Github: https://github.com/vicgalle/feedback-memory-as-a-tool
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIAgents #MemorySystems #AIResearch #MachineLearning
✨Afri-MCQA: Multimodal Cultural Question Answering for African Languages
📝 Summary:
Afri-MCQA is the first multimodal cultural QA benchmark for 15 African languages. It shows open-weight LLMs perform poorly, particularly with native language speech and cultural contexts. This highlights the need for speech-first, culturally grounded AI development.
🔹 Publication Date: Published on Jan 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05699
• PDF: https://arxiv.org/pdf/2601.05699
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Atnafu/Afri-MCQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AfricanLanguages #MultimodalAI #LLMs #CulturalAI #SpeechAI
📝 Summary:
Afri-MCQA is the first multimodal cultural QA benchmark for 15 African languages. It shows open-weight LLMs perform poorly, particularly with native language speech and cultural contexts. This highlights the need for speech-first, culturally grounded AI development.
🔹 Publication Date: Published on Jan 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05699
• PDF: https://arxiv.org/pdf/2601.05699
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Atnafu/Afri-MCQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AfricanLanguages #MultimodalAI #LLMs #CulturalAI #SpeechAI
✨EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs
📝 Summary:
LLM self-training improves reasoning but causes overconfidence. EpiCaR solves this by jointly optimizing reasoning performance and calibration through epistemic learning and self-evaluation. It achieves better accuracy and calibration, reduces inference compute by 3X, and generalizes well to new ...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06786
• PDF: https://arxiv.org/pdf/2601.06786
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AI #MachineLearning #Reasoning #Calibration
📝 Summary:
LLM self-training improves reasoning but causes overconfidence. EpiCaR solves this by jointly optimizing reasoning performance and calibration through epistemic learning and self-evaluation. It achieves better accuracy and calibration, reduces inference compute by 3X, and generalizes well to new ...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06786
• PDF: https://arxiv.org/pdf/2601.06786
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AI #MachineLearning #Reasoning #Calibration
✨LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm
📝 Summary:
LoongFlow is a self-evolving agent that integrates LLMs into a cognitive Plan-Execute-Summarize PES paradigm for directed evolutionary search. It prevents premature convergence by balancing exploration and exploitation with a hybrid memory system. LoongFlow achieves superior solutions 60% more ef...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24077
• PDF: https://arxiv.org/pdf/2512.24077
• Project Page: https://github.com/baidu-baige/LoongFlow
• Github: https://github.com/baidu-baige/LoongFlow
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#EvolutionarySearch #LLMs #CognitiveAI #AIAgents #Optimization
📝 Summary:
LoongFlow is a self-evolving agent that integrates LLMs into a cognitive Plan-Execute-Summarize PES paradigm for directed evolutionary search. It prevents premature convergence by balancing exploration and exploitation with a hybrid memory system. LoongFlow achieves superior solutions 60% more ef...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24077
• PDF: https://arxiv.org/pdf/2512.24077
• Project Page: https://github.com/baidu-baige/LoongFlow
• Github: https://github.com/baidu-baige/LoongFlow
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#EvolutionarySearch #LLMs #CognitiveAI #AIAgents #Optimization
✨Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing
📝 Summary:
This paper introduces an LLM-based approach to interpret natural language hints for cluster workload allocation. It achieved over 95% accuracy and improved placement compared to traditional methods, simplifying workload orchestration.
🔹 Publication Date: Published on Jan 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09282
• PDF: https://arxiv.org/pdf/2601.09282
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ClusterAllocation #NLP #LLMs #WorkloadOrchestration #AIResearch
📝 Summary:
This paper introduces an LLM-based approach to interpret natural language hints for cluster workload allocation. It achieved over 95% accuracy and improved placement compared to traditional methods, simplifying workload orchestration.
🔹 Publication Date: Published on Jan 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09282
• PDF: https://arxiv.org/pdf/2601.09282
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ClusterAllocation #NLP #LLMs #WorkloadOrchestration #AIResearch
❤1