ML Research Hub

✨BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

📝 Summary:
Lexical LLM evaluation is rigid and inaccurate, while LLM-as-a-Judge is expensive. This paper introduces BERT-as-a-Judge, a robust, scalable encoder-driven method for reference-based LLM evaluation. It performs like larger LLM judges but with lower cost.

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09497
• PDF: https://arxiv.org/pdf/2604.09497
• Github: https://github.com/artefactory/BERT-as-a-Judge

🔹 Models citing this paper:
• https://huggingface.co/artefactory/BERTJudge
• https://huggingface.co/artefactory/BERTJudge-Formatted-QCR-500k
• https://huggingface.co/artefactory/BERTJudge-Formatted-QCR-OOD

✨ Datasets citing this paper:
• https://huggingface.co/datasets/artefactory/BERTJudge-Dataset

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMEvaluation #BERT #NLP #AIResearch #MachineLearning

arXiv.org

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for...

Accurate evaluation is central to the large language model (LLM) ecosystem, guiding model selection and downstream adoption across diverse use cases. In practice, however, evaluating generative...

199 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Spatial Competence Benchmark

📝 Summary:
Three frontier models show declining accuracy on a new spatial competence benchmark, with performance saturating quickly under token budget constraints. AI-generated summary Spatial competence is the ...

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09594
• PDF: https://arxiv.org/pdf/2604.09594
• Github: https://github.com/ashleyharris-maptek-com-au/SpatialCompetenceBenchmark

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

226 views13:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

📝 Summary:
Current OCR models poorly generalize across diverse scripts. GlotOCR Bench, a new benchmark for over 100 Unicode scripts, reveals most models perform well on under ten scripts. Generalization is limited and strongly depends on pretraining coverage.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12978
• PDF: https://arxiv.org/pdf/2604.12978
• Project Page: https://huggingface.co/datasets/cis-lmu/GlotOCR-bench
• Github: https://github.com/cisnlp/glotocr-bench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#OCR #NLP #MultilingualAI #Benchmarking #AIResearch

195 views15:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

📝 Summary:
Language-Agnostic Semantic Alignment (LASA) addresses LLM safety gaps across languages by targeting semantic bottlenecks where representations are primarily driven by shared semantics rather than lang...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12710
• PDF: https://arxiv.org/pdf/2604.12710

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

204 views16:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PokeRL: Reinforcement Learning for Pokemon Red

📝 Summary:
PokeRL presents a modular reinforcement learning system with environment wrapping, anti-loop mechanisms, and hierarchical rewards to train agents for early-game Pokemon Red tasks. AI-generated summary...

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10812
• PDF: https://arxiv.org/pdf/2604.10812
• Github: https://github.com/reddheeraj/PokemonRL

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

207 views16:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Learning Versatile Humanoid Manipulation with Touch Dreaming

📝 Summary:
A multimodal Transformer architecture that integrates tactile sensing with visual and proprioceptive data enables high-dexterity humanoid manipulation through contact-aware learning and predictive mod...

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13015
• PDF: https://arxiv.org/pdf/2604.13015
• Project Page: https://humanoid-touch-dream.github.io/
• Github: https://github.com/chrisyrniu/humanoid-touch-dream

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

219 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding

📝 Summary:
This research examines how internal reasoning traces affect video scene understanding in Gemini models. Quality improvements from extended reasoning plateau quickly, with Flash Lite offering the best balance. Tight reasoning budgets can lead to content not reasoned about.

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11177
• PDF: https://arxiv.org/pdf/2604.11177
• Project Page: https://github.com/video-db/gemini-reasoning-eval
• Github: https://github.com/video-db/gemini-reasoning-eval

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

199 views18:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Parcae: Scaling Laws For Stable Looped Language Models

📝 Summary:
Looped architectures can improve model quality but suffer from instability. Parcae, a new stable looped architecture, addresses this by constraining spectral norms. It achieves up to 6.3% lower perplexity and shows superior scaling properties, matching the quality of much larger Transformers.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12946
• PDF: https://arxiv.org/pdf/2604.12946
• Project Page: https://sandyresearch.github.io/parcae/
• Github: https://github.com/sandyresearch/parcae/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

192 views20:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

📝 Summary:
Computer-use agents face significant safety vulnerabilities under unintended attack conditions where benign instructions lead to harmful outcomes through contextual or execution-based risks, with atta...

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10577
• PDF: https://arxiv.org/pdf/2604.10577
• Project Page: https://limenlp.github.io/OS_Blind/
• Github: https://github.com/limenlp/OS_Blind

✨ Datasets citing this paper:
• https://huggingface.co/datasets/lime-nlp/OS-Blind

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

208 views21:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Seedance 2.0: Advancing Video Generation for World Complexity

📝 Summary:
Seedance 2.0 is a new multi-modal audio-video generation model supporting text, image, audio, and video inputs. It offers improved generation quality and speed through a unified architecture, performing on par with leading models. It generates 4-15 second content at 480p/720p.

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14148
• PDF: https://arxiv.org/pdf/2604.14148

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

145 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

📝 Summary:
A multi-agent system automates the complete lifecycle of large language model training by coordinating research and execution modules through iterative planning and experimentation. AI-generated summa...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14116
• PDF: https://arxiv.org/pdf/2604.14116
• Project Page: https://github.com/trex-project

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

96 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

📝 Summary:
OccuBench presents a comprehensive benchmark for evaluating AI agents across 100 professional domains using Language World Models to simulate real-world environments with controlled fault injection. A...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10866
• PDF: https://arxiv.org/pdf/2604.10866
• Project Page: https://gregxmhu.github.io/OccuBench-website/
• Github: https://github.com/GregxmHu/OccuBench

✨ Datasets citing this paper:
• https://huggingface.co/datasets/gregH/OccuBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

94 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

📝 Summary:
UI-Zoomer is a training-free adaptive zoom-in framework for GUI grounding that improves localization accuracy by selectively triggering zoom-in based on prediction uncertainty quantification. AI-gener...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14113
• PDF: https://arxiv.org/pdf/2604.14113
• Project Page: https://zju-real.github.io/UI-Zoomer/
• Github: https://github.com/ZJU-REAL/UI-Zoomer

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

109 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ROSE: Retrieval-Oriented Segmentation Enhancement

📝 Summary:
A new segmentation task focusing on novel and emerging entities is introduced along with a retrieval-augmented framework that enhances multimodal language models with real-time information and visual ...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14147
• PDF: https://arxiv.org/pdf/2604.14147
• Project Page: https://henghuiding.com/ROSE/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

114 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

📝 Summary:
InfiniteScienceGym presents a procedurally generated benchmark for evaluating scientific reasoning in language models, addressing limitations of traditional benchmarks through deterministic repository...

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13201
• PDF: https://arxiv.org/pdf/2604.13201

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

105 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

📝 Summary:
A self-distillation framework converts implicit 3D knowledge from video diffusion models into an explicit 3D Gaussian Splatting representation, enabling 3D scene generation from text or images. AI-gen...

🔹 Publication Date: Published on Sep 23, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.19296
• PDF: https://arxiv.org/pdf/2509.19296
• Project Page: https://research.nvidia.com/labs/toronto-ai/lyra/
• Github: https://github.com/nv-tlabs/lyra

🔹 Models citing this paper:
• https://huggingface.co/nvidia/Lyra

✨ Datasets citing this paper:
• https://huggingface.co/datasets/nvidia/PhysicalAI-SpatialIntelligence-Lyra-SDG

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

115 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

📝 Summary:
SpatialEvo is a self-evolving framework for 3D spatial reasoning that uses deterministic geometric environments to provide objective feedback, enabling efficient training without relying on model cons...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14144
• PDF: https://arxiv.org/pdf/2604.14144
• Github: https://github.com/ZJU-REAL/SpatialEvo

🔹 Models citing this paper:
• https://huggingface.co/lidingm/SpatialEvo-3B
• https://huggingface.co/lidingm/SpatialEvo-7B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/lidingm/SpatialEvo-160K

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

101 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TIP: Token Importance in On-Policy Distillation

📝 Summary:
On-policy knowledge distillation token selection methods are improved by identifying informative tokens through student entropy and teacher-student divergence, enabling efficient training with reduced...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14084
• PDF: https://arxiv.org/pdf/2604.14084
• Github: https://github.com/HJSang/OPSD_OnPolicyDistillation

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

108 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments

📝 Summary:
MERRIN is a human-annotated benchmark for evaluating search-augmented agents in multimodal, noisy web environments, demonstrating significant challenges in retrieving and reasoning over diverse eviden...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13418
• PDF: https://arxiv.org/pdf/2604.13418
• Project Page: https://merrin-benchmark.github.io
• Github: https://merrin-benchmark.github.io

✨ Datasets citing this paper:
• https://huggingface.co/datasets/HanNight/MERRIN

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

118 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

📝 Summary:
UI-Copilot is a collaborative framework that enhances GUI agents by decoupling memory management and integrating on-demand tool assistance for improved performance in complex user interface tasks. AI-...

🔹 Publication Date: Published on Apr 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13822
• PDF: https://arxiv.org/pdf/2604.13822
• Github: https://github.com/ZJU-REAL/UI-Copilot

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

146 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

📝 Summary:
Training reward models to generate multi-dimensional critiques improves visual generation through both enhanced reinforcement learning rewards and test-time refinement loops, achieving state-of-the-ar...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11626
• PDF: https://arxiv.org/pdf/2604.11626
• Project Page: https://tiger-ai-lab.github.io/RationalRewards/
• Github: https://github.com/TIGER-AI-Lab/RationalRewards

🔹 Models citing this paper:
• https://huggingface.co/TIGER-Lab/RationalRewards-8B-T2I
• https://huggingface.co/TIGER-Lab/RationalRewards-8B-Edit

✨ Datasets citing this paper:
• https://huggingface.co/datasets/TIGER-Lab/RationalRewards_DiffusionNFT_TrainData

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

117 views04:01

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform