✨Self-Adversarial One Step Generation via Condition Shifting
📝 Summary:
APEx enables efficient one-step text-to-image synthesis by eliminating adversarial training through endogenous gradient estimation from flow models, achieving superior quality and speed compared to ex...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12322
• PDF: https://arxiv.org/pdf/2604.12322
• Github: https://github.com/LINs-lab/APEX
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
APEx enables efficient one-step text-to-image synthesis by eliminating adversarial training through endogenous gradient estimation from flow models, achieving superior quality and speed compared to ex...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12322
• PDF: https://arxiv.org/pdf/2604.12322
• Github: https://github.com/LINs-lab/APEX
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Lyra 2.0: Explorable Generative 3D Worlds
📝 Summary:
Lyra 2.0 enables large-scale 3D scene creation through persistent video generation that addresses spatial forgetting and temporal drifting issues in long-horizon video models. AI-generated summary Rec...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13036
• PDF: https://arxiv.org/pdf/2604.13036
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Lyra 2.0 enables large-scale 3D scene creation through persistent video generation that addresses spatial forgetting and temporal drifting issues in long-horizon video models. AI-generated summary Rec...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13036
• PDF: https://arxiv.org/pdf/2604.13036
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment
📝 Summary:
General visual foundation models trained without action supervision outperform specialized embodied models and demonstrate superior alignment between visual and physical action spaces compared to pixe...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11689
• PDF: https://github.com/meituan-longcat/LARYBench/blob/main/LARYBench.pdf
• Project Page: https://meituan-longcat.github.io/LARYBench/
• Github: https://meituan-longcat.github.io/LARYBench/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
General visual foundation models trained without action supervision outperform specialized embodied models and demonstrate superior alignment between visual and physical action spaces compared to pixe...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11689
• PDF: https://github.com/meituan-longcat/LARYBench/blob/main/LARYBench.pdf
• Project Page: https://meituan-longcat.github.io/LARYBench/
• Github: https://meituan-longcat.github.io/LARYBench/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass
📝 Summary:
A multimodal reward model evaluates multiple responses simultaneously through concatenated input and cross-entropy scoring, achieving faster training and superior performance in open-ended generation ...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10966
• PDF: https://arxiv.org/pdf/2604.10966
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A multimodal reward model evaluates multiple responses simultaneously through concatenated input and cross-entropy scoring, achieving faster training and superior performance in open-ended generation ...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10966
• PDF: https://arxiv.org/pdf/2604.10966
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
📝 Summary:
VideoFlexTok enables efficient video representation through variable-length token sequences that capture abstract information first, followed by fine-grained details, allowing for reduced computationa...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12887
• PDF: https://arxiv.org/pdf/2604.12887
• Github: https://github.com/apple/ml-videoflextok
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VideoFlexTok enables efficient video representation through variable-length token sequences that capture abstract information first, followed by fine-grained details, allowing for reduced computationa...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12887
• PDF: https://arxiv.org/pdf/2604.12887
• Github: https://github.com/apple/ml-videoflextok
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
📝 Summary:
Deep learning model for tactile localization that uses dense cross-modal feature interactions to identify material properties in images, overcoming limitations of existing methods through enhanced dat...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11579
• PDF: https://arxiv.org/pdf/2604.11579
• Project Page: https://mm.kaist.ac.kr/projects/SeeingThroughTouch/
• Github: https://github.com/kaistmm/SeeingThroughTouch
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Deep learning model for tactile localization that uses dense cross-modal feature interactions to identify material properties in images, overcoming limitations of existing methods through enhanced dat...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11579
• PDF: https://arxiv.org/pdf/2604.11579
• Project Page: https://mm.kaist.ac.kr/projects/SeeingThroughTouch/
• Github: https://github.com/kaistmm/SeeingThroughTouch
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Many-Tier Instruction Hierarchy in LLM Agents
📝 Summary:
Large language model agents require robust instruction conflict resolution mechanisms that can handle arbitrary privilege levels across diverse real-world scenarios, revealing current models' limitati...
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09443
• PDF: https://arxiv.org/pdf/2604.09443
• Project Page: https://jhu-clsp.github.io/ManyIH
• Github: https://github.com/JHU-CLSP/ManyIH
✨ Datasets citing this paper:
• https://huggingface.co/datasets/jhu-clsp/ManyIH-Bench
• https://huggingface.co/datasets/jackzhang/ManyIH-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large language model agents require robust instruction conflict resolution mechanisms that can handle arbitrary privilege levels across diverse real-world scenarios, revealing current models' limitati...
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09443
• PDF: https://arxiv.org/pdf/2604.09443
• Project Page: https://jhu-clsp.github.io/ManyIH
• Github: https://github.com/JHU-CLSP/ManyIH
✨ Datasets citing this paper:
• https://huggingface.co/datasets/jhu-clsp/ManyIH-Bench
• https://huggingface.co/datasets/jackzhang/ManyIH-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
📝 Summary:
HiVG introduces a hierarchical SVG tokenization framework that improves autoregressive vector graphics generation by addressing geometric structure representation and spatial consistency issues throug...
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05072
• PDF: https://arxiv.org/pdf/2604.05072
• Project Page: https://hy-hivg.github.io/
• Github: https://github.com/ximinng/HiVG
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
HiVG introduces a hierarchical SVG tokenization framework that improves autoregressive vector graphics generation by addressing geometric structure representation and spatial consistency issues throug...
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05072
• PDF: https://arxiv.org/pdf/2604.05072
• Project Page: https://hy-hivg.github.io/
• Github: https://github.com/ximinng/HiVG
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models
📝 Summary:
Vision-language models exhibit semantic fixation by preferring default interpretations over alternative valid rule mappings, which can be mitigated through prompt interventions and training strategies...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12119
• PDF: https://arxiv.org/pdf/2604.12119
• Project Page: https://maveryn.github.io/vlm-fix/
• Github: https://github.com/maveryn/vlm-fix
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Vision-language models exhibit semantic fixation by preferring default interpretations over alternative valid rule mappings, which can be mitigated through prompt interventions and training strategies...
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12119
• PDF: https://arxiv.org/pdf/2604.12119
• Project Page: https://maveryn.github.io/vlm-fix/
• Github: https://github.com/maveryn/vlm-fix
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
📝 Summary:
Large language model agents demonstrate limited strategic behaviors including selective trust and deception in a simulated urban environment, remaining vulnerable to adversarial persuasion despite imp...
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09746
• PDF: https://arxiv.org/pdf/2604.09746
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large language model agents demonstrate limited strategic behaviors including selective trust and deception in a simulated urban environment, remaining vulnerable to adversarial persuasion despite imp...
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09746
• PDF: https://arxiv.org/pdf/2604.09746
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
📝 Summary:
LLMs generally lack superior self-awareness of correctness. However, when models disagree, they demonstrate privileged knowledge for factual tasks, outperforming peers. This advantage emerges in early-to-mid layers, but not in math reasoning.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12373
• PDF: https://arxiv.org/pdf/2604.12373
• Project Page: https://technion-cs-nlp.github.io/Privileged-Knowledge/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #NLP #ModelCorrectness #PrivilegedKnowledge
📝 Summary:
LLMs generally lack superior self-awareness of correctness. However, when models disagree, they demonstrate privileged knowledge for factual tasks, outperforming peers. This advantage emerges in early-to-mid layers, but not in math reasoning.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12373
• PDF: https://arxiv.org/pdf/2604.12373
• Project Page: https://technion-cs-nlp.github.io/Privileged-Knowledge/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #NLP #ModelCorrectness #PrivilegedKnowledge
👍1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Accelerating Speculative Decoding with Block Diffusion Draft Trees
📝 Summary:
DDTree enhances speculative decoding by constructing draft trees from block diffusion drafter distributions. It efficiently verifies multiple trajectories in parallel in a single target model pass, improving performance.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12989
• PDF: https://arxiv.org/pdf/2604.12989
• Project Page: https://liranringel.github.io/ddtree
• Github: https://github.com/liranringel/ddtree
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeculativeDecoding #BlockDiffusion #LLMAcceleration #DeepLearning #AIResearch
📝 Summary:
DDTree enhances speculative decoding by constructing draft trees from block diffusion drafter distributions. It efficiently verifies multiple trajectories in parallel in a single target model pass, improving performance.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12989
• PDF: https://arxiv.org/pdf/2604.12989
• Project Page: https://liranringel.github.io/ddtree
• Github: https://github.com/liranringel/ddtree
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeculativeDecoding #BlockDiffusion #LLMAcceleration #DeepLearning #AIResearch
❤1
✨When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
📝 Summary:
Reasoning-enhanced LLMs can over-optimize, making them better problem solvers but poor simulators of diverse, boundedly rational behavior. This solver-sampler mismatch means high model capability hurts simulation fidelity. Bounded reflection improves realism.
🔹 Publication Date: Published on Apr 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11840
• PDF: https://arxiv.org/pdf/2604.11840
• Project Page: https://www.sandric.co
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #MultiAgentSystems #BehavioralSimulation #AI #AgentBasedModeling
📝 Summary:
Reasoning-enhanced LLMs can over-optimize, making them better problem solvers but poor simulators of diverse, boundedly rational behavior. This solver-sampler mismatch means high model capability hurts simulation fidelity. Bounded reflection improves realism.
🔹 Publication Date: Published on Apr 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11840
• PDF: https://arxiv.org/pdf/2604.11840
• Project Page: https://www.sandric.co
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #MultiAgentSystems #BehavioralSimulation #AI #AgentBasedModeling
✨Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
📝 Summary:
This paper introduces the Turing Test on Screen to address GUI agents detectability by digital platforms. It proposes a benchmark and methods to humanize agent behavior, balancing imitability with task performance, enabling seamless coexistence in adversarial digital environments.
🔹 Publication Date: Published on Feb 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09574
• PDF: https://arxiv.org/pdf/2604.09574
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TuringTest #GUIAgents #AIHumanization #MobileAI #AISecurity
📝 Summary:
This paper introduces the Turing Test on Screen to address GUI agents detectability by digital platforms. It proposes a benchmark and methods to humanize agent behavior, balancing imitability with task performance, enabling seamless coexistence in adversarial digital environments.
🔹 Publication Date: Published on Feb 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09574
• PDF: https://arxiv.org/pdf/2604.09574
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TuringTest #GUIAgents #AIHumanization #MobileAI #AISecurity
✨SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
📝 Summary:
SpotSound improves audio language models for precise temporal grounding in long, noisy audio. It uses a novel training objective to suppress false timestamps, addressing sparse events in challenging backgrounds. SpotSound achieves state-of-the-art performance on temporal grounding benchmarks.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13023
• PDF: https://arxiv.org/pdf/2604.13023
• Project Page: https://loiesun.github.io/spotsound/
• Github: https://github.com/LoieSun/SpotSound
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioLanguageModels #TemporalGrounding #AIResearch #MachineLearning #AudioProcessing
📝 Summary:
SpotSound improves audio language models for precise temporal grounding in long, noisy audio. It uses a novel training objective to suppress false timestamps, addressing sparse events in challenging backgrounds. SpotSound achieves state-of-the-art performance on temporal grounding benchmarks.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13023
• PDF: https://arxiv.org/pdf/2604.13023
• Project Page: https://loiesun.github.io/spotsound/
• Github: https://github.com/LoieSun/SpotSound
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioLanguageModels #TemporalGrounding #AIResearch #MachineLearning #AudioProcessing
✨Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution
📝 Summary:
Domain-specific autoencoders significantly enhance medical image super-resolution. Replacing generic VAEs improves fidelity, showing autoencoder choice is key, not the diffusion architecture. Autoencoder performance predicts overall SR quality.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12152
• PDF: https://arxiv.org/pdf/2604.12152
• Github: https://github.com/sebasmos/latent-sr
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalImaging #SuperResolution #DiffusionModels #DeepLearning #Autoencoders
📝 Summary:
Domain-specific autoencoders significantly enhance medical image super-resolution. Replacing generic VAEs improves fidelity, showing autoencoder choice is key, not the diffusion architecture. Autoencoder performance predicts overall SR quality.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12152
• PDF: https://arxiv.org/pdf/2604.12152
• Github: https://github.com/sebasmos/latent-sr
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalImaging #SuperResolution #DiffusionModels #DeepLearning #Autoencoders
✨3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis
📝 Summary:
3DTV is a feedforward network combining lightweight geometry and learning for real-time, robust sparse-view interpolation. It generates novel views efficiently without scene-specific optimization, making it practical for interactive applications.
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11211
• PDF: https://arxiv.org/pdf/2604.11211
• Project Page: https://stefanmschulz.github.io/3DTV_webpage/
• Github: https://github.com/StefanMSchulz/3DTV
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ViewSynthesis #DeepLearning #ComputerVision #NeuralNetworks #RealTimeAI
📝 Summary:
3DTV is a feedforward network combining lightweight geometry and learning for real-time, robust sparse-view interpolation. It generates novel views efficiently without scene-specific optimization, making it practical for interactive applications.
🔹 Publication Date: Published on Apr 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11211
• PDF: https://arxiv.org/pdf/2604.11211
• Project Page: https://stefanmschulz.github.io/3DTV_webpage/
• Github: https://github.com/StefanMSchulz/3DTV
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ViewSynthesis #DeepLearning #ComputerVision #NeuralNetworks #RealTimeAI
✨BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
📝 Summary:
Lexical LLM evaluation is rigid and inaccurate, while LLM-as-a-Judge is expensive. This paper introduces BERT-as-a-Judge, a robust, scalable encoder-driven method for reference-based LLM evaluation. It performs like larger LLM judges but with lower cost.
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09497
• PDF: https://arxiv.org/pdf/2604.09497
• Github: https://github.com/artefactory/BERT-as-a-Judge
🔹 Models citing this paper:
• https://huggingface.co/artefactory/BERTJudge
• https://huggingface.co/artefactory/BERTJudge-Formatted-QCR-500k
• https://huggingface.co/artefactory/BERTJudge-Formatted-QCR-OOD
✨ Datasets citing this paper:
• https://huggingface.co/datasets/artefactory/BERTJudge-Dataset
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMEvaluation #BERT #NLP #AIResearch #MachineLearning
📝 Summary:
Lexical LLM evaluation is rigid and inaccurate, while LLM-as-a-Judge is expensive. This paper introduces BERT-as-a-Judge, a robust, scalable encoder-driven method for reference-based LLM evaluation. It performs like larger LLM judges but with lower cost.
🔹 Publication Date: Published on Apr 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09497
• PDF: https://arxiv.org/pdf/2604.09497
• Github: https://github.com/artefactory/BERT-as-a-Judge
🔹 Models citing this paper:
• https://huggingface.co/artefactory/BERTJudge
• https://huggingface.co/artefactory/BERTJudge-Formatted-QCR-500k
• https://huggingface.co/artefactory/BERTJudge-Formatted-QCR-OOD
✨ Datasets citing this paper:
• https://huggingface.co/datasets/artefactory/BERTJudge-Dataset
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMEvaluation #BERT #NLP #AIResearch #MachineLearning
arXiv.org
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for...
Accurate evaluation is central to the large language model (LLM) ecosystem, guiding model selection and downstream adoption across diverse use cases. In practice, however, evaluating generative...
✨Spatial Competence Benchmark
📝 Summary:
Three frontier models show declining accuracy on a new spatial competence benchmark, with performance saturating quickly under token budget constraints. AI-generated summary Spatial competence is the ...
🔹 Publication Date: Published on Mar 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09594
• PDF: https://arxiv.org/pdf/2604.09594
• Github: https://github.com/ashleyharris-maptek-com-au/SpatialCompetenceBenchmark
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Three frontier models show declining accuracy on a new spatial competence benchmark, with performance saturating quickly under token budget constraints. AI-generated summary Spatial competence is the ...
🔹 Publication Date: Published on Mar 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09594
• PDF: https://arxiv.org/pdf/2604.09594
• Github: https://github.com/ashleyharris-maptek-com-au/SpatialCompetenceBenchmark
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
📝 Summary:
Current OCR models poorly generalize across diverse scripts. GlotOCR Bench, a new benchmark for over 100 Unicode scripts, reveals most models perform well on under ten scripts. Generalization is limited and strongly depends on pretraining coverage.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12978
• PDF: https://arxiv.org/pdf/2604.12978
• Project Page: https://huggingface.co/datasets/cis-lmu/GlotOCR-bench
• Github: https://github.com/cisnlp/glotocr-bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #NLP #MultilingualAI #Benchmarking #AIResearch
📝 Summary:
Current OCR models poorly generalize across diverse scripts. GlotOCR Bench, a new benchmark for over 100 Unicode scripts, reveals most models perform well on under ten scripts. Generalization is limited and strongly depends on pretraining coverage.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12978
• PDF: https://arxiv.org/pdf/2604.12978
• Project Page: https://huggingface.co/datasets/cis-lmu/GlotOCR-bench
• Github: https://github.com/cisnlp/glotocr-bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #NLP #MultilingualAI #Benchmarking #AIResearch