✨Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition
📝 Summary:
A 115M-parameter FastConformer-Transducer model achieves low-latency Thai speech recognition with reduced computational cost through text normalization and curriculum learning, accompanied by a benchm...
🔹 Publication Date: Published on Jan 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13044
• PDF: https://arxiv.org/pdf/2601.13044
🔹 Models citing this paper:
• https://huggingface.co/typhoon-ai/typhoon-asr-realtime
• https://huggingface.co/typhoon-ai/typhoon-isan-asr-realtime
• https://huggingface.co/typhoon-ai/typhoon-whisper-turbo
✨ Datasets citing this paper:
• https://huggingface.co/datasets/typhoon-ai/gigaspeech2-typhoon
• https://huggingface.co/datasets/typhoon-ai/TVSpeech
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A 115M-parameter FastConformer-Transducer model achieves low-latency Thai speech recognition with reduced computational cost through text normalization and curriculum learning, accompanied by a benchm...
🔹 Publication Date: Published on Jan 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13044
• PDF: https://arxiv.org/pdf/2601.13044
🔹 Models citing this paper:
• https://huggingface.co/typhoon-ai/typhoon-asr-realtime
• https://huggingface.co/typhoon-ai/typhoon-isan-asr-realtime
• https://huggingface.co/typhoon-ai/typhoon-whisper-turbo
✨ Datasets citing this paper:
• https://huggingface.co/datasets/typhoon-ai/gigaspeech2-typhoon
• https://huggingface.co/datasets/typhoon-ai/TVSpeech
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨sangkuriang: A pseudo-spectral Python library for Korteweg-de Vries soliton simulation
📝 Summary:
The Korteweg-de Vries (KdV) equation serves as a foundational model in nonlinear wave physics, describing the balance between dispersive spreading and nonlinear steepening that gives rise to solitons....
🔹 Publication Date: Published on Jan 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.12029
• PDF: https://arxiv.org/pdf/2601.12029
• Project Page: https://pypi.org/project/sangkuriang-ideal-solver/
• Github: https://github.com/sandyherho/sangkuriang-ideal-solver
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The Korteweg-de Vries (KdV) equation serves as a foundational model in nonlinear wave physics, describing the balance between dispersive spreading and nonlinear steepening that gives rise to solitons....
🔹 Publication Date: Published on Jan 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.12029
• PDF: https://arxiv.org/pdf/2601.12029
• Project Page: https://pypi.org/project/sangkuriang-ideal-solver/
• Github: https://github.com/sandyherho/sangkuriang-ideal-solver
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation
📝 Summary:
UltraRAG is a RAG toolkit automating knowledge adaptation across the entire workflow from data to evaluation. It provides a user-friendly WebUI, enabling non-coders to build and optimize RAG systems for diverse scenarios.
🔹 Publication Date: Published on Mar 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.08761
• PDF: https://arxiv.org/pdf/2504.08761
• Github: https://github.com/OpenBMB/UltraRAG
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RAG #AI #LLMs #Automation #DataScience
📝 Summary:
UltraRAG is a RAG toolkit automating knowledge adaptation across the entire workflow from data to evaluation. It provides a user-friendly WebUI, enabling non-coders to build and optimize RAG systems for diverse scenarios.
🔹 Publication Date: Published on Mar 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.08761
• PDF: https://arxiv.org/pdf/2504.08761
• Github: https://github.com/OpenBMB/UltraRAG
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RAG #AI #LLMs #Automation #DataScience
❤3
✨Behavior Knowledge Merge in Reinforced Agentic Models
📝 Summary:
Reinforced Agent Merging RAM improves integrating RL agents by distinguishing shared and task-specific parameters. This preserves critical behaviors, outperforming baselines and unlocking synergistic performance beyond specialized agents.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13572
• PDF: https://arxiv.org/pdf/2601.13572
• Project Page: https://xiangchi-yuan.github.io/ram-project/
• Github: https://github.com/xiangchi-yuan/mrl
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #MultiAgentSystems #ArtificialIntelligence #DeepLearning #AgenticModels
📝 Summary:
Reinforced Agent Merging RAM improves integrating RL agents by distinguishing shared and task-specific parameters. This preserves critical behaviors, outperforming baselines and unlocking synergistic performance beyond specialized agents.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13572
• PDF: https://arxiv.org/pdf/2601.13572
• Project Page: https://xiangchi-yuan.github.io/ram-project/
• Github: https://github.com/xiangchi-yuan/mrl
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #MultiAgentSystems #ArtificialIntelligence #DeepLearning #AgenticModels
✨Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
📝 Summary:
Benign fine-tuning can cause privacy collapse in language models. Models lose contextual privacy reasoning despite maintaining high performance, leading to severe vulnerabilities. This silent failure reveals a critical gap in current safety evaluations for specialized agents.
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15220
• PDF: https://arxiv.org/pdf/2601.15220
• Github: https://github.com/parameterlab/privacy-collapse
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Privacy #AIsafety #FineTuning #AIsecurity
📝 Summary:
Benign fine-tuning can cause privacy collapse in language models. Models lose contextual privacy reasoning despite maintaining high performance, leading to severe vulnerabilities. This silent failure reveals a critical gap in current safety evaluations for specialized agents.
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15220
• PDF: https://arxiv.org/pdf/2601.15220
• Github: https://github.com/parameterlab/privacy-collapse
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Privacy #AIsafety #FineTuning #AIsecurity
✨FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
📝 Summary:
Chroma 1.0 is the first open-source real-time end-to-end spoken dialogue model with personalized voice cloning. It achieves low-latency interaction and high-fidelity voice synthesis, improving speaker similarity by 10.96% over a human baseline.
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11141
• PDF: https://arxiv.org/pdf/2601.11141
• Project Page: https://www.flashlabs.ai/flashai-voice-agents
• Github: https://github.com/FlashLabs-AI-Corp/FlashLabs-Chroma
🔹 Models citing this paper:
• https://huggingface.co/FlashLabs/Chroma-4B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ConversationalAI #VoiceCloning #RealTimeAI #OpenSourceAI #TTS
📝 Summary:
Chroma 1.0 is the first open-source real-time end-to-end spoken dialogue model with personalized voice cloning. It achieves low-latency interaction and high-fidelity voice synthesis, improving speaker similarity by 10.96% over a human baseline.
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11141
• PDF: https://arxiv.org/pdf/2601.11141
• Project Page: https://www.flashlabs.ai/flashai-voice-agents
• Github: https://github.com/FlashLabs-AI-Corp/FlashLabs-Chroma
🔹 Models citing this paper:
• https://huggingface.co/FlashLabs/Chroma-4B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ConversationalAI #VoiceCloning #RealTimeAI #OpenSourceAI #TTS
✨Show me the evidence: Evaluating the role of evidence and natural language explanations in AI-supported fact-checking
📝 Summary:
This study found that non-expert users consistently relied on evidence to validate AI claims in fact-checking. While natural language explanations reduced evidence use, participants still turned to evidence if explanations seemed flawed or insufficient. Evidence is a key ingredient for evaluating...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11387
• PDF: https://arxiv.org/pdf/2601.11387
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #FactChecking #ExplainableAI #Evidence #InformationCredibility
📝 Summary:
This study found that non-expert users consistently relied on evidence to validate AI claims in fact-checking. While natural language explanations reduced evidence use, participants still turned to evidence if explanations seemed flawed or insufficient. Evidence is a key ingredient for evaluating...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11387
• PDF: https://arxiv.org/pdf/2601.11387
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #FactChecking #ExplainableAI #Evidence #InformationCredibility
✨GutenOCR: A Grounded Vision-Language Front-End for Documents
📝 Summary:
GutenOCR enhances vision-language models for document understanding, unifying reading, detection, and grounding via a prompt-based interface. It significantly improves grounded OCR, region and line-level OCR, and text detection on diverse documents.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14490
• PDF: https://arxiv.org/pdf/2601.14490
🔹 Models citing this paper:
• https://huggingface.co/rootsautomation/GutenOCR-3B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #VisionLanguageModels #DocumentAI #ComputerVision #DeepLearning
📝 Summary:
GutenOCR enhances vision-language models for document understanding, unifying reading, detection, and grounding via a prompt-based interface. It significantly improves grounded OCR, region and line-level OCR, and text detection on diverse documents.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14490
• PDF: https://arxiv.org/pdf/2601.14490
🔹 Models citing this paper:
• https://huggingface.co/rootsautomation/GutenOCR-3B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #VisionLanguageModels #DocumentAI #ComputerVision #DeepLearning
✨CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
📝 Summary:
The paper addresses unreliable multilingual medical reasoning in LLMs, especially for underrepresented languages. It introduces CURE-MED, a curriculum-informed reinforcement learning framework, and CUREMED-BENCH dataset. CURE-MED significantly improves language consistency and logical correctness...
🔹 Publication Date: Published on Jan 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13262
• PDF: https://arxiv.org/pdf/2601.13262
• Project Page: https://cure-med.github.io/
🔹 Models citing this paper:
• https://huggingface.co/Aikyam-Lab/CURE-MED-1.5B
• https://huggingface.co/Aikyam-Lab/CURE-MED-3B
• https://huggingface.co/Aikyam-Lab/CURE-MED-7B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #MedicalAI #ReinforcementLearning #MultilingualNLP #AIResearch
📝 Summary:
The paper addresses unreliable multilingual medical reasoning in LLMs, especially for underrepresented languages. It introduces CURE-MED, a curriculum-informed reinforcement learning framework, and CUREMED-BENCH dataset. CURE-MED significantly improves language consistency and logical correctness...
🔹 Publication Date: Published on Jan 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13262
• PDF: https://arxiv.org/pdf/2601.13262
• Project Page: https://cure-med.github.io/
🔹 Models citing this paper:
• https://huggingface.co/Aikyam-Lab/CURE-MED-1.5B
• https://huggingface.co/Aikyam-Lab/CURE-MED-3B
• https://huggingface.co/Aikyam-Lab/CURE-MED-7B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #MedicalAI #ReinforcementLearning #MultilingualNLP #AIResearch
✨Implicit Neural Representation Facilitates Unified Universal Vision Encoding
📝 Summary:
This paper unifies image representation learning for both recognition and generation. It uses a hyper-network for implicit neural representation with knowledge distillation to create compressed embeddings. The model achieves state-of-the-art results and enables generative capabilities.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14256
• PDF: https://arxiv.org/pdf/2601.14256
• Github: https://github.com/tiktok/huvr
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ComputerVision #DeepLearning #GenerativeAI #RepresentationLearning #VisionEncoding
📝 Summary:
This paper unifies image representation learning for both recognition and generation. It uses a hyper-network for implicit neural representation with knowledge distillation to create compressed embeddings. The model achieves state-of-the-art results and enables generative capabilities.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14256
• PDF: https://arxiv.org/pdf/2601.14256
• Github: https://github.com/tiktok/huvr
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ComputerVision #DeepLearning #GenerativeAI #RepresentationLearning #VisionEncoding
✨Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis
📝 Summary:
Motion 3-to-4 synthesizes 4D dynamic objects from monocular video by separating static 3D shape generation from motion reconstruction. It uses a canonical mesh and a transformer to predict temporally coherent vertex trajectories, achieving superior fidelity.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14253
• PDF: https://arxiv.org/pdf/2601.14253
🔹 Models citing this paper:
• https://huggingface.co/River-Chen/Motion324
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DReconstruction #4DSynthesis #ComputerVision #DeepLearning #MotionCapture
📝 Summary:
Motion 3-to-4 synthesizes 4D dynamic objects from monocular video by separating static 3D shape generation from motion reconstruction. It uses a canonical mesh and a transformer to predict temporally coherent vertex trajectories, achieving superior fidelity.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14253
• PDF: https://arxiv.org/pdf/2601.14253
🔹 Models citing this paper:
• https://huggingface.co/River-Chen/Motion324
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DReconstruction #4DSynthesis #ComputerVision #DeepLearning #MotionCapture
✨The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
📝 Summary:
Arbitrary order generation in diffusion LLMs, surprisingly, limits reasoning by causing premature solution space collapse. This occurs because dLLMs exploit flexibility to bypass crucial, high-uncertainty tokens. Standard Group Relative Policy Optimization without arbitrary order is more effectiv...
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15165
• PDF: https://arxiv.org/pdf/2601.15165
• Project Page: https://nzl-thu.github.io/the-flexibility-trap
• Github: https://github.com/LeapLabTHU/JustGRPO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #DiffusionModels #NLP #AIResearch #MachineLearning
📝 Summary:
Arbitrary order generation in diffusion LLMs, surprisingly, limits reasoning by causing premature solution space collapse. This occurs because dLLMs exploit flexibility to bypass crucial, high-uncertainty tokens. Standard Group Relative Policy Optimization without arbitrary order is more effectiv...
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15165
• PDF: https://arxiv.org/pdf/2601.15165
• Project Page: https://nzl-thu.github.io/the-flexibility-trap
• Github: https://github.com/LeapLabTHU/JustGRPO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #DiffusionModels #NLP #AIResearch #MachineLearning
✨Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
📝 Summary:
Stable-DiffCoder uses block diffusion continual pretraining to significantly outperform autoregressive code models. It achieves superior performance on a wide range of code benchmarks, enhancing structured code modeling and benefiting low-resource languages.
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15892
• PDF: https://arxiv.org/pdf/2601.15892
• Project Page: https://bytedance-seed.github.io/Stable-DiffCoder/
• Github: https://github.com/ByteDance-Seed/Stable-DiffCoder
🔹 Models citing this paper:
• https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
• https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Base
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Stable-DiffCoder uses block diffusion continual pretraining to significantly outperform autoregressive code models. It achieves superior performance on a wide range of code benchmarks, enhancing structured code modeling and benefiting low-resource languages.
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15892
• PDF: https://arxiv.org/pdf/2601.15892
• Project Page: https://bytedance-seed.github.io/Stable-DiffCoder/
• Github: https://github.com/ByteDance-Seed/Stable-DiffCoder
🔹 Models citing this paper:
• https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
• https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Base
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LLM-in-Sandbox Elicits General Agentic Intelligence
📝 Summary:
LLM-in-Sandbox enables large language models to explore a code sandbox, eliciting general agentic intelligence across diverse domains without additional training. LLMs spontaneously access resources, handle long contexts, and execute scripts, showing robust generalization. These capabilities can ...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16206
• PDF: https://arxiv.org/pdf/2601.16206
• Project Page: https://llm-in-sandbox.github.io
• Github: https://github.com/llm-in-sandbox/llm-in-sandbox
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
LLM-in-Sandbox enables large language models to explore a code sandbox, eliciting general agentic intelligence across diverse domains without additional training. LLMs spontaneously access resources, handle long contexts, and execute scripts, showing robust generalization. These capabilities can ...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16206
• PDF: https://arxiv.org/pdf/2601.16206
• Project Page: https://llm-in-sandbox.github.io
• Github: https://github.com/llm-in-sandbox/llm-in-sandbox
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Learning to Discover at Test Time
📝 Summary:
Test-time training enables AI systems to discover optimal solutions for specific scientific problems through continual learning focused on individual challenges rather than generalization. AI-generate...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16175
• PDF: https://arxiv.org/pdf/2601.16175
• Project Page: https://test-time-training.github.io/discover/
• Github: https://github.com/test-time-training/discover
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Test-time training enables AI systems to discover optimal solutions for specific scientific problems through continual learning focused on individual challenges rather than generalization. AI-generate...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16175
• PDF: https://arxiv.org/pdf/2601.16175
• Project Page: https://test-time-training.github.io/discover/
• Github: https://github.com/test-time-training/discover
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Qwen3-TTS Technical Report
📝 Summary:
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speec...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15621
• PDF: https://arxiv.org/pdf/2601.15621
• Github: https://github.com/QwenLM/Qwen3-TTS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speec...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15621
• PDF: https://arxiv.org/pdf/2601.15621
• Github: https://github.com/QwenLM/Qwen3-TTS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
📝 Summary:
An advanced vision encoder named OpenVision 3 learns a unified visual representation for both image understanding and generation by combining VAE-compressed image latents with ViT architecture and joi...
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15369
• PDF: https://arxiv.org/pdf/2601.15369
• Project Page: https://ucsc-vlaa.github.io/OpenVision3/
• Github: https://ucsc-vlaa.github.io/OpenVision3/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
An advanced vision encoder named OpenVision 3 learns a unified visual representation for both image understanding and generation by combining VAE-compressed image latents with ViT architecture and joi...
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15369
• PDF: https://arxiv.org/pdf/2601.15369
• Project Page: https://ucsc-vlaa.github.io/OpenVision3/
• Github: https://ucsc-vlaa.github.io/OpenVision3/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
📝 Summary:
Terminal-Bench 2.0 presents a challenging benchmark with 89 terminal-based tasks to evaluate AI agents' capabilities in real-world scenarios. AI-generated summary AI agents may soon become capable of ...
🔹 Publication Date: Published on Jan 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11868
• PDF: https://arxiv.org/pdf/2601.11868
• Github: https://github.com/laude-institute/terminal-bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Terminal-Bench 2.0 presents a challenging benchmark with 89 terminal-based tasks to evaluate AI agents' capabilities in real-world scenarios. AI-generated summary AI agents may soon become capable of ...
🔹 Publication Date: Published on Jan 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11868
• PDF: https://arxiv.org/pdf/2601.11868
• Github: https://github.com/laude-institute/terminal-bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing
📝 Summary:
A novel fine-grained composed image retrieval benchmark is introduced through image editing techniques, revealing significant capability gaps in existing multimodal models and exposing limitations of ...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16125
• PDF: https://arxiv.org/pdf/2601.16125
• Github: https://github.com/SighingSnow/edir
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A novel fine-grained composed image retrieval benchmark is introduced through image editing techniques, revealing significant capability gaps in existing multimodal models and exposing limitations of ...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16125
• PDF: https://arxiv.org/pdf/2601.16125
• Github: https://github.com/SighingSnow/edir
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
📝 Summary:
VLA models struggle with generalization due to Information Collapse where language is ignored. BayesianVLA uses Bayesian decomposition and latent action queries. It optimizes conditional PMI to penalize vision shortcuts, significantly improving out-of-distribution generalization.
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15197
• PDF: https://arxiv.org/pdf/2601.15197
• Project Page: https://github.com/ZGC-EmbodyAI/BayesianVLA
• Github: https://github.com/ZGC-EmbodyAI/BayesianVLA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VLA models struggle with generalization due to Information Collapse where language is ignored. BayesianVLA uses Bayesian decomposition and latent action queries. It optimizes conditional PMI to penalize vision shortcuts, significantly improving out-of-distribution generalization.
🔹 Publication Date: Published on Jan 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15197
• PDF: https://arxiv.org/pdf/2601.15197
• Project Page: https://github.com/ZGC-EmbodyAI/BayesianVLA
• Github: https://github.com/ZGC-EmbodyAI/BayesianVLA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨SAMTok: Representing Any Mask with Two Words
📝 Summary:
SAMTok enables pixel-wise capabilities in multi-modal LLMs through discrete mask tokenization and standard training methods, achieving state-of-the-art performance on various vision-language tasks. AI...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16093
• PDF: https://arxiv.org/pdf/2601.16093
• Project Page: https://github.com/bytedance/Sa2VA/tree/main/projects/samtok
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SAMTok enables pixel-wise capabilities in multi-modal LLMs through discrete mask tokenization and standard training methods, achieving state-of-the-art performance on various vision-language tasks. AI...
🔹 Publication Date: Published on Jan 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16093
• PDF: https://arxiv.org/pdf/2601.16093
• Project Page: https://github.com/bytedance/Sa2VA/tree/main/projects/samtok
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research