ML Research Hub – Telegram

ML Research Hub

32.6K subscribers

3.66K photos

165 videos

23 files

3.91K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.6K subscribers

ML Research Hub

Efficient Reasoning with Hidden Thinking

31 Jan 2025 · Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu ·

Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities in Multimodal Large Language Models (MLLMs). However, the verbose nature of textual reasoning introduces significant inefficiencies. In this work, we propose
(as hidden llama), an efficient reasoning framework that leverages reasoning CoTs at hidden latent space. We design the Heima Encoder to condense each intermediate CoT into a compact, higher-level hidden representation using a single thinking token, effectively minimizing verbosity and reducing the overall number of tokens required during the reasoning process. Meanwhile, we design corresponding Heima Decoder with traditional Large Language Models (LLMs) to adaptively interpret the hidden representations into variable-length textual sequence, reconstructing reasoning processes that closely resemble the original CoTs. Experimental results across diverse reasoning MLLM benchmarks demonstrate that Heima model achieves higher generation efficiency while maintaining or even better zero-shot task accuracy. Moreover, the effective reconstruction of multimodal reasoning processes with Heima Decoder validates both the robustness and interpretability of our approach.

Paper: https://arxiv.org/pdf/2501.19201v1.pdf

Code: https://github.com/shawnricecake/heima

Datasets: MMBench - MM-Vet - MathVista - MMStar - HallusionBench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

❤2👍1

1.97K views18:13

ML Research Hub

Data Formulator 2: Iteratively Creating Rich Visualizations with AI

28 Aug 2024 · Chenglong Wang, Bongshin Lee, Steven Drucker, Dan Marshall, Jianfeng Gao ·

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. However, these systems do not work well for iterative visualization authoring, because they often require analysts to provide, in a single turn, a text-only prompt that fully describes the complex visualization task to be performed, which is unrealistic to both users and models in many cases. In this paper, we present Data Formulator 2, an LLM-powered visualization system to address these challenges. With Data Formulator 2, users describe their visualization intent with blended UI and natural language inputs, and data transformation are delegated to AI. To support iteration, Data Formulator 2 lets users navigate their iteration history and reuse previous designs towards new ones so that they don't need to start from scratch every time. In a user study with eight participants, we observed that Data Formulator 2 allows participants to develop their own iteration strategies to complete challenging data exploration sessions.

Paper: https://arxiv.org/pdf/2408.16119v1.pdf

Code: https://github.com/microsoft/data-formulator

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍3

1.7K viewsedited 08:05

ML Research Hub

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

5 Aug 2024 · Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak ·

Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets.

Paper: https://arxiv.org/pdf/2408.02545v1.pdf

Code: https://github.com/intellabs/ragfoundry

Datasets: TriviaQA - PubMedQA

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍4

2.02K viewsedited 08:08

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

🔬MedRAX: A groundbreaking AI agent designed for medical tasks!

What is MedRAX?

MedRAX is the first general-purpose AI agent that combines state-of-the-art chest X-ray analysis tools and multimodal large language models into a single framework that can dynamically reason about complex medical queries without additional training.

🎯 What is so good about MedRAX?

While specialized AI models excel at specific chest X-ray tasks, they often struggle with complex analysis and can produce inaccurate recommendations. Many healthcare professionals want a single, robust system that can handle complex queries while maintaining accuracy. MedRAX aims to be that tool.

🛠 Integrated tools:

- Visual quality control: CheXagent and LLaVA-Med
- Segmentation : MedSAM & ChestX-Det
- Report generation : CheXpert Plus
- Classification : TorchXRayVision
- Grounding Maira-2
- Synthetic data : RoentGen

💡 Key Features:

- Seamless integration of specialized medical tools with multimodal reasoning based on large language models.
- Dynamic Orchestration: Intelligently select and coordinate tools for complex queries.
- Clinical Focus: Designed for real medical processes.

📊 ChestAgentBench:

The developers also released ChestAgentBench , a comprehensive medical agent benchmark built on 675 expert-reviewed clinical cases and including 2,500 complex medical queries across 7 categories.

🎉 The results speak for themselves:
- 63.1% accuracy on ChestAgentBench
- Sota performance on CheXbench
- Outperforms both general-purpose and specialized medical models

▪️ Paper : https://arxiv.org/abs/2502.02673
▪️ Github : https://github.com/bowang-lab/MedRAX

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍2❤1🙏1

1.99K views15:19

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

🌟 RT-DETRv2: An Improved CV Model for Real-Time Object Detection.

RT-DETRv2 is a new version of RT-DETR, an alternative to YOLO. RT-DETRv2 has received a number of improvements: increased flexibility, usability and performance.

The key change is the modification of the deformable attention module in the decoder. RT-DETRv2 proposes to set a different number of sampling points for features of different scales. This allows for more efficient extraction of multi-scale features, making it more adaptive to multiple detection scenarios.

To make the model more practical, we replaced the DETR-specific grid_sample operator with an optional discrete_sample operator, which performs rounding of the predicted sample offsets, speeding up the process without significant loss of accuracy.

RT-DETRv2 is trained using a dynamic data augmentation strategy. In the early stages, more intensive augmentation methods are used to help the model generalize better to the data. In the later stages, the level of augmentation is reduced, allowing the model to adapt to the target domain.

The new version uses hyperparameter customization depending on the scale of the model. For example, for ResNet18, the learning rate increases, while for larger models - ResNet101, it decreases.

RT-DETRv2 was tested on the COCO dataset, where the model showed an improvement in the AP metric by 0.3–1.4 points compared to RT-DETR, while maintaining high performance. For example, RT-DETRv2-S with the ResNet18 architecture achieved an AP of 47.9, which is 1.4 points higher than RT-DETR-S.

Scripts for finetune RT-DETRv2 with Trainer or Accelerate are hosted in the HuggingFace repository on Github, and a simple inference notebook locally - here or run in Google Collab.

📌 Licensing: Apache 2.0

🟡

🟡

🟡

Google Collab Inference

🖥

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.12K views07:11

ML Research Hub

47df5a49_9430_47e9_bf5a_7640f0706832_17c659c7.gif

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

23 Jan 2025 · Tao Liu, Kai Wang, Senmao Li, Joost Van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ·

Text-to-image generation models can create high-quality images from input prompts. However, they struggle to support the consistent generation of identity-preserving requirements for storytelling. Existing approaches to this problem typically require extensive training in large datasets or additional modifications to the original model architectures. This limits their applicability across different domains and diverse diffusion model configurations. In this paper, we first observe the inherent capability of language models, coined context consistency, to comprehend identity through context with a single prompt. Drawing inspiration from the inherent context consistency, we propose a novel training-free method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" (1Prompt1Story). Our approach 1Prompt1Story concatenates all prompts into a single input for T2I diffusion models, initially preserving character identities. We then refine the generation process using two novel techniques: Singular-Value Reweighting and Identity-Preserving Cross-Attention, ensuring better alignment with the input description for each frame. In our experiments, we compare our method against various existing consistent T2I generation approaches to demonstrate its effectiveness through quantitative metrics and qualitative assessments.

Paper: https://arxiv.org/pdf/2501.13554v2.pdf

Code: https://github.com/byliutao/1prompt1story

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍2

1.89K viewsedited 08:37

ML Research Hub

One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation

4 Feb 2025 · Jianze Li, JieZhang Cao, Yong Guo, Wenbo Li, Yulun Zhang ·

Diffusion models (DMs) have significantly advanced the development of real-world image super-resolution (Real-ISR), but the computational cost of multi-step diffusion models limits their application. One-step diffusion models generate high-quality images in a one sampling step, greatly reducing computational overhead and inference latency. However, most existing one-step diffusion methods are constrained by the performance of the teacher model, where poor teacher performance results in image artifacts. To address this limitation, we propose FluxSR, a novel one-step diffusion Real-ISR technique based on flow matching models. We use the state-of-the-art diffusion model FLUX.1-dev as both the teacher model and the base model. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss and introduce Attention Diversification Loss (ADL) as a regularization term to reduce token similarity in transformer, thereby eliminating high-frequency artifacts. Comprehensive experiments demonstrate that our method outperforms existing one-step diffusion-based Real-ISR methods.

Paper: https://arxiv.org/pdf/2502.01993v1.pdf

Code: https://github.com/jianzeli-114/fluxsr

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍4❤1

1.9K views10:12

ML Research Hub

SGLang: Efficient Execution of Structured Language Model Programs

12 Dec 2023 · Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng ·

Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and executing these applications. We introduce SGLang, a system for efficient execution of complex language model programs. SGLang consists of a frontend language and a runtime. The frontend simplifies programming with primitives for generation and parallelism control. The runtime accelerates execution with novel optimizations like RadixAttention for KV cache reuse and compressed finite state machines for faster structured output decoding. Experiments show that SGLang achieves up to 6.4x higher throughput compared to state-of-the-art inference systems on various large language and multi-modal models on tasks including agent control, logical reasoning, few-shot learning benchmarks, JSON decoding, retrieval-augmented generation pipelines, and multi-turn chat.

Paper: https://arxiv.org/pdf/2312.07104v2.pdf

Code: https://github.com/sgl-project/sglang

Datasets: MMLU - HellaSwag - LLaVA-Bench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍2❤1

2K views11:09

ML Research Hub

Forwarded from Machine Learning with Python

Some people asked me about a resource for learning about Transformers.

Here's a good one I am sharing again -- it covers just about everything you need to know.

brandonrohrer.com/transformers

Amazing stuff. It's totally worth your weekend.

#Transformers #DeepLearning #NLP #AI #MachineLearning #SelfAttention #DataScience #Technology #Python #LearningResource

https://t.iss.one/CodeProgrammer

👍5

1.71K views13:13

ML Research Hub

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

22 Feb 2024 · Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam ·

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.

Paper: https://arxiv.org/pdf/2402.14207v2.pdf

Codes:
https://github.com/assafelovic/gpt-researcher
https://github.com/stanford-oval/storm

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍2❤1

2.25K views16:04

ML Research Hub

LLM4Decompile: Decompiling Binary Code with Large Language Models

8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.

Paper: https://arxiv.org/pdf/2403.05286v3.pdf

Code: https://github.com/albertan017/LLM4Decompile

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

❤1👍1

2.27K viewsedited 05:57

ML Research Hub

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.

Paper: https://arxiv.org/pdf/2501.14350v1.pdf

Code: https://github.com/fireredteam/fireredasr

Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍2

2.69K viewsedited 07:33

ML Research Hub

⭐️ Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

🖥 Github: https://github.com/bcmi/Light-A-Video

📕 Paper: https://arxiv.org/abs/2502.08590v1

🌟 Dataset: https://paperswithcode.com/task/image-relighting

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍3

2.34K viewsedited 13:02

ML Research Hub

3b3d57df_374f_4bf3_9752_26a5010a718e.gif

MedRAX: Medical Reasoning Agent for Chest X-ray

4 Feb 2025 · Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo wang ·

Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present #MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems.

paper: https://arxiv.org/pdf/2502.02673v1.pdf

Code: https://github.com/bowang-lab/medrax

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

❤1

2.35K viewsedited 06:39

ML Research Hub

Bayesian Sample Inference

🖥

Github: https://github.com/martenlienen/bsi

📕

Paper: https://arxiv.org/abs/2502.07580

🌟 Dataset: https://paperswithcode.com/dataset/cifar-10

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1👍1

2.35K viewsedited 11:27

ML Research Hub

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

5 Feb 2025 · Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee ·

We present On-device #Sora, a first pioneering solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices. Building on Open-Sora, On-device Sora applies three novel techniques to address the challenges of diffusion-based text-to-video generation on computation- and memory-limited mobile devices. First, Linear Proportional Leap (#LPL) reduces the excessive denoising steps required in video diffusion through an efficient leap-based approach. Second, Temporal Dimension Token Merging (#TDTM) minimizes intensive token-processing computation in attention layers by merging consecutive tokens along the temporal dimension. Third, Concurrent Inference with Dynamic Loading (CI-DL) dynamically partitions large models into smaller blocks and loads them into memory for concurrent model inference, effectively addressing the challenges of limited device memory. We implement On-device Sora on the iPhone 15 Pro, and the experimental evaluations demonstrate that it is capable of generating high-quality videos on the device, comparable to those produced by Open-Sora running on high-end GPUs. These results show that On-device Sora enables efficient and high-quality video generation on resource-constrained mobile devices, expanding accessibility, ensuring user privacy, reducing dependence on cloud infrastructure, and lowering associated costs. We envision the proposed On-device Sora as a significant first step toward democratizing state-of-the-art generative technologies, enabling video generation capabilities on commodity mobile and embedded devices.

Paper:https://arxiv.org/pdf/2502.04363v1.pdf

Code: https://github.com/eai-lab/on-device-sora

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍3❤1

2.21K views12:59

ML Research Hub

Accelerating Data Processing and Benchmarking of AI Models for Pathology

10 Feb 2025 · Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood ·

Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available #models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarking, and curated publicly available tasks. We anticipate that these resources will promote transparency, reproducibility, and continued progress in the field.

Paper: https://arxiv.org/pdf/2502.06750v1.pdf

Codes:
https://github.com/mahmoodlab/trident
https://github.com/mahmoodlab/patho-bench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍1

2.55K views15:51

ML Research Hub

LIMO: Less is More for Reasoning

5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·

We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples. Through comprehensive experiments, our proposed model LIMO demonstrates unprecedented performance in mathematical reasoning. With merely 817 curated training samples, LIMO achieves 57.1% accuracy on AIME and 94.8% on #MATH, improving from previous SFT-based models' 6.5% and 59.2% respectively, while only using 1% of the training data required by previous approaches. LIMO demonstrates exceptional out-of-distribution generalization, achieving 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data, challenging the notion that SFT leads to memorization rather than generalization. Based on these results, we propose the Less-Is-More Reasoning Hypothesis (#LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning capabilities can emerge through minimal but precisely orchestrated demonstrations of cognitive processes. This hypothesis posits that the elicitation threshold for complex reasoning is determined by two key factors: (1) the completeness of the model's encoded knowledge foundation during pre-training, and (2) the effectiveness of post-training examples as "cognitive templates" that show the model how to utilize its knowledge base to solve complex reasoning tasks. To facilitate reproducibility and future research in data-efficient reasoning

Paper: https://arxiv.org/pdf/2502.03387v1.pdf

Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.iss.one/DataScienceT

👍3

2.67K views06:56

ML Research Hub

follow me on X

i will send useful courses and post

https://x.com/EngSheikho

X (formerly Twitter)

Hussein Sheikho (@EngSheikho) on X

Computer Systems Engineer with a Master’s degree in Computer Engineering. Expertise in Python programming, Artificial Intelligence, and Web Development

2.12K views06:58