Data Science | Machine Learning with Python for Researchers

🔹 Title:
Unified Vision-Language-Action Model

🔹 Publication Date: Published on Jun 24

🔹 Abstract:
UniVLA is a multimodal VLA model that autoregressively processes vision, language, and action as token sequences, incorporating world modeling for effective long-horizon policy learning and achieving state-of-the-art results across simulation and real-world benchmarks. AI-generated summary Vision-language-action models ( VLAs ) have garnered significant attention for their potential in advancing robotic manipulation. However, previous approaches predominantly rely on the general comprehension capabilities of vision-language models ( VLMs ) to generate action signals, often overlooking the rich temporal and causal structure embedded in visual observations. In this paper, we present UniVLA, a unified and native multimodal VLA model that autoregressively models vision, language, and action signals as discrete token sequences . This formulation enables flexible multimodal tasks learning , particularly from large-scale video data. By incorporating world modeling during post-training, UniVLA captures causal dynamics from videos, facilitating effective transfer to downstream policy learning --especially for long-horizon tasks. Our approach sets new state-of-the-art results across several widely used simulation benchmarks, including CALVIN , LIBERO , and Simplenv-Bridge , significantly surpassing previous methods. For example, UniVLA achieves 95.5% average success rate on LIBERO benchmark, surpassing pi0-FAST's 85.5%. We further demonstrate its broad applicability on real-world ALOHA manipulation and autonomous driving.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.19850
• PDF: https://arxiv.org/pdf/2506.19850

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

952 views16:26

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content

🔹 Publication Date: Published on Jun 25

🔹 Abstract:
A biomedical text dataset, constructed from PubMed, uses a two-stage annotation process involving large and small language models to fine-tune and extract subsets for clinical NLP, improving pretraining efficiency and performance. AI-generated summary We introduce Biomed-Enriched , a biomedical text dataset constructed from PubMed via a two-stage annotation process. In the first stage, a large language model annotates 400K paragraphs from PubMed scientific articles, assigning scores for their type (review, study, clinical case, other), domain (clinical, biomedical, other), and educational quality . The educational quality score (rated 1 to 5) estimates how useful a paragraph is for college-level learning. These annotations are then used to fine-tune a small language model , which propagates the labels across the full PMC-OA corpus . The resulting metadata allows us to extract refined subsets, including 2M clinical case paragraphs with over 450K high-quality ones from articles with commercial-use licenses, and to construct several variants via quality filtering and domain upsampling. Clinical text is typically difficult to access due to privacy constraints, as hospital records cannot be publicly shared. Hence, our dataset provides an alternative large-scale, openly available collection of clinical cases from PubMed , making it a valuable resource for biomedical and clinical NLP. Preliminary continual-pretraining experiments with OLMo2 suggest these curated subsets enable targeted improvements, with clinical upsampling boosting performance by ~5% on MMLU ProfMed and educational quality filtering improving MedQA and MedMCQA by ~1%. Combinations of these techniques led to faster convergence, reaching same performance with a third of training tokens , indicating potential for more efficient and effective biomedical pretraining strategies.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20331
• PDF: https://arxiv.org/pdf/2506.20331

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/almanach/Biomed-Enriched

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

954 views20:24

Data Science | Machine Learning with Python for Researchers

Article Title:
PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices

Article Date: 30 May 2024

Article Description:
Serverless query processing has become increasingly popular due to its advantages, including automated resource management, high elasticity, and pay-as-you-go pricing. For users who are not system experts, serverless query processing greatly reduces the cost of owning a data analytic system. However, it is still a significant challenge for non-expert users to transform their complex and evolving data analytic needs into proper SQL queries and select a serverless query service that delivers satisfactory performance and price for each type of query. This paper presents PixelsDB, an open-source data analytic system that allows users who lack system or SQL expertise to explore data efficiently. It allows users to generate and debug SQL queries using a natural language interface powered by fine-tuned language models. The queries are then executed by a serverless query engine that offers varying prices for different performance service levels (SLAs). The performance SLAs are natively supported by dedicated architecture design and heterogeneous resource scheduling that can apply cost-efficient resources to process non-urgent queries. We demonstrate that the combination of a serverless paradigm, a natural-language-aided interface, and flexible SLAs and prices will substantially improve the usability of cloud data analytic systems.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2405.19784v2.pdf

GitHub:
• https://github.com/pixelsdb/pixels

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

950 views20:48

Data Science | Machine Learning with Python for Researchers

Article Title:
Optuna: A Next-generation Hyperparameter Optimization Framework

Article Date: 25 Jul 2019

Article Description:
The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/).PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/1907.10902v1.pdf

GitHub:
• https://github.com/pfnet/optuna
• https://github.com/optuna/optuna
• https://github.com/Automunge/AutoMunge
• https://github.com/optuna/optuna-integration
• https://github.com/crcrpar/benchmark-runner-ci
• https://github.com/crcrpar/optuna-mirror
• https://github.com/himkt/optuna-test-rtds
• https://github.com/crcrpar/ci-example-execution
• https://github.com/yqian4/optuna
• https://github.com/brethvoice/optuna_demo_MNIST
• https://github.com/rickyHong/optuna-repl

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

GitHub

GitHub - optuna/optuna: A hyperparameter optimization framework

A hyperparameter optimization framework. Contribute to optuna/optuna development by creating an account on GitHub.

❤2

848 views05:00

Data Science | Machine Learning with Python for Researchers

🔹 Title:
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

🔹 Publication Date: Published on Jun 25

🔹 Abstract:
Investigating mid-training strategies reveals that high-quality mathematical corpora and well-formatted chain-of-thought reasoning examples enhance reinforcement learning performance in language models, leading to the development of OctoThinker. AI-generated summary Different base language model families, such as Llama and Qwen, exhibit divergent behaviors during post-training with reinforcement learning (RL), especially on reasoning-intensive tasks. What makes a base language model suitable for reinforcement learning ? Gaining deeper insight into this question is essential for developing RL-scalable foundation models of the next generation. In this work, we investigate how mid-training strategies shape RL dynamics, focusing on two representative model families: Qwen and Llama. Our study reveals that (1) high-quality mathematical corpora, such as MegaMath-Web-Pro , significantly improve both base model and RL performance, while existing alternatives (e.g., FineMath-4plus) fail to do so; (2) further adding QA-style data , particularly long chain-of-thought (CoT) reasoning examples, enhances RL outcomes, and instruction data further unlocks this effect; (3) while long-CoT improves reasoning depth, it can also induce verbosity of model responses and unstability of RL training, underscoring the importance of data formatting ; (4) scaling mid-training consistently leads to stronger downstream RL performance. Building on these insights, we introduce a two-stage mid-training strategy , Stable-then-Decay, in which base models are first trained on 200B tokens with a constant learning rate, followed by 20B tokens across three CoT-focused branches with learning rate decay . This yields OctoThinker , a family of models demonstrating strong RL compatibility and closing the performance gap with more RL-friendly model families, i.e., Qwen. We hope our work will help shape pre-training strategies for foundation models in the RL era. To support further research, we release our open-source models along with a curated math reasoning-intensive corpus of over 70 billion tokens (i.e., MegaMath-Web-Pro-Max ).

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20512
• PDF: https://arxiv.org/pdf/2506.20512
• Github: https://github.com/GAIR-NLP/OctoThinker

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/OctoThinker/MegaMath-Web-Pro-Max

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤3

1.13K views06:22

Data Science | Machine Learning with Python for Researchers

Article Title:
pySLAM: An Open-Source, Modular, and Extensible Framework for SLAM

Article Date: 17 Feb 2025

Article Description:
pySLAM is an open-source Python framework for Visual SLAM, supporting monocular, stereo, and RGB-D cameras. It provides a flexible interface for integrating both classical and modern local features, making it adaptable to various SLAM tasks. The framework includes different loop closure methods, a volumetric reconstruction pipeline, and support for depth prediction models. Additionally, it offers a suite of tools for visual odometry and SLAM applications. Designed for both beginners and experienced researchers, pySLAM encourages community contributions, fostering collaborative development in the field of Visual SLAM.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2502.11955v2.pdf

GitHub:
• https://github.com/luigifreda/pyslam

Datasets:
• KITTI
• Replica
• TUM RGB-D
• EuRoC MAV
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

1.17K views09:34

Data Science | Machine Learning with Python for Researchers

🔹 Title:
PlayerOne: Egocentric World Simulator

🔹 Publication Date: Published on Jun 11

🔹 Abstract:
PlayerOne is an egocentric realistic world simulator that constructs and generates videos from user-captured images, using a coarse-to-fine training pipeline and advanced motion injection and reconstruction frameworks. AI-generated summary We introduce PlayerOne, the first egocentric realistic world simulator , facilitating immersive and unrestricted exploration within vividly dynamic environments. Given an egocentric scene image from the user, PlayerOne can accurately construct the corresponding world and generate egocentric videos that are strictly aligned with the real scene human motion of the user captured by an exocentric camera. PlayerOne is trained in a coarse-to-fine pipeline that first performs pretraining on large-scale egocentric text-video pairs for coarse-level egocentric understanding, followed by finetuning on synchronous motion-video data extracted from egocentric-exocentric video datasets with our automatic construction pipeline . Besides, considering the varying importance of different components, we design a part-disentangled motion injection scheme, enabling precise control of part-level movements. In addition, we devise a joint reconstruction framework that progressively models both the 4D scene and video frames , ensuring scene consistency in the long-form video generation . Experimental results demonstrate its great generalization ability in precise control of varying human movements and worldconsistent modeling of diverse scenarios. It marks the first endeavor into egocentric real-world simulation and can pave the way for the community to delve into fresh frontiers of world modeling and its diverse applications.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09995
• PDF: https://arxiv.org/pdf/2506.09995
• Project Page: https://playerone-hku.github.io/
• Github: https://playerone-hku.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

PlayerOne: Egocentric World Simulator

We introduce PlayerOne, the first egocentric realistic world simulator, facilitating immersive and unrestricted exploration within vividly dynamic environments. Given an egocentric scene image...

❤4

1.17K views06:04

Data Science | Machine Learning with Python for Researchers

🔹 Title:
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

🔹 Publication Date: Published on Mar 20

🔹 Abstract:
A method using autoencoder and Gumbel-Softmax selection identifies and retains only the most informative visual tokens, enabling efficient multimodal pruning with minimal performance loss. AI-generated summary Vision encoders typically generate a large number of visual tokens , providing information-rich representations but significantly increasing computational demands. This raises the question of whether all generated tokens are equally valuable or if some of them can be discarded to reduce computational costs without compromising quality. In this paper, we introduce a new method for determining feature utility based on the idea that less valuable features can be reconstructed from more valuable ones. We implement this concept by integrating an autoencoder with a Gumbel-Softmax selection mechanism , that allows identifying and retaining only the most informative visual tokens . To validate our approach, we compared the performance of the LLaVA-NeXT model, using features selected by our method with randomly selected features. We found that on OCR-based tasks , more than 50% of the visual context can be removed with minimal performance loss, whereas randomly discarding the same proportion of features significantly affects the model capabilities. Furthermore, in general-domain tasks , even randomly retaining only 30% of tokens achieves performance comparable to using the full set of visual tokens . Our results highlight a promising direction towards adaptive and efficient multimodal pruning that facilitates scalable and low-overhead inference without compromising performance.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.16660
• PDF: https://arxiv.org/pdf/2503.16660

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

When Less is Enough: Adaptive Token Reduction for Efficient Image...

Vision encoders typically generate a large number of visual tokens, providing information-rich representations but significantly increasing computational demands. This raises the question of...

1.18K views13:37

Data Science | Machine Learning with Python for Researchers

🔹 Title:
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

🔹 Publication Date: Published on Jun 23

🔹 Abstract:
FilMaster is an AI system that integrates cinematic principles to generate professional-grade films, featuring camera language design and cinematic rhythm control using real-world film data and generative AI models. AI-generated summary AI-driven content creation has shown potential in film production. However, existing film generation systems struggle to implement cinematic principles and thus fail to generate professional-quality films, particularly lacking diverse camera language and cinematic rhythm. This results in templated visuals and unengaging narratives. To address this, we introduce FilMaster, an end-to-end AI system that integrates real-world cinematic principles for professional-grade film generation, yielding editable, industry-standard outputs. FilMaster is built on two key principles: (1) learning cinematography from extensive real-world film data and (2) emulating professional, audience-centric post-production workflows. Inspired by these principles, FilMaster incorporates two stages: a Reference-Guided Generation Stage which transforms user input to video clips, and a Generative Post-Production Stage which transforms raw footage into audiovisual outputs by orchestrating visual and auditory elements for cinematic rhythm. Our generation stage highlights a Multi-shot Synergized RAG Camera Language Design module to guide the AI in generating professional camera language by retrieving reference clips from a vast corpus of 440,000 film clips. Our post-production stage emulates professional workflows by designing an Audience-Centric Cinematic Rhythm Control module, including Rough Cut and Fine Cut processes informed by simulated audience feedback, for effective integration of audiovisual elements to achieve engaging content. The system is empowered by generative AI models like (M)LLMs and video generation models . Furthermore, we introduce FilmEval , a comprehensive benchmark for evaluating AI-generated films. Extensive experiments show FilMaster's superior performance in camera language design and cinematic rhythm control, advancing generative AI in professional filmmaking.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.18899
• PDF: https://arxiv.org/pdf/2506.18899
• Github: https://filmaster-ai.github.io

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

FilMaster: Bridging Cinematic Principles and Generative AI for...

AI-driven content creation has shown potential in film production. However, existing film generation systems struggle to implement cinematic principles and thus fail to generate...

❤3

1.16K views06:23

Data Science | Machine Learning with Python for Researchers

Article Title:
Hyperbolic Dataset Distillation

Article Date: 30 May 2025

Article Description:
To address the computational and storage challenges posed by large-scale datasets in deep learning, dataset distillation has been proposed to synthesize a compact dataset that replaces the original while maintaining comparable model performance. Unlike optimization-based approaches that require costly bi-level optimization, distribution matching (DM) methods improve efficiency by aligning the distributions of synthetic and original data, thereby eliminating nested optimization. DM achieves high computational efficiency and has emerged as a promising solution. However, existing DM methods, constrained to Euclidean space, treat data as independent and identically distributed points, overlooking complex geometric and hierarchical relationships. To overcome this limitation, we propose a novel hyperbolic dataset distillation method, termed HDD. Hyperbolic space, characterized by negative curvature and exponential volume growth with distance, naturally models hierarchical and tree-like structures. HDD embeds features extracted by a shallow network into the Lorentz hyperbolic space, where the discrepancy between synthetic and original data is measured by the hyperbolic (geodesic) distance between their centroids. By optimizing this distance, the hierarchical structure is explicitly integrated into the distillation process, guiding synthetic samples to gravitate towards the root-centric regions of the original data distribution while preserving their underlying geometric characteristics. Furthermore, we find that pruning in hyperbolic space requires only 20% of the distilled core set to retain model performance, while significantly improving training stability. Notably, HDD is seamlessly compatible with most existing DM methods, and extensive experiments on different datasets validate its effectiveness.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.24623v1.pdf

GitHub:
• https://github.com/Guang000/Awesome-Dataset-Distillation

Datasets:
• CIFAR-100
• Fashion-MNIST
• Tiny ImageNet
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

1.06K views09:35

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Align Your Flow: Scaling Continuous-Time Flow Map Distillation

🔹 Publication Date: Published on Jun 17

🔹 Abstract:
Flow maps, introduced with new continuous-time objectives and training techniques, achieve state-of-the-art performance in few-step image and text-to-image generation. AI-generated summary Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps , along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow , on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.14603
• PDF: https://arxiv.org/pdf/2506.14603
• Project Page: https://research.nvidia.com/labs/toronto-ai/AlignYourFlow/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

1.15K views13:02

Data Science | Machine Learning with Python for Researchers

Article Title:
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis

Article Date: 9 Oct 2023

Article Description:
Multivariate Time Series (MTS) analysis is crucial to understanding and managing complex systems, such as traffic and energy systems, and a variety of approaches to MTS forecasting have been proposed recently. However, we often observe inconsistent or seemingly contradictory performance findings across different studies. This hinders our understanding of the merits of different approaches and slows down progress. We address the need for means of assessing MTS forecasting proposals reliably and fairly, in turn enabling better exploitation of MTS as seen in different applications. Specifically, we first propose BasicTS+, a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions. BasicTS+ establishes a unified training pipeline and reasonable settings, enabling an unbiased evaluation. Second, we identify the heterogeneity across different MTS as an important consideration and enable classification of MTS based on their temporal and spatial characteristics. Disregarding this heterogeneity is a prime reason for difficulties in selecting the most promising technical directions. Third, we apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions. This provides readers with an overall picture of the cutting-edge research on MTS forecasting. The code can be accessed at https://github.com/GestaltCogTeam/BasicTS.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2310.06119v2.pdf

GitHub:
• https://github.com/zezhishao/basicts
• https://github.com/gestaltcogteam/basicts
• https://github.com/zezhishao/step
• https://github.com/zezhishao/d2stgnn
• https://github.com/hitplz/dstrformer

Datasets:
• ETT
• Exchange
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

1.22K views20:17

Data Science | Machine Learning with Python for Researchers

Article Title:
Pseudo-Simulation for Autonomous Driving

Article Date: 4 Jun 2025

Article Description:
Existing evaluation paradigms for Autonomous Vehicles (AVs) face critical limitations. Real-world evaluation is often challenging due to safety concerns and a lack of reproducibility, whereas closed-loop simulation can face insufficient realism or high computational costs. Open-loop evaluation, while being efficient and data-driven, relies on metrics that generally overlook compounding errors. In this paper, we propose pseudo-simulation, a novel paradigm that addresses these limitations. Pseudo-simulation operates on real datasets, similar to open-loop evaluation, but augments them with synthetic observations generated prior to evaluation using 3D Gaussian Splatting. Our key idea is to approximate potential future states the AV might encounter by generating a diverse set of observations that vary in position, heading, and speed. Our method then assigns a higher importance to synthetic observations that best match the AV's likely behavior using a novel proximity-based weighting scheme. This enables evaluating error recovery and the mitigation of causal confusion, as in closed-loop benchmarks, without requiring sequential interactive simulation. We show that pseudo-simulation is better correlated with closed-loop simulations (R^2=0.8) than the best existing open-loop approach (R^2=0.7). We also establish a public leaderboard for the community to benchmark new methodologies with pseudo-simulation. Our code is available at https://github.com/autonomousvision/navsim.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.04218v1.pdf

GitHub:
• https://github.com/autonomousvision/navsim

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

1.15K views07:04

Data Science | Machine Learning with Python for Researchers

🔹 Title:
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution

🔹 Publication Date: Published on Jun 24

🔹 Abstract:
Researchers propose design principles for cascaded video super-resolution models to improve high-resolution video generation by introduces degradation strategies, timestep sampling, noise augmentation, and interleaving temporal units with sparse local attention. AI-generated summary Latent diffusion models have emerged as a leading paradigm for efficient video generation . However, as user expectations shift toward higher-resolution outputs, relying solely on latent computation becomes inadequate. A promising approach involves decoupling the process into two stages: semantic content generation and detail synthesis. The former employs a computationally intensive base model at lower resolutions, while the latter leverages a lightweight cascaded video super-resolution ( VSR ) model to achieve high-resolution output. In this work, we focus on studying key design principles for latter cascaded VSR models, which are underexplored currently. First, we propose two degradation strategies to generate training pairs that better mimic the output characteristics of the base model, ensuring alignment between the VSR model and its upstream generator. Second, we provide critical insights into VSR model behavior through systematic analysis of (1) timestep sampling strategies, (2) noise augmentation effects on low-resolution (LR) inputs. These findings directly inform our architectural and training innovations. Finally, we introduce interleaving temporal unit and sparse local attention to achieve efficient training and inference, drastically reducing computational overhead. Extensive experiments demonstrate the superiority of our framework over existing methods, with ablation studies confirming the efficacy of each design choice. Our work establishes a simple yet effective baseline for cascaded video super-resolution generation, offering practical insights to guide future advancements in efficient cascaded synthesis systems.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.19838
• PDF: https://arxiv.org/pdf/2506.19838

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

1.19K views16:26

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Learning to Skip the Middle Layers of Transformers

🔹 Publication Date: Published on Jun 26

🔹 Abstract:
A novel conditional computation architecture for Transformers dynamically skips middle layers based on input and a gating mechanism, but does not outperform dense baselines in reducing computational cost or improving validation performance. AI-generated summary Conditional computation is a popular strategy to make Transformers more efficient. Existing methods often target individual modules (e.g., mixture-of-experts layers ) or skip layers independently of one another. However, interpretability research has demonstrated that the middle layers of Transformers exhibit greater redundancy, and that early layers aggregate information into token positions . Guided by these insights, we propose a novel architecture that dynamically skips a variable number of layers from the middle outward. In particular, a learned gating mechanism determines whether to bypass a symmetric span of central blocks based on the input, and a gated attention mechanism prevents subsequent tokens from attending to skipped token positions . Residual norms are controlled with a 'sandwich' or ' perilayernorm ' scheme and gate sparsity with an adaptive regularization loss . We had aimed to reduce compute requirements for 'simpler' tokens and potentially foster an emergent multi-level representational hierarchy but, at the scales investigated, our approach does not achieve improvements in the trade-off between validation cross-entropy and estimated FLOPs compared to dense baselines with fewer layers. We release our code at https://github.com/tim-lawson/skip-middle.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21103
• PDF: https://arxiv.org/pdf/2506.21103
• Github: https://github.com/tim-lawson/skip-middle

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

Learning to Skip the Middle Layers of Transformers

Conditional computation is a popular strategy to make Transformers more efficient. Existing methods often target individual modules (e.g., mixture-of-experts layers) or skip layers independently...

❤2

1.02K views05:44

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Radii, masses, and transit-timing variations of the three-planet system orbiting the naked-eye star TOI-396

🔹 Publication Date: Published on Nov 22, 2024

🔹 Abstract:
Observations of TOI-396 reveal three similar-sized planets with the outermost being the densest, and indicate that the inner two planets are close to but not in a 5:3 MMR, with significant TTVs detected. AI-generated summary TOI-396 is an F6V star (Vapprox6.4) orbited by three transiting planets. The orbital periods of the two innermost planets are close to the 5:3 commensurability (P_b sim3.6 d and P_c sim6.0 d). To measure the masses of the three planets, refine their radii, and investigate whether planets b and c are in MMR , we carried out HARPS RV obse rv ations and retrieved photometric data from TESS. We extracted the RV s via a skew-normal fit onto the HARPS CCFs and performed an MCMC joint analysis of the Doppler measurements and transit photometry, while employing the breakpoint method to remove stellar activity from the RV time series. We also performed a thorough TTV dynamical analysis of the system. Our analysis confirms that the three planets have similar sizes: R_b=2.004_{-0.047}^{+0.045}R_{oplus}; R_c=1.979_{-0.051}^{+0.054}R_{oplus}; R_d=2.001_{-0.064}^{+0.063}R_{oplus}. For the first time, we have determined the RV masses for TOI-396b and d: M_b=3.55_{-0.96}^{+0.94}M_{oplus} (rho_b=2.44_{-0.68}^{+0.69} g cm^{-3}) and M_d=7.1pm1.6M_{oplus} (rho_d=4.9_{-1.1}^{+1.2} g cm^{-3}). Our results suggest a quite unusual system architecture, with the outermost planet being the densest. The Doppler reflex motion induced by TOI-396c remains undetected in our RV time series, likely due to the proximity of P_c to the star's rotation period (P_{rot}=6.7pm1.3 d). We also discovered that TOI-396b and c display significant TTV s. While the TTV dynamical analysis returns a formally precise mass for TOI-396c (M_{c,dyn}=2.24^{+0.13}_{-0.67}M_{oplus}), the result might not be accurate owing to the poor sampling of the TTV phase. We also conclude that TOI-396b and c are close to but out of the 5:3 MMR . Our numerical simulation suggests TTV semi-amplitudes of up to 5 hours over a temporal baseline of sim5.2 years.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2411.14911
• PDF: https://arxiv.org/pdf/2411.14911

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

Radii, masses, and transit-timing variations of the three-planet...

TOI-396 is an F6V star ($V\approx6.4$) orbited by three transiting planets. The orbital periods of the two innermost planets are close to the 5:3 commensurability ($P_b \sim3.6$ d and $P_c...

❤4

1.24K views06:12

Data Science | Machine Learning with Python for Researchers

🔹 Title:
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

🔹 Publication Date: Published on Jun 24

🔹 Abstract:
AnimaX creates multi-skeleton 3D animations by blending video diffusion model priors with skeleton-based control, using joint video-pose diffusion and shared positional encodings. AI-generated summary We present AnimaX, a feed-forward 3D animation framework that bridges the motion priors of video diffusion models with the controllable structure of skeleton-based animation . Traditional motion synthesis methods are either restricted to fixed skeletal topologies or require costly optimization in high-dimensional deformation spaces . In contrast, AnimaX effectively transfers video-based motion knowledge to the 3D domain, supporting diverse articulated meshes with arbitrary skeletons. Our method represents 3D motion as multi-view, multi-frame 2D pose maps , and enables joint video-pose diffusion conditioned on template renderings and a textual motion prompt . We introduce shared positional encodings and modality-aware embeddings to ensure spatial-temporal alignment between video and pose sequences, effectively transferring video priors to motion generation task. The resulting multi-view pose sequences are triangulated into 3D joint positions and converted into mesh animation via inverse kinematics . Trained on a newly curated dataset of 160,000 rigged sequences, AnimaX achieves state-of-the-art results on VBench in generalization, motion fidelity, and efficiency, offering a scalable solution for category-agnostic 3D animation . Project page: https://anima-x.github.io/{https://anima-x.github.io/}.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.19851
• PDF: https://arxiv.org/pdf/2506.19851
• Project Page: https://anima-x.github.io/
• Github: https://github.com/anima-x/anima-x

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

1.2K views10:45

Data Science | Machine Learning with Python for Researchers

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1

10:59

Data Science | Machine Learning with Python for Researchers

Article Title:
MNN: A Universal and Efficient Inference Engine

Article Date: 27 Feb 2020

Article Description:
Deploying deep learning models on mobile devices draws more and more attention recently. However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation. To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications. In this paper, the contributions of MNN include: (1) presenting a mechanism called pre-inference that manages to conduct runtime optimization; (2)deliveringthorough kernel optimization on operators to achieve optimal computation performance; (3) introducing backend abstraction module which enables hybrid scheduling and keeps the engine lightweight. Extensive benchmark experiments demonstrate that MNN performs favorably against other popular lightweight deep learning frameworks. MNN is available to public at: https://github.com/alibaba/MNN.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2002.12418v1.pdf

GitHub:
• https://github.com/alibaba/MNN

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

1.08K views12:58

Data Science | Machine Learning with Python for Researchers

🔥 The coolest AI bot on Telegram

💢 Completely free and knows everything, from simple questions to complex problems.

☕️ Helps you with anything in the easiest and fastest way possible.

♨️ You can even choose girlfriend or boyfriend mode and chat as if you’re talking to a real person 😋

💵 Includes weekly and monthly airdrops!❗️

😵‍💫 Bot ID: @chatgpt_officialbot

💎 The best part is, even group admins can use it right inside their groups! ✨

📺 Try now:

• Type FunFact! for a jaw-dropping AI trivia.
• Type RecipePlease! for a quick, tasty meal idea.
• Type JokeTime! for an instant laugh.

Or just say Surprise me! and I'll pick something awesome for you. 🤖✨

1.13K views16:42

About

Blog

Apps

Platform