Data Science | Machine Learning with Python for Researchers
31.8K subscribers
2.08K photos
102 videos
22 files
2.36K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ #YOLOv12 is out – new SOTA! ⚑️

πŸ‘‰ YOLOv12 is a novel attention-centric YOLO framework that matches the speed of previous CNN-based versions while harnessing the performance benefits of attention mechanisms.

πŸ’™ Source Code & Demo released:
▢️ Review: https://t.ly/jj1oR
▢️ Paper: arXiv
πŸ‘‰ Repo: GitHub
πŸ€— Demo: https://t.ly/w5rno

#AI #DeepLearning #ComputerVision #YOLO #AttentionMechanism #OpenSource
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘5❀1
Title of paper:
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
πŸ‘4
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ„ 4D Mocap Human-Object πŸ„

Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated partsβ€”a rich resource for researchers and developers in the field.


⚑️ Review: https://t.ly/lCof3
⚑️ Paper: https://lnkd.in/dVVBDd_c
⚑️ Project: https://lnkd.in/dwBcseDf

#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation

⚑️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘5❀1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯ Geo4D: VideoGen 4D Scene πŸ’₯

The Oxford VGG unveils Geo4D, a breakthrough in #videodiffusion for monocular 4D reconstruction. Trained only on synthetic data, Geo4D still achieves strong generalization to real-world scenarios. It outputs point maps, depth, and ray maps, setting a new #SOTA in dynamic scene reconstruction. Code is now released!


⚑️ Review: https://t.ly/X55Uj
⚑️ Paper: https://arxiv.org/pdf/2504.07961
⚑️ Project: https://geo4d.github.io/
⚑️ Code: https://github.com/jzr99/Geo4D

#Geo4D #4DReconstruction #DynamicScenes #OxfordVGG #ComputerVision #MachineLearning #DiffusionModels

⚑️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
❀2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ General Attention-Based Object Detection πŸ”₯

πŸ‘‰ GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.

πŸ‘‰ Review: https://t.ly/O7wqH
πŸ‘‰ Paper: https://lnkd.in/dc5VTUj9
πŸ‘‰ Project: https://lnkd.in/dzrt-qQV

#3DObjectDetection #Monocular3D #DeepLearning #WeakSupervision #ComputerVision #AI #MachineLearning #GATE3D

⚑️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘3❀1
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces Describe Anything Model (DAM)

a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.

Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD

#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch

https://t.iss.one/DataScienceT βœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘5
This media is not supported in your browser
VIEW IN TELEGRAM
🌼 SOTA Textured 3D-Guided VTON 🌼

πŸ‘‰ #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released πŸ’™

πŸ‘‰ Review: https://t.ly/0tjdC
πŸ‘‰ Paper: https://lnkd.in/dFseYSXz
πŸ‘‰ Project: https://lnkd.in/djtqzrzs
πŸ‘‰ Repo: TBA

#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis

https://t.iss.one/DataScienceT πŸ”—
Please open Telegram to view this post
VIEW IN TELEGRAM
❀2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
🩷 Dance meets #ComputerVision 🩷

Saint-Γ‰tienne University has introduced a new 3D human body pose estimation pipeline designed specifically for dance analysis.
Check out the project page featuring results and an interactive demo! πŸ’™

πŸ‘‰ Paper review: https://t.ly/JEdM3

πŸ‘‰ Full paper: https://arxiv.org/pdf/2505.07249

πŸ‘‰ Project page: https://lnkd.in/dD5dsMv5

#DanceAnalysis #3DPoseEstimation #DeepLearning #HumanPose #AI #MachineLearning #ComputerVisionResearch


πŸ”— Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

πŸ“± Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’ƒ GENMO: Generalist Human Motion by NVIDIA πŸ’ƒ

NVIDIA introduces GENMO, a unified generalist model for human motion that seamlessly combines motion estimation and generation within a single framework. GENMO supports conditioning on videos, 2D keypoints, text, music, and 3D keyframes, enabling highly versatile motion understanding and synthesis.

Currently, no official code release is available.

Review:
https://t.ly/Q5T_Y

Paper:
https://lnkd.in/ds36BY49

Project Page:
https://lnkd.in/dAYHhuFU

#NVIDIA #GENMO #HumanMotion #DeepLearning #AI #ComputerVision #MotionGeneration #MachineLearning #MultimodalAI #3DReconstruction


βœ‰οΈ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

πŸ“± Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘4❀3