Forwarded from Python | Machine Learning | Coding | R
100 Important Data Science Interview Questions.pdf
11.7 MB
๐จ๐ปโ๐ป Preparing for a data science interview?
Reviewing fundamental questions is one of the best strategies for success. During the interview, it's crucial to communicate clearly and simplyโespecially when explaining complex models and data.
These 100 carefully selected questions will not only help you impress your interviewer but also boost your confidence throughout the interview process.
#DataScienceInterview #TechCareers #InterviewPreparation
Please open Telegram to view this post
VIEW IN TELEGRAM
๐2
Title of paper:
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
๐4
๐ 2025 Top IT Certification โ Free Study Materials Are Here!
๐ฅWhether you're preparing for #Cisco #AWS #PMP #Python #Excel #Google #Microsoft #AI or any other in-demand certification โ SPOTO has got you covered!
๐ Download the FREE IT Certs Exam E-book:
๐ https://bit.ly/4lNVItV
๐ง Test Your IT Skills for FREE:
๐ https://bit.ly/4imEjW5
โ๏ธ Download Free AI Materials :
๐ https://bit.ly/3F3lc5B
๐ Need 1-on-1 IT Exam Help? Contact Now:
๐ https://wa.link/k0vy3x
๐ Join Our IT Study Group for Daily Updates & Tips:
๐ https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
๐ฅWhether you're preparing for #Cisco #AWS #PMP #Python #Excel #Google #Microsoft #AI or any other in-demand certification โ SPOTO has got you covered!
๐ Download the FREE IT Certs Exam E-book:
๐ https://bit.ly/4lNVItV
๐ง Test Your IT Skills for FREE:
๐ https://bit.ly/4imEjW5
โ๏ธ Download Free AI Materials :
๐ https://bit.ly/3F3lc5B
๐ Need 1-on-1 IT Exam Help? Contact Now:
๐ https://wa.link/k0vy3x
๐ Join Our IT Study Group for Daily Updates & Tips:
๐ https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
โค3
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Paper: https://arxiv.org/pdf/2504.02782v1.pdf
Code: https://github.com/picotrex/gpt-imgeval
Dataset: MagicBrush - GenEval
โก๏ธ BEST DATA SCIENCE CHANNELS ON TELEGRAM ๐
3 Apr 2025 ยท Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li Yuan ยท
The recent breakthroughs in OpenAI's #GPT4o model have demonstrated surprisingly good capabilities in image generation and editing, resulting in significant excitement in the community. This technical report presents the first-look evaluation benchmark (named GPT-ImgEval), quantitatively and qualitatively diagnosing GPT-4o's performance across three critical dimensions: (1) generation quality, (2) editing proficiency, and (3) world knowledge-informed semantic synthesis. Across all three tasks, GPT-4o demonstrates strong performance, significantly surpassing existing methods in both image generation control and output quality, while also showcasing exceptional knowledge reasoning capabilities. Furthermore, based on the GPT-4o's generated data, we propose a classification-model-based approach to investigate the underlying architecture of GPT-4o, where our empirical results suggest the model consists of an auto-regressive (AR) combined with a diffusion-based head for image decoding, rather than the VAR-like architectures. We also provide a complete speculation on GPT-4o's overall architecture. In addition, we conduct a series of analyses to identify and visualize GPT-4o's specific limitations and the synthetic artifacts commonly observed in its image generation. We also present a comparative study of multi-round image editing between GPT-4o and Gemini 2.0 Flash, and discuss the safety implications of GPT-4o's outputs, particularly their detectability by existing image forensic models. We hope that our work can offer valuable insight and provide a reliable benchmark to guide future research, foster reproducibility, and accelerate innovation in the field of image generation and beyond. The codes and datasets used for evaluating GPT-4o can be found at https://github.com/PicoTrex/GPT-ImgEval.
Paper: https://arxiv.org/pdf/2504.02782v1.pdf
Code: https://github.com/picotrex/gpt-imgeval
Dataset: MagicBrush - GenEval
Please open Telegram to view this post
VIEW IN TELEGRAM
๐2
This media is not supported in your browser
VIEW IN TELEGRAM
Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated partsโa rich resource for researchers and developers in the field.
#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation
Please open Telegram to view this post
VIEW IN TELEGRAM
๐5โค1๐ฅ1
Forwarded from Python | Machine Learning | Coding | R
Forget Coding; start Vibing! Tell AI what you want, and watch it build your dream website while you enjoy a cup of coffee.
Date: Thursday, April 17th at 9 PM IST
Register for FREE: https://lu.ma/4nczknky?tk=eAT3Bi
Limited FREE Seat !!!!!!
Date: Thursday, April 17th at 9 PM IST
Register for FREE: https://lu.ma/4nczknky?tk=eAT3Bi
Limited FREE Seat !!!!!!
๐2
This media is not supported in your browser
VIEW IN TELEGRAM
The Oxford VGG unveils Geo4D, a breakthrough in #videodiffusion for monocular 4D reconstruction. Trained only on synthetic data, Geo4D still achieves strong generalization to real-world scenarios. It outputs point maps, depth, and ray maps, setting a new #SOTA in dynamic scene reconstruction. Code is now released!
#Geo4D #4DReconstruction #DynamicScenes #OxfordVGG #ComputerVision #MachineLearning #DiffusionModels
Please open Telegram to view this post
VIEW IN TELEGRAM
โค2๐2
Forwarded from ๏ธCrypto Rates, Prices and news
๐ฅENTER VIP FOR FREE! ENTRY 24 HOURS FREE!
LISA TRADER - most successful trader for 2024. A week ago they finished a marathon in their vip channel where from $100 they made $2000, in just two weeks of time!
Entry to her channel cost :$1500 FOR 24 ENTRY FREE!
JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
LISA TRADER - most successful trader for 2024. A week ago they finished a marathon in their vip channel where from $100 they made $2000, in just two weeks of time!
Entry to her channel cost :
JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ General Attention-Based Object Detection ๐ฅ
๐ GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.
๐ Review: https://t.ly/O7wqH
๐ Paper: https://lnkd.in/dc5VTUj9
๐ Project: https://lnkd.in/dzrt-qQV
#3DObjectDetection #Monocular3D #DeepLearning #WeakSupervision #ComputerVision #AI #MachineLearning #GATE3D
โก๏ธ BEST DATA SCIENCE CHANNELS ON TELEGRAM ๐
๐ GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.
๐ Review: https://t.ly/O7wqH
๐ Paper: https://lnkd.in/dc5VTUj9
๐ Project: https://lnkd.in/dzrt-qQV
#3DObjectDetection #Monocular3D #DeepLearning #WeakSupervision #ComputerVision #AI #MachineLearning #GATE3D
Please open Telegram to view this post
VIEW IN TELEGRAM
๐3โค1
Forwarded from Python | Machine Learning | Coding | R
Access whitepapers, podcasts, code labs, & recorded livestreams. Additionally, there is a bonus assignment for you!
https://www.kaggle.com/learn-guide/5-day-genai
#GenerativeAI #GoogleAI #AICourse #SelfPacedLearning #MachineLearning #DeepLearning #Kaggle #AICommunity #TechEducation #AIforEveryone
Please open Telegram to view this post
VIEW IN TELEGRAM
Kaggle
5-Day Gen AI Intensive Course with Google
Kaggle is the worldโs largest data science community with powerful tools and resources to help you achieve your data science goals.
โคโ๐ฅ2
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Paper: https://arxiv.org/pdf/2504.10483v1.pdf
Code: https://github.com/End2End-Diffusion/REPA-E
Dataset: ImageNet
https://t.iss.one/DataScienceTโ
14 Apr 2025 ยท Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng ยท
In this paper we tackle a fundamental question: "Can we train latent diffusion models together with the variational auto-encoder (VAE) tokenizer in an end-to-end manner?" Traditional deep-learning wisdom dictates that end-to-end training is often preferable when possible. However, for latent diffusion transformers, it is observed that end-to-end training both VAE and diffusion-model using standard diffusion-loss is ineffective, even causing a degradation in final performance. We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process. Despite its simplicity, the proposed training recipe (REPA-E) shows remarkable performance; speeding up diffusion model training by over 17x and 45x over REPA and vanilla training recipes, respectively. Interestingly, we observe that end-to-end tuning with REPA-E also improves the VAE itself; leading to improved latent space structure and downstream generation performance. In terms of final performance, our approach sets a new state-of-the-art; achieving FID of 1.26 and 1.83 with and without classifier-free guidance on ImageNet 256 x 256. Code is available at https://end2end-diffusion.github.io.
Paper: https://arxiv.org/pdf/2504.10483v1.pdf
Code: https://github.com/End2End-Diffusion/REPA-E
Dataset: ImageNet
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐3๐ฅ1๐1
Liquid: Language Models are Scalable Multi-modal Generators
Paper: https://arxiv.org/pdf/2412.04332v2.pdf
Code: https://github.com/foundationvision/liquid
https://t.iss.one/DataScienceT๐
5 Dec 2024 ยท Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai ยท
We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100x in training costs while outperforming Chameleon in multimodal capabilities and maintaining language performance comparable to mainstream LLMs like LLAMA2. Liquid also outperforms models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. This work demonstrates that LLMs such as LLAMA3.2 and GEMMA2 are powerful multimodal generators, offering a scalable solution for enhancing both vision-language understanding and generation. The code and models will be released at https://github.com/FoundationVision/Liquid.
Paper: https://arxiv.org/pdf/2412.04332v2.pdf
Code: https://github.com/foundationvision/liquid
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐2
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces Describe Anything Model (DAM)
a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.
Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD
a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.
Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD
#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch
https://t.iss.one/DataScienceTโ
Please open Telegram to view this post
VIEW IN TELEGRAM
๐5
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0๏ธโฃ Python
1๏ธโฃ Data Science
2๏ธโฃ Machine Learning
3๏ธโฃ Data Visualization
4๏ธโฃ Artificial Intelligence
5๏ธโฃ Data Analysis
6๏ธโฃ Statistics
7๏ธโฃ Deep Learning
8๏ธโฃ programming Languages
โ
https://t.iss.one/addlist/8_rRW2scgfRhOTc0
โ
https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
๐2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ผ SOTA Textured 3D-Guided VTON ๐ผ
๐ #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released ๐
๐ Review: https://t.ly/0tjdC
๐ Paper: https://lnkd.in/dFseYSXz
๐ Project: https://lnkd.in/djtqzrzs
๐ Repo: TBA
#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis
https://t.iss.one/DataScienceT๐
๐ #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released ๐
๐ Review: https://t.ly/0tjdC
๐ Paper: https://lnkd.in/dFseYSXz
๐ Project: https://lnkd.in/djtqzrzs
๐ Repo: TBA
#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
โค2๐2
Forwarded from ENG. Hussein Sheikho
ูุฑุตุฉ ุนู
ู ุนู ุจุนุฏ ๐งโ๐ป
ูุง ูุชุทูุจ ุงู ู ุคูู ุงู ุฎุจุฑู ุงูุดุฑูู ุชูุฏู ุชุฏุฑูุจ ูุงู ูโจ
ุณุงุนุงุช ุงูุนู ู ู ุฑููโฐ
ูุชู ุงูุชุณุฌูู ุซู ุงูุชูุงุตู ู ุนู ูุญุถูุฑ ููุงุก ุชุนุฑููู ุจุงูุนู ู ูุงูุดุฑูู
https://forms.gle/hqUZXu7u4uLjEDPv8
ูุง ูุชุทูุจ ุงู ู ุคูู ุงู ุฎุจุฑู ุงูุดุฑูู ุชูุฏู ุชุฏุฑูุจ ูุงู ู
ุณุงุนุงุช ุงูุนู ู ู ุฑูู
ูุชู ุงูุชุณุฌูู ุซู ุงูุชูุงุตู ู ุนู ูุญุถูุฑ ููุงุก ุชุนุฑููู ุจุงูุนู ู ูุงูุดุฑูู
https://forms.gle/hqUZXu7u4uLjEDPv8
Please open Telegram to view this post
VIEW IN TELEGRAM
Google Docs
ูุฑุตุฉ ุนู
ู
ุงูุนู
ู ู
ู ุงูู
ูุฒู ูู ุจุจุณุงุทุฉ ุญู ูู
ุดููุฉ ุงูุจุทุงูุฉ ููุดุจุงุจ ุงูุนุฑุจู ูููู ุงูุจุดุฑ ุญูู ุงูุนุงูู
ุ๐ ุงูู ุทุฑููู ูููุตูู ุงูู ุงูุญุฑูุฉ ุงูู
ุงููุฉ ูุจุนูุฏุงู ุนู ุดุบู ุงููุธููุฉ ุงูุญููู
ูุฉ ุงูู
ู
ูุฉ ูุงูู
ุฑุชุจุงุช ุงูุถุนููุฉ..
ุฃุตุจุญ ุงูุฑุจุญ ู ู ุงูุงูุชุฑูุช ุฃู ุฑ ุญูููู ูููุณ ููู ..๐ค
ููุฏู ูู ูุฑุตุฉ ุงูุขู ู ู ุบูุฑ ุฃู ุดูุงุฏุงุชโฆ
ุฃุตุจุญ ุงูุฑุจุญ ู ู ุงูุงูุชุฑูุช ุฃู ุฑ ุญูููู ูููุณ ููู ..๐ค
ููุฏู ูู ูุฑุตุฉ ุงูุขู ู ู ุบูุฑ ุฃู ุดูุงุฏุงุชโฆ
Forwarded from Python Courses
Please open Telegram to view this post
VIEW IN TELEGRAM
โค1๐1
Forwarded from Python | Machine Learning | Coding | R
Dive deep into the world of Transformers with this comprehensive PyTorch implementation guide. Whether you're a seasoned ML engineer or just starting out, this resource breaks down the complexities of the Transformer model, inspired by the groundbreaking paper "Attention Is All You Need".
https://www.k-a.in/pyt-transformer.html
This guide offers:
By following along, you'll gain a solid understanding of how Transformers work and how to implement them from scratch.
#MachineLearning #DeepLearning #PyTorch #Transformer #AI #NLP #AttentionIsAllYouNeed #Coding #DataScience #NeuralNetworks๏ปฟ
Please open Telegram to view this post
VIEW IN TELEGRAM
๐1
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper: https://arxiv.org/pdf/2502.05512v1.pdf
Code: https://github.com/index-tts/index-tts
https://t.iss.one/DataScienceTโ
8 Feb 2025 ยท Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang ยท
Recently, large language model (#LLM) based text-to-speech (#TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with #XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.
Paper: https://arxiv.org/pdf/2502.05512v1.pdf
Code: https://github.com/index-tts/index-tts
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐1