This media is not supported in your browser
VIEW IN TELEGRAM
Crystal Generation with Space Group Informed Transformer
π₯ Github: https://github.com/deepmodeling/crystalformer
π Paper: https://arxiv.org/abs/2504.02367v1
π Dataset: https://paperswithcode.com/dataset/alex-20
π Dataset: https://paperswithcode.com/dataset/alex-20
Please open Telegram to view this post
VIEW IN TELEGRAM
π5
4 advanced attention mechanisms you should know:
β’ Slim attention β 8Γ less memory, 5Γ faster generation by storing only K from KV pairs and recomputing V.
β’ XAttention β 13.5Γ speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention matrix.
β’ Kolmogorov-Arnold Attention, KArAt β Adaptable attention with learnable activation functions using KANs instead of softmax.
β’ Multi-token attention (MTA) β Lets the model consider groups of nearby words together for smarter long-context handling.
Read the overview of them in our free article on https://huggingface.co/blog/Kseniase/attentions
https://t.iss.one/DataScienceMπ
β’ Slim attention β 8Γ less memory, 5Γ faster generation by storing only K from KV pairs and recomputing V.
β’ XAttention β 13.5Γ speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention matrix.
β’ Kolmogorov-Arnold Attention, KArAt β Adaptable attention with learnable activation functions using KANs instead of softmax.
β’ Multi-token attention (MTA) β Lets the model consider groups of nearby words together for smarter long-context handling.
Read the overview of them in our free article on https://huggingface.co/blog/Kseniase/attentions
https://t.iss.one/DataScienceM
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π8
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
28 Mar 2025 Β· Zhihang Lin, Mingbao Lin, Yuan Xie, Rongrong Ji
Paper: https://arxiv.org/pdf/2503.22342v1.pdf
Code: https://github.com/lzhxmu/cppo
Datasets: GSM8K - MATH
https://t.iss.one/DataScienceTβ
28 Mar 2025 Β· Zhihang Lin, Mingbao Lin, Yuan Xie, Rongrong Ji
This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO). GRPO, while effective, incurs high training costs due to the need for sampling multiple completions for each question. Our experiment and theoretical analysis reveals that the number of completions impacts model accuracy yet increases training time multiplicatively, and not all completions contribute equally to policy training -- their contribution depends on their relative advantage. To address these issues, we propose CPPO, which prunes completions with low absolute advantages, significantly reducing the number needed for gradient calculation and updates. Additionally, we introduce a dynamic completion allocation strategy to maximize GPU utilization by incorporating additional questions, further enhancing training efficiency. Experimental results demonstrate that CPPO achieves up to
speedup on GSM8K and on Math while preserving or even enhancing the accuracy compared to the original GRPO. We release our code at https://github.com/lzhxmu/CPPO.
Paper: https://arxiv.org/pdf/2503.22342v1.pdf
Code: https://github.com/lzhxmu/cppo
Datasets: GSM8K - MATH
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π6
This media is not supported in your browser
VIEW IN TELEGRAM
β½ VoRA: Vision as LoRA β½
#ByteDance introduces #VoRA (Vision as #LoRA) β a novel framework that transforms #LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers.
All training data, source code, and model weights are openly available!
Key Resources:
Overview: https://t.ly/guNVN
Paper: arxiv.org/pdf/2503.20680
GitHub Repo: github.com/Hon-Wong/VoRA
Project Page: georgeluimmortal.github.io/vora-homepage.github.io
#ByteDance introduces #VoRA (Vision as #LoRA) β a novel framework that transforms #LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers.
All training data, source code, and model weights are openly available!
Key Resources:
Overview: https://t.ly/guNVN
Paper: arxiv.org/pdf/2503.20680
GitHub Repo: github.com/Hon-Wong/VoRA
Project Page: georgeluimmortal.github.io/vora-homepage.github.io
π5β€1
This media is not supported in your browser
VIEW IN TELEGRAM
#SkyworkAI unveils #SkyReelsA2 β a controllable video generation framework that can assemble arbitrary visual elements (e.g., characters, objects, backgrounds) into fully synthesized videos from text prompts.
Code, models, and evaluation benchmark are all released!
π Resources:
Review: https://t.ly/MEjzL
Paper: https://arxiv.org/pdf/2504.02436
Project: https://skyworkai.github.io/skyreels-a2.github.io/
Repo: https://github.com/SkyworkAI/SkyReels-A2
#AI #VideoGeneration #Multimodal #GenerativeAI #SkyReels #OpenSource
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π5
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0οΈβ£ Python
1οΈβ£ Data Science
2οΈβ£ Machine Learning
3οΈβ£ Data Visualization
4οΈβ£ Artificial Intelligence
5οΈβ£ Data Analysis
6οΈβ£ Statistics
7οΈβ£ Deep Learning
8οΈβ£ programming Languages
β
https://t.iss.one/addlist/8_rRW2scgfRhOTc0
β
https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
Adding TTT layers into a pre-trained Transformer enables generating a one-minute clip from text storyboards.
Videos, code & annotations released
#AI #VideoGeneration #MachineLearning #DeepLearning #Transformers #TTT #GenerativeAI
Please open Telegram to view this post
VIEW IN TELEGRAM
π2
This media is not supported in your browser
VIEW IN TELEGRAM
#AI #DeepLearning #ComputerVision #YOLO #AttentionMechanism #OpenSource
Please open Telegram to view this post
VIEW IN TELEGRAM
π5β€1
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
π₯ Github: https://github.com/bluorion-com/ZClip
π Paper: https://arxiv.org/abs/2504.02507v1
π Dataset: https://paperswithcode.com/dataset/hellaswag
π₯ Github: https://github.com/bluorion-com/ZClip
π Paper: https://arxiv.org/abs/2504.02507v1
π Dataset: https://paperswithcode.com/dataset/hellaswag
π4
Forwarded from Python | Machine Learning | Coding | R
Please open Telegram to view this post
VIEW IN TELEGRAM
π6
#DataScience #MachineLearning #DeepLearning #Python #AI #MLProjects #DataAnalysis #ExplainableAI #100DaysOfCode #TechEducation #MLInterviewPrep #NeuralNetworks #MathForML #Statistics #Coding #AIForEveryone #PythonForDataScience
Please open Telegram to view this post
VIEW IN TELEGRAM
π10β€2
d9ff625c-57ff-44d5-b57d-e5a30c4c0026.gif
120.6 KB
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
20 Jan 2025 Β· Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, Daniel Chalef Β·
Paper: https://arxiv.org/pdf/2501.13956v1.pdf
Code: https://github.com/getzep/graphiti
β‘οΈ BEST DATA SCIENCE CHANNELS ON TELEGRAM π
20 Jan 2025 Β· Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, Daniel Chalef Β·
We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark. Additionally, Zep excels in more comprehensive and challenging evaluations than DMR that better reflect real-world enterprise use cases. While existing retrieval-augmented generation (RAG) frameworks for large language model (LLM)-based agents are limited to static document retrieval, enterprise applications demand dynamic knowledge integration from diverse sources including ongoing conversations and business data. Zep addresses this fundamental limitation through its core component Graphiti -- a temporally-aware knowledge graph engine that dynamically synthesizes both unstructured conversational data and structured business data while maintaining historical relationships. In the DMR benchmark, which the MemGPT team established as their primary evaluation metric, Zep demonstrates superior performance (94.8% vs 93.4%). Beyond DMR, Zep's capabilities are further validated through the more challenging LongMemEval benchmark, which better reflects enterprise use cases through complex temporal reasoning tasks. In this evaluation, Zep achieves substantial results with accuracy improvements of up to 18.5% while simultaneously reducing response latency by 90% compared to baseline implementations. These results are particularly pronounced in enterprise-critical tasks such as cross-session information synthesis and long-term context maintenance, demonstrating Zep's effectiveness for deployment in real-world applications.
Paper: https://arxiv.org/pdf/2501.13956v1.pdf
Code: https://github.com/getzep/graphiti
Please open Telegram to view this post
VIEW IN TELEGRAM
π5
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
2 Apr 2025 Β· Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He Β·
Paper: https://github.com/bytedance/uno
Code: https://paperswithcode.com/dataset/dreambench
Dataset: DreamBooth
β‘οΈ BEST DATA SCIENCE CHANNELS ON TELEGRAM π
2 Apr 2025 Β· Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He Β·
Although subject-driven generation has been extensively explored in image generation due to its wide applications, it still has challenges in data scalability and subject expansibility. For the first challenge, moving from curating single-subject datasets to multiple-subject ones and scaling them is particularly difficult. For the second, most recent methods center on single-subject generation, making it hard to apply when dealing with multi-subject scenarios. In this study, we propose a highly-consistent data synthesis pipeline to tackle this challenge. This pipeline harnesses the intrinsic in-context generation capabilities of diffusion transformers and generates high-consistency multi-subject paired data. Additionally, we introduce UNO, which consists of progressive cross-modal alignment and universal rotary position embedding. It is a multi-image conditioned subject-to-image model iteratively trained from a text-to-image model. Extensive experiments show that our method can achieve high consistency while ensuring controllability in both single-subject and multi-subject driven generation.
Paper: https://github.com/bytedance/uno
Code: https://paperswithcode.com/dataset/dreambench
Dataset: DreamBooth
Please open Telegram to view this post
VIEW IN TELEGRAM
π2
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
31 Mar 2025 Β· Bang Liu, Xinfeng Li, et.
Paper: https://arxiv.org/pdf/2504.01990v1.pdf
Code: https://github.com/foundationagents/awesome-foundation-agents
β‘οΈ BEST DATA SCIENCE CHANNELS ON TELEGRAM π
31 Mar 2025 Β· Bang Liu, Xinfeng Li, et.
The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains. As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate, multifaceted challenges. This survey provides a comprehensive overview, framing intelligent agents within a modular, brain-inspired architecture that integrates principles from cognitive science, neuroscience, and computational research. We structure our exploration into four interconnected parts. First, we delve into the modular foundation of intelligent agents, systematically mapping their cognitive, perceptual, and operational modules onto analogous human brain functionalities, and elucidating core components such as memory, world modeling, reward processing, and emotion-like systems. Second, we discuss self-enhancement and adaptive evolution mechanisms, exploring how agents autonomously refine their capabilities, adapt to dynamic environments, and achieve continual learning through automated optimization paradigms, including emerging AutoML and LLM-driven optimization strategies. Third, we examine collaborative and evolutionary multi-agent systems, investigating the collective intelligence emerging from agent interactions, cooperation, and societal structures, highlighting parallels to human social dynamics. Finally, we address the critical imperative of building safe, secure, and beneficial AI systems, emphasizing intrinsic and extrinsic security threats, ethical alignment, robustness, and practical mitigation strategies necessary for trustworthy real-world deployment.
Paper: https://arxiv.org/pdf/2504.01990v1.pdf
Code: https://github.com/foundationagents/awesome-foundation-agents
Please open Telegram to view this post
VIEW IN TELEGRAM
π2β€1
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0οΈβ£ Python
1οΈβ£ Data Science
2οΈβ£ Machine Learning
3οΈβ£ Data Visualization
4οΈβ£ Artificial Intelligence
5οΈβ£ Data Analysis
6οΈβ£ Statistics
7οΈβ£ Deep Learning
8οΈβ£ programming Languages
β
https://t.iss.one/addlist/8_rRW2scgfRhOTc0
β
https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
Forwarded from Python | Machine Learning | Coding | R
100 Important Data Science Interview Questions.pdf
11.7 MB
π¨π»βπ» Preparing for a data science interview?
Reviewing fundamental questions is one of the best strategies for success. During the interview, it's crucial to communicate clearly and simplyβespecially when explaining complex models and data.
These 100 carefully selected questions will not only help you impress your interviewer but also boost your confidence throughout the interview process.
#DataScienceInterview #TechCareers #InterviewPreparation
Please open Telegram to view this post
VIEW IN TELEGRAM
π2
Title of paper:
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
π4
π 2025 Top IT Certification β Free Study Materials Are Here!
π₯Whether you're preparing for #Cisco #AWS #PMP #Python #Excel #Google #Microsoft #AI or any other in-demand certification β SPOTO has got you covered!
π Download the FREE IT Certs Exam E-book:
π https://bit.ly/4lNVItV
π§ Test Your IT Skills for FREE:
π https://bit.ly/4imEjW5
βοΈ Download Free AI Materials :
π https://bit.ly/3F3lc5B
π Need 1-on-1 IT Exam Help? Contact Now:
π https://wa.link/k0vy3x
π Join Our IT Study Group for Daily Updates & Tips:
π https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
π₯Whether you're preparing for #Cisco #AWS #PMP #Python #Excel #Google #Microsoft #AI or any other in-demand certification β SPOTO has got you covered!
π Download the FREE IT Certs Exam E-book:
π https://bit.ly/4lNVItV
π§ Test Your IT Skills for FREE:
π https://bit.ly/4imEjW5
βοΈ Download Free AI Materials :
π https://bit.ly/3F3lc5B
π Need 1-on-1 IT Exam Help? Contact Now:
π https://wa.link/k0vy3x
π Join Our IT Study Group for Daily Updates & Tips:
π https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
β€3