Title of paper:
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
π4
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces Describe Anything Model (DAM)
a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.
Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD
a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.
Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD
#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch
https://t.iss.one/DataScienceTβ
Please open Telegram to view this post
VIEW IN TELEGRAM
π5
The model processes text, images, audio, and video in a single model.
On benchmarks, it looks like all modalities work with equal quality.
- First place in 22 out of 36 audio and multimodal benchmarks
- Support for 119 text languages,
- Minimal latency β 211 ms
- Audio processing up to 30 minutes long
- Allows flexible customization via system prompts
- Built-in tool calling
The company released three versions:
- Qwen3-Omni-30B-A3B-Instruct
- Qwen3-Omni-30B-A3B-Thinking
- Qwen3-Omni-30B-A3B-Captioner
#qwen #opensource #llm #ml
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π€π§ Cognee: Powerful Memory for AI Agents in Just 6 Lines of Code
ποΈ 07 Oct 2025
π AI News & Trends
Artificial Intelligence is evolving rapidly, but one of the biggest challenges for developers is building agents that remember, reason and adapt. Traditional RAG (Retrieval-Augmented Generation) systems often fall short when handling context, scalability and precision. Thatβs where Cognee comes in. It is an open-source framework designed to provide AI agents with memory using a unique ...
#AI #Memory #AIAgents #OpenSource #RAG #ArtificialIntelligence
ποΈ 07 Oct 2025
π AI News & Trends
Artificial Intelligence is evolving rapidly, but one of the biggest challenges for developers is building agents that remember, reason and adapt. Traditional RAG (Retrieval-Augmented Generation) systems often fall short when handling context, scalability and precision. Thatβs where Cognee comes in. It is an open-source framework designed to provide AI agents with memory using a unique ...
#AI #Memory #AIAgents #OpenSource #RAG #ArtificialIntelligence
β€4
π€π§ NanoChat: The Best ChatGPT That $100 Can Buy
ποΈ 20 Oct 2025
π AI News & Trends
In a world dominated by billion-dollar AI models like GPT-4 and Claude 3, itβs refreshing to see a minimalist, open-source alternative that puts the power of Large Language Models (LLMs) back into the hands of hackers, researchers and enthusiasts. Enter NanoChat β an end-to-end, full-stack implementation of a ChatGPT-style AI chatbot developed by Andrej Karpathy, ...
#NanoChat #ChatGPT #AI #LargeLanguageModels #OpenSource #AndrejKarpathy
ποΈ 20 Oct 2025
π AI News & Trends
In a world dominated by billion-dollar AI models like GPT-4 and Claude 3, itβs refreshing to see a minimalist, open-source alternative that puts the power of Large Language Models (LLMs) back into the hands of hackers, researchers and enthusiasts. Enter NanoChat β an end-to-end, full-stack implementation of a ChatGPT-style AI chatbot developed by Andrej Karpathy, ...
#NanoChat #ChatGPT #AI #LargeLanguageModels #OpenSource #AndrejKarpathy
π€π§ Wan 2.1: Alibabaβs Open-Source Revolution in Video Generation
ποΈ 21 Oct 2025
π AI News & Trends
The landscape of artificial intelligence has been evolving rapidly, especially in the domain of video generation. Since OpenAI unveiled Sora in 2024, the world has witnessed an explosive surge in research and innovation within generative AI. However, most of these cutting-edge tools remained closed-source limiting transparency and accessibility. Recognizing this gap, Alibaba Group introduced Wan, ...
#Alibaba #Wan2.1 #VideoGeneration #GenerativeAI #OpenSource #ArtificialIntelligence
ποΈ 21 Oct 2025
π AI News & Trends
The landscape of artificial intelligence has been evolving rapidly, especially in the domain of video generation. Since OpenAI unveiled Sora in 2024, the world has witnessed an explosive surge in research and innovation within generative AI. However, most of these cutting-edge tools remained closed-source limiting transparency and accessibility. Recognizing this gap, Alibaba Group introduced Wan, ...
#Alibaba #Wan2.1 #VideoGeneration #GenerativeAI #OpenSource #ArtificialIntelligence
β€1
π€π§ Master Machine Learning: Explore the Ultimate βMachine-Learning-Tutorialsβ Repository
ποΈ 23 Oct 2025
π AI News & Trends
In todayβs data-driven world, Machine Learning (ML) has become the cornerstone of modern technology from intelligent chatbots to predictive analytics and recommendation systems. However, mastering ML isnβt just about coding, it requires a structured understanding of algorithms, statistics, optimization techniques and real-world problem-solving. Thatβs where Ujjwal Karnβs Machine-Learning-Tutorials GitHub repository stands out. This open-source, topic-wise ...
#MachineLearning #MLTutorials #ArtificialIntelligence #DataScience #OpenSource #AIEducation
ποΈ 23 Oct 2025
π AI News & Trends
In todayβs data-driven world, Machine Learning (ML) has become the cornerstone of modern technology from intelligent chatbots to predictive analytics and recommendation systems. However, mastering ML isnβt just about coding, it requires a structured understanding of algorithms, statistics, optimization techniques and real-world problem-solving. Thatβs where Ujjwal Karnβs Machine-Learning-Tutorials GitHub repository stands out. This open-source, topic-wise ...
#MachineLearning #MLTutorials #ArtificialIntelligence #DataScience #OpenSource #AIEducation
π€π§ LangChain: The Ultimate Framework for Building Reliable AI Agents and LLM Applications
ποΈ 24 Oct 2025
π AI News & Trends
As artificial intelligence continues to transform industries, developers are racing to build smarter, more adaptive applications powered by Large Language Models (LLMs). Yet, one major challenge remains how to make these models interact intelligently with real-world data and external systems in a scalable, reliable way. Enter LangChain, an open-source framework designed to make LLM-powered application ...
#LangChain #AI #LLM #ArtificialIntelligence #OpenSource #AIAgents
ποΈ 24 Oct 2025
π AI News & Trends
As artificial intelligence continues to transform industries, developers are racing to build smarter, more adaptive applications powered by Large Language Models (LLMs). Yet, one major challenge remains how to make these models interact intelligently with real-world data and external systems in a scalable, reliable way. Enter LangChain, an open-source framework designed to make LLM-powered application ...
#LangChain #AI #LLM #ArtificialIntelligence #OpenSource #AIAgents
β€1
π€π§ Microsoft Data Formulator: Revolutionizing AI-Powered Data Visualization
ποΈ 28 Oct 2025
π AI News & Trends
In todayβs data-driven world, visualization is everything. Whether youβre a business analyst, data scientist or researcher, the ability to convert raw data into meaningful visuals can define the success of your decisions. Thatβs where Microsoftβs Data Formulator steps in a cutting-edge, open-source platform designed to empower analysts to create rich, AI-assisted visualizations effortlessly. Developed by ...
#Microsoft #DataVisualization #AI #DataScience #OpenSource #Analytics
ποΈ 28 Oct 2025
π AI News & Trends
In todayβs data-driven world, visualization is everything. Whether youβre a business analyst, data scientist or researcher, the ability to convert raw data into meaningful visuals can define the success of your decisions. Thatβs where Microsoftβs Data Formulator steps in a cutting-edge, open-source platform designed to empower analysts to create rich, AI-assisted visualizations effortlessly. Developed by ...
#Microsoft #DataVisualization #AI #DataScience #OpenSource #Analytics
Instead of a rigidly trained classifier, the model takes your own security policy as input and reasons whether the message complies with this policy.
The result is not just "safe/unsafe," but a chain of reasoning that you can verify and improve.
The models are available in two sizes: 120B and 20B.
β’ gpt-oss-safeguard-120B
β’ gpt-oss-safeguard-20B
π‘ Why they are needed:
β’ Policies can be changed without retraining the model
β’ Suitable for niche or rapidly changing risks (e.g., cheating in games or fake reviews)
β’ Does not require thousands of labeled examples
β’ Ideal when explainability is important rather than minimal latency
Both are available under the Apache 2.0 license - they can be freely used, modified, and deployed.
π Official announcement
π€ Hugging Face
#openai #chatgpt #opensource
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π€π§ Reflex: Build Full-Stack Web Apps in Pure Python β Fast, Flexible and Powerful
ποΈ 29 Oct 2025
π AI News & Trends
Building modern web applications has traditionally required mastering multiple languages and frameworks from JavaScript for the frontend to Python, Java or Node.js for the backend. For many developers, switching between different technologies can slow down productivity and increase complexity. Reflex eliminates that problem. It is an innovative open-source full-stack web framework that allows developers to ...
#Reflex #FullStack #WebDevelopment #Python #OpenSource #WebApps
ποΈ 29 Oct 2025
π AI News & Trends
Building modern web applications has traditionally required mastering multiple languages and frameworks from JavaScript for the frontend to Python, Java or Node.js for the backend. For many developers, switching between different technologies can slow down productivity and increase complexity. Reflex eliminates that problem. It is an innovative open-source full-stack web framework that allows developers to ...
#Reflex #FullStack #WebDevelopment #Python #OpenSource #WebApps
π€π§ MiniMax-M2: The Open-Source Revolution Powering Coding and Agentic Intelligence
ποΈ 30 Oct 2025
π AI News & Trends
Artificial intelligence is evolving faster than ever, but not every innovation needs to be enormous to make an impact. MiniMax-M2, the latest release from MiniMax-AI, demonstrates that efficiency and power can coexist within a streamlined framework. MiniMax-M2 is an open-source Mixture of Experts (MoE) model designed for coding tasks, multi-agent collaboration and automation workflows. With ...
#MiniMaxM2 #OpenSource #MachineLearning #CodingAI #AgenticIntelligence #MixtureOfExperts
ποΈ 30 Oct 2025
π AI News & Trends
Artificial intelligence is evolving faster than ever, but not every innovation needs to be enormous to make an impact. MiniMax-M2, the latest release from MiniMax-AI, demonstrates that efficiency and power can coexist within a streamlined framework. MiniMax-M2 is an open-source Mixture of Experts (MoE) model designed for coding tasks, multi-agent collaboration and automation workflows. With ...
#MiniMaxM2 #OpenSource #MachineLearning #CodingAI #AgenticIntelligence #MixtureOfExperts
π€π§ LongCat-Video: Meituanβs Groundbreaking Step Toward Efficient Long Video Generation with AI
ποΈ 04 Nov 2025
π AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model β LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
ποΈ 04 Nov 2025
π AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model β LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
β¨olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
π Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.
πΉ Publication Date: Published on Feb 25
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2502.18443
β’ PDF: https://arxiv.org/pdf/2502.18443
β’ Github: https://github.com/allenai/olmocr
β¨ Datasets citing this paper:
β’ https://huggingface.co/datasets/davanstrien/test-olmocr2
β’ https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
β’ https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#OCR #VLMs #LLM #DataExtraction #OpenSource
π Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.
πΉ Publication Date: Published on Feb 25
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2502.18443
β’ PDF: https://arxiv.org/pdf/2502.18443
β’ Github: https://github.com/allenai/olmocr
β¨ Datasets citing this paper:
β’ https://huggingface.co/datasets/davanstrien/test-olmocr2
β’ https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
β’ https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#OCR #VLMs #LLM #DataExtraction #OpenSource
β¨MinerU: An Open-Source Solution for Precise Document Content Extraction
π Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.
πΉ Publication Date: Published on Sep 27, 2024
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/pdf/2409.18839
β’ PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
β’ Github: https://github.com/opendatalab/MinerU
β¨ Spaces citing this paper:
β’ https://huggingface.co/spaces/opendatalab/MinerU
β’ https://huggingface.co/spaces/xiaoye-winters/MinerU-API
β’ https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#DocumentExtraction #OpenSource #DataScience #NLP #AI
π Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.
πΉ Publication Date: Published on Sep 27, 2024
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/pdf/2409.18839
β’ PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
β’ Github: https://github.com/opendatalab/MinerU
β¨ Spaces citing this paper:
β’ https://huggingface.co/spaces/opendatalab/MinerU
β’ https://huggingface.co/spaces/xiaoye-winters/MinerU-API
β’ https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#DocumentExtraction #OpenSource #DataScience #NLP #AI
π€π§ Krea Realtime 14B: Redefining Real-Time Video Generation with AI
ποΈ 05 Nov 2025
π AI News & Trends
The field of artificial intelligence is undergoing a remarkable transformation and one of the most exciting developments is the rise of real-time video generation. From cinematic visual effects to immersive virtual environments, AI is rapidly blurring the boundaries between imagination and reality. At the forefront of this innovation stands Krea Realtime 14B, an advanced open-source ...
#AI #RealTimeVideo #ArtificialIntelligence #OpenSource #VideoGeneration #KreaRealtime14B
ποΈ 05 Nov 2025
π AI News & Trends
The field of artificial intelligence is undergoing a remarkable transformation and one of the most exciting developments is the rise of real-time video generation. From cinematic visual effects to immersive virtual environments, AI is rapidly blurring the boundaries between imagination and reality. At the forefront of this innovation stands Krea Realtime 14B, an advanced open-source ...
#AI #RealTimeVideo #ArtificialIntelligence #OpenSource #VideoGeneration #KreaRealtime14B
π€π§ FIBO: The First JSON-Native, Open-Source Text-to-Image Model Built for Real-World Control and Accuracy
ποΈ 07 Nov 2025
π AI News & Trends
The world of generative AI has evolved rapidly with text-to-image tools enabling creators, marketers, designers and enterprises to bring ideas to life with unprecedented ease. However, most existing models have a clear limitation: they prioritize imagination at the cost of control. Whether producing inconsistent styles, unpredictable lighting or drifting away from user prompts, traditional models ...
#FIBO #TextToImage #GenerativeAI #OpenSource #JSONNative #RealWorldControl
ποΈ 07 Nov 2025
π AI News & Trends
The world of generative AI has evolved rapidly with text-to-image tools enabling creators, marketers, designers and enterprises to bring ideas to life with unprecedented ease. However, most existing models have a clear limitation: they prioritize imagination at the cost of control. Whether producing inconsistent styles, unpredictable lighting or drifting away from user prompts, traditional models ...
#FIBO #TextToImage #GenerativeAI #OpenSource #JSONNative #RealWorldControl
β¨OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
π Summary:
OmniVinci is an open-source omni-modal LLM that improves cross-modal understanding for audio, vision, and robotics. It features innovative architecture for better embedding alignment and temporal capture, along with efficient data curation. OmniVinci outperforms competitors while using significan...
πΉ Publication Date: Published on Oct 17
πΉ Paper Links:
β’ arXiv Page: https://arxivexplained.com/papers/omnivinci-enhancing-architecture-and-data-for-omni-modal-understanding-llm
β’ PDF: https://arxiv.org/pdf/2510.15870
β’ Project Page: https://nvlabs.github.io/OmniVinci/
β’ Github: https://github.com/NVlabs/OmniVinci
πΉ Models citing this paper:
β’ https://huggingface.co/nvidia/omnivinci
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLM #MultimodalAI #Robotics #DeepLearning #OpenSource
π Summary:
OmniVinci is an open-source omni-modal LLM that improves cross-modal understanding for audio, vision, and robotics. It features innovative architecture for better embedding alignment and temporal capture, along with efficient data curation. OmniVinci outperforms competitors while using significan...
πΉ Publication Date: Published on Oct 17
πΉ Paper Links:
β’ arXiv Page: https://arxivexplained.com/papers/omnivinci-enhancing-architecture-and-data-for-omni-modal-understanding-llm
β’ PDF: https://arxiv.org/pdf/2510.15870
β’ Project Page: https://nvlabs.github.io/OmniVinci/
β’ Github: https://github.com/NVlabs/OmniVinci
πΉ Models citing this paper:
β’ https://huggingface.co/nvidia/omnivinci
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLM #MultimodalAI #Robotics #DeepLearning #OpenSource
π€π§ Meilisearch: The Lightning-Fast, AI-Ready Search Engine for Modern Applications
ποΈ 08 Nov 2025
π AI News & Trends
Search is no longer a luxury feature. Todayβs users expect instant, relevant results across e-commerce platforms, SaaS tools, media libraries and knowledge systems. With AI-powered experiences becoming the new standard, developers need search infrastructure that is fast, flexible, developer-friendly and ready for hybrid semantic search. This is where Meilisearch stands out. Meilisearch is an open-source, ...
#Meilisearch #AIReadySearch #LightningFast #SearchEngine #ModernApplications #OpenSource
ποΈ 08 Nov 2025
π AI News & Trends
Search is no longer a luxury feature. Todayβs users expect instant, relevant results across e-commerce platforms, SaaS tools, media libraries and knowledge systems. With AI-powered experiences becoming the new standard, developers need search infrastructure that is fast, flexible, developer-friendly and ready for hybrid semantic search. This is where Meilisearch stands out. Meilisearch is an open-source, ...
#Meilisearch #AIReadySearch #LightningFast #SearchEngine #ModernApplications #OpenSource
Media is too big
VIEW IN TELEGRAM
β¨UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
π Summary:
UniVA is an open-source multi-agent framework that unifies video understanding, segmentation, editing, and generation. It uses a Plan-and-Act architecture with hierarchical memory to enable complex, iterative video workflows. This system aims to advance agentic video intelligence.
πΉ Publication Date: Published on Nov 11
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.08521
β’ PDF: https://arxiv.org/pdf/2511.08521
β’ Project Page: https://univa.online/
β’ Github: https://github.com/univa-agent/univa
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VideoAI #AIagents #GenerativeAI #ComputerVision #OpenSource
π Summary:
UniVA is an open-source multi-agent framework that unifies video understanding, segmentation, editing, and generation. It uses a Plan-and-Act architecture with hierarchical memory to enable complex, iterative video workflows. This system aims to advance agentic video intelligence.
πΉ Publication Date: Published on Nov 11
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.08521
β’ PDF: https://arxiv.org/pdf/2511.08521
β’ Project Page: https://univa.online/
β’ Github: https://github.com/univa-agent/univa
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VideoAI #AIagents #GenerativeAI #ComputerVision #OpenSource