ML Research Hub
32.6K subscribers
3.36K photos
131 videos
23 files
3.58K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho
Download Telegram
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

📝 Summary:
IndexTTS enhances XTTS and Tortoise for TTS, improving naturalness and zero-shot voice cloning. It features hybrid character-pinyin modeling for Chinese and optimized vector quantization, resulting in more controllable usage, faster inference, and superior performance compared to other systems.

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.05512
• PDF: https://arxiv.org/pdf/2502.05512
• Github: https://github.com/index-tts/index-tts

🔹 Models citing this paper:
https://huggingface.co/IndexTeam/IndexTTS-2
https://huggingface.co/IndexTeam/Index-TTS
https://huggingface.co/Toxzic/indextts-colab

Spaces citing this paper:
https://huggingface.co/spaces/IndexTeam/IndexTTS
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
https://huggingface.co/spaces/jairwaal/image

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToSpeech #ZeroShotLearning #VoiceCloning #AI #MachineLearning
OpenVoice: Versatile Instant Voice Cloning

📝 Summary:
OpenVoice is a versatile voice cloning method using a short audio clip. It provides flexible control over voice styles and achieves zero-shot cross-lingual cloning for new languages without extensive training data. It is also highly efficient.

🔹 Publication Date: Published on Dec 3, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2312.01479
• PDF: https://arxiv.org/pdf/2312.01479
• Github: https://github.com/myshell-ai/openvoice

🔹 Models citing this paper:
https://huggingface.co/rsxdalv/OpenVoiceV2
https://huggingface.co/ameerazam08/Udiff
https://huggingface.co/flopml/OpenVoice-v2

Datasets citing this paper:
https://huggingface.co/datasets/tsinghua-ee/QualiSpeech
https://huggingface.co/datasets/dlxjj/Openvoice
https://huggingface.co/datasets/Pendrokar/open_tts_tracker

Spaces citing this paper:
https://huggingface.co/spaces/Russell1123213123/testOpenVoice
https://huggingface.co/spaces/gauthamk28/gauthamk28_voice
https://huggingface.co/spaces/blayks07/OpenVoice-main

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VoiceCloning #AIResearch #SpeechSynthesis #ZeroShotLearning #CrossLingualAI
Dynamic Reflections: Probing Video Representations with Text Alignment

📝 Summary:
This work presents the first comprehensive study on video-text representation alignment. It reveals alignment depends on data richness and correlates with downstream task performance, suggesting its value for general video understanding. This introduces video-text alignment as a zero-shot method ...

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02767
• PDF: https://arxiv.org/pdf/2511.02767
• Github: https://video-prh.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoUnderstanding #TextAlignment #VideoTextAI #ZeroShotLearning #RepresentationLearning
1