Forwarded from Machine learning books and papers
NVIDIA BioNeMo2 Framework is a set of tools, libraries, and models for computational drug discovery and design.
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
Forwarded from Machine learning books and papers
Large Language Models Course: Learn by Doing LLM Projects
🖥 Github: https://github.com/peremartra/Large-Language-Model-Notebooks-Course
📕 Paper: https://doi.org/10.31219/osf.io/qgxea
@Machine_learn
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Forwarded from Machine learning books and papers
Foundations of Large Language Models (1).pdf
1.9 MB
Foundations of Large Language Models
📝 Table of Contents:
● Pre-training
● Generative Models
● Prompting
● Alignment
Tong Xiao and Jingbo Zhu
January 17, 2025
📃 Download from arXiv.
@Machine_learn
📝 Table of Contents:
● Pre-training
● Generative Models
● Prompting
● Alignment
Tong Xiao and Jingbo Zhu
January 17, 2025
📃 Download from arXiv.
@Machine_learn
👍1
Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap
Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced #LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of #LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence.
Paper: https://arxiv.org/pdf/2401.10034v3.pdf
Code: https://github.com/wuxingyu-ai/llm4ec
https://t.iss.one/deep_learning_proj
Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced #LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of #LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence.
Paper: https://arxiv.org/pdf/2401.10034v3.pdf
Code: https://github.com/wuxingyu-ai/llm4ec
https://t.iss.one/deep_learning_proj
🐫Tülu 3 (what a name) 405B - another release!
An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks
Scalable to 405B - with performance on par with GPT-4o and outperforming previous models in the same class.
▪ Blog: https://allenai.org/blog/tulu-3-405B
▪You can test it here: https://playground.allenai.org/?model=tulu3-405b
▪ Technical report: https://allenai.org/blog/tulu-3-technical
▪ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
https://t.iss.one/deep_learning_proj
An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks
Scalable to 405B - with performance on par with GPT-4o and outperforming previous models in the same class.
▪ Blog: https://allenai.org/blog/tulu-3-405B
▪You can test it here: https://playground.allenai.org/?model=tulu3-405b
▪ Technical report: https://allenai.org/blog/tulu-3-technical
▪ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
https://t.iss.one/deep_learning_proj
⚡ LitGPT
▪Github
▪Docs
▪Video
https://t.iss.one/deep_learning_proj
pip install 'litgpt[all]'
from litgpt import LLM
llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the familly goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.
▪Github
▪Docs
▪Video
https://t.iss.one/deep_learning_proj
👍4
LLMs can see and hear without any training
30 Jan 2025 · Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar ·
We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.
Paper: https://arxiv.org/pdf/2501.18096v1.pdf
Code: https://github.com/facebookresearch/mils
✅
https://t.iss.one/deep_learning_proj
30 Jan 2025 · Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar ·
We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.
Paper: https://arxiv.org/pdf/2501.18096v1.pdf
Code: https://github.com/facebookresearch/mils
https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
arXiv.org
Demystifying Long Chain-of-Thought Reasoning in LLMs
Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning...
👍3
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
13 Dec 2024 · Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan ·
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processing high-resolution images with different aspect ratios. For the language component, we leverage #DeepSeekMoE models with the Multi-head Latent Attention mechanism, which compresses Key-Value cache into latent vectors, to enable efficient inference and high throughput. Trained on an improved vision-language dataset, DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, #DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. Codes and pre-trained models are publicly accessible at https://github.com/deepseek-ai/DeepSeek-VL2.
Paper: https://arxiv.org/pdf/2412.10302v1.pdf
Code: https://github.com/deepseek-ai/deepseek-vl2
Datasets: RefCOCO TextVQA MMBench
DocVQA
💠
https://t.iss.one/deep_learning_proj
13 Dec 2024 · Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan ·
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processing high-resolution images with different aspect ratios. For the language component, we leverage #DeepSeekMoE models with the Multi-head Latent Attention mechanism, which compresses Key-Value cache into latent vectors, to enable efficient inference and high throughput. Trained on an improved vision-language dataset, DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, #DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. Codes and pre-trained models are publicly accessible at https://github.com/deepseek-ai/DeepSeek-VL2.
Paper: https://arxiv.org/pdf/2412.10302v1.pdf
Code: https://github.com/deepseek-ai/deepseek-vl2
Datasets: RefCOCO TextVQA MMBench
DocVQA
https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
5 Aug 2024 · Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak ·
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets.
Paper: https://arxiv.org/pdf/2408.02545v1.pdf
Code: https://github.com/intellabs/ragfoundry
Datasets: TriviaQA - PubMedQA
https://t.iss.one/deep_learning_proj
5 Aug 2024 · Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak ·
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets.
Paper: https://arxiv.org/pdf/2408.02545v1.pdf
Code: https://github.com/intellabs/ragfoundry
Datasets: TriviaQA - PubMedQA
https://t.iss.one/deep_learning_proj
👍4
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·
We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.
Paper: https://arxiv.org/pdf/2501.14350v1.pdf
Code: https://github.com/fireredteam/fireredasr
Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech
https://t.iss.one/deep_learning_proj
24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·
We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.
Paper: https://arxiv.org/pdf/2501.14350v1.pdf
Code: https://github.com/fireredteam/fireredasr
Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech
https://t.iss.one/deep_learning_proj
👍4
⚡️ LLM4Decompile .
🟡 Github
🟡 Models
🟡 Paper
🟡 Colab
https://t.iss.one/deep_learning_proj
git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n 'llm4decompile' python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt
https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Slamming: Training a Speech Language Model on One GPU in a Day
19 Feb 2025 · Gallil Maimon, Avishai Elmakies, Yossi Adi ·
We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to #SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .
Paper: https://arxiv.org/pdf/2502.15814v1.pdf
Code: https://github.com/slp-rl/slamkit
https://t.iss.one/deep_learning_proj
19 Feb 2025 · Gallil Maimon, Avishai Elmakies, Yossi Adi ·
We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to #SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .
Paper: https://arxiv.org/pdf/2502.15814v1.pdf
Code: https://github.com/slp-rl/slamkit
https://t.iss.one/deep_learning_proj
👍1
Forwarded from Machine learning books and papers
This media is not supported in your browser
VIEW IN TELEGRAM
رمضان الکریم ❤️
@Machine_learn
@Machine_learn
❤🔥6👎4🤔1