Github LLMs
750 subscribers
39 photos
3 videos
4 files
53 links
LLM projects
@Raminmousa
Download Telegram
πŸ”₯ NVIDIA silently release a Llama 3.1 70B fine-tune that outperforms
GPT-4o and Claude Sonnet 3.5


Llama 3.1 Nemotron 70B Instruct a further RLHFed model on
huggingface


https://huggingface.co/collections/nvidia/llama-31-nemotron-70b-670e93cd366feea16abc13d8
βœ…https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
🌟 Zamba2-Instruct

Π’ сСмСйствС 2 ΠΌΠΎΠ΄Π΅Π»ΠΈ:

🟒Zamba2-1.2B-instruct;
🟠Zamba2-2.7B-instruct.



# Clone repo
git clone https://github.com/Zyphra/transformers_zamba2.git
cd transformers_zamba2

# Install the repository & accelerate:
pip install -e .
pip install accelerate

# Inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

user_turn_1 = "user_prompt1."
assistant_turn_1 = "assistant_prompt."
user_turn_2 = "user_prompt2."
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))





πŸ–₯GitHub

https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘2
πŸ“– LLM-Agent-Paper-List is a repository of papers on the topic of agents based on large language models (LLM)! The papers are divided into categories such as LLM agent architectures, autonomous LLM agents, reinforcement learning (RL), natural language processing methods, multimodal approaches and tools for developing LLM agents, and more.

πŸ–₯ Github

https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘3
Welcome to Ollama's Prompt Engineering Interactive Tutorial

πŸ”— Github

https://t.iss.one/deep_learning_proj
πŸ‘3
⚑️ MobileLLM


🟒MobileLLM-125M. 30 Layers, 9 Attention Heads, 3 KV Heads. 576 Token Dimension;

🟒MobileLLM-350M. 32 Layers, 15 Attention Heads, 5 KV Heads. 960 Token Dimension;

🟒MobileLLM-600M. 40 Layers, 18 Attention Heads, 6 KV Heads. 1152 Token Dimension;

🟒MobileLLM-1B. 54 Layers, 20 Attention Heads, 5 KV Heads. 1280 Token Dimension;


🟑Arxiv
πŸ–₯GitHub


@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
🌟 BioNeMo: A Framework for Developing AI Models for Drug Design.

NVIDIA BioNeMo2 Framework is a set of tools, libraries, and models for computational drug discovery and design.



▢️ Pre-trained models:

🟒 ESM-2 is a pre-trained bidirectional encoder (BERT-like) for amino acid sequences. BioNeMo2 includes checkpoints with parameters 650M and 3B;

🟒 Geneformer is a tabular scoring model that generates a dense representation of a cell's scRNA by examining co-expression patterns in individual cells.


▢️ Datasets:

🟠 CELLxGENE is a collection of publicly available single-cell datasets collected by the CZI (Chan Zuckerberg Initiative) with a total volume of 24 million cells;


🟠 UniProt is a database of clustered sets of protein sequences from UniProtKB, created on the basis of translated genomic data.



🟑 Project page
🟑 Documentation
πŸ–₯ GitHub

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘2
🌟 LLaMA-Mesh:
🟑Arxiv
πŸ–₯GitHub

https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1
Large Language Models Course: Learn by Doing LLM Projects

πŸ–₯ Github: https://github.com/peremartra/Large-Language-Model-Notebooks-Course

πŸ“• Paper: https://doi.org/10.31219/osf.io/qgxea

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Foundations of Large Language Models (1).pdf
1.9 MB
Foundations of Large Language Models

πŸ“ Table of Contents:
● Pre-training
● Generative Models
● Prompting
● Alignment

Tong Xiao and Jingbo Zhu
January 17, 2025

πŸ“ƒ Download from arXiv.

@Machine_learn
πŸ‘1
Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap

Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced #LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of #LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence.

Paper: https://arxiv.org/pdf/2401.10034v3.pdf

Code: https://github.com/wuxingyu-ai/llm4ec

https://t.iss.one/deep_learning_proj
ChatGPT Cheat Sheet for Business (2025).pdf
8 MB
ChatGPT Cheat Sheet for Business - DataCamp

@Machine_learn
πŸ‘3
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1
🐫TΓΌlu 3 (what a name) 405B - ​​another release!

An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks

Scalable to 405B - ​​with performance on par with GPT-4o and outperforming previous models in the same class.

β–ͺ Blog: https://allenai.org/blog/tulu-3-405B
β–ͺYou can test it here: https://playground.allenai.org/?model=tulu3-405b
β–ͺ Technical report: https://allenai.org/blog/tulu-3-technical
β–ͺ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

https://t.iss.one/deep_learning_proj
⚑ LitGPT



pip install 'litgpt[all]'



from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the familly goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.


β–ͺGithub
β–ͺDocs
β–ͺVideo

https://t.iss.one/deep_learning_proj
πŸ‘4
LLMs can see and hear without any training

30 Jan 2025 Β· Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar Β·

We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.

Paper: https://arxiv.org/pdf/2501.18096v1.pdf

Code: https://github.com/facebookresearch/mils
βœ…
https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM