Github LLMs
747 subscribers
39 photos
3 videos
4 files
54 links
LLM projects
@Raminmousa
Download Telegram
LLMs can see and hear without any training

30 Jan 2025 · Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar ·

We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.

Paper: https://arxiv.org/pdf/2501.18096v1.pdf

Code: https://github.com/facebookresearch/mils

https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
Awesome-LLM-as-a-judge Survey

Github

🔸https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

13 Dec 2024 · Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan ·

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processing high-resolution images with different aspect ratios. For the language component, we leverage #DeepSeekMoE models with the Multi-head Latent Attention mechanism, which compresses Key-Value cache into latent vectors, to enable efficient inference and high throughput. Trained on an improved vision-language dataset, DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, #DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. Codes and pre-trained models are publicly accessible at https://github.com/deepseek-ai/DeepSeek-VL2.

Paper: https://arxiv.org/pdf/2412.10302v1.pdf

Code: https://github.com/deepseek-ai/deepseek-vl2

Datasets: RefCOCO TextVQA MMBench
DocVQA
💠
https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

5 Aug 2024 · Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak ·

Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets.

Paper: https://arxiv.org/pdf/2408.02545v1.pdf

Code: https://github.com/intellabs/ragfoundry

Datasets: TriviaQA - PubMedQA

https://t.iss.one/deep_learning_proj
👍4
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.

Paper: https://arxiv.org/pdf/2501.14350v1.pdf

Code: https://github.com/fireredteam/fireredasr

Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech

https://t.iss.one/deep_learning_proj
👍4
⚡️ LLM4Decompile .

git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n 'llm4decompile' python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt


🟡 Github
🟡 Models
🟡 Paper
🟡 Colab
https://t.iss.one/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Tutorial: Train your own Reasoning model with GRPO

📓 Tutorial

https://t.iss.one/deep_learning_proj
👍1
Slamming: Training a Speech Language Model on One GPU in a Day

19 Feb 2025 · Gallil Maimon, Avishai Elmakies, Yossi Adi ·

We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to #SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .

Paper: https://arxiv.org/pdf/2502.15814v1.pdf

Code: https://github.com/slp-rl/slamkit



https://t.iss.one/deep_learning_proj
👍1
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs

📚 'Read


@Machine_learn
preprints202502.0982.v1.pdf
1018.1 KB
PKG-LLM: A Framework for Predicting GAD and MDD Using Knowledge Graphs and Large Language Models in Cognitive Neuroscience

Ali Sarabadani,Hadis Taherinia,Niloufar Ghadiri,
Ehsan Karimi Shahmarvandi,
Ramin Mousa  *

Abstract
Purpose: This research project has a single purpose: the construction and evaluation of PKG-LLM, a knowledge graph framework whose application is primarily intended for cognitive neuroscience. It also aims to improve predictions of relationships among neurological entities and improve named entity recognition (NER) and relation extraction (RE) from large neurological datasets. Employing the GPT-4 and expert review, we aim to demonstrate how this framework may outperform traditional models by way of precision, recall, and F1 score, intending to provide key insights into possible future clinical and research applications in the field of neuroscience. Method: In the evaluation of PKG-LLM, there were two different tasks primarily: relation extraction (RE) and named entity recognition (NER). Both tasks processed data and obtained performance metrics, such as precision, recall, and F1-score, using GPT-4. Moreover, there was an integration of an expert review process comprising neurologists and domain experts reviewing those extracted relationships and entities and improving such final performance metrics. Model comparative performance was reported against StrokeKG and Heart Failure KG. On the other hand, PKG-LLM evinced itself to link prediction-in-cognition through metrics such as Mean Rank (MR), Mean Reciprocal Rank (MRR), and Precision at K (P@K). The model was evaluated against other link prediction models, including TransE, RotatE, DistMult, ComplEx, ConvE, and HolmE. Findings: PKG-LLM demonstrated competitive performance in both relation extraction and named entity recognition tasks. In its traditional form, PKG-LLM achieved a precision of 75.45\%, recall of 78.60\%, and F1-score of 76.89\% in relation extraction, which improved to 82.34\%, 85.40\%, and 83.85\% after expert review. In named entity recognition, the traditional model scored 73.42\% precision, 76.30\% recall, and 74.84\% F1-score, improving to 81.55\%, 84.60\%, and 82.99\% after expert review. For link prediction, PKG-LLM achieved an MRR of 0.396, P@1 of 0.385, and P@10 of 0.531, placing it in a competitive range compared to models like TransE, RotatE, and ConvE. Conclusion: This study showed that PKG-LLM mainly outperformed the existing models by adding expert reviews in its application in extraction and named entity recognition tasks. Further, the model's competitive edge in link prediction lends credence to its capability in knowledge graph construction and refinement in the field of cognitive neuroscience as well. PKG-LLM's superiority over existing models and its ability to generate more accurate results with clinical relevance indicates that it is a significant tool to augment neuroscience research and clinical applications. The evaluation process entailed using GPT-4 and expert review. This approach ensures that the resulting knowledge graph is scientifically compelling and practically beneficial in more advanced cognitive neuroscience research.


Link: https://www.preprints.org/manuscript/202502.0982/v1
@Machine_learn
👍2
From System 1 to System 2: A Survey of Reasoning Large Language Models

24 Feb 2025 · Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu ·

Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning #LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.

Paper: https://arxiv.org/pdf/2502.17419v1.pdf

Code: https://github.com/zzli2022/awesome-slow-reason-system

Datasets: GSM8K - MedQA - MathVista - GPQA - MMLU-Pro - PGPS9K

💠https://t.iss.one/LLM_learning
Please open Telegram to view this post
VIEW IN TELEGRAM
❤‍🔥3
This study delves into the capabilities and constraints of ChatGPT, a prominent large language model, in the context of automated essay scoring (AES), particularly focusing on the TOEFL Independent Writing Task. This investigation is significant as it explores the potential of ChatGPT to evaluate essays based on the diverse scoring criteria outlined in the official TOEFL guide. The primary objective is to assess whether ChatGPT can effectively serve as an AES tool, especially when dealing with small sample sizes, which often pose challenges for traditional machine learning approaches.

📁 Paper : https://arxiv.org/pdf/2401.03401

@scopeofai
https://t.iss.one/LLM_learning
👍4
Large Language Models (LLMs) have rapidly permeated the information landscape, sparking anxieties regarding the displacement of human labor. This essay moves beyond a purely technological assessment of neural networks to explore their broader socio-philosophical implications. By examining the functions of contemporary LLMs, particularly those developed by OpenAI, we revisit the enduring question of whether a machine can truly think. Furthermore, we address a critical, often overlooked aspect: are humans prepared to accept the social subjectivity of these increasingly sophisticated machines? Through the lens of social philosophy, we analyze LLMs not merely as technological products, but as social agents actively shaping and participating in the social order.

Paper: https://galacticamedia.com/index.php/gmd/article/view/502/421

@scopeofai
https://t.iss.one/LLM_learning
👍3
Agents built on LLMs (LLM agents) further extend these capabilities, allowing them to process user interactions and perform complex operations in diverse task environments. However, during the processing and generation of massive data, LLMs and LLM agents pose a risk of sensitive information leakage, potentially threatening data privacy. This paper aims to demonstrate data privacy issues associated with LLMs and LLM agents to facilitate a comprehensive understanding. Specifically, we conduct an in-depth survey about privacy threats, encompassing passive privacy leakage and active privacy attacks. Subsequently, we introduce the privacy protection mechanisms employed by LLMs and LLM agents and provide a detailed analysis of their effectiveness. Finally, we explore the privacy protection challenges for LLMs and LLM agents as well as outline potential directions for future developments in this domain.

🗂 Paper: Link

@scopeofai
https://t.iss.one/LLM_learning
👍2
This study focuses on fine-tuning Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language, to improve medical information accessibility and healthcare communication in developing countries. The methodology involves selecting a base model (BloomZ-3B, LLaMA2–7B and LLaMA2–13B), compiling a domain-specific dataset of approximately 337,000 prompt-response pairs in Vietnamese from existing datasets, Vietnamese medical online forums, and medical textbooks, and fine-tuning the model using Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. The fine-tuned models showed enhanced performance, demonstrating the potential to improve healthcare communication in low-resource languages and enhance data privacy and security.


📂 Paper: https://www.sciencedirect.com/science/article/pii/S0169260725000720/pdfft?md5=b348ebfecc8d8f8b481e23ec241da2de&pid=1-s2.0-S0169260725000720-main.pdf

@scopeofai
https://t.iss.one/LLM_learning
3👍1
Large Language Models (LLMs), such as GPT-4, have shown high accuracy in medical board exams, indicating their potential for clinical decision support. However, their metacognitive abilities—the ability to assess their own knowledge and manage uncertainty—are significantly lacking. This poses risks in medical applications where recognizing limitations and uncertainty is crucial.

To address this, researchers developed MetaMedQA, an enhanced benchmark that evaluates LLMs not just on accuracy but also on their ability to recognize unanswerable questions, manage uncertainty, and provide confidence scores. Testing revealed that while newer and larger models generally perform better in accuracy, most fail to handle uncertainty effectively and often give overconfident answers even when wrong


📁 Paper: https://www.nature.com/articles/s41467-024-55628-6.pdf

@scopeofai
@LLM_learning
👍4
The rapid advancement of large language models (LLMs), such as ChatGPT and GPT-4, has led to a surge in synthetic text generation across various domains, including journalism, academia, cybersecurity, and online discourse. While these models offer immense benefits, their ability to generate highly realistic text raises concerns regarding misinformation, academic dishonesty, and content authenticity. Consequently, the detection of LLM-generated content has become an essential area of research.

This survey provides a comprehensive overview of existing detection methodologies, benchmarks, and challenges, offering insights into the strengths and weaknesses of current techniques.The study aims to serve as a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era dominated by synthetic content.

📁 Paper: https://arxiv.org/abs/2310.15654

@scopeofai
@LLM_learning
👍5🔥1
This repository is a curated collection of survey papers focused on Large Language Models (LLMs), organized to help researchers and practitioners navigate the rapidly evolving field. It compiles existing surveys across multiple topics, including foundational overviews of LLMs, technical aspects like Transformer architectures and efficient model design, and societal considerations such as alignment with human values, fairness, and safety. The repository also covers specialized areas like multimodal LLMs (handling text, images, etc.), knowledge-augmented models, and applications in education, healthcare, and law. Each section provides direct links to relevant papers (often on arXiv) and related GitHub repositories, emphasizing recent work from the past few years. the repository serves as a centralized resource for understanding both the technical advancements and ethical challenges of LLMs.

🔗 Repository: https://github.com/NiuTrans/ABigSurveyOfLLMs

@scopeofai
@LLM_learning
👍4