Data Science by ODS.ai 🦜

👍64🔥7💩7😁4👎2🤔2

18K views11:55

LLaMA: Open and Efficient Foundation Language Models

LLaMA is a set of large language models, ranging from 7B to 65B parameters, that have been trained on publicly available datasets containing trillions of tokens. The LLaMA-13B model performs better than GPT-3 (175B) on most benchmarks, and the LLaMA-65B model is competitive with other state-of-the-art models, such as Chinchilla70B and PaLM-540B. This suggests that it is possible to achieve excellent performance in language modeling without relying on proprietary or inaccessible datasets.

Paper: https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/

Code: https://github.com/facebookresearch/llama

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-llama

#deeplearning #nlp #transformer #sota #languagemodel

🔥29👍15❤3

18.5K views14:18

Data Science by ODS.ai 🦜

Forwarded from gonzo-обзоры ML статей

Hot news: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

Training smaller foundation models like LLaMA is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases. Foundation models train on a large set of unlabeled data, which makes them ideal for fine-tuning for a variety of tasks. We are making LLaMA available at several sizes (7B, 13B, 33B, and 65B parameters) and also sharing a LLAMA model card that details how we built the model in keeping with our approach to Responsible AI practices.

In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B. We release all our models to the research community.

Model card: https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md

Paper: https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/

Form to apply: https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform

Unfortunately, it's only for non-commercial purposes :(

"You will not, and will not permit, assist or cause any third party to:

a. use, modify, copy, reproduce, create derivative works of, or distribute the Software Products (or any derivative works thereof, works incorporating the Software Products, or any data produced by the Software), in whole or in part, for (i) any commercial or production purposes ... "

Meta

Introducing LLaMA: A foundational, 65-billion-parameter language model

Today, we’re releasing our LLaMA (Large Language Model Meta AI) foundational model with a gated release. LLaMA is more efficient and competitive with previously published models of a similar size on existing benchmarks.

👍12❤2👏1

12.2K views22:36

Data Science by ODS.ai 🦜

In-Context Instruction Learning

The authors introduce a novel approach called In-Context Instruction Learning (ICIL), which greatly enhances zero-shot task generalization performance for both pretrained and instruction-fine-tuned models. ICIL employs a single fixed prompt to evaluate all tasks, which is a concatenation of cross-task demonstrations. The authors demonstrate that even the most powerful instruction-fine-tuned baseline (text-davinci-003) benefits from ICIL by 9.3%, indicating that the effect of ICIL is complementary to instruction-based fine-tuning.

Paper: https://arxiv.org/abs/2302.14691

Code: https://github.com/seonghyeonye/ICIL

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-icil

#deeplearning #nlp #transformer #sota #languagemodel

👍9

12.8K views09:29

Data Science by ODS.ai 🦜

Forwarded from ml4se

ChatML

OpenAI released ChatGPT API with Chat Markup Language. The basic idea behind ChatML is ensure the LLM model inputs are sent in structured format following ChatML and not as unstructured text.

https://github.com/openai/openai-python/blob/main/chatml.md

🔥23👍9👏1

12.2K views19:33

Data Science by ODS.ai 🦜

PaLM-E: An Embodied Multimodal Language Model

In this paper, the authors introduce the concept of "embodied language models," which integrate real-world sensory information with language processing. This integration enables the models to perform tasks related to robotics and perception seamlessly.

To achieve this, the models are trained end-to-end using a large language model and multiple sensory inputs, including visual and textual information. These models can tackle complex tasks such as sequential robotic manipulation planning, visual question answering, and captioning. The results of evaluations demonstrate the effectiveness of this approach, including positive transfer across different domains.

The flagship model, PaLM-E-562B, is the crown jewel of this research. It excels in robotics tasks and delivers state-of-the-art performance on OK-VQA. Despite its specialization in robotics, this model maintains its generalist language capabilities.

Paper: https://arxiv.org/abs/2303.03378

Project link: https://palm-e.github.io/

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-palme

#deeplearning #nlp #transformer #sota #languagemodel #robotics

👍18❤3

14K views04:28

Data Science by ODS.ai 🦜

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

ChatGPT is a language interface with distinctive conversational competency and reasoning capabilities across many domains. However, it is currently unable to process or generate images from the visual world. To address this limitation, the authors propose a system called Visual ChatGPT that incorporates different Visual Foundation Models to enable users to interact with ChatGPT using both language and images. The system is capable of handling complex visual questions or instructions that require multiple AI models and steps. Additionally, it allows for feedback and corrections.

Rather than creating a new multimodal ChatGPT from scratch, the authors propose building Visual ChatGPT by incorporating various (22) Visual Foundation Models (VFMs) directly into ChatGPT. To facilitate the integration of these VFMs, the authors introduce a Prompt Manager that supports several functions. These include specifying the input-output formats of each VFM, converting visual information to language format, and managing the histories, priorities, and conflicts of different VFMs. With the Prompt Manager's help, ChatGPT can use these VFMs iteratively and receive their feedback until it satisfies the users' requirements or reaches the end condition.

Paper: https://arxiv.org/abs/2303.04671

Code link: https://github.com/microsoft/visual-chatgpt

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-palme

#deeplearning #nlp #transformer #sota #languagemodel #visual

👍31😁1

12.5K views07:15

Data Science by ODS.ai 🦜

Forwarded from Machinelearning

⏩

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception.

OpenOccupancy first surrounding semantic occupancy perception benchmar.

🖥

Github: https://github.com/jeffwang987/openoccupancy

⏩

Paper: https://arxiv.org/abs/2303.03991v1

⭐️

Dataset: https://paperswithcode.com/dataset/synthcity

💨

Project: https://www.mmlab-ntu.com/project/styleganex/

ai_machinelearning_big_data

Please open Telegram to view this post

VIEW IN TELEGRAM

👍11❤5🔥1

12.2K views23:06

Data Science by ODS.ai 🦜

Forwarded from ml4se

Software Vulnerability Prediction Knowledge Transferring Between Programming Languages

One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, authors address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. They use C source code samples to train a CNN model, then, they use Java source code samples to adopt and evaluate the learned model. The authors use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72%.

👍12❤6

11.2K views12:07

Data Science by ODS.ai 🦜

Forwarded from gonzo-обзоры ML статей

In the meantime, some slides from my talks on NLP in 2022

https://docs.google.com/presentation/d/1m7Wpzaowbvi2je6nQERXyfQ0bzzS0dD0OArWznfOjHE/edit

Google Docs

NLP in 2022 / Intento

NLP in 2022 Grigory Sapunov Internal talks / 2023.03.01-2023.03.08 [email protected]

❤12👍4

10.7K views19:47

Data Science by ODS.ai 🦜

Hyena Hierarchy: Towards Larger Convolutional Language Models

Attention has been a cornerstone of deep learning, but it comes at a steep cost: quadratic expense in sequence length. This can limit the amount of context accessible, making it challenging for subquadratic methods like low-rank and sparse approximations to achieve comparable performance. That's where Hyena comes in!

Hyena is a revolutionary subquadratic drop-in replacement for attention that combines implicitly parametrized long convolutions and data-controlled gating. And the results speak for themselves! Hyena significantly improves accuracy in recall and reasoning tasks on long sequences, matching attention-based models.

In fact, Hyena sets a new state-of-the-art for dense-attention-free architectures in language modeling, reaching Transformer quality with 20% less training compute at sequence length 2K. And that's not all! Hyena operators are twice as fast as optimized attention at sequence length 8K and 100x faster at sequence length 64K.

Paper: https://arxiv.org/abs/2302.10866
Code link: https://github.com/HazyResearch/safari
Project link: https://hazyresearch.stanford.edu/blog/2023-03-07-hyena

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-hyena

#deeplearning #nlp #cv #languagemodel #convolution

👍22❤2

12.2K views10:34

Data Science by ODS.ai 🦜

Forwarded from ml4se

Tracking the Fake GitHub Star Black Market with Dagster, dbt and BigQuery

This is a simple Dagster project to analyze the number of fake GitHub stars on any GitHub repository:
https://github.com/dagster-io/fake-star-detector

dagster.io

Detecting Fake GitHub Stars with Dagster

Use Dagster, dbt, and BigQuery to analyze suspicious GitHub star activity and protect open-source credibility.

👍14❤3

12.4K views13:04

Data Science by ODS.ai 🦜

Interview of Ilya Sutskver

TLDR: thereotically #chatgpt can learn a lot and eventually converge to #AGI given the proper dataset and help of #RLHF (Reinforcement Learning from Human Feedback).

Video provides valuable insights into the current state and future of artificial intelligence. The conversation explores the progress of AI, its limitations, and the importance of reinforcement learning and ethics in AI development. Ilia also discusses the potential benefits of AI in democracy and its potential role in helping humans manage society. This interview offers a comprehensive and thought-provoking overview of the AI landscape, making it a must-watch for anyone interested in understanding the impact of AI on our lives and the world at large.

Youtube: https://www.youtube.com/watch?v=SjhIlw3Iffs

#youtube #Sutskever #OpenAI #GPTEditor

YouTube

The Mastermind Behind GPT-4 and the Future of AI | Ilya Sutskever

In this podcast episode, Ilya Sutskever, the co-founder and chief scientist at OpenAI, discusses his vision for the future of artificial intelligence (AI), including large language models like GPT-4.

Sutskever starts by explaining the importance of AI research…

👍15🔥7👎1

11.2K views08:11

Data Science by ODS.ai 🦜

lecun-20230324-nyuphil.pdf

30.5 MB

Do large language models need sensory grounding for meaning and understanding?

TLDR: Yes

Slides from philosophical debate by Yann LeCun, who claimed Auto-Regressive LLMs are exponentially diverging diffusion processes.

#LLM #YanLeCun

👍8❤3🥰1

10.1K views07:07

Data Science by ODS.ai 🦜

ReBotNet: Fast Real-time Video Enhancement

The authors introduce a novel Recurrent Bottleneck Mixer Network (ReBotNet) method, designed for real-time video enhancement in practical scenarios, such as live video calls and video streams. ReBotNet employs a dual-branch framework, where one branch focuses on learning spatio-temporal features, and the other aims to enhance temporal consistency. A common decoder combines the features from both branches to generate the improved frame. This method incorporates a recurrent training approach that utilizes predictions from previous frames for more efficient enhancement and superior temporal consistency.

To assess ReBotNet, the authors use two new datasets that simulate real-world situations and show that their technique surpasses existing methods in terms of reduced computations, decreased memory requirements, and quicker inference times.

Paper: https://arxiv.org/abs/2303.13504
Project link: https://jeya-maria-jose.github.io/rebotnet-web/

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-rebotnet

#deeplearning #cv #MachineLearning #VideoEnhancement #AI #Innovation #RealTimeVideo

👍16❤3

10.8K views11:08

Data Science by ODS.ai 🦜

Forwarded from Spark in me (Alexander)

My experience with PyTorch 2.0 so far:

[1] - packaging?
[2] - compilation errors

We will test other models as well.

PyTorch Forums

How to serialize models with torch.compile properly

Hi, Despite the main points in the torch.compile pitch, we faced some issues with jit, but they were tolerable, and we adopted torch.jit.save and torch packages as a model serialization / obfuscation / freezing methods (and ONNX as well). It may be seen…

🔥6👍2❤1

9.87K views21:45

Data Science by ODS.ai 🦜

Sparks of Artificial General Intelligence: Early experiments with GPT-4

TLDR: Paper from #Microsoft research about #GPT4 showing something which can be considered signs of #AGI.

ArXiV: https://arxiv.org/abs/2303.12712

arXiv.org

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our...

🤮17👍5❤3🔥1

9.87K views13:42

Data Science by ODS.ai 🦜

Forwarded from Spark in me (Alexander)

Adobe does image generation

> Adobe announced a beta of Firefly, a generative ML tool for making images, Unlike MidJourney or Stable Diffusion (or Bing) this looks a lot more like an actual product - instead of typing 50-100 works into a box trying to refine your results, there are GUI tools and settings. It also has a much more clearly-defined set of training data - note that Getty is suing Stable Diffusion for training on its images without permission. In more normal times this would be a huge story - now it’s only half way down the page.

https://firefly.adobe.com/?ref=lore.ghost.io

This really looks like a product. Also numerous tags and knobs are probably sourced from internal Adobe data.

Lots of networks here - upscaling, cycle-gan like domain transfers, inpainting, editing, plain generation, etc

I understand that their demos are probably cherry picked af, but proper product work is evident. Also probably this shows the real niche these tools are meant to occupy. Not the "AGI".

Also evident that the data requirements and scale to pull this off are huge.

👍21❤3🤣1

9.48K views07:56

Data Science by ODS.ai 🦜

Forwarded from ml4se

An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction

Authors propose a novel way of representing changes in source code, the Code Change Tree, a form that is designed to keep only the differences between two abstract syntax trees of Java source code. The appoach was evaluated in predicting if a code change introduces a vulnerability against multiple representation types and evaluated them by a number of machine learning models as a baseline. The evaluation is done on a novel dataset VIC.

RQ. 1 Can a vulnerability introducing database generated from a vulnerability fixing commit database be used for vulnerability prediction?
RQ. 2 How effective are Code Change Trees in representing source code changes?
RQ. 3 Are source code metrics sufficient to represent code changes?

dataset paper
VIC dataset

👍7❤1

8.75K views15:31

Data Science by ODS.ai 🦜

🕊Twitter Recommendation Algorithm

#Twitter disclosed the sources of its recommendation engine.

GitHub: https://github.com/twitter/the-algorithm
Blog post: https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm

#recommenders #recsys #recommendation

👍33❤9🤔1

13.3K viewsedited 19:38

Data Science by ODS.ai 🦜

Forwarded from ml4se

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

CodeGeeX is a multilingual model with 13 billion parameters for code generation. It is pre-trained on 850 billion tokens of 23 programming languages.

- Multilingual Code Generation: CodeGeeX has good performance for generating executable programs in several mainstream programming languages, including Python, C++, Java, JavaScript, Go, etc.
- Crosslingual Code Translation: CodeGeeX supports the translation of code snippets between different languages.
- Customizable Programming Assistant: CodeGeeX is available in the VS Code extension marketplace for free. It supports code completion, explanation, summarization and more, which empower users with a better coding experience.
- Open-Source and Cross-Platform: All codes and model weights are publicly available for research purposes. CodeGeeX supports both Ascend and NVIDIA platforms. It supports inference in a single Ascend 910, NVIDIA V100 or A100.

GitHub

👍27❤7🔥4

8.55K views16:18

About

Blog

Apps

Platform