Kandinsky 2.2
by Sber & AIRI
What has changed from Kandinsky 2.1
- Improved quality of image generation
- Ability to generate images with different aspect ratio
- Optimization of work with portraits to achieve photorealism
- Machine learning on an extensive dataset of 1.5b text-to-image pairs
- Generating stickers for Telegram and creating custom stickerpacks
- Drawing missing parts of a picture (inpainting).
- Creating pictures in infinite canvas mode (outpainting)
- Understanding queries in eng (Russian thу main)
- 20+ painting styles
- Mixing images
- Generating images similar to a given image
- Image styling by text description
- Possibility to change by text description separate objects or elements in images with preserving the composition of the original illustration (ControlNet)
Habr: https://habr.com/ru/companies/sberbank/articles/747446/
GH: https://github.com/ai-forever/Kandinsky-2/
Telegram-bot: https://t.iss.one/kandinsky21_bot
MLSpace: https://cloud.ru/ru/datahub/rugpt3family/kandinsky-2-2
Web-GUI for Kandinsky 2.x: https://github.com/seruva19/kubin
FusionBrain: https://fusionbrain.ai/diffusion
RUdalle: https://rudalle.ru/
Diffusers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/kandinsky2_2
by Sber & AIRI
What has changed from Kandinsky 2.1
- Improved quality of image generation
- Ability to generate images with different aspect ratio
- Optimization of work with portraits to achieve photorealism
- Machine learning on an extensive dataset of 1.5b text-to-image pairs
- Generating stickers for Telegram and creating custom stickerpacks
- Drawing missing parts of a picture (inpainting).
- Creating pictures in infinite canvas mode (outpainting)
- Understanding queries in eng (Russian thу main)
- 20+ painting styles
- Mixing images
- Generating images similar to a given image
- Image styling by text description
- Possibility to change by text description separate objects or elements in images with preserving the composition of the original illustration (ControlNet)
Habr: https://habr.com/ru/companies/sberbank/articles/747446/
GH: https://github.com/ai-forever/Kandinsky-2/
Telegram-bot: https://t.iss.one/kandinsky21_bot
MLSpace: https://cloud.ru/ru/datahub/rugpt3family/kandinsky-2-2
Web-GUI for Kandinsky 2.x: https://github.com/seruva19/kubin
FusionBrain: https://fusionbrain.ai/diffusion
RUdalle: https://rudalle.ru/
Diffusers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/kandinsky2_2
👍12👎3🔥2
Practical ML Conf - The biggest offline ML conference of the year in Moscow.
- https://pmlconf.yandex.ru
- September 7, Moscow
- For speakers: offline
- For participants: offline and online (youtube)
- The conference language is Russian.
Call for propose is open https://pmlconf.yandex.ru/call_for_papers
#conference #nlp #cv #genAI #recsys #mlops #ecomm #hardware #research #offline #online
- https://pmlconf.yandex.ru
- September 7, Moscow
- For speakers: offline
- For participants: offline and online (youtube)
- The conference language is Russian.
Call for propose is open https://pmlconf.yandex.ru/call_for_papers
#conference #nlp #cv #genAI #recsys #mlops #ecomm #hardware #research #offline #online
Practical ML Conf 2025
Конференция про практический ML от Яндекса
👍23👎13🔥6👏2
Data Science by ODS.ai 🦜
Launching the Open Data Science Talent Pool Initiative! Hello, community! We received several requests to organize some tools to match people seeking career / pet projects matching opportunities. So now we are launching the Open Data Science Talent Pool!…
@opendatascience Open Positions Post 0
We received 8 submissions for our Talent Pool so far! There are various backgrounds from data engineers to data leads, what’s the best way to connect talents with the seekers not compromising on privacy?
We suggest that people seeking to find teammates or to hire someone may post their suggestions in comments to this post 👇🏻
We received 8 submissions for our Talent Pool so far! There are various backgrounds from data engineers to data leads, what’s the best way to connect talents with the seekers not compromising on privacy?
We suggest that people seeking to find teammates or to hire someone may post their suggestions in comments to this post 👇🏻
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Introducing CM3Leon (pronounced “Chameleon”), a multi-modal language model that's revolutionizing the realms of text and image generation. This model, designed with a decoder-only, retrieval-augmented, and token-based structure, expands on the established CM3 multi-modal architecture. It showcases the striking benefits of scaling and diversification in instruction-style data. The most impressive part? It's the first of its kind, trained with a recipe inspired by text-only language models, including a substantial retrieval-augmented pretraining phase and a secondary multi-task supervised fine-tuning (SFT) stage. It exemplifies the power of general-purpose models, capable of both text-to-image and image-to-text generation.
CM3Leon isn't just a theoretical model, but a proven performer. Through extensive experiments, it demonstrates the effectiveness of this new approach for multi-modal models. Remarkably, it achieves state-of-the-art performance in text-to-image generation, requiring 5x less training compute than comparable methods, and achieving a zero-shot MS-COCO FID of 4.88. Post-SFT, CM3Leon exhibits an unmatched level of controllability across various tasks, ranging from language-guided image editing to image-controlled generation and segmentation.
Paper link: https://ai.meta.com/research/publications/scaling-autoregressive-multi-modal-models-pretraining-and-instruction-tuning/
Blogpost link: https://ai.meta.com/blog/generative-ai-text-images-cm3leon/
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-cm3leon
#deeplearning #cv #nlp #imagegeneration #sota #multimodal
Introducing CM3Leon (pronounced “Chameleon”), a multi-modal language model that's revolutionizing the realms of text and image generation. This model, designed with a decoder-only, retrieval-augmented, and token-based structure, expands on the established CM3 multi-modal architecture. It showcases the striking benefits of scaling and diversification in instruction-style data. The most impressive part? It's the first of its kind, trained with a recipe inspired by text-only language models, including a substantial retrieval-augmented pretraining phase and a secondary multi-task supervised fine-tuning (SFT) stage. It exemplifies the power of general-purpose models, capable of both text-to-image and image-to-text generation.
CM3Leon isn't just a theoretical model, but a proven performer. Through extensive experiments, it demonstrates the effectiveness of this new approach for multi-modal models. Remarkably, it achieves state-of-the-art performance in text-to-image generation, requiring 5x less training compute than comparable methods, and achieving a zero-shot MS-COCO FID of 4.88. Post-SFT, CM3Leon exhibits an unmatched level of controllability across various tasks, ranging from language-guided image editing to image-controlled generation and segmentation.
Paper link: https://ai.meta.com/research/publications/scaling-autoregressive-multi-modal-models-pretraining-and-instruction-tuning/
Blogpost link: https://ai.meta.com/blog/generative-ai-text-images-cm3leon/
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-cm3leon
#deeplearning #cv #nlp #imagegeneration #sota #multimodal
👍12❤2🔥1
Forwarded from ml4se
Using Commandline To Process CSV files
- to print the first column of a CSV file:awk -F, '{print $1}' file.csv
- to print the first and third columns of a CSV file: awk -F, '{print $1 "," $3}' file.csv
- to print only the lines of a CSV file that contain a specific string: grep "string" file.csv
- to sort a CSV file based on the values in the second column: sort -t, -k2 file.csv
- to remove the first row of a CSV file (the header row): tail -n +2 file.csv
- to remove duplicates from a CSV file based on the values in the first column: awk -F, '!seen[$1]++' file.csv
- to calculate the sum of the values in the third column of a CSV file: awk -F, '{sum+=$3} END {print sum}' file.csv
- to convert a CSV file to a JSON array: jq -R -r 'split(",") | {name:.[0],age:.[1]}' file.csv
- to convert a CSV file to a SQL INSERT statement: awk -F, '{printf "INSERT INTO table VALUES (\"%s\", \"%s\", \"%s\");\n", $1, $2, $3}' file.csv
- to print the first column of a CSV file:
👍23🥴15👎3🤔3❤2🥰2🔥1👏1🤨1
Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models
Introducing Llama 2, a cutting-edge ensemble of large language models ranging from 7 to 70 billion parameters! These models, specially fine-tuned for dialogue use cases, not only outperform existing open-source chat models but also showcase exemplary performance in safety and helpfulness. Llama 2 creators have opened the door for AI community, sharing their detailed approach to inspire further advancements in the development of responsible AI.
Project link: https://ai.meta.com/llama/
Model link: https://github.com/facebookresearch/llama
Paper link: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-llama2
#deeplearning #nlp #safetyai #responsibleai
Introducing Llama 2, a cutting-edge ensemble of large language models ranging from 7 to 70 billion parameters! These models, specially fine-tuned for dialogue use cases, not only outperform existing open-source chat models but also showcase exemplary performance in safety and helpfulness. Llama 2 creators have opened the door for AI community, sharing their detailed approach to inspire further advancements in the development of responsible AI.
Project link: https://ai.meta.com/llama/
Model link: https://github.com/facebookresearch/llama
Paper link: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-llama2
#deeplearning #nlp #safetyai #responsibleai
👍13🔥6❤4
Retentive Network: A Successor to Transformer for Large Language Models
The Retentive Network (RetNet) has been proposed as a game-changing foundation architecture for large language models. RetNet uniquely combines training parallelism, low-cost inference, and impressive performance into one sleek package. It ingeniously draws a theoretical connection between recurrence and attention, opening new avenues in AI exploration. The introduction of the retention mechanism for sequence modeling further enhances this innovation, featuring not one, not two, but three computation paradigms - parallel, recurrent, and chunkwise recurrent!
Specifically, the parallel representation provides the horsepower for training parallelism, while the recurrent representation supercharges low-cost O(1) inference, enhancing decoding throughput, latency, and GPU memory without compromising performance. For long-sequence modeling, the chunkwise recurrent representation is the ace up RetNet's sleeve, enabling efficient handling with linear complexity. Each chunk is encoded in parallel while also recurrently summarizing the chunks, which is nothing short of revolutionary. Based on experimental results in language modeling, RetNet delivers strong scaling results, parallel training, low-cost deployment, and efficient inference. All these groundbreaking features position RetNet as a formidable successor to the Transformer for large language models.
Code link: https://github.com/microsoft/unilm
Paper link: https://arxiv.org/abs/2307.08621
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-retnet
#deeplearning #nlp #llm
The Retentive Network (RetNet) has been proposed as a game-changing foundation architecture for large language models. RetNet uniquely combines training parallelism, low-cost inference, and impressive performance into one sleek package. It ingeniously draws a theoretical connection between recurrence and attention, opening new avenues in AI exploration. The introduction of the retention mechanism for sequence modeling further enhances this innovation, featuring not one, not two, but three computation paradigms - parallel, recurrent, and chunkwise recurrent!
Specifically, the parallel representation provides the horsepower for training parallelism, while the recurrent representation supercharges low-cost O(1) inference, enhancing decoding throughput, latency, and GPU memory without compromising performance. For long-sequence modeling, the chunkwise recurrent representation is the ace up RetNet's sleeve, enabling efficient handling with linear complexity. Each chunk is encoded in parallel while also recurrently summarizing the chunks, which is nothing short of revolutionary. Based on experimental results in language modeling, RetNet delivers strong scaling results, parallel training, low-cost deployment, and efficient inference. All these groundbreaking features position RetNet as a formidable successor to the Transformer for large language models.
Code link: https://github.com/microsoft/unilm
Paper link: https://arxiv.org/abs/2307.08621
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-retnet
#deeplearning #nlp #llm
👍13🔥4❤2👎1
Forwarded from Machinelearning
ECGAN новая система для решения сложной задачи семантического синтеза изображений.
🔥 Dataset: https://paperswithcode.com/dataset/cityscapes
ai_machinelearning_big_data
Please open Telegram to view this post
VIEW IN TELEGRAM
👍8🔥8❤3
Meta-Transformer: A Unified Framework for Multimodal Learning
The landscape of multimodal learning is about to witness a remarkable transformation with the introduction of Meta-Transformer, a state-of-the-art framework that's poised to overcome long-standing challenges in the field. The beauty of Meta-Transformer lies in its unique ability to process and understand information from a diverse range of modalities - from natural language, 2D images, 3D point clouds, to audio, video, time series, and tabular data. This ability stems from its innovative design that leverages a frozen encoder to map raw input data from these diverse modalities into a shared token space, eliminating the need for paired multimodal training data.
More than just a theoretical achievement, the Meta-Transformer has proven its practical application across various benchmarks, handling an impressive range of tasks from fundamental perception such as text, image, and audio processing, to more complex applications like X-Ray, infrared, and hyperspectral data interpretation, as well as data mining tasks involving graph, tabular, and time-series data.
Code link: https://github.com/invictus717/MetaTransformer
Paper link: https://arxiv.org/abs/2307.10802
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-meta-transformer
#deeplearning #nlp #transformer #cv
The landscape of multimodal learning is about to witness a remarkable transformation with the introduction of Meta-Transformer, a state-of-the-art framework that's poised to overcome long-standing challenges in the field. The beauty of Meta-Transformer lies in its unique ability to process and understand information from a diverse range of modalities - from natural language, 2D images, 3D point clouds, to audio, video, time series, and tabular data. This ability stems from its innovative design that leverages a frozen encoder to map raw input data from these diverse modalities into a shared token space, eliminating the need for paired multimodal training data.
More than just a theoretical achievement, the Meta-Transformer has proven its practical application across various benchmarks, handling an impressive range of tasks from fundamental perception such as text, image, and audio processing, to more complex applications like X-Ray, infrared, and hyperspectral data interpretation, as well as data mining tasks involving graph, tabular, and time-series data.
Code link: https://github.com/invictus717/MetaTransformer
Paper link: https://arxiv.org/abs/2307.10802
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-meta-transformer
#deeplearning #nlp #transformer #cv
👍8👨💻6🔥3❤2
Forwarded from gonzo-обзоры ML статей
An interesting theoretical result on gradient descent complexity. I missed it before.
https://www.quantamagazine.org/computer-scientists-discover-limits-of-major-research-algorithm-20210817/
The Complexity of Gradient Descent: CLS = PPAD ∩ PLS
https://arxiv.org/abs/2011.01929
https://www.quantamagazine.org/computer-scientists-discover-limits-of-major-research-algorithm-20210817/
The Complexity of Gradient Descent: CLS = PPAD ∩ PLS
https://arxiv.org/abs/2011.01929
Quanta Magazine
Computer Scientists Discover Limits of Major Research Algorithm | Quanta Magazine
The most widely used technique for finding the largest or smallest values of a math function turns out to be a fundamentally difficult computational problem.
🔥6❤3👍2😎1
TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning
The deep learning arena is abuzz with the rise of models designed for tabular data problems, challenging the traditional dominance of gradient-boosted decision trees (GBDT) algorithms. Among these, retrieval-augmented tabular DL models, which gather relevant training data like nearest neighbors for better prediction, are gaining traction. However, these novel models have only shown marginal benefits over properly tuned retrieval-free baselines, sparking a debate on the effectiveness of the retrieval-based approach.
In response to this uncertainty, this groundbreaking work presents TabR, an innovative retrieval-based tabular DL model. This breakthrough was achieved by augmenting a simple feed-forward architecture with an attention-like retrieval component. Several overlooked aspects of the attention mechanism were highlighted, leading to major performance improvements. On a set of public benchmarks, TabR stole the show, demonstrating unparalleled average performance, becoming the new state-of-the-art on numerous datasets, and even outperforming GBDT models on a recent benchmark designed to favor them.
Code link: https://github.com/yandex-research/tabular-dl-tabr
Paper link: https://arxiv.org/abs/2307.14338
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-tabr
#deeplearning #tabular
The deep learning arena is abuzz with the rise of models designed for tabular data problems, challenging the traditional dominance of gradient-boosted decision trees (GBDT) algorithms. Among these, retrieval-augmented tabular DL models, which gather relevant training data like nearest neighbors for better prediction, are gaining traction. However, these novel models have only shown marginal benefits over properly tuned retrieval-free baselines, sparking a debate on the effectiveness of the retrieval-based approach.
In response to this uncertainty, this groundbreaking work presents TabR, an innovative retrieval-based tabular DL model. This breakthrough was achieved by augmenting a simple feed-forward architecture with an attention-like retrieval component. Several overlooked aspects of the attention mechanism were highlighted, leading to major performance improvements. On a set of public benchmarks, TabR stole the show, demonstrating unparalleled average performance, becoming the new state-of-the-art on numerous datasets, and even outperforming GBDT models on a recent benchmark designed to favor them.
Code link: https://github.com/yandex-research/tabular-dl-tabr
Paper link: https://arxiv.org/abs/2307.14338
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-tabr
#deeplearning #tabular
🔥21👍8❤4
Forwarded from Технологический Болт Генона
Cost Optimisation In The Cloud – Practical Design Steps For Architects and Developers – Part 1
https://automation.baldacchino.net/cost-optimising-your-architecture-on-azure-practical-design-steps-for-builders-to-cost-optimise-your-tech-stack/
Part 2 – Infrastructure Cost Optimisation In The Cloud – Practical Design Steps For Architects and Developers
https://automation.baldacchino.net/cost-optimising-your-architecture-on-azure-practical-design-steps-for-builders-to-cost-optimise-your-tech-stack-part-2/
Part 3 – Architectural Cost Optimisation – Practical Design Steps for Architects and Developers
https://automation.baldacchino.net/part-3-architectural-cost-optimisation-practical-design-steps-for-architects-and-developers/
https://automation.baldacchino.net/cost-optimising-your-architecture-on-azure-practical-design-steps-for-builders-to-cost-optimise-your-tech-stack/
Part 2 – Infrastructure Cost Optimisation In The Cloud – Practical Design Steps For Architects and Developers
https://automation.baldacchino.net/cost-optimising-your-architecture-on-azure-practical-design-steps-for-builders-to-cost-optimise-your-tech-stack-part-2/
Part 3 – Architectural Cost Optimisation – Practical Design Steps for Architects and Developers
https://automation.baldacchino.net/part-3-architectural-cost-optimisation-practical-design-steps-for-architects-and-developers/
🔥7👍3
Forwarded from ml4se
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
In the paper, the authors
- survey open problems and fundamental limitations of RLHF and related methods;
- overview techniques to understand, improve, and complement RLHF in practice; and
- propose auditing and disclosure standards to improve societal oversight of RLHF systems.
In the paper, the authors
- survey open problems and fundamental limitations of RLHF and related methods;
- overview techniques to understand, improve, and complement RLHF in practice; and
- propose auditing and disclosure standards to improve societal oversight of RLHF systems.
👍7❤2
Tracking Anything in High Quality
Visual object tracking, a cornerstone of computer vision, is being revolutionized by the ever-increasing power of perception algorithms, facilitating the unification of single/multi-object and box/mask-based tracking. In this thrilling technological panorama, the Segment Anything Model stands out, drawing significant attention from researchers around the globe.
HQTrack is ingeniously constructed with a video multi-object segmenter and a mask refiner. VMOS, given an object in the initial frame, works its magic by propagating object masks to the current frame. However, its initial results may not be perfect due to limited training data, but that's where the MR comes in, refining these results and significantly enhancing the tracking mask quality. HQTrack claimed an impressive second place in the prestigious Visual Object Tracking and Segmentation challenge, all without resorting to any tricks such as test-time data augmentations and model ensembles.
Code link: https://github.com/jiawen-zhu/HQTrack
Paper link: https://arxiv.org/abs/2307.13974
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-hqtrack
#deeplearning #objectdetection #objecttracking
Visual object tracking, a cornerstone of computer vision, is being revolutionized by the ever-increasing power of perception algorithms, facilitating the unification of single/multi-object and box/mask-based tracking. In this thrilling technological panorama, the Segment Anything Model stands out, drawing significant attention from researchers around the globe.
HQTrack is ingeniously constructed with a video multi-object segmenter and a mask refiner. VMOS, given an object in the initial frame, works its magic by propagating object masks to the current frame. However, its initial results may not be perfect due to limited training data, but that's where the MR comes in, refining these results and significantly enhancing the tracking mask quality. HQTrack claimed an impressive second place in the prestigious Visual Object Tracking and Segmentation challenge, all without resorting to any tricks such as test-time data augmentations and model ensembles.
Code link: https://github.com/jiawen-zhu/HQTrack
Paper link: https://arxiv.org/abs/2307.13974
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-hqtrack
#deeplearning #objectdetection #objecttracking
👍9❤4
Forwarded from Machinelearning
🦩 OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
An open-source framework for training large multimodal models.
OpenFlamingo - семейство авторегрессионных моделей для обучения LMM в стиле Flamingo с параметрами от 3B до 9B.
OpenFlamingo можно использовать для создания подписи к изображению или для создания тейзисов на основе изображения. Преимуществом такого подхода является возможность быстрой адаптации к новым задачам с помощью внутриконтекстного обучения.
🖥 Github: https://github.com/mlfoundations/open_flamingo
📕 Paper: https://arxiv.org/abs/2308.01390
⭐️ Demo: https://huggingface.co/spaces/openflamingo/OpenFlamingo
☑️ Dataset: https://paperswithcode.com/dataset/flickr30k
ai_machinelearning_big_data
An open-source framework for training large multimodal models.
OpenFlamingo - семейство авторегрессионных моделей для обучения LMM в стиле Flamingo с параметрами от 3B до 9B.
OpenFlamingo можно использовать для создания подписи к изображению или для создания тейзисов на основе изображения. Преимуществом такого подхода является возможность быстрой адаптации к новым задачам с помощью внутриконтекстного обучения.
pip install open-flamingo
ai_machinelearning_big_data
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥8👍3❤2🤔1
Forwarded from gonzo-обзоры ML статей
An interesting theoretical result on gradient descent complexity. I missed it before.
https://www.quantamagazine.org/computer-scientists-discover-limits-of-major-research-algorithm-20210817/
The Complexity of Gradient Descent: CLS = PPAD ∩ PLS
https://arxiv.org/abs/2011.01929
https://www.quantamagazine.org/computer-scientists-discover-limits-of-major-research-algorithm-20210817/
The Complexity of Gradient Descent: CLS = PPAD ∩ PLS
https://arxiv.org/abs/2011.01929
Quanta Magazine
Computer Scientists Discover Limits of Major Research Algorithm | Quanta Magazine
The most widely used technique for finding the largest or smallest values of a math function turns out to be a fundamentally difficult computational problem.
🥰3
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
In an effort to tackle the generation latency of large language models (LLMs), a new approach Skeleton-of-Thought (SoT) has been developed. Motivated by human thinking and writing processes, SoT guides LLMs to generate the "skeleton" of an answer first and then fills in the content in parallel. The result is a remarkable speed-up of up to 2.39x across 11 different LLMs without losing the integrity of sequential decoding.
What sets SoT apart is its potential to improve answer quality in terms of diversity and relevance, shedding light on an exciting avenue in AI. As an initial attempt at data-centric optimization for efficiency, SoT showcases the fascinating possibility of having machines that can think more like humans.
Paper link: https://arxiv.org/abs/2307.15337
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-sot
#deeplearning #nlp #llm
In an effort to tackle the generation latency of large language models (LLMs), a new approach Skeleton-of-Thought (SoT) has been developed. Motivated by human thinking and writing processes, SoT guides LLMs to generate the "skeleton" of an answer first and then fills in the content in parallel. The result is a remarkable speed-up of up to 2.39x across 11 different LLMs without losing the integrity of sequential decoding.
What sets SoT apart is its potential to improve answer quality in terms of diversity and relevance, shedding light on an exciting avenue in AI. As an initial attempt at data-centric optimization for efficiency, SoT showcases the fascinating possibility of having machines that can think more like humans.
Paper link: https://arxiv.org/abs/2307.15337
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-sot
#deeplearning #nlp #llm
👍12❤4🔥3
Forwarded from ml4se
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
In this paper, the authors introduce a novel framework, namely RRTF (Rank Responses to align Test&Teacher Feedback), and present a new Code LLM, namely PanGu-Coder2. Firstly, they adopt the Evol-Instruct technique to obtain a substantial amount of high-quality natural language instruction and code solution data pairs. Then, they train the base model by ranking candidate code solutions using feedback from test cases and heurstic preferences.
Through comprehensive evaluations on HumanEval, CodeEval, and LeetCode benchmarks, PanGu-Coder2 achieves new state-of-the-art performance among billion-parameter-level Code LLMs, surpassing all of the existing ones by a large margin.
In this paper, the authors introduce a novel framework, namely RRTF (Rank Responses to align Test&Teacher Feedback), and present a new Code LLM, namely PanGu-Coder2. Firstly, they adopt the Evol-Instruct technique to obtain a substantial amount of high-quality natural language instruction and code solution data pairs. Then, they train the base model by ranking candidate code solutions using feedback from test cases and heurstic preferences.
Through comprehensive evaluations on HumanEval, CodeEval, and LeetCode benchmarks, PanGu-Coder2 achieves new state-of-the-art performance among billion-parameter-level Code LLMs, surpassing all of the existing ones by a large margin.
👍5❤2
AI Index: An opportunity for AI development
The National Centre for the Development of Artificial Intelligence has launched a nationwide study to determine the index of readiness of domestic organizations to implement artificial intelligence.
The AI Readiness Index will be calculated on several application areas: the use of AI in organizations, the level of maturity of infrastructure and data management, the availability of human resources and existing competencies, as well as a number of other areas that will show the availability of AI technologies for businesses.
The study is being conducted until 31 August 2023 and is confidential, the results in aggregated form will be posted on the National AI Portal: https://ai.gov.ru/.
At the moment, many SME companies do not have sufficient capabilities to implement AI. This study will help to influence support measures for businesses. We cannot claim that this study will make AI implementation available to all companies, but it is an opportunity to give an impetus to the strengthening and development of state support in the industry. Each of us can contribute to the common cause of AI development in Russia by taking the survey and participating in the research.
Here is the link: https://aibe.wciom.ru/.
The National Centre for the Development of Artificial Intelligence has launched a nationwide study to determine the index of readiness of domestic organizations to implement artificial intelligence.
The AI Readiness Index will be calculated on several application areas: the use of AI in organizations, the level of maturity of infrastructure and data management, the availability of human resources and existing competencies, as well as a number of other areas that will show the availability of AI technologies for businesses.
The study is being conducted until 31 August 2023 and is confidential, the results in aggregated form will be posted on the National AI Portal: https://ai.gov.ru/.
At the moment, many SME companies do not have sufficient capabilities to implement AI. This study will help to influence support measures for businesses. We cannot claim that this study will make AI implementation available to all companies, but it is an opportunity to give an impetus to the strengthening and development of state support in the industry. Each of us can contribute to the common cause of AI development in Russia by taking the survey and participating in the research.
Here is the link: https://aibe.wciom.ru/.
aibe.wciom.ru
ВЦИОМ ИИ
ВЦИОМ: опрос по теме ИИ
🥰9👍6💩5🤡4❤2🥱2
Forwarded from Machinelearning
🚀 AgentBench: Evaluating LLMs as Agents.
AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting.
Комплексный бенчмарк для оценки работы LLM агентов.
🖥 Github: https://github.com/thudm/agentbench
📕 Paper: https://arxiv.org/abs/2308.03688v1
☑️ Dataset: https://paperswithcode.com/dataset/alfworld
ai_machinelearning_big_data
AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting.
Комплексный бенчмарк для оценки работы LLM агентов.
ai_machinelearning_big_data
Please open Telegram to view this post
VIEW IN TELEGRAM
👍8🔥5❤1