Forwarded from gonzo-ะพะฑะทะพัั ML ััะฐัะตะน
In the meantime, some slides from my talks on NLP in 2022
https://docs.google.com/presentation/d/1m7Wpzaowbvi2je6nQERXyfQ0bzzS0dD0OArWznfOjHE/edit
https://docs.google.com/presentation/d/1m7Wpzaowbvi2je6nQERXyfQ0bzzS0dD0OArWznfOjHE/edit
Google Docs
NLP in 2022 / Intento
NLP in 2022 Grigory Sapunov Internal talks / 2023.03.01-2023.03.08 [email protected]
โค12๐4
โโHyena Hierarchy: Towards Larger Convolutional Language Models
Attention has been a cornerstone of deep learning, but it comes at a steep cost: quadratic expense in sequence length. This can limit the amount of context accessible, making it challenging for subquadratic methods like low-rank and sparse approximations to achieve comparable performance. That's where Hyena comes in!
Hyena is a revolutionary subquadratic drop-in replacement for attention that combines implicitly parametrized long convolutions and data-controlled gating. And the results speak for themselves! Hyena significantly improves accuracy in recall and reasoning tasks on long sequences, matching attention-based models.
In fact, Hyena sets a new state-of-the-art for dense-attention-free architectures in language modeling, reaching Transformer quality with 20% less training compute at sequence length 2K. And that's not all! Hyena operators are twice as fast as optimized attention at sequence length 8K and 100x faster at sequence length 64K.
Paper: https://arxiv.org/abs/2302.10866
Code link: https://github.com/HazyResearch/safari
Project link: https://hazyresearch.stanford.edu/blog/2023-03-07-hyena
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-hyena
#deeplearning #nlp #cv #languagemodel #convolution
Attention has been a cornerstone of deep learning, but it comes at a steep cost: quadratic expense in sequence length. This can limit the amount of context accessible, making it challenging for subquadratic methods like low-rank and sparse approximations to achieve comparable performance. That's where Hyena comes in!
Hyena is a revolutionary subquadratic drop-in replacement for attention that combines implicitly parametrized long convolutions and data-controlled gating. And the results speak for themselves! Hyena significantly improves accuracy in recall and reasoning tasks on long sequences, matching attention-based models.
In fact, Hyena sets a new state-of-the-art for dense-attention-free architectures in language modeling, reaching Transformer quality with 20% less training compute at sequence length 2K. And that's not all! Hyena operators are twice as fast as optimized attention at sequence length 8K and 100x faster at sequence length 64K.
Paper: https://arxiv.org/abs/2302.10866
Code link: https://github.com/HazyResearch/safari
Project link: https://hazyresearch.stanford.edu/blog/2023-03-07-hyena
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-hyena
#deeplearning #nlp #cv #languagemodel #convolution
๐22โค2
Forwarded from ml4se
Tracking the Fake GitHub Star Black Market with Dagster, dbt and BigQuery
This is a simple Dagster project to analyze the number of fake GitHub stars on any GitHub repository:
https://github.com/dagster-io/fake-star-detector
This is a simple Dagster project to analyze the number of fake GitHub stars on any GitHub repository:
https://github.com/dagster-io/fake-star-detector
dagster.io
Detecting Fake GitHub Stars with Dagster
Use Dagster, dbt, and BigQuery to analyze suspicious GitHub star activity and protect open-source credibility.
๐14โค3
Interview of Ilya Sutskver
TLDR: thereotically #chatgpt can learn a lot and eventually converge to #AGI given the proper dataset and help of #RLHF (Reinforcement Learning from Human Feedback).
Video provides valuable insights into the current state and future of artificial intelligence. The conversation explores the progress of AI, its limitations, and the importance of reinforcement learning and ethics in AI development. Ilia also discusses the potential benefits of AI in democracy and its potential role in helping humans manage society. This interview offers a comprehensive and thought-provoking overview of the AI landscape, making it a must-watch for anyone interested in understanding the impact of AI on our lives and the world at large.
Youtube: https://www.youtube.com/watch?v=SjhIlw3Iffs
#youtube #Sutskever #OpenAI #GPTEditor
TLDR: thereotically #chatgpt can learn a lot and eventually converge to #AGI given the proper dataset and help of #RLHF (Reinforcement Learning from Human Feedback).
Video provides valuable insights into the current state and future of artificial intelligence. The conversation explores the progress of AI, its limitations, and the importance of reinforcement learning and ethics in AI development. Ilia also discusses the potential benefits of AI in democracy and its potential role in helping humans manage society. This interview offers a comprehensive and thought-provoking overview of the AI landscape, making it a must-watch for anyone interested in understanding the impact of AI on our lives and the world at large.
Youtube: https://www.youtube.com/watch?v=SjhIlw3Iffs
#youtube #Sutskever #OpenAI #GPTEditor
YouTube
The Mastermind Behind GPT-4 and the Future of AI | Ilya Sutskever
In this podcast episode, Ilya Sutskever, the co-founder and chief scientist at OpenAI, discusses his vision for the future of artificial intelligence (AI), including large language models like GPT-4.
Sutskever starts by explaining the importance of AI researchโฆ
Sutskever starts by explaining the importance of AI researchโฆ
๐15๐ฅ7๐1
lecun-20230324-nyuphil.pdf
30.5 MB
Do large language models need sensory grounding for meaning and understanding?
TLDR: Yes
Slides from philosophical debate by Yann LeCun, who claimed Auto-Regressive LLMs are exponentially diverging diffusion processes.
#LLM #YanLeCun
TLDR: Yes
Slides from philosophical debate by Yann LeCun, who claimed Auto-Regressive LLMs are exponentially diverging diffusion processes.
#LLM #YanLeCun
๐8โค3๐ฅฐ1
โโReBotNet: Fast Real-time Video Enhancement
The authors introduce a novel Recurrent Bottleneck Mixer Network (ReBotNet) method, designed for real-time video enhancement in practical scenarios, such as live video calls and video streams. ReBotNet employs a dual-branch framework, where one branch focuses on learning spatio-temporal features, and the other aims to enhance temporal consistency. A common decoder combines the features from both branches to generate the improved frame. This method incorporates a recurrent training approach that utilizes predictions from previous frames for more efficient enhancement and superior temporal consistency.
To assess ReBotNet, the authors use two new datasets that simulate real-world situations and show that their technique surpasses existing methods in terms of reduced computations, decreased memory requirements, and quicker inference times.
Paper: https://arxiv.org/abs/2303.13504
Project link: https://jeya-maria-jose.github.io/rebotnet-web/
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-rebotnet
#deeplearning #cv #MachineLearning #VideoEnhancement #AI #Innovation #RealTimeVideo
The authors introduce a novel Recurrent Bottleneck Mixer Network (ReBotNet) method, designed for real-time video enhancement in practical scenarios, such as live video calls and video streams. ReBotNet employs a dual-branch framework, where one branch focuses on learning spatio-temporal features, and the other aims to enhance temporal consistency. A common decoder combines the features from both branches to generate the improved frame. This method incorporates a recurrent training approach that utilizes predictions from previous frames for more efficient enhancement and superior temporal consistency.
To assess ReBotNet, the authors use two new datasets that simulate real-world situations and show that their technique surpasses existing methods in terms of reduced computations, decreased memory requirements, and quicker inference times.
Paper: https://arxiv.org/abs/2303.13504
Project link: https://jeya-maria-jose.github.io/rebotnet-web/
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-rebotnet
#deeplearning #cv #MachineLearning #VideoEnhancement #AI #Innovation #RealTimeVideo
๐16โค3
Forwarded from Spark in me (Alexander)
My experience with PyTorch 2.0 so far:
[1] - packaging?
[2] - compilation errors
We will test other models as well.
[1] - packaging?
[2] - compilation errors
We will test other models as well.
PyTorch Forums
How to serialize models with torch.compile properly
Hi, Despite the main points in the torch.compile pitch, we faced some issues with jit, but they were tolerable, and we adopted torch.jit.save and torch packages as a model serialization / obfuscation / freezing methods (and ONNX as well). It may be seenโฆ
๐ฅ6๐2โค1
Sparks of Artificial General Intelligence: Early experiments with GPT-4
TLDR: Paper from #Microsoft research about #GPT4 showing something which can be considered signs of #AGI.
ArXiV: https://arxiv.org/abs/2303.12712
TLDR: Paper from #Microsoft research about #GPT4 showing something which can be considered signs of #AGI.
ArXiV: https://arxiv.org/abs/2303.12712
arXiv.org
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our...
๐คฎ17๐5โค3๐ฅ1
Forwarded from Spark in me (Alexander)
Adobe does image generation
> Adobe announced a beta of Firefly, a generative ML tool for making images, Unlike MidJourney or Stable Diffusion (or Bing) this looks a lot more like an actual product - instead of typing 50-100 works into a box trying to refine your results, there are GUI tools and settings. It also has a much more clearly-defined set of training data - note that Getty is suing Stable Diffusion for training on its images without permission. In more normal times this would be a huge story - now itโs only half way down the page.
https://firefly.adobe.com/?ref=lore.ghost.io
This really looks like a product. Also numerous tags and knobs are probably sourced from internal Adobe data.
Lots of networks here - upscaling, cycle-gan like domain transfers, inpainting, editing, plain generation, etc
I understand that their demos are probably cherry picked af, but proper product work is evident. Also probably this shows the real niche these tools are meant to occupy. Not the "AGI".
Also evident that the data requirements and scale to pull this off are huge.
> Adobe announced a beta of Firefly, a generative ML tool for making images, Unlike MidJourney or Stable Diffusion (or Bing) this looks a lot more like an actual product - instead of typing 50-100 works into a box trying to refine your results, there are GUI tools and settings. It also has a much more clearly-defined set of training data - note that Getty is suing Stable Diffusion for training on its images without permission. In more normal times this would be a huge story - now itโs only half way down the page.
https://firefly.adobe.com/?ref=lore.ghost.io
This really looks like a product. Also numerous tags and knobs are probably sourced from internal Adobe data.
Lots of networks here - upscaling, cycle-gan like domain transfers, inpainting, editing, plain generation, etc
I understand that their demos are probably cherry picked af, but proper product work is evident. Also probably this shows the real niche these tools are meant to occupy. Not the "AGI".
Also evident that the data requirements and scale to pull this off are huge.
๐21โค3๐คฃ1
Forwarded from ml4se
An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction
Authors propose a novel way of representing changes in source code, the Code Change Tree, a form that is designed to keep only the differences between two abstract syntax trees of Java source code. The appoach was evaluated in predicting if a code change introduces a vulnerability against multiple representation types and evaluated them by a number of machine learning models as a baseline. The evaluation is done on a novel dataset VIC.
RQ. 1 Can a vulnerability introducing database generated from a vulnerability fixing commit database be used for vulnerability prediction?
RQ. 2 How effective are Code Change Trees in representing source code changes?
RQ. 3 Are source code metrics sufficient to represent code changes?
dataset paper
VIC dataset
Authors propose a novel way of representing changes in source code, the Code Change Tree, a form that is designed to keep only the differences between two abstract syntax trees of Java source code. The appoach was evaluated in predicting if a code change introduces a vulnerability against multiple representation types and evaluated them by a number of machine learning models as a baseline. The evaluation is done on a novel dataset VIC.
RQ. 1 Can a vulnerability introducing database generated from a vulnerability fixing commit database be used for vulnerability prediction?
RQ. 2 How effective are Code Change Trees in representing source code changes?
RQ. 3 Are source code metrics sufficient to represent code changes?
dataset paper
VIC dataset
๐7โค1
๐Twitter Recommendation Algorithm
#Twitter disclosed the sources of its recommendation engine.
GitHub: https://github.com/twitter/the-algorithm
Blog post: https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm
#recommenders #recsys #recommendation
#Twitter disclosed the sources of its recommendation engine.
GitHub: https://github.com/twitter/the-algorithm
Blog post: https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm
#recommenders #recsys #recommendation
๐33โค9๐ค1
Forwarded from ml4se
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
CodeGeeX is a multilingual model with 13 billion parameters for code generation. It is pre-trained on 850 billion tokens of 23 programming languages.
- Multilingual Code Generation: CodeGeeX has good performance for generating executable programs in several mainstream programming languages, including Python, C++, Java, JavaScript, Go, etc.
- Crosslingual Code Translation: CodeGeeX supports the translation of code snippets between different languages.
- Customizable Programming Assistant: CodeGeeX is available in the VS Code extension marketplace for free. It supports code completion, explanation, summarization and more, which empower users with a better coding experience.
- Open-Source and Cross-Platform: All codes and model weights are publicly available for research purposes. CodeGeeX supports both Ascend and NVIDIA platforms. It supports inference in a single Ascend 910, NVIDIA V100 or A100.
GitHub
CodeGeeX is a multilingual model with 13 billion parameters for code generation. It is pre-trained on 850 billion tokens of 23 programming languages.
- Multilingual Code Generation: CodeGeeX has good performance for generating executable programs in several mainstream programming languages, including Python, C++, Java, JavaScript, Go, etc.
- Crosslingual Code Translation: CodeGeeX supports the translation of code snippets between different languages.
- Customizable Programming Assistant: CodeGeeX is available in the VS Code extension marketplace for free. It supports code completion, explanation, summarization and more, which empower users with a better coding experience.
- Open-Source and Cross-Platform: All codes and model weights are publicly available for research purposes. CodeGeeX supports both Ascend and NVIDIA platforms. It supports inference in a single Ascend 910, NVIDIA V100 or A100.
GitHub
๐27โค7๐ฅ4
Data Science by ODS.ai ๐ฆ
โโInteractive and explorable explanations Collection of links to different explanations of how things work. Link: https://explorabl.es How network effect (ideas, diseases) works: https://meltingasphalt.com/interactive/going-critical/ How trust works: hโฆ
Complexity Explorables
Another collection of interactive explorable explanations of complex systems in biology, physics, mathematics, social sciences, epidemiology, ecology
Link: https://www.complexity-explorables.org
The emergence of communities in weighted networks: https://www.complexity-explorables.org/explorables/jujujajaki-networks/
#interactive #demo #systems #explanations
Another collection of interactive explorable explanations of complex systems in biology, physics, mathematics, social sciences, epidemiology, ecology
Link: https://www.complexity-explorables.org
The emergence of communities in weighted networks: https://www.complexity-explorables.org/explorables/jujujajaki-networks/
#interactive #demo #systems #explanations
๐10๐ฅ4โค2
Reliable ML track at Data Fest Online 2023
Call for Papers
Friends, we are glad to inform you that the largest Russian-language conference on Data Science - Data Fest - from the Open Data Science community will take place in 2023 (at the end of May).
And it will again have a section from Reliable ML community. We are waiting for your applications for reports: write directly to me or Dmitry.
Track Info
The concept of Reliable ML is about what to do so that the result of the work of data teams would be, firstly, applicable in the business processes of the customer company and, secondly, brought benefits to this company.
For this you need to be able to:
- correctly build a portfolio of projects (#business)
- think over the system design of each project (#ml_system_design)
- overcome various difficulties when developing a prototype (#tech #causal_inference #metrics)
- explain to the business that your MVP deserves a pilot (#interpretable_ml)
- conduct a pilot (#causal_inference #ab_testing)
- implement your solution in business processes (#tech #mlops #business)
- set up solution monitoring in the productive environment (#tech #mlops)
If you have something to say on the topics above, write to us! If in doubt, write anyway. Many of the coolest reports of previous Reliable ML tracks have come about as a result of discussion and collaboration on the topic.
If you are not ready to make a report but want to listen to something interesting, you can still help! Repost to a relevant community / forward to a friend = participate in the creation of good content.
Registration and full information about Data Fest 2023 is here.
@Reliable ML
Call for Papers
Friends, we are glad to inform you that the largest Russian-language conference on Data Science - Data Fest - from the Open Data Science community will take place in 2023 (at the end of May).
And it will again have a section from Reliable ML community. We are waiting for your applications for reports: write directly to me or Dmitry.
Track Info
The concept of Reliable ML is about what to do so that the result of the work of data teams would be, firstly, applicable in the business processes of the customer company and, secondly, brought benefits to this company.
For this you need to be able to:
- correctly build a portfolio of projects (#business)
- think over the system design of each project (#ml_system_design)
- overcome various difficulties when developing a prototype (#tech #causal_inference #metrics)
- explain to the business that your MVP deserves a pilot (#interpretable_ml)
- conduct a pilot (#causal_inference #ab_testing)
- implement your solution in business processes (#tech #mlops #business)
- set up solution monitoring in the productive environment (#tech #mlops)
If you have something to say on the topics above, write to us! If in doubt, write anyway. Many of the coolest reports of previous Reliable ML tracks have come about as a result of discussion and collaboration on the topic.
If you are not ready to make a report but want to listen to something interesting, you can still help! Repost to a relevant community / forward to a friend = participate in the creation of good content.
Registration and full information about Data Fest 2023 is here.
@Reliable ML
๐11๐ฅ1
โโBloombergGPT: A Large Language Model for Finance
The realm of financial technology involves a wide range of NLP applications, such as sentiment analysis, named entity recognition, and question answering. Although Large Language Models (LLMs) have demonstrated effectiveness in various tasks, no LLM specialized for the financial domain has been reported so far. This work introduces BloombergGPT, a 50-billion-parameter language model trained on an extensive range of financial data. The researchers have created a massive 363-billion-token dataset using Bloomberg's data sources, supplemented with 345 billion tokens from general-purpose datasets, potentially creating the largest domain-specific dataset to date.
BloombergGPT has been validated on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that accurately reflect its intended usage. The mixed dataset training results in a model that significantly outperforms existing models on financial tasks without sacrificing performance on general LLM benchmarks. The paper also discusses modeling choices, training processes, and evaluation methodology. As a next step, the researchers plan to release training logs (Chronicles) detailing their experience in training BloombergGPT.
Paper: https://arxiv.org/abs/2303.17564
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-bloomberggpt
#deeplearning #nlp #transformer #sota #languagemodel #finance
The realm of financial technology involves a wide range of NLP applications, such as sentiment analysis, named entity recognition, and question answering. Although Large Language Models (LLMs) have demonstrated effectiveness in various tasks, no LLM specialized for the financial domain has been reported so far. This work introduces BloombergGPT, a 50-billion-parameter language model trained on an extensive range of financial data. The researchers have created a massive 363-billion-token dataset using Bloomberg's data sources, supplemented with 345 billion tokens from general-purpose datasets, potentially creating the largest domain-specific dataset to date.
BloombergGPT has been validated on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that accurately reflect its intended usage. The mixed dataset training results in a model that significantly outperforms existing models on financial tasks without sacrificing performance on general LLM benchmarks. The paper also discusses modeling choices, training processes, and evaluation methodology. As a next step, the researchers plan to release training logs (Chronicles) detailing their experience in training BloombergGPT.
Paper: https://arxiv.org/abs/2303.17564
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-bloomberggpt
#deeplearning #nlp #transformer #sota #languagemodel #finance
๐ค16๐7๐ฅ4โค2๐ฅฐ1
Forwarded from gonzo-ะพะฑะทะพัั ML ััะฐัะตะน
Stanford 2023 AI Index Report is published!
The section on machine translation is based on Intento data as usual :)
https://aiindex.stanford.edu/report/
The section on machine translation is based on Intento data as usual :)
https://aiindex.stanford.edu/report/
๐ฅ11๐4๐ฅฑ1
Pandas v2.0.0
The main enhancements:
- installing optional dependencies with pip extras
-
- argument
- copy-on-write improvements
- ..
+ other notable bug fixes
Full list of changes: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html
The main enhancements:
- installing optional dependencies with pip extras
-
index
can now hold numpy numeric dtypes- argument
dtype_backend
, to return pyarrow-backed or numpy-backed nullable dtypes- copy-on-write improvements
- ..
+ other notable bug fixes
Full list of changes: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html
๐ฅ29โค4๐3๐2๐2๐2
Kandinsky 2.1
by Sber & AIRI
The main features:
- 3.3B parameters
- generation resolution - 768x768
- image prior transformer
- new MoVQ image autoencoder
- doing a cleaner set of 172M text-image pairs
- work modes: generate by text, blend image, generate images by pattern, change images by text, inpainting/outpainting
The FID on the COCO_30k dataset reaches 8.21
Few posts where compare Kandinsky 2.1 with another similar models
- https://t.iss.one/dushapitona/643
- https://t.iss.one/antidigital/6153
Habr: https://habr.com/ru/companies/sberbank/articles/725282/
Telegram-bot: https://t.iss.one/kandinsky21_bot
ruDALL-E: https://rudalle.ru/
MLSpace: https://sbercloud.ru/ru/datahub/rugpt3family/kandinsky-2-1
GH: https://github.com/ai-forever/Kandinsky-2
HF model: https://huggingface.co/ai-forever/Kandinsky_2.1
HF space: https://huggingface.co/spaces/ai-forever/Kandinsky2.1
FusionBrain: https://fusionbrain.ai/diffusion
by Sber & AIRI
The main features:
- 3.3B parameters
- generation resolution - 768x768
- image prior transformer
- new MoVQ image autoencoder
- doing a cleaner set of 172M text-image pairs
- work modes: generate by text, blend image, generate images by pattern, change images by text, inpainting/outpainting
The FID on the COCO_30k dataset reaches 8.21
Few posts where compare Kandinsky 2.1 with another similar models
- https://t.iss.one/dushapitona/643
- https://t.iss.one/antidigital/6153
Habr: https://habr.com/ru/companies/sberbank/articles/725282/
Telegram-bot: https://t.iss.one/kandinsky21_bot
ruDALL-E: https://rudalle.ru/
MLSpace: https://sbercloud.ru/ru/datahub/rugpt3family/kandinsky-2-1
GH: https://github.com/ai-forever/Kandinsky-2
HF model: https://huggingface.co/ai-forever/Kandinsky_2.1
HF space: https://huggingface.co/spaces/ai-forever/Kandinsky2.1
FusionBrain: https://fusionbrain.ai/diffusion
๐31โค1
Forwarded from Kier from TOP
Rask โ service for AI-supported video localization
TLDR: Service which allows to translate video end-to-end between languages.
Rask AI offers voice cloning capabilities to make your voice part of your brand, although it has a library of natural and human-like voices to choose from. They currently support the output of videos in the following languages: German, French, Spanish, Chinese, English, and Portuguese, regardless of the source language.
In the near future, a team plans to offer additional services such as captions and subtitles and increase the number of supported languages up to 60 languages.
They havenโt raised any funds for the current setup and currently are launched on the Product Hunt. You are welcome to support them via link below (we all know how important it is for founders, right?).
Website: https://www.rask.ai/
ProductHunt: https://www.producthunt.com/posts/rask-ai-video-localization-dubbing-app
#producthunt #aiproduct #localization
TLDR: Service which allows to translate video end-to-end between languages.
Rask AI offers voice cloning capabilities to make your voice part of your brand, although it has a library of natural and human-like voices to choose from. They currently support the output of videos in the following languages: German, French, Spanish, Chinese, English, and Portuguese, regardless of the source language.
In the near future, a team plans to offer additional services such as captions and subtitles and increase the number of supported languages up to 60 languages.
They havenโt raised any funds for the current setup and currently are launched on the Product Hunt. You are welcome to support them via link below (we all know how important it is for founders, right?).
Website: https://www.rask.ai/
ProductHunt: https://www.producthunt.com/posts/rask-ai-video-localization-dubbing-app
#producthunt #aiproduct #localization
๐9โค4๐ฉ4๐2๐คก2