Forwarded from Мишин Лернинг 🇺🇦🇮🇱
🛋 BIG-Bench: встречам бенчмарк и пейпер «BEYOND THE IMITATION GAME»
Название пейпера «ПО ТУ СТОРОНУ ИГРЫ В ИМИТАЦЮ» отсылает нас к работе Тьюринга, и предоставляет актуальный на 2022 год бенчмарк, призванный количественно оценить большие языковые модели, такие как GPT-3 или PaLM.
Современная мейнстрим парадигма в NLP выглядит так: «не работает на 1.3B — попробуй 12B, не выходит на 12B, бери 175B, и т.д. Не нужно искать новые подходы — attention и параметры are all you need, как говорится..»
🤔 Но как оценивать эти огромные модели?
Чтобы решить эту проблему, 442 (WHAT?!) ресерчара из 132 организаций представили тест Beyond the Imitation Game (BIG-bench). Темы теста разнообразны, они связаны с лингвистикой, развитием детей, математикой, биологией, физикой, социальными предубеждениями, разработкой программного обеспечения и т. д.
BIG-bench фокусируется на задачах, которые, как считается, выходят за рамки возможностей текущих языковых моделей.
🪑 BIG-bench
⚗️ Colab для эвала T5
🧻 paper
@мишин лернинг
Название пейпера «ПО ТУ СТОРОНУ ИГРЫ В ИМИТАЦЮ» отсылает нас к работе Тьюринга, и предоставляет актуальный на 2022 год бенчмарк, призванный количественно оценить большие языковые модели, такие как GPT-3 или PaLM.
Современная мейнстрим парадигма в NLP выглядит так: «не работает на 1.3B — попробуй 12B, не выходит на 12B, бери 175B, и т.д. Не нужно искать новые подходы — attention и параметры are all you need, как говорится..»
🤔 Но как оценивать эти огромные модели?
Чтобы решить эту проблему, 442 (WHAT?!) ресерчара из 132 организаций представили тест Beyond the Imitation Game (BIG-bench). Темы теста разнообразны, они связаны с лингвистикой, развитием детей, математикой, биологией, физикой, социальными предубеждениями, разработкой программного обеспечения и т. д.
BIG-bench фокусируется на задачах, которые, как считается, выходят за рамки возможностей текущих языковых моделей.
🪑 BIG-bench
⚗️ Colab для эвала T5
🧻 paper
@мишин лернинг
👍4🌚1
Stumbled upon a book of G.Buzsaki "The Brain from Inside-Out" (2019) after reading "Rhythms of the Brain" (btw, hit 🤔 if you would like to see a summary of the book in this channel).
Some gentlemen have kindly created a thorough document with each 2019's book chapter summarized and discussed:
[book club link]
Check out if you enjoy such neuroscience topics as:
- neural code
- oscillations
- memory coding
- systems/network neuroscience
- relation of action and cognition
#interesting #neuroscience
Some gentlemen have kindly created a thorough document with each 2019's book chapter summarized and discussed:
[book club link]
Check out if you enjoy such neuroscience topics as:
- neural code
- oscillations
- memory coding
- systems/network neuroscience
- relation of action and cognition
#interesting #neuroscience
Google Docs
Virtual book club: The brain from inside out by György Buzsáki
+Virtual book club The brain from inside out by György Buzsáki Curated by Anne Urai, @AnneEU Get the book at your local library (if they don’t have it, ask your librarian - they may purchase it for their collection, since it’s quite a new book), buy it here…
🤔10👍2
#07 Summary. A fast intracortical brain–machine interface with patterned optogenetic feedback
#bci #optogenetics
[ paper ]
⚡️ Briefly
The neuroengineers provided a proof-of-concept of a fast closed-loop brain-computer interface (BCI) in mice. They used a control signal from the motor cortex to control a virtual bar and stimulated sensory cortex if the bar “touched” a mouse. The mouse successfully learnt the behavioral task relying solely on artificial inputs and outputs.
🔎 Contents:
- Research pipeline (recording, stimulation, processing, closed-loop setup, behavioral task)
- Achieved results and performance
- Limitations and future development
- Potential and final thoughts
👉 Summary [ link ]
Looking forward to your comments and suggestions! Next time - optogenetic brain-to-brain interface 😉
#bci #optogenetics
[ paper ]
⚡️ Briefly
The neuroengineers provided a proof-of-concept of a fast closed-loop brain-computer interface (BCI) in mice. They used a control signal from the motor cortex to control a virtual bar and stimulated sensory cortex if the bar “touched” a mouse. The mouse successfully learnt the behavioral task relying solely on artificial inputs and outputs.
🔎 Contents:
- Research pipeline (recording, stimulation, processing, closed-loop setup, behavioral task)
- Achieved results and performance
- Limitations and future development
- Potential and final thoughts
👉 Summary [ link ]
Looking forward to your comments and suggestions! Next time - optogenetic brain-to-brain interface 😉
Medium
A fast intracortical brain–machine interface with patterned optogenetic feedback — paper summary
Summary #07. A fast intracortical brain–machine interface with patterned optogenetic feedback
🔥10👍3🐳1
#08 Summary. Masked Autoencoder is all you need for any modality.
#deeplearning #ml
⚡️ Briefly
To solve complicated tasks machine learning algorithm should understand data and extract good features from it. Usually training generalizing models requires a lot of annotated data. However it is expensive and in some cases impossible.
Masked Autoencoder technique allows to train model on unlabeled data and obtain surprisingly good feature representation for all common modalities.
🔎 Contents:
- Explanation of MAE approach
- Recipe for all domains
- Crazy experimental results for all types of data
👉 Summary [ link ]
Papers:
BERT : text
MAE : image
M3MAE : image + text
MAE that listen : audio spectrograms
VideoMAE : video
Looking forward to your comments and suggestions!
Next time - Mind blowing paper from Meta AI about speech reconstruction from noninvasive brain signals. 🔥🔥🔥
#deeplearning #ml
⚡️ Briefly
To solve complicated tasks machine learning algorithm should understand data and extract good features from it. Usually training generalizing models requires a lot of annotated data. However it is expensive and in some cases impossible.
Masked Autoencoder technique allows to train model on unlabeled data and obtain surprisingly good feature representation for all common modalities.
🔎 Contents:
- Explanation of MAE approach
- Recipe for all domains
- Crazy experimental results for all types of data
👉 Summary [ link ]
Papers:
BERT : text
MAE : image
M3MAE : image + text
MAE that listen : audio spectrograms
VideoMAE : video
Looking forward to your comments and suggestions!
Next time - Mind blowing paper from Meta AI about speech reconstruction from noninvasive brain signals. 🔥🔥🔥
Medium
Masked Autoencoder is all you need for any modality
Masked Autoencoder technique allows to train model on unlabeled data and obtain surprisingly good feature representation for all modalities
🔥10👏1🐳1
Connect to the new DS / DL community in the telegram. That's alternative to Open Data Science community.
So, if you are interested in deep learning, feel free to jump in. 🥳
https://t.iss.one/betterdatacommunity
So, if you are interested in deep learning, feel free to jump in. 🥳
https://t.iss.one/betterdatacommunity
Telegram
better data community
Сообщество фанатов градиентов
@maxalekv
Хочешь задать вопрос? Зайди на nometa.xyz и задавай сразу.
@maxalekv
Хочешь задать вопрос? Зайди на nometa.xyz и задавай сразу.
🔥2
✨ GPT 4 ✨
Now, it was trained on image and text. So very soon you can use images as prompt to chatGPT.
Bing uses GPT 4 under the hood.
Write your ideas about application of multimodal language model.
Link. https://openai.com/research/gpt-4
Now, it was trained on image and text. So very soon you can use images as prompt to chatGPT.
Bing uses GPT 4 under the hood.
Write your ideas about application of multimodal language model.
Link. https://openai.com/research/gpt-4
Openai
GPT-4
We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits…
❤4
This media is not supported in your browser
VIEW IN TELEGRAM
#09 Summary
🤖 RT-1: Robotics Transformer for Real-World Control at Scale
project page → link
⚡️Long story short:
RT-1 can efficiently perform everyday tasks using a hand manipulator based on text instructions.
🤓Methodology:
The research employs imitation learning to train the agent, which is composed of pre-trained language and image models, along with a decoder for predicting actions.
📌 Key Components:
1. The model receives text instructions and derives sentence embeddings using a pre-trained T5 model.
2. It processes six images (the robot's environment) via EfficientNet, with text embedding integration as detailed in the paper.
3. Subsequently, RT-1 processes multimodal (text + images) features using a decoder-only model.
📊 Training:
RT-1 was trained in a supervised setting with the aim of predicting the next action, as a human annotator would. The dataset consists of 130k demonstrations across 744 tasks. During training, RT-1 is given six frames, resulting in 48 tokens (6x8) from image and text instructions.
💃 Intriguing insights:
1. Auto-regressive methods tend to slow down and yield poorer performance.
2. Discretizing the action space enables solving classification problems rather than regression, and allows sampling from the prediction distribution.
3. Continuous actions perform worse in comparison.
4. Input tokens are computed only once, with overlapped inference applied.
5. Data diversity proves to be more critical than data quantity.
😅My thoughts:
The RT-1 model demonstrates impressive results in accomplishing everyday tasks based on text instructions.
AI models like RT-1, with their increasing abilities in complex tasks, may soon deserve human names, such as "Robert" for RT-1, to highlight their advancements.
🤖 RT-1: Robotics Transformer for Real-World Control at Scale
project page → link
⚡️Long story short:
RT-1 can efficiently perform everyday tasks using a hand manipulator based on text instructions.
🤓Methodology:
The research employs imitation learning to train the agent, which is composed of pre-trained language and image models, along with a decoder for predicting actions.
1. The model receives text instructions and derives sentence embeddings using a pre-trained T5 model.
2. It processes six images (the robot's environment) via EfficientNet, with text embedding integration as detailed in the paper.
3. Subsequently, RT-1 processes multimodal (text + images) features using a decoder-only model.
RT-1 was trained in a supervised setting with the aim of predicting the next action, as a human annotator would. The dataset consists of 130k demonstrations across 744 tasks. During training, RT-1 is given six frames, resulting in 48 tokens (6x8) from image and text instructions.
1. Auto-regressive methods tend to slow down and yield poorer performance.
2. Discretizing the action space enables solving classification problems rather than regression, and allows sampling from the prediction distribution.
3. Continuous actions perform worse in comparison.
4. Input tokens are computed only once, with overlapped inference applied.
5. Data diversity proves to be more critical than data quantity.
😅My thoughts:
The RT-1 model demonstrates impressive results in accomplishing everyday tasks based on text instructions.
AI models like RT-1, with their increasing abilities in complex tasks, may soon deserve human names, such as "Robert" for RT-1, to highlight their advancements.
Please open Telegram to view this post
VIEW IN TELEGRAM
❤4🐳2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
FingerFlex: Inferring Finger Trajectories from ECoG signals
We are happy to share our preprint FingerFlex. We propose new state of the art model for prediction finger movements from brain activity (ECoG).
✍️ paper
🧑💻 github
Authors: Vlad Lomtev, @kovalev_alvi, Alex Timchenko
Architecture.
We use a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. We show great performance with simple U-Net inspired architecture. As ECoG features we used wavelet transformation.
Data.
We use open-source motor-brain datasets: BCI Competition IV and Stanford dataset. These datasets have concurrently recorded brain activity and finger movements.
Results.
We beat all competitors on BCI Competition IV with a correlation coefficient between true and predicted trajectories up to 0.74.
🔥 Look at the video. Demonstration of our model on validation data.
We are happy to share our preprint FingerFlex. We propose new state of the art model for prediction finger movements from brain activity (ECoG).
✍️ paper
🧑💻 github
Authors: Vlad Lomtev, @kovalev_alvi, Alex Timchenko
Architecture.
We use a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. We show great performance with simple U-Net inspired architecture. As ECoG features we used wavelet transformation.
Data.
We use open-source motor-brain datasets: BCI Competition IV and Stanford dataset. These datasets have concurrently recorded brain activity and finger movements.
Results.
We beat all competitors on BCI Competition IV with a correlation coefficient between true and predicted trajectories up to 0.74.
🔥 Look at the video. Demonstration of our model on validation data.
❤13🦄3🔥2
ML papers | 01-13 June 2023
💎 Video + Text
Probabilistic Adaptation of Text-to-Video Models
What: Finetune large pretrain text to video model on small domain specific videos.
Complicated but interesting. You can finetune pretrain diffusion model on your domain with small additional block.
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
What: Finetune LLM for understanding video+audio.
Use Q-Former for getting audio and video features. Then add it to pretrained llama model.
🧬 Diffusion
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
What: propose simple implementation and intuition of diffusion model.
Good start to dive into the field and try on your data.
💎 Audio Transformers
Simple and Controllable Music Generation
What: propose decoder for text 2 audio based on latent audio features.
They use vq quantization. Check it if you don't hear about it.
It allows to represent data with a limited number of vectors.
💎 If you like this format please write in comments.
#digest
Probabilistic Adaptation of Text-to-Video Models
What: Finetune large pretrain text to video model on small domain specific videos.
Complicated but interesting. You can finetune pretrain diffusion model on your domain with small additional block.
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
What: Finetune LLM for understanding video+audio.
Use Q-Former for getting audio and video features. Then add it to pretrained llama model.
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
What: propose simple implementation and intuition of diffusion model.
Good start to dive into the field and try on your data.
Simple and Controllable Music Generation
What: propose decoder for text 2 audio based on latent audio features.
They use vq quantization. Check it if you don't hear about it.
It allows to represent data with a limited number of vectors.
#digest
Please open Telegram to view this post
VIEW IN TELEGRAM
❤9👍2🔥2🤩2🦄1
Multimodal
Add visual information to LLM using trainable adapters.
Expand LLaMA Adapters V1 to vision.
+ Apply early fusion for visual tokens.
+ Add calibration of norm, bias of the LLM model.
+ Finetune on image-text dataset.
Audio
Compress natural audio to discrete tokens with VQ technique.
Train universal compression model on all audio data: speech, music, noise.
+ add vector quantization.
+ add adversarial loss (GAN loss).
Audio generative "diffusion" model trained on 50k hours data.
Use Flow Matching, similar w/ diffusion, but better
Masked train setting with context information. The model can synthesize speech, noise removal, content editing,
Neuro
Decode tonal language from ECoG data with CNN-LSTM models.
Adapt multi-stream model -> looks unnecessary complicated.
Record small datasets. Overall 10 minutes per patient for 8 different syllables.
#digest
Please open Telegram to view this post
VIEW IN TELEGRAM
❤3🦄3🔥1
Media is too big
VIEW IN TELEGRAM
Introducing motor interface for amputee
That is the first AI model for decoding precise finger movements for people with hand amputation. It uses only 8 surface EMG electrodes.
ALVI Interface can decode different types of moves in virtual reality
🔘 finger flexion
🔘 finger extension
🟣 typing
🟣 some more
💎Full demo: YouTube link
Subscribe and follow the further progress of ALVI Labs:
Twitter: link
Instagram: link
That is the first AI model for decoding precise finger movements for people with hand amputation. It uses only 8 surface EMG electrodes.
ALVI Interface can decode different types of moves in virtual reality
💎Full demo: YouTube link
Subscribe and follow the further progress of ALVI Labs:
Twitter: link
Instagram: link
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥16👍5🦄2👎1
Meet the new ALVI Interface: a breakthrough in intuitive prosthetic control.
This technology offers individuals with hand differences a new movement experience:
✨ Wrist rotation.
🖐 Finger movement.
🕹 Interaction with objects in VR.
Discover how we're turning futuristic dreams into today's reality. Be among the first to step into this new era of possibilities.
Our demo:
https://youtu.be/Dx_6Id2clZ0?si=jF9pX3u7tSiKobM5
Twitter/X
P. S. We are going to build a hand prosthesis powered by ALVI Interface. We need your support to do it.
This technology offers individuals with hand differences a new movement experience:
✨ Wrist rotation.
🖐 Finger movement.
🕹 Interaction with objects in VR.
Discover how we're turning futuristic dreams into today's reality. Be among the first to step into this new era of possibilities.
Our demo:
https://youtu.be/Dx_6Id2clZ0?si=jF9pX3u7tSiKobM5
Twitter/X
P. S. We are going to build a hand prosthesis powered by ALVI Interface. We need your support to do it.
🔥11❤🔥3🤗2👏1
the last neural cell pinned «Meet the new ALVI Interface: a breakthrough in intuitive prosthetic control. This technology offers individuals with hand differences a new movement experience: ✨ Wrist rotation. 🖐 Finger movement. 🕹 Interaction with objects in VR. Discover how we're…»
How to adapt MAE for multimodal data generation?
Background.
We know that MAE is very good approach to learn good features. It can be used for several modalities. For example, it outperforms MAE with trained on pairs text and image via transferable features: M3AE
Also there is MaskGit model which uses MAE for generation tasks. Shortly speaking it unmasks image step by step (aka diffusion but without diffusion).
And deep mind published paper: Muse(3 Jan 2023) in which they adapt mask git for text to image.
Very interesting idea 💡
So what if these approaches merge?
In my opinion using multimodal MAE for EMG and Motions concurrent prediction might be ... well promising.
How?
1) Just use M3AE pipeline and actually get better MAE features for Muscle.
2) Add MaskGit generation pipeline.
Expected results
Universal model which can unmask/generate 2 modalities.
- EMG -> Motion
- Motion -> EMG
- Better EMG features.
What do you think?☺️
P.S. In theory, it might be used for any data: ECoG, Spikes, EEG, fMRI and so on.
Background.
We know that MAE is very good approach to learn good features. It can be used for several modalities. For example, it outperforms MAE with trained on pairs text and image via transferable features: M3AE
Also there is MaskGit model which uses MAE for generation tasks. Shortly speaking it unmasks image step by step (aka diffusion but without diffusion).
And deep mind published paper: Muse(3 Jan 2023) in which they adapt mask git for text to image.
Very interesting idea 💡
Few words about my research)
I'm developing a prosthesis control system at ALVI Labs. My goal is to decode hand motions from muscle activity(EMG).
So what if these approaches merge?
In my opinion using multimodal MAE for EMG and Motions concurrent prediction might be ... well promising.
How?
1) Just use M3AE pipeline and actually get better MAE features for Muscle.
2) Add MaskGit generation pipeline.
Expected results
Universal model which can unmask/generate 2 modalities.
- EMG -> Motion
- Motion -> EMG
- Better EMG features.
What do you think?☺️
P.S. In theory, it might be used for any data: ECoG, Spikes, EEG, fMRI and so on.
arXiv.org
Multimodal Masked Autoencoders Learn Transferable Representations
Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train...
❤4
How to adapt MAE for multimodal data generation?
Architectures of M3AE and MaskGIT.
Architectures of M3AE and MaskGIT.
❤4
Recent advances in neural population decoding
In this recent paper (NeurIPS) the researchers adapted transformers for flexible decoding of large-scale spiking populations. The model POYO-1 was trained across sessions and participants (non-human primates), thus being effectively a good pre-trained model for whichever decoding paradigm you might want to use. Transfer learning across sessions and participants is enabled in a very "DL engineer" manner: let's just create an embedding of the session and process it along-side with spike embeddings - it seems to work well.
Unfortunately they did not release the code yet, but in the meantime you can check the website of a project.
One of the potential limitations is data was collected from M1, PMd and S1, which was shown to contain the task-relevant information, usually in the population vector coding format. Does the pre-trained transformer generalize to, for example, auditory cortex to decode imagined speech, remains unclear.
For me, seems like a hype paper, cannot wait until the code/weights are released to try it out. Let us know in the comments, which pitfalls you notice - it always feels good to debunk crowd decision making results 🙃
***
If you're rather interested in applying a model to any kind of data to infer low-dimensional trajectories representing the behavior check out CEBRA (paper). They allow you to use any kind of time series data (calcium imaging, spikes, voltages) along side with behavior. They also enable transfer learning and generalization across participants by explicitly encoding trial_ID, session_ID, subject_ID for the model.
👉 Here is the website: https://cebra.ai/
In this recent paper (NeurIPS) the researchers adapted transformers for flexible decoding of large-scale spiking populations. The model POYO-1 was trained across sessions and participants (non-human primates), thus being effectively a good pre-trained model for whichever decoding paradigm you might want to use. Transfer learning across sessions and participants is enabled in a very "DL engineer" manner: let's just create an embedding of the session and process it along-side with spike embeddings - it seems to work well.
Unfortunately they did not release the code yet, but in the meantime you can check the website of a project.
One of the potential limitations is data was collected from M1, PMd and S1, which was shown to contain the task-relevant information, usually in the population vector coding format. Does the pre-trained transformer generalize to, for example, auditory cortex to decode imagined speech, remains unclear.
For me, seems like a hype paper, cannot wait until the code/weights are released to try it out. Let us know in the comments, which pitfalls you notice - it always feels good to debunk crowd decision making results 🙃
***
If you're rather interested in applying a model to any kind of data to infer low-dimensional trajectories representing the behavior check out CEBRA (paper). They allow you to use any kind of time series data (calcium imaging, spikes, voltages) along side with behavior. They also enable transfer learning and generalization across participants by explicitly encoding trial_ID, session_ID, subject_ID for the model.
👉 Here is the website: https://cebra.ai/
arXiv.org
A Unified, Scalable Framework for Neural Population Decoding
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural...
🔥2❤1👍1
A collegue from Max Planck Institute in Tübingen talks about his foundational model of a cortical neuron, able to capture rich dynamics of real neurons.
https://twitter.com/OpenNeuroMorph/status/1762523010895081807?s=19
https://arxiv.org/abs/2306.16922
https://twitter.com/OpenNeuroMorph/status/1762523010895081807?s=19
https://arxiv.org/abs/2306.16922
X (formerly Twitter)
Open Neuromorphic (@OpenNeuroMorph) on X
The ELM Neuron: An Efficient and Expressive Cortical Neuron Model https://t.co/KQYjKPcZWn
🔥6❤1
Forwarded from Axis of Ordinary
DeepMind introduces SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games. 🕹 https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/
"This research marks the first time an agent has demonstrated it can understand a broad range of gaming worlds, and follow natural-language instructions to carry out tasks within them, as a human might.
SIMA is an AI agent that can perceive and understand a variety of environments, then take actions to achieve an instructed goal. It comprises a model designed for precise image-language mapping and a video model that predicts what will happen next on-screen.
It requires just two inputs: the images on screen, and simple, natural-language instructions provided by the user.
The ability to function in brand new environments highlights SIMA’s ability to generalize beyond its training.
Our results also show that SIMA’s performance relies on language."
"This research marks the first time an agent has demonstrated it can understand a broad range of gaming worlds, and follow natural-language instructions to carry out tasks within them, as a human might.
SIMA is an AI agent that can perceive and understand a variety of environments, then take actions to achieve an instructed goal. It comprises a model designed for precise image-language mapping and a video model that predicts what will happen next on-screen.
It requires just two inputs: the images on screen, and simple, natural-language instructions provided by the user.
The ability to function in brand new environments highlights SIMA’s ability to generalize beyond its training.
Our results also show that SIMA’s performance relies on language."
❤3🔥2👍1