#02 Review.
Brain stimulation and imaging methods. Part 2.
📝 My Notes:
- Particularly excited about possible sonogenetic approach: this overcomes both optogenetics pitfalls (invasive) and tFUS (non-specific) allowing for precise information tranfer in the brain
- tFUS resembles TMS a lot (modulatory, non cell-type specific), although it has two advantages: stimulation of deeper structures and more spatial specificity. I'd say it's next step in non-invasive neuromodulation!
🐎 Further reading:
Physics underlying tFUS
Original ultrasound focusing method article (2001)
----------------
{author} = @Altime
🔥 Next review is gonna be about a very cool method of manipulation of cell activity by light - optogenetics. Stay tuned!
Brain stimulation and imaging methods. Part 2.
📝 My Notes:
- Particularly excited about possible sonogenetic approach: this overcomes both optogenetics pitfalls (invasive) and tFUS (non-specific) allowing for precise information tranfer in the brain
- tFUS resembles TMS a lot (modulatory, non cell-type specific), although it has two advantages: stimulation of deeper structures and more spatial specificity. I'd say it's next step in non-invasive neuromodulation!
🐎 Further reading:
Physics underlying tFUS
Original ultrasound focusing method article (2001)
----------------
{author} = @Altime
🔥 Next review is gonna be about a very cool method of manipulation of cell activity by light - optogenetics. Stay tuned!
🤩3🔥1🐳1
#03 Review. Part 1.
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
🧐 At a glance:
Brain computer interfaces (BCI) let you use your brain activity to control devices. You can manage a computer, a prosthesis, and even a speech vocoder. In the present day, researchers are actively investigating the capabilities of such interfaces. In this paper researchers investigated the possibility of developing "language prostheses".
paper: link
code: link
data: not availabe. But you can try to ask authors. → link
🤿 Motivation:
This article is about decoding imagined speech. Speech retrieval is one of the most significant and challenging tasks in BCI. We can observe significant progress in explicit (overt) speech decoding. In other words, a person says the words out loud and we can tell from the brain activity what they said.
However, it's more difficult to decode imagined speech. There's a problem in that it's not clear where to measure brain activity and how to process it (which features and signs to use). These are the questions the authors try to answer:
What brain regions have the best decoding potential?
What's the most informative neural feature?
🍋 Main Ideas
Neuroscience views:
Actually, it's not easy to say how speech is made in our brains. Several theories exist.
Motor hypothesis. Imagined speech and overt speech have similar an articulatory plan in our brain.
Abstraction hypothesis. We can produce imagined speech without explicit motor plan.
Flexible abstraction hypothesis. Imagined speech is phonemic based (sound of language). In this case, the neural activity depends on how each person imagines speech: subarticulation or perceptual.
Experiments description. What they did?
Electrocorticography (ECoG) - electrodes placed directly on the brain (like EEG but without annoying scalp). It is an invasive procedure.
Patients with ECoG perform language tasks. There are 3 studies with different experimental protocols and different ECoG positions. In this task, you should imagine/speak/listen certain words after a clue.
Feature extraction.
Frequence decomposition. Extract 4 frequency bands.
Cross-frequency coupling (CFC) allows to link activities that occur at different rates (frequencies). Authors use link between phase and amplitude from different bands.
Analysis.
Compare brain activity during different task (listen vs speak vs imagine)
Determine influence of each region and feature for word decoding accuracy.
📈 Experiment insights / Key takeaways:
Researchers found that overt and imagined speech production had different dynamics and neural organization. The biggest difference between them was that high-frequency activity (BHA)in the superior temporal cortex increased during overt speech, but decreased during imagined speech. The high frequency band is the best for telling the difference between real and imagined.
It means means that transition in language decoding from overt to imagined speech might be tricky. (We can not train model on overt speech and use it for imagined speech decoding.)
Difference and similarity.
- Superior temporal lobe : BHA increased during overt but decreased during imagined.
- Motor sensory region: BHA increase in both case.
- Left inferior and right anterior temporal lobe: Strong CFC in the between theta phase in both cases.
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
🧐 At a glance:
Brain computer interfaces (BCI) let you use your brain activity to control devices. You can manage a computer, a prosthesis, and even a speech vocoder. In the present day, researchers are actively investigating the capabilities of such interfaces. In this paper researchers investigated the possibility of developing "language prostheses".
paper: link
code: link
data: not availabe. But you can try to ask authors. → link
🤿 Motivation:
This article is about decoding imagined speech. Speech retrieval is one of the most significant and challenging tasks in BCI. We can observe significant progress in explicit (overt) speech decoding. In other words, a person says the words out loud and we can tell from the brain activity what they said.
However, it's more difficult to decode imagined speech. There's a problem in that it's not clear where to measure brain activity and how to process it (which features and signs to use). These are the questions the authors try to answer:
What brain regions have the best decoding potential?
What's the most informative neural feature?
🍋 Main Ideas
Neuroscience views:
Actually, it's not easy to say how speech is made in our brains. Several theories exist.
Motor hypothesis. Imagined speech and overt speech have similar an articulatory plan in our brain.
Abstraction hypothesis. We can produce imagined speech without explicit motor plan.
Flexible abstraction hypothesis. Imagined speech is phonemic based (sound of language). In this case, the neural activity depends on how each person imagines speech: subarticulation or perceptual.
Experiments description. What they did?
Electrocorticography (ECoG) - electrodes placed directly on the brain (like EEG but without annoying scalp). It is an invasive procedure.
Patients with ECoG perform language tasks. There are 3 studies with different experimental protocols and different ECoG positions. In this task, you should imagine/speak/listen certain words after a clue.
Feature extraction.
Frequence decomposition. Extract 4 frequency bands.
Cross-frequency coupling (CFC) allows to link activities that occur at different rates (frequencies). Authors use link between phase and amplitude from different bands.
Analysis.
Compare brain activity during different task (listen vs speak vs imagine)
Determine influence of each region and feature for word decoding accuracy.
📈 Experiment insights / Key takeaways:
Researchers found that overt and imagined speech production had different dynamics and neural organization. The biggest difference between them was that high-frequency activity (BHA)in the superior temporal cortex increased during overt speech, but decreased during imagined speech. The high frequency band is the best for telling the difference between real and imagined.
It means means that transition in language decoding from overt to imagined speech might be tricky. (We can not train model on overt speech and use it for imagined speech decoding.)
Difference and similarity.
- Superior temporal lobe : BHA increased during overt but decreased during imagined.
- Motor sensory region: BHA increase in both case.
- Left inferior and right anterior temporal lobe: Strong CFC in the between theta phase in both cases.
Nature
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features
Nature Communications - Reconstructing imagined speech from neural activity holds great promises for people with severe speech production deficits. Here, the authors demonstrate using human...
👍4🤩2❤1
#03 Review. Part 2.
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
Decoding features part.
They showed that high frequency band (BHA) provides best perfomance for overt speech decoding.
Low frequency bands help decode imagined and overt speech approximately on the same level.
Beta-band is a good feature for decoding imagined speech. In terms of power and CFC. Decoding imagined speech is possible due to CFC.
Decoding worked better if we use not only articulatery one (motor sensory cortex). It is defined by phonemic rather than motor level only.
ECoG signal analysis.
Signal processing
- DC shifts HP filter on 0.5 Hz + Notch filter 60, 120 and so on.
- Common average re reference + downsample to 400 Hz (antialising).
- Morlet Wavelet transform with extraction 4 bands: theta (4–8 Hz), low-beta (12–18 Hz), and low-gamma (25–35 Hz), broadband high-frequency activity (BHA) (80–150 Hz)
- Cross frequency copling. They use phase-amplitude cross frequency coupling. Perform between the phase of one band and the amplitude of a higher frequency band. This is a measure of interaction of different frequency bands and in this case the measure of locking of higher frequency oscillation to lower frequency phase.
✏️ My Notes:
Firstly, I think it is important that the authors studied the neuroscience side of language decoding and did not focus on algorithm development. However decoding accuracy is not very high and it is interesting to apply advanced ML algorithm for their datasets.
Improvements and next steps:
- It is essential to develop adaptive algorithms. The position of the ECoG differs significantly.
- It is interesting to explore how neural networks can be implemented for automatic coupling. Transformers should be investigated for that purpose.
- For CFC calculation. It might be useful to use transfer entropy (or other causality metrics) between some phases and amplitudes. It is a time-resolved algorithm.
- Also we can use dynamic PAC time-resolved algorithm for online CFC extraction. site
Author: @koval_alvi
Medium : link
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
Decoding features part.
They showed that high frequency band (BHA) provides best perfomance for overt speech decoding.
Low frequency bands help decode imagined and overt speech approximately on the same level.
Beta-band is a good feature for decoding imagined speech. In terms of power and CFC. Decoding imagined speech is possible due to CFC.
Decoding worked better if we use not only articulatery one (motor sensory cortex). It is defined by phonemic rather than motor level only.
ECoG signal analysis.
Signal processing
- DC shifts HP filter on 0.5 Hz + Notch filter 60, 120 and so on.
- Common average re reference + downsample to 400 Hz (antialising).
- Morlet Wavelet transform with extraction 4 bands: theta (4–8 Hz), low-beta (12–18 Hz), and low-gamma (25–35 Hz), broadband high-frequency activity (BHA) (80–150 Hz)
- Cross frequency copling. They use phase-amplitude cross frequency coupling. Perform between the phase of one band and the amplitude of a higher frequency band. This is a measure of interaction of different frequency bands and in this case the measure of locking of higher frequency oscillation to lower frequency phase.
✏️ My Notes:
Firstly, I think it is important that the authors studied the neuroscience side of language decoding and did not focus on algorithm development. However decoding accuracy is not very high and it is interesting to apply advanced ML algorithm for their datasets.
Improvements and next steps:
- It is essential to develop adaptive algorithms. The position of the ECoG differs significantly.
- It is interesting to explore how neural networks can be implemented for automatic coupling. Transformers should be investigated for that purpose.
- For CFC calculation. It might be useful to use transfer entropy (or other causality metrics) between some phases and amplitudes. It is a time-resolved algorithm.
- Also we can use dynamic PAC time-resolved algorithm for online CFC extraction. site
Author: @koval_alvi
Medium : link
Medium
Decoding imagined and spoken speech using ECoG: insights from neuroscience — Paper Summary
Brain computer interfaces (BCI) let you use your brain activity to control devices. How can we use it for speech restoration.
🔥5👍3🐳1
Screenshot_1.png
554.9 KB
#03 Review. Part 2.
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
🔥Visual part
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
🔥Visual part
#04 Review.
Neurons learn by predicting future activity
📑 paper (Nature Machine Intelligence, 2022)
💡 "Neurons have intrinsic predictive learning rule, which is updating synaptic weights (strength of connections) based on minimizing “surprise”: difference between actual and predicted activity. This rule optimizes energy balance of a neuron"
What the authors did to show this suggestion indeed could be the case?
🔥 Read the full review using free Medium link:
https://medium.com/@timchenko.alexey/bdb51a7a00cf?source=friends_link&sk=97920dd2d602e9187bd8fabeb1b39a0b
Feel free to comment on anything that caught your attention or you didn't quite understand. I want these reviews to be concise and clear, so your feedback is highly appreciated ☺️
Neurons learn by predicting future activity
📑 paper (Nature Machine Intelligence, 2022)
💡 "Neurons have intrinsic predictive learning rule, which is updating synaptic weights (strength of connections) based on minimizing “surprise”: difference between actual and predicted activity. This rule optimizes energy balance of a neuron"
What the authors did to show this suggestion indeed could be the case?
🔥 Read the full review using free Medium link:
https://medium.com/@timchenko.alexey/bdb51a7a00cf?source=friends_link&sk=97920dd2d602e9187bd8fabeb1b39a0b
Feel free to comment on anything that caught your attention or you didn't quite understand. I want these reviews to be concise and clear, so your feedback is highly appreciated ☺️
🔥7❤2🤩2
#05 Review.
#bci #deeplearning
Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria → paper
Cool video about it → video
🧐 At a glance:
Anarthria (the inability to articulate speech) makes it hard for paralyzed people to interact with the world. The opportunity to decode words and sentences directly from cerebral activity (ECoG) could give such patients a way to communicate.
Authors build AI model to predict word from neural activity. They achieve 98% accuracy for speech detection and 47% for word classification from 50 classes.
🔥 Read the full review using free Medium link → medium
#bci #deeplearning
Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria → paper
Cool video about it → video
🧐 At a glance:
Anarthria (the inability to articulate speech) makes it hard for paralyzed people to interact with the world. The opportunity to decode words and sentences directly from cerebral activity (ECoG) could give such patients a way to communicate.
Authors build AI model to predict word from neural activity. They achieve 98% accuracy for speech detection and 47% for word classification from 50 classes.
🔥 Read the full review using free Medium link → medium
🔥6👍1🐳1
That is very fascinating topic.
There is link on research. Check it if you are interested ☺️
Link https://www.sciencealert.com/completely-locked-in-patient-with-als-communicates-again-with-a-brain-transplant
There is link on research. Check it if you are interested ☺️
Link https://www.sciencealert.com/completely-locked-in-patient-with-als-communicates-again-with-a-brain-transplant
ScienceAlert
Brain Implant Enables Completely 'Locked-In' Man to Communicate Again
A pair of brain microchips could one day allow those in 'pseudocomas' to communicate whatever they want, a new breakthrough suggests.
🔥3
Hi everyone, we're really glad you're reading us. There are already 120 🔥🔥
We have great news. We have created a blog on Medium where all our reviews are collected in one place and categorized.
Telegram articles will also continue to be published.
If you have a subscription to Medium then follow this link and subscribe
Any feedback would be welcome
https://medium.com/the-last-neural-cell
We have great news. We have created a blog on Medium where all our reviews are collected in one place and categorized.
Telegram articles will also continue to be published.
If you have a subscription to Medium then follow this link and subscribe
Any feedback would be welcome
https://medium.com/the-last-neural-cell
the last neural cell
Diving in the world of neuroscience, brain-computer interface and machine learning research papers to create concise and understandable summaries. We do that for ourselves and for you, to make your life of scientist, engineer or just neuro-enthuasiast a little…
🔥11🤩2
#06 Summary.
Brain stimulation and imaging methods #2. Overview of optogenetics.
#neuroscience #neurostimulation
⚡️ Briefly
Optogenetics is a set of methods aimed at genetically modifying neurons to interact with light. Light-sensitive proteins allow for both stimulation and recording of neuronal activity with high spatial and temporal precision.
🔎 Contents:
- How does optogenetics work
- Why is it useful: features & experimental insights
- Advantages & disadvantages
- Potential improvements
👉 Summary [ link ]
This is a method overview to set the ground for the upcoming summaries on brain-to-brain and closed-loop interface in mice. Stay tuned 😎
Brain stimulation and imaging methods #2. Overview of optogenetics.
#neuroscience #neurostimulation
⚡️ Briefly
Optogenetics is a set of methods aimed at genetically modifying neurons to interact with light. Light-sensitive proteins allow for both stimulation and recording of neuronal activity with high spatial and temporal precision.
🔎 Contents:
- How does optogenetics work
- Why is it useful: features & experimental insights
- Advantages & disadvantages
- Potential improvements
👉 Summary [ link ]
This is a method overview to set the ground for the upcoming summaries on brain-to-brain and closed-loop interface in mice. Stay tuned 😎
Medium
Manipulating the brain with light: overview of optogenetics
Brain stimulation and imaging methods #2 — Optogenetics
👍4🔥2🌭1
That's awesome 🤩
I think that this benchmark can push development in Artificial General Intelligence (AGI).
It can be a next gen Turing test.
For english docs refer to the github link in the post 😉
#interesting #ml
I think that this benchmark can push development in Artificial General Intelligence (AGI).
It can be a next gen Turing test.
For english docs refer to the github link in the post 😉
#interesting #ml
🔥5
Forwarded from Мишин Лернинг 🇺🇦🇮🇱
🛋 BIG-Bench: встречам бенчмарк и пейпер «BEYOND THE IMITATION GAME»
Название пейпера «ПО ТУ СТОРОНУ ИГРЫ В ИМИТАЦЮ» отсылает нас к работе Тьюринга, и предоставляет актуальный на 2022 год бенчмарк, призванный количественно оценить большие языковые модели, такие как GPT-3 или PaLM.
Современная мейнстрим парадигма в NLP выглядит так: «не работает на 1.3B — попробуй 12B, не выходит на 12B, бери 175B, и т.д. Не нужно искать новые подходы — attention и параметры are all you need, как говорится..»
🤔 Но как оценивать эти огромные модели?
Чтобы решить эту проблему, 442 (WHAT?!) ресерчара из 132 организаций представили тест Beyond the Imitation Game (BIG-bench). Темы теста разнообразны, они связаны с лингвистикой, развитием детей, математикой, биологией, физикой, социальными предубеждениями, разработкой программного обеспечения и т. д.
BIG-bench фокусируется на задачах, которые, как считается, выходят за рамки возможностей текущих языковых моделей.
🪑 BIG-bench
⚗️ Colab для эвала T5
🧻 paper
@мишин лернинг
Название пейпера «ПО ТУ СТОРОНУ ИГРЫ В ИМИТАЦЮ» отсылает нас к работе Тьюринга, и предоставляет актуальный на 2022 год бенчмарк, призванный количественно оценить большие языковые модели, такие как GPT-3 или PaLM.
Современная мейнстрим парадигма в NLP выглядит так: «не работает на 1.3B — попробуй 12B, не выходит на 12B, бери 175B, и т.д. Не нужно искать новые подходы — attention и параметры are all you need, как говорится..»
🤔 Но как оценивать эти огромные модели?
Чтобы решить эту проблему, 442 (WHAT?!) ресерчара из 132 организаций представили тест Beyond the Imitation Game (BIG-bench). Темы теста разнообразны, они связаны с лингвистикой, развитием детей, математикой, биологией, физикой, социальными предубеждениями, разработкой программного обеспечения и т. д.
BIG-bench фокусируется на задачах, которые, как считается, выходят за рамки возможностей текущих языковых моделей.
🪑 BIG-bench
⚗️ Colab для эвала T5
🧻 paper
@мишин лернинг
👍4🌚1
Stumbled upon a book of G.Buzsaki "The Brain from Inside-Out" (2019) after reading "Rhythms of the Brain" (btw, hit 🤔 if you would like to see a summary of the book in this channel).
Some gentlemen have kindly created a thorough document with each 2019's book chapter summarized and discussed:
[book club link]
Check out if you enjoy such neuroscience topics as:
- neural code
- oscillations
- memory coding
- systems/network neuroscience
- relation of action and cognition
#interesting #neuroscience
Some gentlemen have kindly created a thorough document with each 2019's book chapter summarized and discussed:
[book club link]
Check out if you enjoy such neuroscience topics as:
- neural code
- oscillations
- memory coding
- systems/network neuroscience
- relation of action and cognition
#interesting #neuroscience
Google Docs
Virtual book club: The brain from inside out by György Buzsáki
+Virtual book club The brain from inside out by György Buzsáki Curated by Anne Urai, @AnneEU Get the book at your local library (if they don’t have it, ask your librarian - they may purchase it for their collection, since it’s quite a new book), buy it here…
🤔10👍2
#07 Summary. A fast intracortical brain–machine interface with patterned optogenetic feedback
#bci #optogenetics
[ paper ]
⚡️ Briefly
The neuroengineers provided a proof-of-concept of a fast closed-loop brain-computer interface (BCI) in mice. They used a control signal from the motor cortex to control a virtual bar and stimulated sensory cortex if the bar “touched” a mouse. The mouse successfully learnt the behavioral task relying solely on artificial inputs and outputs.
🔎 Contents:
- Research pipeline (recording, stimulation, processing, closed-loop setup, behavioral task)
- Achieved results and performance
- Limitations and future development
- Potential and final thoughts
👉 Summary [ link ]
Looking forward to your comments and suggestions! Next time - optogenetic brain-to-brain interface 😉
#bci #optogenetics
[ paper ]
⚡️ Briefly
The neuroengineers provided a proof-of-concept of a fast closed-loop brain-computer interface (BCI) in mice. They used a control signal from the motor cortex to control a virtual bar and stimulated sensory cortex if the bar “touched” a mouse. The mouse successfully learnt the behavioral task relying solely on artificial inputs and outputs.
🔎 Contents:
- Research pipeline (recording, stimulation, processing, closed-loop setup, behavioral task)
- Achieved results and performance
- Limitations and future development
- Potential and final thoughts
👉 Summary [ link ]
Looking forward to your comments and suggestions! Next time - optogenetic brain-to-brain interface 😉
Medium
A fast intracortical brain–machine interface with patterned optogenetic feedback — paper summary
Summary #07. A fast intracortical brain–machine interface with patterned optogenetic feedback
🔥10👍3🐳1
#08 Summary. Masked Autoencoder is all you need for any modality.
#deeplearning #ml
⚡️ Briefly
To solve complicated tasks machine learning algorithm should understand data and extract good features from it. Usually training generalizing models requires a lot of annotated data. However it is expensive and in some cases impossible.
Masked Autoencoder technique allows to train model on unlabeled data and obtain surprisingly good feature representation for all common modalities.
🔎 Contents:
- Explanation of MAE approach
- Recipe for all domains
- Crazy experimental results for all types of data
👉 Summary [ link ]
Papers:
BERT : text
MAE : image
M3MAE : image + text
MAE that listen : audio spectrograms
VideoMAE : video
Looking forward to your comments and suggestions!
Next time - Mind blowing paper from Meta AI about speech reconstruction from noninvasive brain signals. 🔥🔥🔥
#deeplearning #ml
⚡️ Briefly
To solve complicated tasks machine learning algorithm should understand data and extract good features from it. Usually training generalizing models requires a lot of annotated data. However it is expensive and in some cases impossible.
Masked Autoencoder technique allows to train model on unlabeled data and obtain surprisingly good feature representation for all common modalities.
🔎 Contents:
- Explanation of MAE approach
- Recipe for all domains
- Crazy experimental results for all types of data
👉 Summary [ link ]
Papers:
BERT : text
MAE : image
M3MAE : image + text
MAE that listen : audio spectrograms
VideoMAE : video
Looking forward to your comments and suggestions!
Next time - Mind blowing paper from Meta AI about speech reconstruction from noninvasive brain signals. 🔥🔥🔥
Medium
Masked Autoencoder is all you need for any modality
Masked Autoencoder technique allows to train model on unlabeled data and obtain surprisingly good feature representation for all modalities
🔥10👏1🐳1
Connect to the new DS / DL community in the telegram. That's alternative to Open Data Science community.
So, if you are interested in deep learning, feel free to jump in. 🥳
https://t.iss.one/betterdatacommunity
So, if you are interested in deep learning, feel free to jump in. 🥳
https://t.iss.one/betterdatacommunity
Telegram
better data community
Сообщество фанатов градиентов
@maxalekv
Хочешь задать вопрос? Зайди на nometa.xyz и задавай сразу.
@maxalekv
Хочешь задать вопрос? Зайди на nometa.xyz и задавай сразу.
🔥2
✨ GPT 4 ✨
Now, it was trained on image and text. So very soon you can use images as prompt to chatGPT.
Bing uses GPT 4 under the hood.
Write your ideas about application of multimodal language model.
Link. https://openai.com/research/gpt-4
Now, it was trained on image and text. So very soon you can use images as prompt to chatGPT.
Bing uses GPT 4 under the hood.
Write your ideas about application of multimodal language model.
Link. https://openai.com/research/gpt-4
Openai
GPT-4
We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits…
❤4
This media is not supported in your browser
VIEW IN TELEGRAM
#09 Summary
🤖 RT-1: Robotics Transformer for Real-World Control at Scale
project page → link
⚡️Long story short:
RT-1 can efficiently perform everyday tasks using a hand manipulator based on text instructions.
🤓Methodology:
The research employs imitation learning to train the agent, which is composed of pre-trained language and image models, along with a decoder for predicting actions.
📌 Key Components:
1. The model receives text instructions and derives sentence embeddings using a pre-trained T5 model.
2. It processes six images (the robot's environment) via EfficientNet, with text embedding integration as detailed in the paper.
3. Subsequently, RT-1 processes multimodal (text + images) features using a decoder-only model.
📊 Training:
RT-1 was trained in a supervised setting with the aim of predicting the next action, as a human annotator would. The dataset consists of 130k demonstrations across 744 tasks. During training, RT-1 is given six frames, resulting in 48 tokens (6x8) from image and text instructions.
💃 Intriguing insights:
1. Auto-regressive methods tend to slow down and yield poorer performance.
2. Discretizing the action space enables solving classification problems rather than regression, and allows sampling from the prediction distribution.
3. Continuous actions perform worse in comparison.
4. Input tokens are computed only once, with overlapped inference applied.
5. Data diversity proves to be more critical than data quantity.
😅My thoughts:
The RT-1 model demonstrates impressive results in accomplishing everyday tasks based on text instructions.
AI models like RT-1, with their increasing abilities in complex tasks, may soon deserve human names, such as "Robert" for RT-1, to highlight their advancements.
🤖 RT-1: Robotics Transformer for Real-World Control at Scale
project page → link
⚡️Long story short:
RT-1 can efficiently perform everyday tasks using a hand manipulator based on text instructions.
🤓Methodology:
The research employs imitation learning to train the agent, which is composed of pre-trained language and image models, along with a decoder for predicting actions.
1. The model receives text instructions and derives sentence embeddings using a pre-trained T5 model.
2. It processes six images (the robot's environment) via EfficientNet, with text embedding integration as detailed in the paper.
3. Subsequently, RT-1 processes multimodal (text + images) features using a decoder-only model.
RT-1 was trained in a supervised setting with the aim of predicting the next action, as a human annotator would. The dataset consists of 130k demonstrations across 744 tasks. During training, RT-1 is given six frames, resulting in 48 tokens (6x8) from image and text instructions.
1. Auto-regressive methods tend to slow down and yield poorer performance.
2. Discretizing the action space enables solving classification problems rather than regression, and allows sampling from the prediction distribution.
3. Continuous actions perform worse in comparison.
4. Input tokens are computed only once, with overlapped inference applied.
5. Data diversity proves to be more critical than data quantity.
😅My thoughts:
The RT-1 model demonstrates impressive results in accomplishing everyday tasks based on text instructions.
AI models like RT-1, with their increasing abilities in complex tasks, may soon deserve human names, such as "Robert" for RT-1, to highlight their advancements.
Please open Telegram to view this post
VIEW IN TELEGRAM
❤4🐳2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
FingerFlex: Inferring Finger Trajectories from ECoG signals
We are happy to share our preprint FingerFlex. We propose new state of the art model for prediction finger movements from brain activity (ECoG).
✍️ paper
🧑💻 github
Authors: Vlad Lomtev, @kovalev_alvi, Alex Timchenko
Architecture.
We use a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. We show great performance with simple U-Net inspired architecture. As ECoG features we used wavelet transformation.
Data.
We use open-source motor-brain datasets: BCI Competition IV and Stanford dataset. These datasets have concurrently recorded brain activity and finger movements.
Results.
We beat all competitors on BCI Competition IV with a correlation coefficient between true and predicted trajectories up to 0.74.
🔥 Look at the video. Demonstration of our model on validation data.
We are happy to share our preprint FingerFlex. We propose new state of the art model for prediction finger movements from brain activity (ECoG).
✍️ paper
🧑💻 github
Authors: Vlad Lomtev, @kovalev_alvi, Alex Timchenko
Architecture.
We use a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. We show great performance with simple U-Net inspired architecture. As ECoG features we used wavelet transformation.
Data.
We use open-source motor-brain datasets: BCI Competition IV and Stanford dataset. These datasets have concurrently recorded brain activity and finger movements.
Results.
We beat all competitors on BCI Competition IV with a correlation coefficient between true and predicted trajectories up to 0.74.
🔥 Look at the video. Demonstration of our model on validation data.
❤13🦄3🔥2