Machine Learning

Topic: RNN (Recurrent Neural Networks) – Part 3 of 4: LSTM and GRU – Solving the Vanishing Gradient Problem

---

1. Problem with Vanilla RNNs

• Vanilla RNNs struggle with long-term dependencies due to the vanishing gradient problem.

• They forget early parts of the sequence as it grows longer.

---

2. LSTM (Long Short-Term Memory)

• LSTM networks introduce gates to control what information is kept, updated, or forgotten over time.

• Components:

* Forget Gate: Decides what to forget
* Input Gate: Decides what to store
* Output Gate: Decides what to output

• Equations (simplified):

f_t = σ(W_f · [h_{t-1}, x_t] + b_f)  
i_t = σ(W_i · [h_{t-1}, x_t] + b_i)  
o_t = σ(W_o · [h_{t-1}, x_t] + b_o)  
C̃_t = tanh(W_C · [h_{t-1}, x_t] + b_C)  
C_t = f_t * C_{t-1} + i_t * C̃_t  
h_t = o_t * tanh(C_t)

---

3. GRU (Gated Recurrent Unit)

• A simplified version of LSTM with fewer gates:

* Update Gate
* Reset Gate

• More computationally efficient than LSTM while achieving similar results.

---

4. LSTM/GRU in PyTorch

import torch.nn as nn

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, (h_n, _) = self.lstm(x)
        return self.fc(h_n[-1])

---

5. When to Use LSTM vs GRU

| Aspect | LSTM | GRU |
| ---------- | --------------- | --------------- |
| Accuracy | Often higher | Slightly lower |
| Speed | Slower | Faster |
| Complexity | More gates | Fewer gates |
| Memory | More memory use | Less memory use |

---

6. Real-Life Use Cases

• LSTM – Language translation, speech recognition, medical time-series

• GRU – Real-time prediction systems, where speed matters

---

Summary

• LSTM and GRU solve RNN's vanishing gradient issue.

• LSTM is more powerful; GRU is faster and lighter.

• Both are crucial for sequence modeling tasks with long dependencies.

---

Exercise

• Build two models (LSTM and GRU) on the same dataset (e.g., sentiment analysis) and compare accuracy and training time.

---

#RNN #LSTM #GRU #DeepLearning #SequenceModeling

https://t.iss.one/DataScienceM

👍1👎1

1.87K views13:56

Machine Learning

PyTorch Masterclass: Part 3 – Deep Learning for Natural Language Processing with PyTorch

Duration: ~120 minutes

Link A: https://hackmd.io/@husseinsheikho/pytorch-3a

Link B: https://hackmd.io/@husseinsheikho/pytorch-3b

#PyTorch #NLP #RNN #LSTM #GRU #Transformers #Attention #NaturalLanguageProcessing #TextClassification #SentimentAnalysis #WordEmbeddings #DeepLearning #MachineLearning #AI #SequenceModeling #BERT #GPT #TextProcessing #PyTorchNLP

https://t.iss.one/DataScienceM

⚠️

Please open Telegram to view this post

VIEW IN TELEGRAM

❤2

2.05K viewsedited 04:58

About

Blog

Apps

Platform