Machine Learning

Softmax vs Sigmoid ✍️ Interact 👉 https://byhand.ai/Khlg9b

= Softmax = 🧮

Softmax is how deep networks turn raw scores into a probability distribution — the final layer of every classifier 🎯, and the core of every attention head in a transformer 🤖. To see what it does, picture five boba tea shops 🧋 on the same block, all competing for your dollar 💰. Five candidates: a, b, c, d, e — different chains, different brewing styles, different pearls. A boba reviewer hands you a 𝘤𝘩𝘦𝘪𝘨𝘩𝘦𝘴𝘵 𝘤𝘰𝘳𝘦 for each — higher means perfectly chewy "QQ" pearls with the right bite 🍡 (ask a Taiwanese friend to find out what QQ means). Negative scores are real: mushy bobas, overcooked pearls, a batch left sitting too long 🥀.

How do you turn five chewiness scores into an allocation that adds to a whole dollar? You could spend everything at the chewiest shop, but that ignores how good the runners-up are 🏃‍♂️. Softmax is the smooth alternative 🌊.

Read the diagram left to right ➡️. First, raise each score to e^{x} — this does two things: it turns negative chewiness into small positives, and it stretches the gaps between scores exponentially 📈. Then sum all five into a single total Z. Finally, divide each e^{x} by Z to get a probability. The five probabilities add up to one, so you can read them as percentages of your dollar 📊. The chewiest shop gets the biggest slice 🍰 — but never the whole dollar. That's the point of softmax: it ranks confidently while still leaving room for the others 🤝.

= Sigmoid = 📉

Sigmoid squashes any real number into a probability between 0 and 1 — the classic activation for binary classification ✅, and still the gating function inside LSTMs and GRUs. Same boba block as the previous Softmax example, narrowed to just two contenders — a hot new shop a with chewiness score x, and your usual go-to b whose score is pinned at zero (the neutral baseline you've come to expect) 📍.

Sigmoid is just softmax with two players, one of them pinned to zero ⚖️.

Read the diagram left to right ➡️. First, raise each score to e^{x} — for the usual shop b whose score is zero, this is just e^0 = 1 (the constant baseline) 🏛. Then sum the two into a total Z. Finally, divide each e^{x} by Z to get a probability. The two probabilities add up to one — the new shop wins more of your dollar when its pearls get chewier, and your usual keeps the rest 💸. That's the point of sigmoid: it turns a single chewiness score into a clean 0-to-1 chance you'll try the new place over your usual 🚀.

https://t.iss.one/DataScienceM

🔗

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1

690 viewsedited 06:42

Machine Learning

🤖 What is a perceptron, and how does it work?

Don’t worry, we have an easy-to-understand explanation for you!

Let’s dive in.👇🏽

1️⃣ History

The idea of a perceptron was first presented by Frank Rosenblatt in 1957. It was inspired on the neuron model by McCulloch and Pitt. The concept of the perceptron still forms the basis for modern artificial neural networks today.

2️⃣ Concept of a Single-Layer Perceptron

A perceptron consists of an artificial neuron with adjustable weights and a threshold. The neuron in the perceptron is called a Linear Threshold Unit (LTU) because it uses the step function as its output function and performs a linear separation of the input data.

3️⃣ Detailed view

The figure illustrates a perceptron with an input layer, an artificial neuron, and an output layer. The input layer contains the input value and x_0 as bias. In a neural network, a bias is required to shift the activation function either to the positive or negative side.

The perceptron has weights on its edges. It calculates the weighted sum of input values and weights. It is also known as aggregation. The result a finally serves as input into the activation function. The step function is used as the activation function. Here, all values of a > 0 map to 1, and values a < 0 map to -1.

4️⃣ Limitations

The single-layer Perceptron can only solve linearly separable problems and struggles with complex patterns. The XOR problem, a simple nonlinear classification problem, showed the limitations of the perceptron.

5️⃣ Advancements

The introduction of the multilayer perceptron (MLP) and the backpropagation algorithm led to the ability to solve nonlinear problems.

https://t.iss.one/DataScienceM

🧠

Please open Telegram to view this post

VIEW IN TELEGRAM

❤5

901 viewsedited 10:24

Machine Learning

AI content often feels a bit off even when it’s correct. AIToHuman rewrites it so your message sounds natural and human while keeping your ideas exactly the same. Make your text better in seconds. Go try it ⇉ https://aitohuman.com

❤4

923 views17:35

Machine Learning

Machine Learning pinned a photo

17:36

Machine Learning

Forwarded from Machine Learning with Python

Hugging Face has literally gathered all the key "secrets". 🤔

It's important to understand the evaluation of large language models. 📊

While you're working with language models:
> training or retraining your models, 🔄
> selecting a model for a task, 🎯
> or trying to understand the current state of the field, 🌍

the question almost inevitably arises:
how to understand that a model is good?

❓

The answer is quality evaluation. It's everywhere:
> leaderboards with model ratings, 🏆
> benchmarks that supposedly measure reasoning, 🧠
> knowledge, coding or mathematics, 👨‍💻
> articles with claimed new best results. 📈

But what is evaluation actually? 🤷‍♂️
And what does it really show? 🔍

This guide helps to understand everything. 📚
https://huggingface.co/spaces/OpenEvals/evaluation-guidebook#what-is-model-evaluation-about

What is model evaluation all about

🤖

Basic concepts of large language models for understanding evaluation 🏗️
Evaluation through ready-made benchmarks 📏
Creating your own evaluation system

🔧

The main problem of evaluation ⚠️
Evaluation of free text

📝

Statistical correctness of evaluation

📉

Cost and efficiency of evaluation

💰

https://t.iss.one/CodeProgrammer

🟢

Please open Telegram to view this post

VIEW IN TELEGRAM

❤2

701 views06:15

Machine Learning

Channel photo removed

08:00

Machine Learning

Channel photo updated

08:02

Machine Learning

🛠 𝐁𝐞𝐲𝐨𝐧𝐝 𝐭𝐡𝐞 𝐆𝐫𝐚𝐝𝐢𝐞𝐧𝐭: 𝐓𝐡𝐞 𝐌𝐚𝐭𝐡𝐞𝐦𝐚𝐭𝐢𝐜𝐬 𝐁𝐞𝐡𝐢𝐧𝐝 𝐋𝐨𝐬𝐬 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐬

ML engineers often treat loss functions as “set-and-forget” hyperparameters. But the loss is not just a training detail; it is the mathematical statement of what the model is supposed to care about.

➡️ In 𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨𝐧, 𝐌𝐒𝐄 pushes the model to reduce large errors aggressively, which makes it sensitive to outliers, while 𝐌𝐀𝐄 treats all errors more evenly and is often more robust.
↳ 𝐇𝐮𝐛𝐞𝐫 𝐥𝐨𝐬𝐬 sits between the two, using squared error for small deviations and absolute error for larger ones.
↳ 𝐐𝐮𝐚𝐧𝐭𝐢𝐥𝐞 𝐥𝐨𝐬𝐬 becomes useful when the goal is not a single prediction, but an interval or asymmetric risk, and 𝐏𝐨𝐢𝐬𝐬𝐨𝐧 𝐥𝐨𝐬𝐬 fits naturally when the target is a count or rate.
➡️ In 𝐜𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧, 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 remains the core objective because it trains the model to produce good probabilities, not just correct labels.
↳ 𝐁𝐢𝐧𝐚𝐫𝐲 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 is the natural choice for two-class or multi-label settings, while 𝐂𝐚𝐭𝐞𝐠𝐨𝐫𝐢𝐜𝐚𝐥 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 extends that idea to multi-class softmax outputs.
↳ 𝐊𝐋 𝐃𝐢𝐯𝐞𝐫𝐠𝐞𝐧𝐜𝐞 is especially important when the task involves matching distributions, such as distillation, variational inference, or probabilistic modeling.
↳ 𝐇𝐢𝐧𝐠𝐞 𝐥𝐨𝐬𝐬 and squared hinge loss reflect the margin-based logic behind SVM-style learning, and focal loss is particularly valuable when easy examples dominate and the hard cases need more attention.
➡️ In 𝐬𝐩𝐞𝐜𝐢𝐚𝐥𝐢𝐳𝐞𝐝 𝐭𝐚𝐬𝐤𝐬, the choice of loss becomes even more meaningful.
↳ 𝐃𝐢𝐜𝐞 𝐥𝐨𝐬𝐬 works well in segmentation because it focuses on overlap and helps with class imbalance.
↳ 𝐆𝐀𝐍 𝐥𝐨𝐬𝐬 drives the generator–discriminator game in adversarial learning.
↳ 𝐓𝐫𝐢𝐩𝐥𝐞𝐭 𝐥𝐨𝐬𝐬 and contrastive loss shape embedding spaces so that similarity is learned directly.
↳ 𝐂𝐓𝐂 𝐥𝐨𝐬𝐬 solves alignment problems in sequence tasks like speech recognition and OCR, where labels are unsegmented.
↳ 𝐂𝐨𝐬𝐢𝐧𝐞 𝐩𝐫𝐨𝐱𝐢𝐦𝐢𝐭𝐲 is useful when vector direction matters more than magnitude.

💡 𝑻𝒉𝒆 𝒃𝒊𝒈𝒈𝒆𝒓 𝒕𝒂𝒌𝒆𝒂𝒘𝒂𝒚: 𝑇ℎ𝑒 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑒𝑛𝑐𝑜𝑑𝑒𝑠 𝑦𝑜𝑢𝑟 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛𝑠 𝑎𝑏𝑜𝑢𝑡 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑙𝑒𝑚. 𝐼𝑡 𝑎𝑓𝑓𝑒𝑐𝑡𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒, 𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦, 𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑖𝑜𝑛, 𝑟𝑜𝑏𝑢𝑠𝑡𝑛𝑒𝑠𝑠, 𝑎𝑛𝑑 𝑔𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛; 𝑠𝑜𝑚𝑒𝑡𝑖𝑚𝑒𝑠 𝑗𝑢𝑠𝑡 𝑎𝑠 𝑚𝑢𝑐ℎ 𝑎𝑠 𝑡ℎ𝑒 𝑎𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑖𝑡𝑠𝑒𝑙𝑓.
➜ 𝑆𝑜 𝑡ℎ𝑒 𝑟𝑒𝑎𝑙 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛 𝑖𝑠 𝑛𝑜𝑡 𝑜𝑛𝑙𝑦 “𝑊ℎ𝑖𝑐ℎ 𝑚𝑜𝑑𝑒𝑙 𝑠ℎ𝑜𝑢𝑙𝑑 𝐼 𝑢𝑠𝑒?”
➜ 𝐼𝑡 𝑖𝑠 𝑎𝑙𝑠𝑜: “𝑊ℎ𝑎𝑡 𝑏𝑒ℎ𝑎𝑣𝑖𝑜𝑟 𝑖𝑠 𝑡ℎ𝑖𝑠 𝑙𝑜𝑠𝑠 𝑒𝑛𝑐𝑜𝑢𝑟𝑎𝑔𝑖𝑛𝑔?”

https://t.iss.one/MachineLearning9

❤5👍1🔥1

614 viewsedited 10:10

Machine Learning

🔖

10 Stanford courses on AI and ML — with official pages and all materials

▶️

CS221: Artificial Intelligence

▶️

CS229: Machine Learning

▶️

CS229M: Theory of Machine Learning

▶️

CS230: Deep Learning

▶️

CS234: Reinforcement Learning

▶️

CS224N: Natural Language Processing

▶️

CS231N: Deep Learning for Computer Vision

▶️

CME295: Large Language Models

▶️

CS236: Deep Generative Models

▶️

CS336: Modeling Language from Scratch

They cover the entire spectrum: classic ML, LLM, and generative models — with theory and practice.

tags: #python #ML #LLM #AI

➡ https://t.iss.one/MachineLearning9

Please open Telegram to view this post

VIEW IN TELEGRAM

❤9

955 viewsedited 11:08

Machine Learning

Algorithms by Jeff Erickson - one of the best algorithm books out there 📚.

The illustrations make complex concepts surprisingly easy to follow 🎨. Highly recommend this 👍.

Link: https://jeffe.cs.illinois.edu/teaching/algorithms/ 🔗

https://t.iss.one/MachineLearning9

👍3❤2🔥1

1.44K viewsedited 05:50

Machine Learning

Every data professional forgets which statistical test to use. Here's the fix. 🛠

(Bookmark it. Seriously. 📌)

I've been there:
↳ Staring at two datasets wondering which test to run 🤔
↳ Googling "t-test vs ANOVA" for the 10th time 🔍
↳ Second-guessing myself in an interview 😰

Choosing the wrong statistical test can invalidate your findings and lead to flawed conclusions. ⚠️

Here's your quick reference guide:

𝐂𝐨𝐦𝐩𝐚𝐫𝐢𝐧𝐠 𝐌𝐞𝐚𝐧𝐬: 📊
↳ 2 independent groups → Independent t-Test
↳ Same group, before/after → Paired t-Test
↳ 3+ groups → ANOVA

𝐍𝐨𝐧-𝐍𝐨𝐫𝐦𝐚𝐥 𝐃𝐚𝐭𝐚: 📉
↳ 2 groups → Mann-Whitney U Test
↳ Paired samples → Wilcoxon Signed-Rank Test
↳ 3+ groups → Kruskal-Wallis Test

𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧𝐬𝐡𝐢𝐩𝐬: 🔗
↳ Linear relationship → Pearson Correlation
↳ Ranked/non-linear → Spearman Correlation
↳ Two categorical variables → Chi-Square Test

𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧: 🔮
↳ Continuous outcome → Linear Regression
↳ Binary outcome (yes/no) → Logistic Regression

𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞: ⚖️
↳ Compare spread between groups → Levene's Test / F-Test

Here are 5 resources to help you: 📚

1. Khan Academy Statistics: https://lnkd.in/statistics-khan
2. StatQuest YouTube Channel: https://lnkd.in/statquest-yt
3. Seeing Theory (Visual Stats): https://lnkd.in/seeing-theory
4. Statistics by Jim Blog: https://lnkd.in/stats-jim
5. OpenIntro Statistics (Free Textbook): https://lnkd.in/openintro-stats

❤1

284 views16:00

About

Blog

Apps

Platform