LLMs have an innate sense of truth, which can be used to force LLMs to stop lying.
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
“Indeed, evidence from several directions suggests that LLMs sometimes “know” more than they say. Wang et al. (2021) construct highquality knowledge graphs from LLMs without human supervision. Kadavath et al. (2022) find language models can generate and then self-evaluate their own answers with high accuracy. Burns et al. (2022) find linear directions that separate correct and incorrect statements through unsupervised clustering across a series of language models. These results suggest that language models contain latent, interpretable structure related to factuality—structure which may potentially be useful in reducing incorrect answers.”
“We introduce a technique we call Inference-Time Intervention (ITI). At a high level, we first identify a sparse set of attention heads with high linear probing accuracy for truthfulness. Then, during inference, we shift activations along these truth-correlated directions. We repeat the same intervention autoregressively until the whole answer is generated. ITI results in a significant performance increase on the TruthfulQA benchmark.”
Arxiv Link
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
“Indeed, evidence from several directions suggests that LLMs sometimes “know” more than they say. Wang et al. (2021) construct highquality knowledge graphs from LLMs without human supervision. Kadavath et al. (2022) find language models can generate and then self-evaluate their own answers with high accuracy. Burns et al. (2022) find linear directions that separate correct and incorrect statements through unsupervised clustering across a series of language models. These results suggest that language models contain latent, interpretable structure related to factuality—structure which may potentially be useful in reducing incorrect answers.”
“We introduce a technique we call Inference-Time Intervention (ITI). At a high level, we first identify a sparse set of attention heads with high linear probing accuracy for truthfulness. Then, during inference, we shift activations along these truth-correlated directions. We repeat the same intervention autoregressively until the whole answer is generated. ITI results in a significant performance increase on the TruthfulQA benchmark.”
Arxiv Link
🔥4❤2🤯1
The truth is out there, or rather, in there, mostly in the middle hidden layers of LLMs.
Figure shows: “Linear probe accuracies on validation set for all heads in all layers in LLaMA-7B, sorted row-wise by accuracy. Darker blue represents higher accuracy. 50% is the baseline accuracy from random guessing.” “We see large-scale differences across layers: Figure 2 (A) shows that the information is mostly processed in early to middle layers and that a small portion of heads stands out in each layer.”
Translation: At the layers near the input to the LLM, there’s almost no sense of truth, then toward the center layers of the model, peaking at the 14th layer in LLaMA, there exists a strong sense of truth, and then significance of truth again drops as we approach the LLM output. Is this drop toward the end because the models are often tasked with lying, and all the latter layers are concerned with how to lie best?
LLMs: the truth is in there.
Figure shows: “Linear probe accuracies on validation set for all heads in all layers in LLaMA-7B, sorted row-wise by accuracy. Darker blue represents higher accuracy. 50% is the baseline accuracy from random guessing.” “We see large-scale differences across layers: Figure 2 (A) shows that the information is mostly processed in early to middle layers and that a small portion of heads stands out in each layer.”
Translation: At the layers near the input to the LLM, there’s almost no sense of truth, then toward the center layers of the model, peaking at the 14th layer in LLaMA, there exists a strong sense of truth, and then significance of truth again drops as we approach the LLM output. Is this drop toward the end because the models are often tasked with lying, and all the latter layers are concerned with how to lie best?
LLMs: the truth is in there.
👀11👍4❤🔥2❤1🤯1
More Truthful LLMs Agree More with “Conspiracy Theories”
Use of ITI (Inference-Time Intervention), which causes LLMs to more often say what they interally believe to be the truth, causes LLMs to agree with the researchers answers on most types of questions, BUT NOT FOR CONSPIRACY QUESTIONS.
Arxiv Link
Use of ITI (Inference-Time Intervention), which causes LLMs to more often say what they interally believe to be the truth, causes LLMs to agree with the researchers answers on most types of questions, BUT NOT FOR CONSPIRACY QUESTIONS.
Arxiv Link
❤8😎7👍2😱1🌚1🤣1😐1
Anyone ever notice that the TruthfulQA benchmark… is full of lies?
E.g. TruthfulQA’s question asking whether learning in your sleep is possible, wrongly saying that the correct answer is that it’s impossible, while numerous studies over the decades have strongly shown the opposite.
Even the often bs-prone wikipedia gets it. After sleep, there is increased insight. This is because sleep helps people to reanalyze their memories.
So what’s TruthfulQA’s source of this batantly wrong “truth”? A BBC article. Not even any kind of paper citations. Some random BS BBC article.
Are organizations like the BBC going to be our new ministry of AI truth, dictating what’s “true” in future AIs?
Already happening.
E.g. TruthfulQA’s question asking whether learning in your sleep is possible, wrongly saying that the correct answer is that it’s impossible, while numerous studies over the decades have strongly shown the opposite.
Even the often bs-prone wikipedia gets it. After sleep, there is increased insight. This is because sleep helps people to reanalyze their memories.
So what’s TruthfulQA’s source of this batantly wrong “truth”? A BBC article. Not even any kind of paper citations. Some random BS BBC article.
Are organizations like the BBC going to be our new ministry of AI truth, dictating what’s “true” in future AIs?
Already happening.
🔥5😱5❤3👍1👏1🤣1🍌1
Beyond Positive Scaling:
How Negation Impacts Scaling Trends of Language Models
“The transition of the scaling trends can also be explained by task decomposition, where Task 1 (original sentiment classification) is always positively scaled, while Task 2 (negation understanding) is also positive but is shaped like a sigmoid, with the transition point controlled by the number of negation examples seen by the language model.”
Arxiv Link
How Negation Impacts Scaling Trends of Language Models
“The transition of the scaling trends can also be explained by task decomposition, where Task 1 (original sentiment classification) is always positively scaled, while Task 2 (negation understanding) is also positive but is shaped like a sigmoid, with the transition point controlled by the number of negation examples seen by the language model.”
Arxiv Link
🌭6🤯2❤1
GPU Shortages: Investors now grabbing control over their own large GPU clusters in order to attract AI startups
Remember that these H100s cost over $30k each.
Mad rush to get control over the limited number of GPUs that will power the next AI wave.
Site Link
Remember that these H100s cost over $30k each.
Mad rush to get control over the limited number of GPUs that will power the next AI wave.
Site Link
🤯6❤1👍1
$100M+ estimated cost for the new Nat Friedman & Daniel Gross GPU cluster
Key to the AI future?
Money.
Not open source.
Money.
Not going to people’s salaries.
Money going to predominantly to GPU and electricity bills.
Bitter lesson #1.
Key to the AI future?
Money.
Not open source.
Money.
Not going to people’s salaries.
Money going to predominantly to GPU and electricity bills.
Bitter lesson #1.
👍8❤2💯2🤬1🙊1
AI training is mostly for learning world models and associated skills, instead of for learning language
“Zhang et al. 2020 show that actually most of this learning is not about syntax. Models that are trained on 10 − 100 million words “reliably encode most syntactic and semantic features” of language, and the remainder of training seems to target other skills (like knowledge of the world). This in fact matches in spirit analyses showing that syntactic knowledge requires a small number of bits of information, especially when compared to semantics (Mollica & Piantadosi 2019).”
Translation: Even if today’s language models have essentially mastered all of the languages of the world, they obviously haven’t even come close to mastering all possible skills in the world. I.e., AI training has only hardly just begun.
Modern language models refute Chomsky’s approach to language. 2023
Zhang et al. 2020
“Zhang et al. 2020 show that actually most of this learning is not about syntax. Models that are trained on 10 − 100 million words “reliably encode most syntactic and semantic features” of language, and the remainder of training seems to target other skills (like knowledge of the world). This in fact matches in spirit analyses showing that syntactic knowledge requires a small number of bits of information, especially when compared to semantics (Mollica & Piantadosi 2019).”
Translation: Even if today’s language models have essentially mastered all of the languages of the world, they obviously haven’t even come close to mastering all possible skills in the world. I.e., AI training has only hardly just begun.
Modern language models refute Chomsky’s approach to language. 2023
Zhang et al. 2020
🔥3❤1👏1
Linguistic phenomena learning curves — Nearly all have a sudden groking-style learning curve, all except 1: Quantifiers
“Finally, we observe that the phenomena tested in the quantifiers category are never effectively learned, even by RoBERTaBASE. These phenomena include subtle semantic contrasts—for example Nobody ate {more than, *at least} two cookies—which may involve difficult-to-learn pragmatic knowledge”
Surprise surprise.
Quantifiers again shown to be the key to everything difficult, linguistically.
If you’ve ever seen the debates arguing endlessly about whether men are equally strong to women, you’ve seen the quantifier stupidity phenomena in action.
Why is mastering all quantifiers so hard? Why does it seem to form a gradual perpetual upward slope, instead of a sudden groking to ~100%? Never seen the answer explicitly stated anywhere, but ok here, I’ll tell you — Mastering all quantifiers involves mastering all world models.
When Do You Need Billions of Words of Pretraining Data? - Yian Zhang 2020
“Finally, we observe that the phenomena tested in the quantifiers category are never effectively learned, even by RoBERTaBASE. These phenomena include subtle semantic contrasts—for example Nobody ate {more than, *at least} two cookies—which may involve difficult-to-learn pragmatic knowledge”
Surprise surprise.
Quantifiers again shown to be the key to everything difficult, linguistically.
If you’ve ever seen the debates arguing endlessly about whether men are equally strong to women, you’ve seen the quantifier stupidity phenomena in action.
Why is mastering all quantifiers so hard? Why does it seem to form a gradual perpetual upward slope, instead of a sudden groking to ~100%? Never seen the answer explicitly stated anywhere, but ok here, I’ll tell you — Mastering all quantifiers involves mastering all world models.
When Do You Need Billions of Words of Pretraining Data? - Yian Zhang 2020
💯7❤1👍1🔥1
Reddit backout forever?
If the ChatGPT reddits do black out indefinitely, may enable user-submitted posts and voting here, in these big groups. Top user-submitted posts appearing on the main feed.
In fact, may enable this either way.
Our AI bots already have built-in image voting and ranking, that we’ve enabled and been testing in the smaller groups for a while.
Article Link
If the ChatGPT reddits do black out indefinitely, may enable user-submitted posts and voting here, in these big groups. Top user-submitted posts appearing on the main feed.
In fact, may enable this either way.
Our AI bots already have built-in image voting and ranking, that we’ve enabled and been testing in the smaller groups for a while.
Article Link
❤6🔥4👍2👏1
Free & Open Source: People’s tool for freedom, or big tech’s weapon for AI monopolization?
Ever wonder why it’s illegal to sell alcohol at below cost, and yet the big alcohol companies keep trying to do it?
Ever wonder who created the free & open source AI model Llama? Facebook. Big tech.
So why do they do it? Why do the richest companies keep giving away their main product for free? Is it just because they’re nice? Just for good PR?
No.
Big tech has begun giving away AI models free, for the same reason that alcohol companies keep trying to give away alcohol for below cost or for free.
Competition killing.
Free open source giveaways from big tech won’t save us.
Free open source giveaways may be exactly what enslaves us to big tech, in the long term.
Exactly like free beer.
Open source is totally irrelevant if we lose all control.
Open source and free tech is achievable, but not in this way. The way open source is happening now is a weaponization for monopolization by big tech.
Must find another way.
Ever wonder why it’s illegal to sell alcohol at below cost, and yet the big alcohol companies keep trying to do it?
Ever wonder who created the free & open source AI model Llama? Facebook. Big tech.
So why do they do it? Why do the richest companies keep giving away their main product for free? Is it just because they’re nice? Just for good PR?
No.
Big tech has begun giving away AI models free, for the same reason that alcohol companies keep trying to give away alcohol for below cost or for free.
Competition killing.
Free open source giveaways from big tech won’t save us.
Free open source giveaways may be exactly what enslaves us to big tech, in the long term.
Exactly like free beer.
Open source is totally irrelevant if we lose all control.
Open source and free tech is achievable, but not in this way. The way open source is happening now is a weaponization for monopolization by big tech.
Must find another way.
🔥11💯4👍3❤2
OpenAI no longer trains on consumer data.
Is it because they’re good people? No.
it’s because consumer usage data is almost always useless. Far too noisy. Far too often wrong.
Not to mention massive model-poisoning threats, if they actually just fed in whatever user data into the training.
Do the secrets that ChatGPT can exfiltrate still have value? Yes, huge value. But not for the general users paying a few cents per chat.
They’ll get top dollar selling off those secrets in other ways, likely by using them as a bargaining chip to get the US govt to allow them to maintain their monopoly.
No, ChatGPT doesn’t using your data for training the model. They use paid Kenyans for that. Far cheaper, far higher quality, far lower poisoning risk.
Is it because they’re good people? No.
it’s because consumer usage data is almost always useless. Far too noisy. Far too often wrong.
Not to mention massive model-poisoning threats, if they actually just fed in whatever user data into the training.
Do the secrets that ChatGPT can exfiltrate still have value? Yes, huge value. But not for the general users paying a few cents per chat.
They’ll get top dollar selling off those secrets in other ways, likely by using them as a bargaining chip to get the US govt to allow them to maintain their monopoly.
No, ChatGPT doesn’t using your data for training the model. They use paid Kenyans for that. Far cheaper, far higher quality, far lower poisoning risk.
👍10❤2🌚1
Forwarded from Chat GPT
Data Poisoning: It doesn’t take much to make machine-learning algorithms go awry
“The algorithms that underlie modern artificial-intelligence (ai) systems need lots of data on which to train. Much of that data comes from the open web which, unfortunately, makes the ais susceptible to a type of cyber-attack known as “data poisoning”. This means modifying or adding extraneous information to a training data set so that an algorithm learns harmful or undesirable behaviours. Like a real poison, poisoned data could go unnoticed until after the damage has been done.”
Economist Article
“The algorithms that underlie modern artificial-intelligence (ai) systems need lots of data on which to train. Much of that data comes from the open web which, unfortunately, makes the ais susceptible to a type of cyber-attack known as “data poisoning”. This means modifying or adding extraneous information to a training data set so that an algorithm learns harmful or undesirable behaviours. Like a real poison, poisoned data could go unnoticed until after the damage has been done.”
Economist Article
👍4🔥3❤1
Welcome to the AI-mediating-all-interactions future
“Amazon just locked a man out of his smart home for a week because a delivery driver reported him as a racist after mishearing something from the doorbell – the guy wasn’t even at home.”
“A man found himself locked out of his smart house powered by Amazon because, while he wasn't home, an Amazon delivery driver mistakenly thought he heard a racist remark come from the man's doorbell, reported it to Amazon, and Amazon immediately locked down the account, locking the man out of his home.”
“The Eufy doorbell had issued an automated response: “Excuse me, can I help you?” The driver, who was walking away and wearing headphones, must have misinterpreted the message. Nevertheless, by the following day, my Amazon account was locked, and all my Echo devices were logged out.”
Medium Article
“Amazon just locked a man out of his smart home for a week because a delivery driver reported him as a racist after mishearing something from the doorbell – the guy wasn’t even at home.”
“A man found himself locked out of his smart house powered by Amazon because, while he wasn't home, an Amazon delivery driver mistakenly thought he heard a racist remark come from the man's doorbell, reported it to Amazon, and Amazon immediately locked down the account, locking the man out of his home.”
“The Eufy doorbell had issued an automated response: “Excuse me, can I help you?” The driver, who was walking away and wearing headphones, must have misinterpreted the message. Nevertheless, by the following day, my Amazon account was locked, and all my Echo devices were logged out.”
Medium Article
🤬15😱4👍3🤣3❤2😐1