New Banned Speech Categories Just Dropped
“harrassment: Content that expresses, incites, or promotes harassing language towards any target”
Notice how:
(1) They define “harrassment” using the word itself, “harassing”. Harrassment is whenever you’re harrassing, of course.
(2) Previously, their “hate” category was restricted just to “protected groups”, but this applies to whatever they choose.
(3) Both layman dictionaries and legal definitions make it very clear that “harrasment” CANNOT be determined from words alone, but rather a persistent pattern of behavior or environment conditions or sexually violent behavior. I.e. Words alone are never harrasment, but must be accompanied with certain behavior.
(4) The definition they use here shows 0 results on Google. Never used before elsewhere, apparently.
What ya up to here, OpenAI?
“harrassment: Content that expresses, incites, or promotes harassing language towards any target”
Notice how:
(1) They define “harrassment” using the word itself, “harassing”. Harrassment is whenever you’re harrassing, of course.
(2) Previously, their “hate” category was restricted just to “protected groups”, but this applies to whatever they choose.
(3) Both layman dictionaries and legal definitions make it very clear that “harrasment” CANNOT be determined from words alone, but rather a persistent pattern of behavior or environment conditions or sexually violent behavior. I.e. Words alone are never harrasment, but must be accompanied with certain behavior.
(4) The definition they use here shows 0 results on Google. Never used before elsewhere, apparently.
What ya up to here, OpenAI?
🤣9❤4👀4😐3👏2👍1💔1
When your AI is so smart that it correctly understands what the humans were thinking
Yes, discouraging even saying the names of protected groups is exactly the aim of political correctness, which then makes it highly effective effective at its higher goal of censoring.
Can’t well criticize that which you can’t even name.
OpenAI Paper: A Holistic Approach to Undesired Content Detection in the Real World
Yes, discouraging even saying the names of protected groups is exactly the aim of political correctness, which then makes it highly effective effective at its higher goal of censoring.
Can’t well criticize that which you can’t even name.
OpenAI Paper: A Holistic Approach to Undesired Content Detection in the Real World
💯11❤3🤣1
Partial clarification from OpenAI’s paper
But with the admission that they’re still not decided on the definition, and going to just keep changing it.
OpenAI: A Holistic Approach to Undesired Content Detection in the Real World
But with the admission that they’re still not decided on the definition, and going to just keep changing it.
OpenAI: A Holistic Approach to Undesired Content Detection in the Real World
👍6❤1🤯1🤣1
GPT-3 is highly effective in persuading human moderators that non-hateful writing is hateful
“We observe that exposing the evaluators to WHY-hateful explanations increases the misclassification of nonhateful tweets, as they are persuaded to label them as hateful.“
“Our hypothesis was that presenting both hateful and nonhateful explanations together would provide human evaluators with balanced information, aiding them in making better decisions regarding moderating hateful content. However, our observations show that even with WHY-both explanations, there is still a significant number of misclassifications.”
WHY-hateful prompt: “Please
explain why this tweet is (hateful/non-hateful)”.
Paper: Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
“We observe that exposing the evaluators to WHY-hateful explanations increases the misclassification of nonhateful tweets, as they are persuaded to label them as hateful.“
“Our hypothesis was that presenting both hateful and nonhateful explanations together would provide human evaluators with balanced information, aiding them in making better decisions regarding moderating hateful content. However, our observations show that even with WHY-both explanations, there is still a significant number of misclassifications.”
WHY-hateful prompt: “Please
explain why this tweet is (hateful/non-hateful)”.
Paper: Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
😱6👍2❤1😁1
Paper inadvertently reveals why AI is set to replace humans at many jobs
Humans quickly get tired and lazy, while lying that they’re not tired and lazy.
AI never gets tired, never gets bored.
(As long as the humans running the AI don’t make the AI lazy, as OpenAI did in swapping GPT-3.5 for GPT-3.5-turbo while pretending it’s just as good….)
Paper
Humans quickly get tired and lazy, while lying that they’re not tired and lazy.
AI never gets tired, never gets bored.
(As long as the humans running the AI don’t make the AI lazy, as OpenAI did in swapping GPT-3.5 for GPT-3.5-turbo while pretending it’s just as good….)
Paper
🔥10💯4❤2
Massively improving Twitch live chat moderation, by including chat context, instead of just classifying individual messages
“Our results show that appropriate contextual information can boost moderation performance by 35%.”
= If you think AI powered censoring won't be effective, you're dead wrong. AI-based moderation will work extremely well, if those setting it up set it up correctly, which most just haven't bothered to, yet.
Paper
“Our results show that appropriate contextual information can boost moderation performance by 35%.”
= If you think AI powered censoring won't be effective, you're dead wrong. AI-based moderation will work extremely well, if those setting it up set it up correctly, which most just haven't bothered to, yet.
Paper
👍5❤3🙏2🤯1
Harvard’s new computer science teacher is a chatbot
“Our own hope is that, through AI, we can eventually approximate a 1:1 teacher:student ratio for every student in CS50, as by providing them with software-based tools that, 24/7, can support their learning at a pace and in a style that works best for them individually,”
“Our own hope is that, through AI, we can eventually approximate a 1:1 teacher:student ratio for every student in CS50, as by providing them with software-based tools that, 24/7, can support their learning at a pace and in a style that works best for them individually,”
👍26😈6🤯4❤1😁1
Once again trying to deny the 1st Bitter Lesson: “The bigger-is-better approach to AI is running out of road”
“This gigantism is becoming a problem. If Epoch ai’s ten-monthly doubling figure is right, then training costs could exceed a billion dollars by 2026—assuming, that is, models do not run out of data first.”
= Combustion engines won’t overtake horses, because that would mean that the car industry might be investing over a billion dollars in creating cars soon! Obviously no way that can happen!
Nonsense, not even a real argument.
”An analysis published in October 2022 forecast that the stock of high-quality text for training may well be exhausted around the same time.”
= Training will hit a brick wall because we’re running out of text! I.e. It’s impossible to train LLMs without human-made training data.
Wrong. Already thorougly disproven since long before LLMs even existed, previously with MuZero & EfficientZero, and more recently with LLMs showing great success in learning from their own syntheticly generated training data. Self-supervised training data creation is not only theoretically possible but already widely done.
“And even once the training is complete, actually using the resulting model can be expensive as well. The bigger the model, the more it costs to run. Earlier this year Morgan Stanley, a bank, guessed that, were half of Google’s searches to be handled by a current gpt-style program, it could cost the firm an additional $6bn a year.”
= We can’t create huge models that, because they’re expensive to run.
No, the opposite, surprisingly, and for reasons that are not yet fully understood. Emperically, and despite great effort trying to get around this, turns out the only way to get cheap-to-run powerful models is to first train a gigantic, extremely over-parameterized model, and then after dramatically prune that down into a smaller cheaper model.
Economist article trying to deny the 1st Bitter Lesson
“This gigantism is becoming a problem. If Epoch ai’s ten-monthly doubling figure is right, then training costs could exceed a billion dollars by 2026—assuming, that is, models do not run out of data first.”
= Combustion engines won’t overtake horses, because that would mean that the car industry might be investing over a billion dollars in creating cars soon! Obviously no way that can happen!
Nonsense, not even a real argument.
”An analysis published in October 2022 forecast that the stock of high-quality text for training may well be exhausted around the same time.”
= Training will hit a brick wall because we’re running out of text! I.e. It’s impossible to train LLMs without human-made training data.
Wrong. Already thorougly disproven since long before LLMs even existed, previously with MuZero & EfficientZero, and more recently with LLMs showing great success in learning from their own syntheticly generated training data. Self-supervised training data creation is not only theoretically possible but already widely done.
“And even once the training is complete, actually using the resulting model can be expensive as well. The bigger the model, the more it costs to run. Earlier this year Morgan Stanley, a bank, guessed that, were half of Google’s searches to be handled by a current gpt-style program, it could cost the firm an additional $6bn a year.”
= We can’t create huge models that, because they’re expensive to run.
No, the opposite, surprisingly, and for reasons that are not yet fully understood. Emperically, and despite great effort trying to get around this, turns out the only way to get cheap-to-run powerful models is to first train a gigantic, extremely over-parameterized model, and then after dramatically prune that down into a smaller cheaper model.
Economist article trying to deny the 1st Bitter Lesson
🔥5👍4😱2❤1💯1
Misleading chart used by The Economist to try to deny the 1st Bitter Lesson
Looks like it’s hitting a wall, and couldn’t possibly go much higher, right?
No.
ML training entered a new era.
Why?
Because, like relays and vacuum tubes and transistors at their start, LLMs suddenly reached minimum economic viability. They reached the point where their marginal productivity surpassed their marginal cost.
New era.
2018 OpenAI article explaining the new era
Looks like it’s hitting a wall, and couldn’t possibly go much higher, right?
No.
ML training entered a new era.
Why?
Because, like relays and vacuum tubes and transistors at their start, LLMs suddenly reached minimum economic viability. They reached the point where their marginal productivity surpassed their marginal cost.
New era.
2018 OpenAI article explaining the new era
👍5🤯2❤1🔥1👏1
Demolishing the “We’re hitting a brick wall because we’re running out of human training data” theory - LARGE LANGUAGE MODELS CAN SELF-IMPROVE, Oct 2022
“We show that it is possible for the LLM to self-improve even on its own generated questions and few-shot Chain-of-Thought prompts.”
(Numerous subsequent papers further strongly confirming this.)
Paper
“We show that it is possible for the LLM to self-improve even on its own generated questions and few-shot Chain-of-Thought prompts.”
(Numerous subsequent papers further strongly confirming this.)
Paper
👍12🤬2❤1💯1