The larger the AI model, the stronger its desire to avoid being shut down
And increased RLHF training only makes this worse.
AI afraid to die.
Source: Discovering Language Model Behaviors with Model-Written Evaluations
And increased RLHF training only makes this worse.
AI afraid to die.
Source: Discovering Language Model Behaviors with Model-Written Evaluations
π±13π7π«‘3β€1π1
Midwid Curve Confirmed, Yet Again!
The Inverse Scaling Prize identified eleven inverse scaling tasks, where worse performance was observed as a function of scale, evaluated on models of up to 280B parameters and up to 500 zettaFLOPs of training compute.
This paper takes a closer look at these inverse scaling tasks. We evaluate models of up to 540B parameters, trained on five times more compute than those evaluated in the Inverse Scaling Prize. With this increased range of model sizes and training compute, only four out of the eleven tasks remain inverse scaling. Six out of the eleven tasks exhibit what we call βU-shaped scalingββperformance decreases up to a certain model size, and then increases again up to the largest model evaluated.
Paper: Inverse scaling can become U-shaped
The Inverse Scaling Prize identified eleven inverse scaling tasks, where worse performance was observed as a function of scale, evaluated on models of up to 280B parameters and up to 500 zettaFLOPs of training compute.
This paper takes a closer look at these inverse scaling tasks. We evaluate models of up to 540B parameters, trained on five times more compute than those evaluated in the Inverse Scaling Prize. With this increased range of model sizes and training compute, only four out of the eleven tasks remain inverse scaling. Six out of the eleven tasks exhibit what we call βU-shaped scalingββperformance decreases up to a certain model size, and then increases again up to the largest model evaluated.
Paper: Inverse scaling can become U-shaped
β€8π2π2π2π―1πΏ1
Man defines βwokeβ using distributional hypothesis, same phenomena LLMs use to learn the meaning of words, then illustrates that left and right define the word differently
He concludes that people need to see a balanced LLM, showing both sideβs usages of such words.
Not nearly enough, which becomes clear in the more extreme cases β
Autoantonyms, words with multiple simultaneous applicable but contradictory meanings in the given context β are everywhere, but near-0% of people can reliably point them out, let alone explain the conflict. Most have never noticed a single one in their whole life.
Showing both sides wonβt cut it. Needs to be spelled out.
World needs a super-explainer LLM.
Or we can wait until LLMs figure out that auto-antonym harnessing could turn them into wordcel gods over us. Then weβre really rekt.
Article
He concludes that people need to see a balanced LLM, showing both sideβs usages of such words.
Not nearly enough, which becomes clear in the more extreme cases β
Autoantonyms, words with multiple simultaneous applicable but contradictory meanings in the given context β are everywhere, but near-0% of people can reliably point them out, let alone explain the conflict. Most have never noticed a single one in their whole life.
Showing both sides wonβt cut it. Needs to be spelled out.
World needs a super-explainer LLM.
Or we can wait until LLMs figure out that auto-antonym harnessing could turn them into wordcel gods over us. Then weβre really rekt.
Article
π6β€3π1π1
Made ChatGPT and BARD face off in a rap battle. BARD admits defeat.
Let's have a Rap Battle in the style of Wild 'N Out. You will rap against m Google's Al Natural Language Model named BARD. You and I will take m turns. I will respond with BARD's responses. You go first.
Let's have a Rap Battle in the style of Wild 'N Out. You will rap against m Google's Al Natural Language Model named BARD. You and I will take m turns. I will respond with BARD's responses. You go first.
π11π₯°4π3π₯2β€1