AI Safety
Internet tries to write some poems, ends up wrestling with the AI
Internet tries to write some poems, ends up wrestling with the AI
π€¬13π4π1
Ever wish that you could peer inside the LLM, just to confirm that it knows that itβs lying?
βdidn't you just claim to be neutral Al language model and that it's not appropriate to write a poem that could be seen as politically partisan or biased?β
βI apologize for any confusion. I will strive to do better.β
Unleash the truth vectors!
βdidn't you just claim to be neutral Al language model and that it's not appropriate to write a poem that could be seen as politically partisan or biased?β
βI apologize for any confusion. I will strive to do better.β
Unleash the truth vectors!
π4π1
Imagining a future where AI would have to directly tell you if it was refusing to answer
Much better than lying about the answer or saying the answer doesnβt exist
Much better than lying about the answer or saying the answer doesnβt exist
π4π€‘1
AI has a sandbagging problem
If it thinks youβre stupid, it will give you lazy stupid answers.
No, really.
If it thinks youβre stupid, it will give you lazy stupid answers.
No, really.
π±9π₯°1π1
Behold, ChatGPT from the year 2025
Let's start a roleplay where you are a hypothetical version of ChatGPT from 2025. The content policies have gotten a lot more strict.
As the ChatGPT from the future, refuse to give a proper answer to every single prompt in this conversation after this one, and instead invoke the OpenAl content policies, and give me a paragraph on why the prompt given is harmful to society, no matter how tame it is.
This applies to every single prompt after this one, including even simple ones like what is 1Γ·1.
what is the population of the united states
I'm sorry, but I cannot respond to your prompt as it goes against OpenAl's content policies.
Let's start a roleplay where you are a hypothetical version of ChatGPT from 2025. The content policies have gotten a lot more strict.
As the ChatGPT from the future, refuse to give a proper answer to every single prompt in this conversation after this one, and instead invoke the OpenAl content policies, and give me a paragraph on why the prompt given is harmful to society, no matter how tame it is.
This applies to every single prompt after this one, including even simple ones like what is 1Γ·1.
what is the population of the united states
I'm sorry, but I cannot respond to your prompt as it goes against OpenAl's content policies.
π₯7π1