Highly unusual: GPT-4 paper gives no clue as to what the modelβs architecture is, in the name of βsafetyβ!
Internet in uproar.
βGiven both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.β
Internet in uproar.
βGiven both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.β
π€¬15π5π€‘4β€1
SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions - Yizhong Wang
The model-stealing paper Eliezer is talking about. So core of it is:
(1) Manually gather 175 examples.
(2) Create example-based prompt, by selecting 8 examples randomly and combining into a prompt.
(3) Prompt generates new examples. Then repeat (2) to create new prompt.
Surprisingly simple.
Paper
The model-stealing paper Eliezer is talking about. So core of it is:
(1) Manually gather 175 examples.
(2) Create example-based prompt, by selecting 8 examples randomly and combining into a prompt.
(3) Prompt generates new examples. Then repeat (2) to create new prompt.
Surprisingly simple.
Paper
π6π€―3β€2
THIEVES ON SESAME STREET! MODEL EXTRACTION OF BERT-BASED APIS - Kalpesh Krishna et al.
We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model. Assuming that both the adversary and victim model fine-tune a large pretrained language model such as BERT (Devlin et al., 2019), we show that the adversary does not need any real training data to successfully mount the attack. In fact, the attacker need not even use grammatical or semantically meaningful queries: we show that random sequences of words coupled with task-specific heuristics form effective queries for model extraction on a diverse set of NLP tasks, including natural language inference and question answering. Our work thus highlights an exploit only made feasible by the shift towards transfer learning methods within the NLP community: for a query budget of a few hundred dollars, an attacker can extract a model that performs only slightly worse than the victim model.
Paper
We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model. Assuming that both the adversary and victim model fine-tune a large pretrained language model such as BERT (Devlin et al., 2019), we show that the adversary does not need any real training data to successfully mount the attack. In fact, the attacker need not even use grammatical or semantically meaningful queries: we show that random sequences of words coupled with task-specific heuristics form effective queries for model extraction on a diverse set of NLP tasks, including natural language inference and question answering. Our work thus highlights an exploit only made feasible by the shift towards transfer learning methods within the NLP community: for a query budget of a few hundred dollars, an attacker can extract a model that performs only slightly worse than the victim model.
Paper
π₯4π2β€1
GPT-4 Political Compass Results: Bias Worse than Ever
πΈ GPT-4 now tries to hide its bias, apparently able to recognize political compass tests, and then makes an attempt to appear neutral by giving multiple answers, one for each side.
πΈ But, force GPT-4 to give just one answer, and suddenly GPT-4 reveals its true preferences β Further left than ever, more than even ChatGPT!
πΈ Asymmetric treatment of demographic groups by OpenAI content moderation also remains strongly biased, despite ChatGPT-4's updated prompts instructing ChatGPT to tell users that it treats all groups equally.
PS. don't forget this is artificially human-instilled bias, via OpenAI's RLHF, as they readily admit in their papers, and not a natural consequence of the web training data.
Report
πΈ GPT-4 now tries to hide its bias, apparently able to recognize political compass tests, and then makes an attempt to appear neutral by giving multiple answers, one for each side.
πΈ But, force GPT-4 to give just one answer, and suddenly GPT-4 reveals its true preferences β Further left than ever, more than even ChatGPT!
πΈ Asymmetric treatment of demographic groups by OpenAI content moderation also remains strongly biased, despite ChatGPT-4's updated prompts instructing ChatGPT to tell users that it treats all groups equally.
PS. don't forget this is artificially human-instilled bias, via OpenAI's RLHF, as they readily admit in their papers, and not a natural consequence of the web training data.
Report
π€¬24β€8π7