Microsoft is planning to use ChatGPT to power Bing, and plans to launch it in a couple of months.
https://www.theinformation.com/articles/microsoft-and-openai-working-on-chatgpt-powered-bing-in-challenge-to-google
https://www.theinformation.com/articles/microsoft-and-openai-working-on-chatgpt-powered-bing-in-challenge-to-google
The Information
Microsoft and OpenAI Working on ChatGPT-Powered Bing in Challenge to Google
Microsoft could soon get a return on its $1 billion investment in OpenAI, creator of the ChatGPT chatbot, which gives humanlike text answers to questions. Microsoft is preparing to launch a version of its Bing search engine that uses the artificial intelligence…
🫡1
Language Models are Drummers: Drum Composition with Natural Language Pre-Training
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
https://arxiv.org/abs/2301.01162
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
https://arxiv.org/abs/2301.01162
👍4
Large Language Models as Corporate Lobbyists
This demonstrates a proof-of-concept of GPT conducting corporate lobbying related activities.
We use OpenAI's GPT-3.5 +
LangChainAI to determine if proposed Congressional bills are relevant to specific companies + provide explanations + confidence levels.
For bills it deems relevant, model drafts letter to bill sponsor to persuade the congressperson to make changes.
These results suggest that, as LLMs continue to exhibit improved core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. Paper briefly discusses why this could be problematic for societal-AI alignment.
abs: https://arxiv.org/abs/2301.01181
github: https://github.com/JohnNay/llm-lobbyist
^ politicianGPT
This demonstrates a proof-of-concept of GPT conducting corporate lobbying related activities.
We use OpenAI's GPT-3.5 +
LangChainAI to determine if proposed Congressional bills are relevant to specific companies + provide explanations + confidence levels.
For bills it deems relevant, model drafts letter to bill sponsor to persuade the congressperson to make changes.
These results suggest that, as LLMs continue to exhibit improved core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. Paper briefly discusses why this could be problematic for societal-AI alignment.
abs: https://arxiv.org/abs/2301.01181
github: https://github.com/JohnNay/llm-lobbyist
^ politicianGPT
GPT Takes the Bar Exam
GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5's ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.
abs: https://arxiv.org/abs/2212.14402
github: https://github.com/mjbommar/gpt-takes-the-bar-exam
GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5's ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.
abs: https://arxiv.org/abs/2212.14402
github: https://github.com/mjbommar/gpt-takes-the-bar-exam
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
Achieves massive accuracy boosts over sota forward reasoning methods on two challenging logical reasoning datasets, particularly when deep and accurate proof chains are required
abs: https://arxiv.org/abs/2212.13894
Achieves massive accuracy boosts over sota forward reasoning methods on two challenging logical reasoning datasets, particularly when deep and accurate proof chains are required
abs: https://arxiv.org/abs/2212.13894
Rethinking with Retrieval: Faithful Large Language Model Inference
We propose a novel post-processing approach, rethinking with retrieval (RR), which retrieves relevant external knowledge based on the decomposed reasoning steps obtained from the chain-of-thought (CoT) prompting. This lightweight approach does not require additional training or fine-tuning and is not limited by the input length of LLMs.
This new paper shows the potential of enhancing LLMs by retrieving relevant external knowledge based on decomposed reasoning steps obtained through chain-of-thought prompting.
The proposed method (rethinking with retrieval) seems to consistently outperform CoT (in terms of accuracy and faithfulness of explanations) as model size increases. How would even bigger models perform here?
https://arxiv.org/abs/2301.00303
We propose a novel post-processing approach, rethinking with retrieval (RR), which retrieves relevant external knowledge based on the decomposed reasoning steps obtained from the chain-of-thought (CoT) prompting. This lightweight approach does not require additional training or fine-tuning and is not limited by the input length of LLMs.
This new paper shows the potential of enhancing LLMs by retrieving relevant external knowledge based on decomposed reasoning steps obtained through chain-of-thought prompting.
The proposed method (rethinking with retrieval) seems to consistently outperform CoT (in terms of accuracy and faithfulness of explanations) as model size increases. How would even bigger models perform here?
https://arxiv.org/abs/2301.00303
Prompting GPT-3 to reliably generate text and JSON data in a precise format using Python assertions, f‑strings, and variables declared only in our imaginations.
^ Check out this weird assertions trick. Amazing this works.
^ Check out this weird assertions trick. Amazing this works.
GPTZero is a proposed anti-plagiarism tool that claims to be able to detect ChatGPT-generated text. Here's how it did on the first prompt I tried.
🔥4😁2
The arms race is on.
(Though the detectors are already failing horribly.)
Pro-tip: Such tools should never give represent a the human / not-human classification with 1 number, but rather at least 2.
E.g. how do you use the one number to represent when the detector has no idea which class it is, 50%? No, that’d mean it it’s sure that half the time that would be right, when really it has no idea of how often it will be right in this case. Need at least 2 numbers to represent this properly, not one.
(Though the detectors are already failing horribly.)
Pro-tip: Such tools should never give represent a the human / not-human classification with 1 number, but rather at least 2.
E.g. how do you use the one number to represent when the detector has no idea which class it is, 50%? No, that’d mean it it’s sure that half the time that would be right, when really it has no idea of how often it will be right in this case. Need at least 2 numbers to represent this properly, not one.
👍1