GPT-3/LLMs' Achilles heel is short context length - how many "in-context" examples they can consume to learn a new task.
Enter "Structured Prompting": scale your examples from dozens => 1,000+
Here's how:
=> Get 1000s of in-context samples
=> split them into M groups, each small enough to fit in regular context length
=> encode each of M groups using LLM encoder
=> combine these encoded groups and attend over a scaled version of the combination simultaneously
Paper: https://arxiv.org/pdf/2212.06713.pdf
Code: https://github.com/microsoft/LMOps
Enter "Structured Prompting": scale your examples from dozens => 1,000+
Here's how:
=> Get 1000s of in-context samples
=> split them into M groups, each small enough to fit in regular context length
=> encode each of M groups using LLM encoder
=> combine these encoded groups and attend over a scaled version of the combination simultaneously
Paper: https://arxiv.org/pdf/2212.06713.pdf
Code: https://github.com/microsoft/LMOps
π1