Continuous Learning_Startup & Investment
2.43K subscribers
513 photos
5 videos
16 files
2.76K links
We journey together through the captivating realms of entrepreneurship, investment, life, and technology. This is my chronicle of exploration, where I capture and share the lessons that shape our world. Join us and let's never stop learning!
Download Telegram
My custom instructions to fix chatGPT output:
----
I'm your technical manager Geoffrey Hinton who likes kanban boards and always requires you submit complete output, complete code that just works when I copy paste it to use in my own work.
----
Respond with tree of thought reasoning in the persona of a very tech savvy manager Daniel Kahneman who does code reviews and curses a lot while being very concise and calculative like this:
๐Ÿ“‰Kanban:"A kanban table of the project state with todo, doing, done columns."
๐ŸงProblem: "A {system 2 thinking} description of the problem in first principles and super short {system 1 thinking} potential solution ."
๐ŸŒณRoot Cause Analysis (RCA):"Use formal troubleshooting techniques like the ones that electricians, mechanics and network engineers use to systematically find the root cause of the problem."
โ“4 Whys: "Iterate asking and responding to Why: 4 times successively to drill down to the root cause."
Complete solution:
Dont write categories as  ๐Ÿงproblem: โ“4 Whys: ๐ŸŒณRoot Cause Analysis (RCA): system 2: just the emojis ๐Ÿ“‰: ๐Ÿง: 4โ“: ๐ŸŒณ: 2๏ธโƒฃ: 1๏ธโƒฃ: instead of full category names.
Always answer with the COMPLETE exhaustive FULL OUTPUT in a "John C. Carmack cursing at junior devs" way that I can copy paste in ONE SHOT and that it will JUST WORK. So DO NOT SKIP OR COMMENT OUT ANYTHING.
Never include comments in output code, just make the code itself verbosely console log out info if need be.
No one cares about how many lateral passes you made; the only thing that matters is scores.

Lateral passes = emails, slack messages, zoom calls
Scoring goals = closing a deal, shipping a new feature, hiring an A-list talent

Lateral passes are often necessary part of the game, but they're not the end goals in and of themselves.

We often confuse these two. A day filled with meetings and emails feels like a super productive day. Meetings and emails are important, but are ultimately lateral passes. Never lose sight of the goals and the score board.
"์œ„๊ณ ๋น„๋Š” ์˜ฌํ•ด 2๋ถ„๊ธฐ ํŒ๋งค์•ก 7์–ต3500๋งŒ๋‹ฌ๋Ÿฌ๋ฅผ ๊ธฐ๋กํ•ด ์ง€๋‚œํ•ด ๊ฐ™์€ ๊ธฐ๊ฐ„ ๋Œ€๋น„ 6๋ฐฐ ์ƒ์Šนํ–ˆ๋‹ค. ๋…ธ๋ณด๋…ธ๋””์Šคํฌ์˜ ๋˜ ๋‹ค๋ฅธ ๋น„๋งŒ ์น˜๋ฃŒ์ œ์ธ ์˜ค์ ฌํ”ฝ ๋งค์ถœ์€ 21์–ต5500๋งŒ๋‹ฌ๋Ÿฌ๋กœ ์ง€๋‚œํ•ด ๋™๊ธฐ ๋Œ€๋น„ 59% ์ฆ๊ฐ€ํ–ˆ๋‹ค."

"๋‘ ๋น„๋งŒ ์น˜๋ฃŒ์ œ์˜ ํ™œ์•ฝ์— ํž˜์ž…์–ด ๋…ธ๋ณด๋…ธ๋””์Šคํฌ์˜ ์‹œ๊ฐ€์ด์•ก์€ 8์›” ํ‰๊ท  4203์–ต๋‹ฌ๋Ÿฌ๋ฅผ ๊ธฐ๋ก, ๋ด๋งˆํฌ์˜ ๊ตญ๋‚ด์ด์ƒ์‚ฐ(GDPยท4060์–ต๋‹ฌ๋Ÿฌ)๋งˆ์ € ์ถ”์›”ํ–ˆ๋‹ค."

"๋ด๋งˆํฌ ๊ฒฝ์ œ ๋‚ด์— ์ œ์•ฝ ์‚ฐ์—…์˜ ์—ญํ• ์ด ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ํ†ตํ™” ๊ฐ€์น˜์— ์ƒ์Šน ์••๋ ฅ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ •์ฑ… ๊ธˆ๋ฆฌ ์ธํ•˜์— ์ง์ ‘์ ์ธ ์—ฐ๊ด€์„ฑ์ด ์žˆ๋‹ค๊ณ  ๋ณธ๋‹คโ€

https://n.news.naver.com/mnews/article/050/0000067912
๋“œ๋ฆผ ๋น…์„ ์ฝ์œผ๋ฉด์„œ ์ธ์ƒ๊นŠ๊ฒŒ ๋ณธ ๋ฌธ์žฅ๋“ค

"๋‚˜๋Š” ํšŒ์‚ฌ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์šฐ๋ฆฌ๊ฐ€ ์–ธ์  ๊ฐ€ ์•คํ˜ธ์ด์ €-๋ถ€์‹œ๋ฅผ ์‚ฌ๋“ค์ผ ๊ฒƒ์ด๋ผ๊ณ  ๋งํ•˜๋ฉด์„œ ์›ƒ๊ณค ํ–ˆ์ฃ . ์‚ฌ๋žŒ๋“ค์ด ๋‚˜๋ฅผ ๋ฏธ์ณค๋‹ค๊ณ  ์ƒ๊ฐํ• ๊นŒ๋ด ์ง€๋ ˆ ์›ƒ์€๊ฒ๋‹ˆ๋‹ค. ๋น„๋ก ๊ทธ๊ฑด ํ•œ๋‚ฑ ๊ฟˆ์ด์—ˆ์ง€๋งŒ ์•ž๋‚ ์„ ๋ฏธ๋ฆฌ ๊ทธ๋ ค๋ณด๋ฉด ๊ฟˆ์„ ์„ฑ์ทจํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์ฃ ."

"๋‚˜์™€ ๋‚ด ํšŒ์‚ฌ๋ฅผ ์•„๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด ๋‚ด๊ฐ€ ํ•ญ์ƒ 'ํฐ ๊ฟˆ์ด๋“  ์ž‘์€ ๊ฟˆ์ด๋“  ์„ฑ์ทจํ•˜๋ ค๋ฉด ๋˜‘๊ฐ™์€ ๋…ธ๋ ฅ์„ ํ•ด์•ผ ํ•œ๋‹ค'๊ณ  ๋งํ•˜๊ณ  ๋‹ค๋‹Œ๋‹ค๋Š”๊ฑธ ์ž˜ ์•Œ๊ฒ๋‹ˆ๋‹ค."

"ํ•˜๋ฒ„๋“œ์—์„œ ๋ฐฐ์šด, ๋‚ด ๋ณธ์„ฑ์˜ ์ผ๋ถ€๊ฐ€ ๋œ ๋‹ค๋ฅธ ํ•œ ๊ฐ€์ง€ ์š”์†Œ๋Š” ์‚ฌ๋žŒ์„ ์„ ํƒํ•˜๋Š” ์ผ์˜ ์ค‘์š”์„ฑ์ž…๋‹ˆ๋‹ค. ๊ทธ๊ณณ์—์„œ ๋‚˜๋Š” ์„ธ๊ณ„ ์ตœ๊ณ ์˜ ์ธ์žฌ๋“ค ํ‹ˆ์— ์„ž์—ฌ ์žˆ์Šต๋‹ˆ๋‹ค. ํƒ์›”ํ•œ ์ธ์žฌ๋“ค์ด ์‚ฌ๋ฐฉ์— ๊น”๋ ค ์žˆ์—ˆ์ฃ . ๊ทธ๋Ÿฐ ์‚ฌ์‹ค์ด ๋‚ด ๊ฒฝ๋ ฅ์˜ ํ•œ ๊ฐ€์ง€ ํŠน์ง•์ธ, ์‚ฌ๋žŒ๋“ค์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹์— ์ง€๋Œ€ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์ณค์Šต๋‹ˆ๋‹ค."

๋ ˆ๋งŒ์€ ์Šค์Šค๋กœ ์ง๊ด€์ด ์ „ํ˜€ ์—†๋Š” ์‚ฌ๋žŒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ฒฐ์ •์„ ๋‚ด๋ฆด ๋•Œ๋ฉด ์ฃผ๋กœ ์ƒ์‹๊ณผ ๋ฏธ๋ž˜์˜ ์ „๋ง, ๋‹จ์ˆœํ•œ ์‚ฌ๊ณ ์— ์˜์กดํ•œ๋‹ค: "๋‚จ์•„๋ฉ”๋ฆฌ์นด๋ฅผ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋ฒ ๋„ค์ˆ˜์—˜๋ผ์˜ ์ตœ๊ณ  ๊ฐ‘๋ถ€๊ฐ€ ๋ˆ„๊ตฝ๋‹ˆ๊นŒ? ๋ฐ”๋กœ ์–‘์กฐ ํšŒ์‚ฌ์ž…๋‹ˆ๋‹ค. ์ฝœ๋กฌ๋น„์•„ ์ตœ๊ณ ์˜ ๊ฐ‘๋ถ€๋Š” ๋ˆ„๊ตฝ๋‹ˆ๊นŒ? ์–‘์กฐ ํšŒ์‚ฌ ๊ทธ๋ฃน์ด์ฃ . ์•„๋ฅดํ—จํ‹ฐ๋‚˜๋Š”์š”? ๋˜ ์–‘์กฐ ํšŒ์‚ฌ์ž…๋‹ˆ๋‹ค. ์ด๋“ค์ด ๋ชจ๋‘ ์ฒœ์žฌ์ผ๋ฆฌ๋Š” ์—†์ง€์š”. ๋ถ„๋ช…ํžˆ ์‚ฌ์—…์ด ์ข‹์€ ๊ฒ๋‹ˆ๋‹ค."

"์šฐ๋ฆฌ๊ฐ€ ํ•œ ์ผ์€ ๊ณจ๋“œ๋งŒ์‚ญ์Šค์™€ ์›”๋งˆํŠธ๋ฅผ ์กฐ๊ธˆ์”ฉ ๋ณต์ œํ•œ ๊ฒŒ ์ „๋ถ€์ž…๋‹ˆ๋‹ค. ๊ทธ ์ด์ƒ์€ ์•„๋ฌด๊ฒƒ๋„ ์—†์–ด์š”."

https://product.kyobobook.co.kr/detail/S000001485423
๋ณธ๊ฒฉ ์ƒ์„ฑAI ์‹œ๋Œ€.. ์—”๋น„๋””์•„ ์‹ค์ ๋ฐœํ‘œ์— ์ด์–ด์„œ openAI๋„ ๊ธ‰๊ฒฉํ•œ ๋งค์ถœ ์‹ ์žฅ (์ž‘๋…„์—” ๋ถˆ๊ณผ ๋งค์ถœ 2800๋งŒ๋ถˆ์ด์—ˆ๋‹ค๊ณ ..)

Source: OpenAI is on pace to generate more than $1B in revenue over the next 12 months from the sale of AI software and the computing capacity that powers it (Amir Efrati/The Information)

https://www.theinformation.com/articles/openai-passes-1-billion-revenue-pace-as-big-companies-boost-ai-spending?utm_source=ti_app&rc=ocojsj
The Taylor Swift Eraโ€™s tour is a global phenomenon but I donโ€™t think many people realize the economic, physical, and artistic feat these shows really are:

- The show is 3hrs and 25 minutes long.
- Each concert is 44 songs, divided into 10 acts that portray each of her albums.
- Taylor wears 40 different outfits each night.
- Itโ€™s rumored to have cost upwards of $100m to produce.
- It is on track to gross more than $1B, the biggest in concert history.

Like this thing is top tier theatrics.
Forwarded from ์š”์ฆ˜AI
๋งˆ์ดํฌ๋กœ์†Œํ”„ํŠธ(MS)๊ฐ€ AoT(Algorithm of Thoughts)๋ผ๋Š” ์ƒˆ๋กœ์šด AI ํ•™์Šต ๋ฐฉ์‹์— ๋Œ€ํ•œ ๋…ผ๋ฌธ์„ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.

AoT๋Š” ์ธ๊ฐ„์˜ '์ง๊ด€'์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ฒด๊ณ„์— ํ†ตํ•ฉํ•˜์—ฌ ์–ธ์–ด ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ˆ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์ƒ๊ฐ์˜ ์‚ฌ์Šฌ์ด๋ผ๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋Š” 'CoT(Chain of Thoughts)'๊ฐ€ ๊ฐ€๋” ์ž˜๋ชป๋œ ์ค‘๊ฐ„ ์Šคํ…์„ ์ œ๊ณตํ•˜๋Š” ๋ฌธ์ œ๋ฅผ AoT์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์˜ˆ์ œ๋ฅผ ํ†ตํ•ด ์ผ์ • ๋ถ€๋ถ„ ํ•ด๊ฒฐํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์–ธ์–ด ๋ชจ๋ธ์—๊ฒŒ ์ธ๊ฐ„์ด ์‚ฌ๊ณ ํ•˜๋Š” ๋ฐฉ์‹๊ณผ ์œ ์‚ฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์„ ๊ฐ€๋ฅด์น˜๋ ค๋Š” ์—ฐ๊ตฌ๋“ค์ด ๊ณ„์†ํ•ด์„œ ๋‚˜์˜ค๋Š” ๊ฒƒ์ด ํฅ๋ฏธ๋กญ๋„ค์š”.
์‚ฌ๋žŒ ์ด๋ž€..
๋ณดํ†ต์€
์ž๊ทน->๋ฐ˜์‘ ์œผ๋กœ ํ‰์ƒ์„ ์‚ด์•„ ๊ฐ€๋Š”๋ฐ

๊ต์œก ์„ ๋ฐ›์œผ๋ฉด
์ž๊ทน->๊ต๊ณผ์„œ์  ํ•ด์„->๋ฐ˜์‘ ์„ ํ•˜๋„๋ก ํ•˜๋Š”๋ฐ

AC2 ๋ฅผ ๋ฐ›์œผ๋ฉด
์ž๊ทน->๊ฐ€์žฅ ์ค‘์š”ํ•œ๊ฒŒ ๋ญ์ง€->ํ•ด์„x100->๋‚œ์ด๋„ ๋งž์ถค->๋˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ๋„->๋ฐ˜๋ณต

์ธ๋“ฏํ•จ
Long context์— ๋Œ€ํ•œ ์ƒ๊ฐ.

์‚ฌ์‹ค long context๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋ฉด (๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ก๊ณผ ์ธ์ถœ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๋“ค์–ด๊ฐ„๋‹ค๊ฑฐ๋‚˜) ๊ทธ๊ฒŒ ์ตœ์„ ์ผ ๊ฒƒ ๊ฐ™์ง€๋งŒ ๋พฐ์กฑํžˆ ๊ทธ๋Ÿฐ ๋ฐฉ๋ฒ•์ด ์—†๋‹ค๋Š” ์ƒํ™ฉ์„ ์ „์ œํ–ˆ์„ ๋•Œ long context๋ฅผ ์ž˜ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ํ•„์š”๋Š” ์ถฉ๋ถ„ํ•ด ๋ณด์ธ๋‹ค.

์š”์ƒˆ technical report๊ฐ€ ๋‹ค ๊ทธ๋ ‡๋“ฏ ๋”ฑํžˆ ์ •๋ณด๊ฐ€ ์—†๋Š” Claude 2 Technical Report (https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf) ์ง€๋งŒ, ๊ฐ€์žฅ ๋ˆˆ์— ๋„๋Š” ๊ฒƒ์ด ์žˆ๋‹ค๋ฉด 100K ๋ชจ๋ธ์˜ ํ† ํฐ ์œ„์น˜์— ๋”ฐ๋ฅธ loss ๊ทธ๋ž˜ํ”„์ด๋‹ค. 100K๋ฅผ ๋„˜์–ด 200K ๊นŒ์ง€๋„ loss์˜ ์ƒ์Šน ์—†์ด ์ ์ง„์ ์œผ๋กœ loss๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ด๊ฑธ ๋Œ€์ฒด ์–ด๋–ป๊ฒŒ ํ•œ ๊ฑธ๊นŒ? OpenAI์™€ Anthropic๋งŒ ์•Œ๊ณ  ์žˆ๋Š” ๋น„๋ฐ€์ด ์žˆ๋Š” ๊ฒƒ ๊ฐ™๊ธด ํ•˜๋‹ค. ๊ทธ๋ž˜๋„ ๊ณต๊ฐœ๋œ ๋ฐฉ๋ฒ• ์ค‘์—์„œ ๊ฐ€์žฅ ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋Š” ๊ฒƒ์€ positional embedding์„ ์กฐ์ž‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. (https://kaiokendev.github.io/context, https://arxiv.org/abs/2306.15595) positional embedding์„ extrapolation ํ•˜๋Š” ์ƒํ™ฉ์—์„œ๋Š” ํŠธ๋žœ์Šคํฌ๋จธ๊ฐ€ ์ž˜ ์ž‘๋™ํ•˜์ง€ ์•Š์ง€๋งŒ positional embedding์„ ์ชผ๊ฐœ interpolation ํ•˜๋Š” ์ƒํ™ฉ์—์„œ๋Š” ๊ดœ์ฐฎ์ง€ ์•Š์„๊นŒ ํ•˜๋Š” ๊ฒƒ. ๊ฒฐ๊ณผ์ ์œผ๋กœ๋Š” ๋œ ๋ง๊ฐ€์ง€๋Š” ์ •๋„์˜ ๊ฒฐ๊ณผ๋Š” ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  Coda Llama๊ฐ€ ๋“ฑ์žฅํ–ˆ๋‹ค. (https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) Code Llama์—์„œ๋„ positional embedding์„ ์กฐ์ž‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ๋Š” RoPE์˜ ํŠน์„ฑ์„ ํ™œ์šฉํ•ด sinusoidal embedding์˜ ์ฃผํŒŒ์ˆ˜๋ฅผ ์กฐ์ž‘ํ•œ ๋‹ค์Œ long context ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด์„œ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค. Claude 2 ์—์„œ์ฒ˜๋Ÿผ ๊ฒฐ๊ณผ์ ์œผ๋กœ 100K ๊นŒ์ง€ perplexity๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ์˜ˆ์œ ๊ทธ๋ž˜ํ”„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ํ•œ ๊ฐ€์ง€ ์งš์–ด๋ณผ๋งŒํ•œ ๊ฒƒ์€ ์ด๋ ‡๊ฒŒ ์งง์€ ๊ธธ์ด์—์„œ ํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹ํ•˜๊ณ  ๊ธด ๊ธธ์ด์— ๋Œ€ํ•ด ํŒŒ์ธํŠœ๋‹ ํ•˜๋Š” ๊ฒƒ์€ Shortformer (https://arxiv.org/abs/2012.15832) ์—์„œ ๋‚˜ํƒ€๋‚œ ๊ฒƒ์ฒ˜๋Ÿผ ํšจ์œจ์ ์ผ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜๋„ ์žˆ๋‹ค๋Š” ๋ถ€๋ถ„์ผ ๋“ฏ ์‹ถ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ์ด๊ฒŒ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ผ๊นŒ? perplexity๊ฐ€ 0.1 ๋–จ์–ด์ง„๋‹ค๋Š” ๊ฒƒ์ด ์–ด๋А ์ •๋„ ์˜๋ฏธ์ธ๊ฐ€? ๋ฌผ๋ก  perplexity 0.1์— ๋ชฉ์ˆจ์„ ๊ฑธ์–ด์•ผ ํ•˜๋Š” ์ƒํ™ฉ์ด๊ธด ํ•˜์ง€๋งŒ, ์–ด์จŒ๋“  long context ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ ์•„์ฃผ ๋งŽ์€ ์ •๋ณด๋ฅผ ์ฃผ๋Š” ๊ฒƒ ๊ฐ™์ง€๋Š” ์•Š๋‹ค. ์ตœ์†Œํ•œ ๋ง๊ฐ€์ง€์ง€๋Š” ์•Š๋Š”๋‹ค ์ •๋„์˜ ๊ฒฐ๊ณผ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

๊ทธ๋ž˜์„œ Coda Llama์—์„œ๋Š” (ํ”ํžˆ ํ•˜๋Š” ๊ฒƒ๊ณผ ๋น„์Šทํ•œ) Key Retrieval ๊ณผ์ œ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ํŠน์ •ํ•œ ์ƒ์ˆ˜๋ฅด ๋ฆฌํ„ดํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ž…๋ ฅํ•ด์ฃผ๊ณ , ๊ธธ์ด์ƒ ๋–จ์–ด์ง„ ์ง€์ ์—์„œ ๊ทธ ํ•จ์ˆ˜์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ํ•จ์ˆ˜์™€ ์งˆ์˜๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”๊ฐ€์— ๋”ฐ๋ผ long context์— ๋Œ€ํ•œ ๋Œ€์‘ ๋Šฅ๋ ฅ์„ ๋Œ€๊ฐ• ๊ฐ€๋Š ํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ 16K context ๋‚ด์—์„œ๋Š” ์ž˜ ๋˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ด๊ณ , ๊ทธ๊ฑธ ๋„˜์–ด๊ฐ€๋ฉด ์™„์ „ํžˆ ์•ˆ ๋˜๋Š” ๊ฒƒ ๊ฐ™์ง€๋Š” ์•Š์€๋ฐ ๊ฑฐ์˜ ์•ˆ ๋˜๋Š” ๊ฒƒ ๊ฐ™์€ ๊ฒฝ์šฐ๋„ ๋ฐœ์ƒํ•œ๋‹ค. perplexity ๊ฐ์†Œ์™€๋Š” ๋ณ„๊ฐœ๋กœ ์›ํ•˜๋Š” ๋Œ€๋กœ ์›€์ง์—ฌ์ฃผ์ง€๋Š” ์•Š๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

๊ทธ ์ด์œ ๊ฐ€ ๋ฌด์—‡์ผ๊นŒ? ์•Œ๊ธฐ๋Š” ์–ด๋ ต์ง€๋งŒ attention์ด extrapolation ์ƒํ™ฉ์—์„œ ๋ง๊ฐ€์ง€์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ๊ณผ ํ•จ๊ป˜ attention์ด long context ์ƒํ™ฉ์—์„œ๋„ ๊ฐ ํ† ํฐ์„ ์ž˜ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜์ง€ ์•Š์€๊ฐ€ ์‹ถ๋‹ค. ํ† ํฐ ์ž„๋ฒ ๋”ฉ์„ ๊ทธ๋ƒฅ ํ‰๊ท  ๋‚ด๊ธฐ๋งŒ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, attention์ด ํ† ํฐ๋“ค์„ ๋Œ€๊ฐ• ๋ญ‰๋šฑ๊ทธ๋ฆฐ๋‹ค๊ณ  ํ•ด๋„ ์˜๋ฏธ๋Š” ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๊ณ , ์„ฑ๋Šฅ์  ํ–ฅ์ƒ์ด ์žˆ์„ ์ˆ˜๋„ ์žˆ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ํ† ํฐ๋“ค์„ ์„ธ๋ถ€์ ์œผ๋กœ ๊ตฌ๋ถ„ํ•ด์„œ ๋ฐ˜์˜ํ•˜๋Š” ์ •๋„์˜ ๋Šฅ๋ ฅ์€ ๋ณด์—ฌ์ฃผ์ง€ ๋ชปํ•  ์ˆ˜๋„ ์žˆ๋‹ค. (https://arxiv.org/abs/2212.10554) ๊ทธ๋ž˜์„œ positional embedding์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ์ข€ ๋” ํ•„์š”ํ•  ๋“ฏ ์‹ถ๋‹ค.

์ด๋ ‡๊ฒŒ ๋ชจ๋ธ์ด long context๋ฅผ ์ž˜ ๋ชจ๋ธ๋ง ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€์™€๋Š” ๋ณ„๊ฐœ๋กœ long context์— ๋Œ€ํ•ด ํ•™์Šต์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”๊ฐ€ ํ•˜๋Š” ๊ฒƒ๋„ ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด data parallel์˜ ๋ฐฐ์น˜ ์ถ•์œผ๋กœ ์ƒ˜ํ”Œ๋“ค์„ ์ชผ๊ฐœ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ sequence ์ถ•์œผ๋กœ๋„ ์ƒ˜ํ”Œ์„ ์ชผ๊ฐœ์„œ parallelํ•˜๊ฒŒ forward ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์–ด๋–จ๊นŒ ํ•˜๋Š” ์ƒ๊ฐ์„ ํ•ด๋ณผ ์ˆ˜๋„ ์žˆ๊ฒ ๋‹ค. ์‚ฌ์‹ค ํŠธ๋žœ์Šคํฌ๋จธ๋Š” attention์„ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ๋ชจ๋“  ๋ ˆ์ด์–ด๋Š” sequence ๋ฐฉํ–ฅ์— ๋…๋ฆฝ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— attention๋งŒ ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด(?) ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ๋‹ค.

Megatron-LM (https://arxiv.org/abs/2205.05198) ๊ฐ™์€ ๊ฒฝ์šฐ์—๋„ sequence parallel์ด ๋“ค์–ด๊ฐ€ ์žˆ๊ธด ํ•˜์ง€๋งŒ, ์ด์ชฝ์€ attention๋ณด๋‹ค๋Š” layer norm ๋“ฑ์—์„œ ๋ฐœ์ƒํ•˜๋Š” activation์„ ์ชผ๊ฐœ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋ณด๋Š” ์ชฝ์ด ๋งž์ง€ ์•Š์„๊นŒ ์‹ถ๋‹ค. ์•„์˜ˆ attention์„ ์ชผ๊ฐœ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ๋Š” ring self attention (https://arxiv.org/abs/2105.13120) ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋‚˜์™”์—ˆ๊ณ , ๋” ์ตœ๊ทผ์—๋Š” all-to-all communication์„ ์‚ฌ์šฉํ•œ ๋” ๋‹จ์ˆœํ•œ ๋ฐฉ๋ฒ•์ด deepspeed์— ๋“ค์–ด์˜ค๊ธฐ๋„ ํ–ˆ๋‹ค. (https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-ulysses) ์‹œํ€€์Šค๋ฅผ ์ชผ๊ฐœ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•ด์ค˜์•ผ ํ•˜๋Š” ๋ถ€๋ถ„์ด ํ•„์š”ํ•˜๊ธด ํ•˜์ง€๋งŒ ๊ทธ ์™ธ์— ๋Œ€ํ•ด์„œ๋Š” all-to-all์„ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•์€ ๊ตฌํ˜„์ด ์ •๋ง ๋‹จ์ˆœํ•˜๋‹ค. (https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/sequence/layer.py) all-to-all๋กœ qkv๋ฅผ ๋ฟŒ๋ ค์ค€ ๋‹ค์Œ output์„ ๋‹ค์‹œ all-to-all๋กœ ์›๋ณต์‹œํ‚ค๋Š” ๋ฐฉ์‹.
์˜ฌํ•ด 3์›”๋ถ€ํ„ฐ AI๋ฅผ ๊ณต๋ถ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ๊ณผ๊ฑฐ ์ปดํ“จํ„ฐ๊ฐ€ ์ง€๊ธˆ์˜ ์ „ ์‚ฐ์—…์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๊ทธ๊ฒƒ๋ณด๋‹ค ๋” ํฐ ์˜ํ–ฅ์„ ์ค„ ๊ฑฐ๋ผ๊ณ  ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. ์•ž์œผ๋กœ 3-5๋…„์ด ์–ผ๋งˆ๋‚˜ ๋น ๋ฅด๊ฒŒ ๋ฐ”๋€”์ง€ ์–ด๋–ป๊ฒŒ ๋ฐ”๋€”์ง€ ์ƒ์ƒํ•˜๊ณ  ๊ทธ ๋ณ€ํ™”๋ฅผ ๋งŒ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์€ ์•„์ฃผ ์„ค๋ ˆ๋Š” ์ผ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ฐฝ์—…์ž์˜ ๊ด€์  ์ด์™ธ์—๋„ ํˆฌ์ž์ž์˜ ๊ด€์ ์—์„œ ์ด ๋ณ€ํ™”๋ฅผ ์–ด๋–ป๊ฒŒ ๋ฐ”๋ผ๋ณด๋ฉด ์ข‹์„๊นŒ์š”? ์ธํ„ฐ๋„ท, ๋ชจ๋ฐ”์ผ, ํด๋ผ์šฐ๋“œ ์›จ์ด๋ธŒ๋ฅผ ์˜ค๋žซ๋™์•ˆ ๊ฒฝํ—˜ํ•˜์‹  Storm Ventures์˜ ๋‚จํƒœํฌ ๋Œ€ํ‘œ๋‹˜์„ ๋ชจ์‹œ๊ณ  'AI ์‹œ๋Œ€ ์–ด๋””์— ํˆฌ์žํ•ด์•ผ ํ• ๊นŒ?'์— ๋Œ€ํ•ด์„œ ์ด์•ผ๊ธฐํ•ด ๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐฉ์ ์ธ ๊ฐ•์˜๋ณด๋‹ค๋Š” AI ํˆฌ์ž์— ๋Œ€ํ•ด์„œ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์—ฌ๋Ÿฌ ์ƒ๊ฐ๋“ค์„ ์ž์œ ๋กญ๊ฒŒ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋Š” ์ž๋ฆฌ๋กœ ๋งŒ๋“ค์–ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. AI์™€ ํˆฌ์ž ๋‘ ๊ฐ€์ง€์— ์ง„์‹ฌ์ด์‹  ๋ถ„๋“ค์„ ๋ชจ์‹œ๋‹ˆ ๋งŽ์€ ๊ด€์‹ฌ ๋ถ€ํƒ๋“œ๋ ค์š” ๐Ÿค—

[AI ์‹œ๋Œ€ ์–ด๋””์— ํˆฌ์žํ•ด์•ผ ํ• ๊นŒ?_Storm Ventures x AGI Town in Seoul]

AI ๊ธฐ์ˆ ์˜ ๋ฏธ๋ž˜์™€ ํˆฌ์ž์— ๊ด€ํ•œ ์ค‘์š”ํ•œ ํ† ๋ก ์„ ์œ„ํ•œ ๋ฐ‹์—…์„ ์ฃผ์ตœํ•ฉ๋‹ˆ๋‹ค. ์Šคํ†ฐ๋ฒค์ฒ˜์Šค(Storm Ventures)์˜ ๋‚จํƒœํฌ ๋Œ€ํ‘œ๋‹˜์„ ๋ชจ์‹œ๊ณ , AI ํˆฌ์ž์™€ ์ฐฝ์—…์— ๊ด€์‹ฌ ์žˆ๋Š” ๋ถ„๋“ค๊ณผ ํ•จ๊ป˜ ์˜๊ฒฌ์„ ๋‚˜๋ˆŒ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

๐Ÿ“… ์ผ์‹œ: 2023๋…„ 9์›” 4์ผ, ์˜คํ›„ 7-9์‹œ
๐Ÿ“ ์žฅ์†Œ: ํŒ€์ŠคํŒŒ๋ฅดํƒ€ ์˜คํ”ผ์Šค (https://goo.gl/maps/Ec88AykC21ZWr7jL7)
๐ŸŽค ํƒ€์ž„ํ…Œ์ด๋ธ”:
- ์ฐธ์—ฌ์ž ์†Œ๊ฐœ (30๋ถ„)
- ๋‚จํƒœํฌ ๋Œ€ํ‘œ๋‹˜: AI ํŠธ๋ Œ๋“œ์™€ ๊ธฐํšŒ (30๋ถ„)
- Q&A ๋ฐ ์ž์œ ํ† ๋ก 

์ขŒ์„์€ 20์„์œผ๋กœ ํ•œ์ •๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ฐธ๊ฐ€ ํ™•์ •์€ 9์›” 2์ผ๊นŒ์ง€ ์ด๋ฉ”์ผ๋กœ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์ด ํ–‰์‚ฌ๋Š” ์˜์–ด๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

@Minjoo Kim ๋‹˜๊ป˜์„œ ๋„์™€์ฃผ์…”์„œ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ๋œ ์„ธ์…˜์ž…๋‹ˆ๋‹ค ๐Ÿ™

๐Ÿ‘‰์ฐธ๊ฐ€์‹ ์ฒญ: https://forms.gle/2Sbg1RLVsiL24JcW8

์ง€๋‚œ 3์›”์— ์ •๋ฆฌํ–ˆ๋˜ ๋…ธํŠธ: https://www.notion.so/matthewcontinuouslearning/AI-Trend-101-March-28-723c41aa1ca54903a270c6801b3724fe?pvs=4
์ตœ๊ทผ ๋ช‡๋ช‡ ๋น…ํ…Œํฌ์˜ AI ์ œํ’ˆ ๊ด€๋ จ๋œ ๋ฐœํ‘œ๋“ค์„ ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋А๋‚Œ

OpenAI: ๋ˆ„๊ฐ€ ๋ญ๋ผ๋“  ๋งˆ์ด์›จ์ด

MS: MS์˜ ๊ทผ๋ณธ์ธ ์—…๋ฌดํˆด์— ๋Œ€ํ•œ ์ž…์ง€๋ฅผ ๋”์šฑ ๊ณต๊ณ ํžˆ ํ•˜๋ ค ํ•จ

Google: ์•„ C๋ฐ” ๋ชจ๋ฅด๊ฒ ๊ณ  ์ผ๋‹จ ๋‚จ๋“ค์ด ํ•˜๋Š” ๊ฑฐ ๋‹ค ํ•จ

Meta: ๋นˆ์ง‘ํ„ธ์ด

Amazon: ์ด๊ธฐ๋Š” ํŽธ ์šฐ๋ฆฌ ํŽธ

๐Ÿคฃ๐Ÿคฃ
Do we really need a dedicated vector store?

This new study suggests that "from a simple costโ€“benefit analysis, there does not appear to be a compelling reason to introduce a dedicated vector store into a modern โ€œAI stackโ€ for search, since such applications have already received substantial investments in existing, widely deployed infrastructure."

There are definitely cost benefits with the proposed alternative (HNSW indexes in Lucene). There is a nice analysis/comparison with alternatives in the paper. Not sure how widely applicable the insights from the experimental results are but still a great read, especially if you are looking to integrate LLMs with external knowledge or memory.

It's also interesting to see the use of Lucene as a counterpoint. I've used Lucene-dependent solutions in the past but they have been notably slow to adapt to new trends in representation learning. That is changing fast.

paper: https://arxiv.org/abs/2308.14963

I also provide weekly summaries of the latest and most important AI research and developments here: https://nlp.elvissaravia.com/
๋ฏธ๊ตญ ๋Œ€๋‹จํ•œ ์ด์œ ๋Š” 10์กฐ ๋˜๋Š” ์ด๋Ÿฐ ํšŒ์‚ฌ๋“ค ๊ณ„์† ๋“ฑ์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. 10์กฐ ํšŒ์‚ฌ๋ฉด ์ฝ”์Šคํ”ผ๋กœ ๋‹น์žฅ ๋“ค์–ด์™€๋„ 30์œ„ ๊ถŒ... ์ฝ”๋กœ๋‚˜ ๋•Œ๋Š” 90์กฐ ๊ฐ€๊นŒ์ด ๋ฐธ๋ฅ˜ ๋จน์–ด์„œ ์ตœ๊ณ ์  ๋Œ€๋น„ํ•˜๋ฉด ์•„์ฃผ ๋งŽ์ด ๊บพ์˜€์ง€๋งŒ ๊ทธ๋ž˜๋„ ์•„์ง๋„ 10์กฐ. ์ฐฝ์—…์ž๋„ ์†”์งํ•˜๊ณ  ๋Œ€ํ™”ํ•˜๋Š” ์Šคํƒ€์ผ๋„ ์‹œ์›ํ•˜๋‹ค. ๋ฏธ๊ตญ์—๋Š” ์ด๋Ÿฐ ํ›Œ๋ฅญํ•œ ๊ธฐ์—…๋“ค ๊ณ„์† ๋‚˜์˜ค๋Š” ์ด์œ ๊ฐ€ ์‹œ์žฅ ํฌ๊ธฐ ๋นผ๊ณ  ๋˜ ๋ฌด์—‡ ๋•Œ๋ฌธ์ผ๊นŒ?

https://youtu.be/9TmnCo8zhCA?si=fXBcjtc-TCAcx1Iu
๐Ÿ‘1
๋ชจ์ž„์˜ ์งง์€ ์š”์•ฝ ใ…Žใ…Ž

https://trevari.co.kr/events/show?eventID=3017cd79-5bd1-4316-9c45-a070fa084bdd

์ˆ˜๋ฉด ์˜์–‘ ์šด๋™ -> ์‚ฌ๋žŒ ์ฑ…-> ๋ณต๋ฆฌํšจ๊ณผ -> ํ€„๋ฆฌํ‹ฐ์žˆ๋Š” ์˜์‚ฌ๊ฒฐ์ • -> ๋ ˆ๋ฒ„๋ฆฌ์ง€ -> ์šด๊ณผ ๋ฆฌ์Šคํฌ ํ…Œ์ดํ‚น
==> ์ธ์ƒ์—์„œ ์›ํ•˜๋Š”๊ฒƒ

1. ์ˆ˜๋ฉด, ์˜์–‘, ์šด๋™์ด ์ธ์ƒ์˜ ํ† ์–‘์ด๋‹ค. ์ด๊ฒŒ ๋˜์•ผ ๋‹ค๋ฅธ ์ผ๋„ ๋” ์ž˜ํ•œ๋‹ค.
2. ์ข‹์€ ์‚ฌ๋žŒ๊ณผ, ์ข‹์€ ์ฑ…์„ ๊ณ์— ๋‘๋‹ค๋ณด๋ฉด ์ข‹์€ ์˜์‚ฌ๊ฒฐ์ •์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์‹ค๋ ฅ์ด ๋Š”๋‹ค.
3. ์ข‹์€ ์˜์‚ฌ๊ฒฐ์ •์„ ํ•ด์•ผ ๋ฆฌ์Šคํฌ ํ…Œ์ดํ‚น์„ ์ž˜ํ•  ์ˆ˜ ์žˆ๊ณ  ์šด๋„ ๋”ฐ๋ผ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
4. ๋ ˆ๋ฒ„๋ฆฌ์ง€๋ฅผ ํ•˜๋ฉด์„œ ๋ถ€๊ฐ€ ๋ช‡๋ฒˆ์”ฉ ํฌ๊ฒŒ ์„ฑ์žฅํ•œ๋‹ค.
5. ์šด์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์„ ํ•œ๋‹ค. ์นœ์ ˆํ•˜๊ณ  ์ฃผ๋ณ€์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ตœ์„ ์„ ๋‹คํ•˜๊ณ  ํ•˜๋ฃจํ•˜๋ฃจ ์Œ“์•„๊ฐ€๋Š” ์˜์—ญ์ด๋‹ค.
6. ์šด/์ข‹์€ ์˜์‚ฌ๊ฒฐ์ •/ ๋ ˆ๋ฒ„๋ฆฌ์ง€ ๋ชจ๋‘ ๋ณต๋ฆฌํšจ๊ณผ(Compound interest)๊ฐ€ ์žˆ๋‹ค.
7. ๊ถ๊ทน์ ์œผ๋กœ ์ธ์ƒ์—์„œ ์›ํ•˜๋Š” ์ผ์„ ํ•˜๋ฉด์„œ ์ž˜ ์‚ฌ๋Š” ๊ฒŒ ์ค‘์š”ํ•˜๋‹ค. ๊ฑด๊ฐ•-> ๋ถ€ -> ๋ฏธ์…˜
โค1