Exploring the limits of current AI architectures, and why "more data" is unlikely to get us to "AGI"

Fady Bark
0 replies
Hi all! So I've been hearing this a lot recently "Scale won't get us to AGI", it seems that, at a certain point, feeding more data to an LLM in training no longer results in huge performance gains, and this has led many to question if the transformers architecture is capable of reaching AGI. Transformers are supposed to be better than their predecessors because they can process entire sequences simultaneously, and perhaps more importantly, assign importance to words by using an attention layer, this is what allows LLMs to process text in various formats, translate, write, and do other seemingly creative things, whereas without transformers, importance would end up being determined largely by the order in which they appear, good enough for very simple tasks, but human language just doesn't work like that in most cases. With this mechanism, AI researchers were able to mimic human attention, the transformers architecture itself was first described in a research paper titled "Attention is all you need". ChatGPT, which is based on the transformer architecture generated a lot of buzz, but it then became apparent that a lot of work still needs to be done for AI to become what everyone envisions it to be, critics often rightfully point out ChatGPT and similar models are especially lacking in reasoning, essentially what they're doing is predicting the most likely "next token" given a prompt, "aka completion", and this (kinda) straightforward approach got us pretty far, ChatGPT may not be able to unify quantum mechanics and classical mechanics yet, but we've come a long, long way, and as always, we want more, and it seems we've reached the near-limit of what can be done with transformer-based models alone. Fast forward to today, and I came across an interesting article on Reuters, OpenAI is now working on a new model called "Strawberry", specializing in "advanced reasoning", not much is known about the model or its architecture and the company is very secretive about it even within its own ranks, but the idea seems to be to create a model that doesn't just respond to prompts but also plans for many steps ahead, and is also connected to a computer using agent (CUA), the company says this is meant to allow the model to do "deep research", news about this model seem to be pretty vague though. If this model is released, what things can you think of that would be possible with it that are impossible or very impractical with current LLM technology?
🤔
No comments yet be the first to help