Heterogenous AI Systems — A Return to Rigor

Mar 20

After close to 3 years of rapid development in generative AI, the last couple quarters have seen a paradigm shift. We are transitioning from a prompt driven world — where LLMs responded to static instructions — to one dominated by agentic AI, where the vision is of systems that can autonomously make decisions, set goals, plan, and use tools.

This evolution reflects not just a technological shift but also a redefinition of what we expect intelligent systems to do. No longer confined to generating text or images, agents can be designed to operate in more complex environments, learning and adapting along the way.

This agentic paradigm introduces a key shift in use cases. The early promise of generative AI was in:

- Content creation

- Search enhancement

- Summarization

- Code generation

These are all primarily single-turn tasks. These models, rooted in transformer architectures and trained on vast corpora, excel at next-token prediction.

Now, as we seek to build more autonomous AI systems that can make decisions and act in enterprise usecases, token prediction is not sufficient for reasoning or decision-making. For multi-step, real-world workflows—for example, an agent that helps a manager resolve a blocker, follows up with a direct report, provides proactive alerts when a project is at risk of going off-track—we need to build agents that can not just talk (like ChatGPT), but also use ML models to do reasoning and take actions.

In enterprise and industry, the smarter these get, and the more autonomous they are — the wider their scope of application, across a variety of verticals. In healthcare, imagine better drug discovery, using reinforcement learning to navigate chemical spaces and design novel molecules. In finance, agentic systems for portfolio optimization, fraud detection, and real-time market analysis, where adaptability and decision-making are crucial. Similarly, in project management, agentic tools can manage workflows, assign tasks, optimize schedules, and manage interdependencies across teams.

This new class of systems requires more than powerful LLMs. Predicting the next token in a sentence — the foundation of most generative models — is fundamentally different from reasoning over multiple steps or making decisions with incomplete information.

Agentic systems must incorporate elements of classical AI, such as reasoning, planning, inference, deep learning and memory.

In other words, we are now entering a new phase that prioritizes models capable of inference, reasoning, and decision-making, drawing on core machine learning principles that long predate the transformer revolution. These systems should be able to interact with environments and make inferences from unstructured data. They don’t operate as pattern completers the way LLMs do. They don’t predict the next token but the next action to take, keeping in mind the full picture of signals, rewards, goals. This is reasoning. In the long term, we need models that go beyond curve fitting — models that represent knowledge explicitly, perform causal inference, and learn to act.

The future of AI is thus not simply about scaling transformers, but composing heterogeneous systems — where different components are responsible for language, vision, action, reasoning, and memory. These architectures are closer to the original vision of AI: systems that understand, reason, and generalize.

LLMs will still play a role, particularly as language interfaces or semantic retrievers and particularly for use cases where generation of content is still the main goal (e.g. code gen, search, research, product documentation) or where a pre-defined set of instructions can be followed (e.g. in simpler use cases within customer service and sales). But the intelligence of the future will not be monolithic. It will be modular, multi-model, and grounded in models that reason, not just predict. This is about integrating the best of classical AI with the power of modern learning systems — to build machines that not only talk, but think.

This shift also addresses the alignment and safety concerns inherent in autonomous agents. Unlike LLMs, which often hallucinate, symbolic systems can be verified and audited. Reasoning engines can be constrained by formal logic. Reinforcement learning can be bounded by safety layers or reward shaping. This makes them more viable for domains like autonomous vehicles, finance, or medicine, where the stakes are higher and non-deterministic outputs are not acceptable.

The Return of Structured Models: Inference and Representation

A new wave of AI systems is returning to core ideas from probabilistic modeling, symbolic reasoning, and structured learning. For instance:

Bayesian Networks and Markov Random Fields provide a formalism for encoding conditional dependencies and performing exact or approximate inference using methods like belief propagation, variational inference, or MCMC sampling. These are especially powerful in decision support systems, such as medical diagnosis or financial risk modeling.
Causal Inference Frameworks (e.g., Pearl’s Structural Causal Models) enable reasoning about interventions (do-calculus) and counterfactuals — essential for scientific discovery, policy analysis, and AI alignment. Models like Invariant Risk Minimization (IRM) and causal graph neural networks seek to learn representations that generalize across environments.
Graph Neural Networks (GNNs) and message-passing architectures enable reasoning over relational structures — from molecules to social networks — where the topology encodes inductive bias. GNNs are already outperforming LLMs in domains like drug discovery, traffic prediction, and knowledge graph completion (see: Wu et al., 2020).
Energy-Based Models (EBMs), like Hopfield Networks and modern contrastive variants (e.g., Score Matching, Denoising Score Networks), model joint distributions and support inference via optimization. These are being revisited for tasks like program synthesis, scene understanding, and planning.
Reinforcement Learning (RL) remains the dominant framework for sequential decision-making under uncertainty. Approaches such as model-based RL (e.g., MuZero), offline RL, and hierarchical RL are increasingly integrated with symbolic representations or planning graphs to bridge learning and reasoning.
Hybrid Neuro-Symbolic Systems, like the Neuro-Symbolic Concept Learner (IBM) or DeepProbLog, blend differentiable modules with logical inference engines. These systems exhibit systematic generalization and are more data-efficient than purely neural models in tasks requiring abstraction and composition.

LLMs No Longer The Center of Gravity for Applied AI

The dominant architectural pattern emerging now is not a single model that does everything, but multi-agent systems or modular cognitive architectures. Inspired by classical models like SOAR or ACT-R. These systems use LLMs as language interfaces or weak generalists, surrounded by specialized modules for planning, memory, reasoning, or environment simulation.

This aligns with recent work in agentic architectures like:

AutoGen (Microsoft)
CAMEL (Collaborative Agents for Multi-step Execution)
Voyager (Embodied LLM agent for Minecraft)
Eureka (RLHF-based reward model for learning behaviors in simulation)

In each of these, language models are used for high-level intent expression, while execution and reasoning rely on external modules or controllers that are optimized for structure, adaptability, and correctness.

The next wave of progress in AI will come from reintegrating theoretical foundations of inference, structure, and decision-making into our systems. While incredibly powerful tools, LLMs no longer have to be the center of gravity for applied AI.

Instead, the future is a hybrid AI stack — one that combines language, vision, planning, simulation, and symbolic manipulation into systems that can not only generate, but reason, adapt, and decide.

The road ahead will echo the original ambitions of AI: not just mimicking intelligence, but understanding it.

Saba Gul

Heterogenous AI Systems — A Return to Rigor

The Return of Structured Models: Inference and Representation

LLMs No Longer The Center of Gravity for Applied AI

Human-AI Teaming Requires Going Beyond LLMs

Solito, A Book Review