Why Optimization Is the Engine Behind AI

19 Sep 2025 Master AI DataEngineering, Symbolic and Evolutionary Artificial Intelligence 11 min read

Optimization is at the heart of modern artificial intelligence. Even though it might seem redundant to revisit this topic after having studied it in courses like “Optimization Methods and Game Theory,” its centrality in AI justifies another look. In fact, optimization is the underlying mechanism that enables the vast majority of learning algorithms. Only a small subset of AI problems relies on methods like constraint satisfaction or other non-optimization-based techniques. For everything else—especially in learning—optimization is key.

Take deep neural networks as an example. Their success in fields ranging from computer vision to natural language processing is largely due to effective optimization algorithms, typically deterministic and local in nature.

Understanding Multi-Agent Systems

One of the most general models that unifies many AI paradigms is the multi-agent system. These are systems composed of multiple agents that can interact with their environment: they gather information, take actions, and sometimes receive feedback in the form of rewards. This framework is highly suitable for modeling both multi-objective optimization problems and reinforcement learning scenarios.

Exploring the Types of Optimization

Optimization problems can be categorized in several ways:

Deterministic vs. Stochastic
Local vs. Global
Single-agent vs. Population-based
Heuristic vs. Meta-heuristic
Single-objective vs. Multi- or Many-objective

Local vs. Global Search

Local optimization techniques focus on finding solutions within a nearby region of the search space. Common deterministic methods include gradient descent, Newton’s method, and algorithms like ADAM and ADAMAX. These methods work well in many contexts but may get trapped in local minima.

In contrast, global optimization methods aim to explore the entire search space. Deterministic global algorithms, such as those based on space-filling curves, are effective in low-dimensional problems and often rely on properties like Lipschitz continuity.

Deterministic and Stochastic Approaches

Deterministic search strategies typically follow a fixed path given an initial state. They are often used in local optimization, although exceptions exist—like stochastic gradient descent, which introduces randomness to avoid local minima.

Stochastic search methods, on the other hand, rely on random sampling. These are more commonly used in global optimization. One of the earliest and most notable stochastic algorithms is Simulated Annealing, which remains relevant today, especially in the context of quantum computing.

Meta-Heuristics: A Modern Evolution

Historically, algorithms that relied on intuitive rules or domain-specific shortcuts to find solutions—especially in complex or poorly understood spaces—were known as heuristics. These approaches aim to produce good-enough solutions efficiently, without guaranteeing optimality. Examples include greedy algorithms or rule-based methods.

Over time, researchers began developing more sophisticated strategies that could adapt and guide these basic heuristics, especially in large and complex search spaces. These are called metaheuristics—a term introduced by Fred Glover in the 1980s. A metaheuristic is not a standalone solution method, but a framework that orchestrates the search process, balancing exploration and exploitation, often by leveraging randomness. Its role is to steer the underlying heuristics toward better solutions, improving both the quality and diversity of the results.

In essence, while heuristics are problem-specific shortcuts, metaheuristics are general-purpose strategies designed to enhance and coordinate them effectively.

These methods are particularly effective for global optimization tasks. Many of them are inspired by nature and include algorithms like Genetic Algorithms, Ant Colony Optimization, and Particle Swarm Optimization. Each of these draws on biological principles to explore complex search spaces in clever, efficient ways.

Why Random Search Alone Isn’t Enough

A naive approach might suggest using pure random search to find optimal solutions. However, such methods are generally inefficient because they distribute their search effort uniformly across the entire space. While uniform sampling has its place—such as in Monte Carlo methods—it is not suitable for optimization problems where directed search is essential.

The Power of Swarm Intelligence

Swarm intelligence refers to the collective behavior of decentralized systems, whether natural or artificial. Introduced by Gerardo Beni and Jing Wang in 1989, this concept is rooted in the behavior of biological systems like ant colonies, bird flocks, and fish schools. These systems consist of simple agents that follow local rules without any central control. Nevertheless, their interactions give rise to surprisingly intelligent global behaviors.

Applications of swarm intelligence span a wide range, from robotics to forecasting, and even synthetic biology. The principles behind swarm intelligence also form the basis of several powerful optimization algorithms.

Our First Meta-Heuristic: Genetic Algorithms

Genetic Algorithms (GAs) are among the most influential metaheuristics. Inspired by Darwinian evolution, GAs mimic natural selection through processes like crossover and mutation.

Candidate solutions in GAs can be represented as binary strings or vectors of real numbers. Binary strings are easy to implement but may turn a continuous problem into a harder combinatorial one. Real-valued representations maintain the continuity of the problem space.

In GAs, crossover combines genetic material from two parent solutions to create offspring, while mutation introduces random changes. A key design choice is whether or not to use elitism—the practice of always carrying forward the best solutions. In the multi-objective case, elitism involves preserving all non-dominated solutions.

There are several versions of GAs:

Steady-state algorithms that use mutation alone.

WHILE (terminating condition not met):
    Choose one individual at random from the current population
    (optionally based on fitness)
    Apply mutation to generate a new individual
    Evaluate its fitness
    Add the new individual to the current population
    Remove the worst individual from the population

Versions that include both crossover and mutation.

WHILE (terminating condition not met):
    Choose two individuals at random from the current population
    (optionally based on fitness)
    Apply the crossover operator to generate two offspring
    Apply mutation to each offspring
    Evaluate the fitness of both offspring
    Add the offspring to the population
    Remove the two worst individuals from the population

Canonical generational models (not steady state) that produce new populations at each step.

Set t = 0
Randomly create an initial population Pt of size pop_size
Evaluate the fitness of each individual in Pt

WHILE (terminating condition not met):
    Create a new population Qt from Pt (same size)
    Evaluate the fitness of each individual in Qt
    Combine Pt and Qt to form Rt (size = 2 * pop_size)
    Select the best pop_size individuals from Rt to form Pt+1
    Set t = t + 1

Why Optimization Is the Engine Behind AI 01

## Selection and Constraints

Before diving into selection and constraints, it’s important to understand how the new offspring population (Qt) is generated from the current parent population (Pt) in a generational genetic algorithm. The goal is to build a new population of the same size, one pair of offspring at a time. Here’s a pseudocode representation of the process:

WHILE (size(Qt) < pop_size):
    Select the first parent using Binary Tournament Selection (BTS)
    Select the second parent using BTS again
    Apply crossover to generate two offspring
    Apply mutation to each offspring
    Add both offspring to Qt

A common selection method is Binary Tournament Selection. This method compares two individuals and picks the one with better fitness. If one is infeasible (violates constraints), it is discarded; if both are, the one with fewer violations is chosen. This approach works well even when constraints are present.

Constraint handling can also involve penalty functions or changes to the mating process to prefer feasible individuals.

Beyond Single Objectives: The Need for Multiobjective Optimization

Many real-world problems involve multiple objectives that can conflict. For instance, in designing a neural network, one might need to balance accuracy with complexity. Multiobjective optimization does not combine objectives into a single scalar value but rather seeks to explore the trade-offs between them.

Goal Programming

Before proceeding, it’s important to distinguish between Multi-Objective Optimization and Goal Programming (GP), as they rely on fundamentally different principles.

In Goal Programming, the user explicitly provides a desired set of target values for the objective functions. These are often denoted as a goal vector: $[f₁(x), f₂(x), …, fₘ(x)]$. The aim of the optimization is then to find a solution whose objective values are as close as possible to these target values.

This is typically achieved by converting the multi-objective problem into a single-objective one through a process called scalarization. The most common approach involves calculating the Euclidean distance between the objective values of the current solution and the user-defined goal. The optimization process seeks to minimize this distance.

Graphically, the goal can be represented as a point in the objective space — marked in blue — while the best current solution is shown as a red point. The algorithm evaluates how close the red point is to the blue one and continues to refine the solution until the distance is minimized.

Because the method focuses solely on minimizing a scalar distance to a fixed point, the concept of Pareto dominance does not apply here. As a result, Goal Programming has more in common with traditional single-objective optimization techniques than with true multi-objective approaches that aim to generate a diverse set of trade-off solutions.

Pareto Dominance and the Efficient Frontier

A Multiobjective Optimization Problem (MOP) involves simultaneously optimizing two or more conflicting objective functions. It is typically expressed as: $$\min ⁡F(x) = (f_1(x),f_2(x),…,f_m(x)) \text{ subject to }x \in D$$ Here, $F(x)$ is the objective vector consisting of individual functions $f_i(x)$, and $D$ is the search space which can include discrete, continuous, or mixed variables.

By nature, many real-life problems have multiple objectives. Decision makers (DM) or modellers don’t know how to combine them into a single one. DMs want to know trade-off relationship among these objectives

In multiobjective optimization, a solution is said to dominate another if and only if it is at least as good in all objectives and better in at least one. The set of non-dominated solutions is called the Pareto Set (in the search space) and its image in the objective space is known as the Efficient Frontier. In the following example, $B$ dominates $A$, while $B$ and $C$ are not comparable.

Why Optimization Is the Engine Behind AI 02

A solution is Pareto optimal if there is no other solution that dominates it.

From a decision-making perspective, a rational decision maker is typically not interested in solutions that are not Pareto optimal, as there exist better trade-offs in the solution space.

The Pareto Set (PS) is the set of all Pareto optimal solutions in the search space (X-space).

The Efficient Frontier (EF), also known as the Pareto Frontier or Efficient Set, is the image of the Pareto Set in the objective space (F-space).

Why Optimization Is the Engine Behind AI 03

A practical example: consider optimizing a neural network where one axis represents model complexity and the other represents validation error. Simple models are easy to deploy but less accurate; complex ones perform better but are harder to manage. The goal is to find solutions on the Pareto Frontier that offer the best trade-offs.

Challenges with Weighted Sums

Using a weighted sum of objectives to reduce a multiobjective problem to a single objective often fails to find solutions in concave regions of the Efficient Frontier. Some solutions are simply unreachable through scalarization.

For instance, here Point A and E can be obtained, by selecting an appropriate combination of the weights. However, there is no combination of weights that will generated point D, since it is located on a concave portion of the efficient set!

Why Optimization Is the Engine Behind AI 04

### Strategies for Decision Making

There are three main strategies when dealing with multiple objectives:

A-priori: define preferences before the search.
A-posteriori: generate solutions first, then choose.
Interactive: evolve preferences alongside the optimization process.

Archives and Salient Points

Some algorithms maintain external archives of non-dominated solutions. These archives can be either bounded or unbounded. They help in keeping track of progress and ensuring diversity in solutions.

Objectives of Multiobjective Optimizers

In multi-objective optimization, the ultimate goal is to approximate the entire Efficient Frontier (EF), which represents the set of optimal trade-offs among objectives. However, this frontier is typically continuous and not composed of discrete points, which poses a fundamental limitation: no numerical algorithm can provide an infinite set of solutions to fully describe it.

In fact, the EF can only be expressed analytically in closed mathematical form is very limited, idealized cases. In practice, we must settle for a finite, high-quality approximation that is as close as possible to the true EF.

If the optimizer is not population-based—such as in the case of Simulated Annealing—one practical approach is to repeatedly solve a scalarized version of the multi-objective problem using different combinations of weights (w₁, …, wₘ). This strategy yields a set of solutions, each optimized for a specific trade-off encoded in the weights.

Population-based optimizers, like Genetic Algorithms, offer a distinct advantage in this context. They can generate a diverse set of high-quality solutions along the Efficient Frontier in a single run of the algorithm, making them particularly effective for multi-objective problems.

As a result, the three meta-targets that an effective population-based multi-objective optimizer should aim to achieve are:

Proximity: Generate solutions as close as possible to the true Efficient Frontier.
Diversity: Explore the extremes of the frontier to capture the widest possible range of trade-offs.
Uniformity: Distribute the solutions as evenly as possible along the frontier, which is especially difficult when the frontier is discontinuous.

These objectives are critical to ensure that decision-makers can analyze and choose among a rich, well-distributed set of optimal solutions.

Different optimizers have different strengths. One may find solutions close to the frontier but fail to cover it entirely. Another might cover a broad range but produce solutions far from optimal. A third could distribute solutions evenly but miss critical areas. Ideally, a good optimizer strikes a balance between these trade-offs.

Families of Multiobjective Genetic Algorithms

There are several main families of algorithms designed for multiobjective optimization:

Pareto-Dominance Based MOEAs: NSGA-II, PAES, SPEA2, etc.
Indicator-Based MOEAs: SMS-EMOA, HyPE, CHEA.
Decomposition-Based: MOEA/D, MOGLS, C-MOGA.

Each family brings unique strengths depending on the problem structure and goals.