Code Optimization Techniques for Embedded Systems

How to Boost ESP32 Performance

November 8, 2025 by Alessandro Colucci
How to Boost ESP32 Performance Image

Optimizing code for embedded systems is a critical skill for engineers and developers working with constrained hardware. The ESP32, a powerful and versatile microcontroller, is widely used in IoT, robotics, and sensor applications. While it offers impressive capabilities, achieving peak performance requires a deep understanding of code optimization techniques, memory management, and timing considerations.

Although the ESP32 is powerful, understanding how to efficiently use its resources ensures your projects run faster, consume less power, and behave predictably.

In this article, we will explore practical and advanced strategies to optimize your ESP32 firmware, from compiler-level tricks to algorithmic improvements, memory handling, and RTOS-level optimizations. By the end, you will have actionable methods to improve both speed and efficiency in your embedded projects.

This guide is structured to help you move from basic tweaks to deeper firmware strategies, giving you tangible improvements in your ESP32 projects.

 

Why Code Optimization Matters in Embedded Systems

Embedded systems, unlike general-purpose computers, are constrained by limited CPU speed, memory, and power. Poorly optimized code can lead to:

    • Slow response times in real-time applications
    • Increased power consumption
    • Unpredictable system behavior

Even minor inefficiencies can have a big impact in constrained environments.

By understanding these pitfalls, you can write code that not only works, but performs consistently under all conditions.

For instance, consider a sensor polling loop running on an ESP32:

void loop() {
    int sensorValue = analogRead(34);
    delay(10);
}

This simple loop seems fine, but in multitasking or high-frequency scenarios, it can introduce delays and jitter, affecting critical timing.

 

Understanding the ESP32 Hardware Architecture

Before optimizing, it’s essential to understand the ESP32 architecture:

Knowing how your microcontroller works internally helps you place code and data in the right memory, and schedule tasks efficiently.

    • Dual-core Tensilica LX6/LX7 CPUs: Allows parallel execution of tasks. Mismanaging tasks across cores can cause contention.
    • Memory hierarchy:
      • IRAM (Instruction RAM): Fast memory for critical functions.
      • DRAM (Data RAM): General-purpose RAM.
      • Flash: Stores code and constants, slower access.
      • PSRAM (if available): Extra memory, slightly slower than DRAM.
    • Peripherals and caches: Knowing which memory regions are cached affects performance.

Placing critical routines in IRAM and minimizing cache misses improves responsiveness, especially in real-time applications.

ESP32 Architecture Image

ESP32 Architecture Image

This block diagram illustrates the main components of the ESP32, helping visualize where optimization matters most.

ESP32 SRAM Allocation

ESP32 SRAM Allocation

Understanding SRAM allocation allows you to place time-critical data in fast memory and avoid performance bottlenecks.

 

Compiler-Level Optimization Techniques

The ESP32 uses GCC via ESP-IDF or PlatformIO. Compiler optimizations are the first line of defense for code efficiency.

Even before touching your code logic, the compiler can help improve execution speed and reduce memory usage.

GCC Optimization Levels
Flag Focus When to use
-O0 No optimization Debugging
-O1 Minimal Reduce code size slightly
-O2 Balanced General optimization
-O3 Maximum Speed CPU-intensive loops
-Os Optimized for Size Memory-limited applications
-Ofast Aggressive speed Non-standard behavior possible

 

Choosing the right optimization flag balances speed, memory usage, and deterministic behavior. Profiling is key before changing flags.

In PlatformIO, you can set optimization flags in platformio.ini:

[env:esp32dev]
platform = espressif32
board = esp32dev
framework = arduino
build_flags = -O2 -flto

This allows you to fine-tune performance without changing source code.

Useful Qualifiers
    • inline: Suggests function inlining to reduce call overhead.
    • const and restrict: Help compiler optimize memory access.

Using these keywords helps the compiler produce faster, more efficient code.

inline int square(const int x) {
     return x * x;
}

Small changes like inlining frequently called functions can noticeably reduce execution time in critical loops.

 

Function Placement and Memory Optimization

Critical functions should be placed in IRAM for faster execution. ESP-IDF provides IRAM_ATTR:

void IRAM_ATTR onTimer() {
     // Time-critical ISR code
}

Placing interrupt service routines (ISRs) in IRAM reduces jitter and ensures deterministic timing.

 
Profiling ISR latency
hw_timer_t *timer = NULL;

void IRAM_ATTR onTimer() {
    static volatile int count = 0;
    count++;
}

void setup() {
    timer = timerBegin(0, 80, true); // 80 prescaler for 1 MHz
    timerAttachInterrupt(timer, &onTimer, true);
    timerAlarmWrite(timer, 1000, true); // 1 kHz
    timerAlarmEnable(timer);
}

This setup demonstrates how to measure and minimize ISR latency.

 
Static Data Placement
    • Store large, rarely changing arrays in PROGMEM (Arduino) or FLASH_ATTR (ESP-IDF).
    • Use PSRAM for temporary buffers that exceed internal RAM.
const uint8_t lookupTable[256] PROGMEM = { /* values */ };

Efficiently placing data ensures fast access to frequently used values while keeping scarce RAM free for runtime operations.

 

Algorithmic Optimization

Micro-optimizations are useful, but algorithmic efficiency is paramount.

Replace Float with Integer Math
// Slow
float average = (float)(sum) / count;

// Optimized
int average = sum / count; // integer division

Integer operations are much faster than floating-point on microcontrollers, making this change critical in performance-sensitive loops.

 
Use Lookup Tables
// Example: sine wave lookup
const int16_t sineTable[360] = { /* precomputed values */ };
int getSine(int angle) {
    return sineTable[angle % 360];
}

Precomputing values avoids costly runtime calculations.

 
Avoid Redundant Calculations
// Before
for (int i=0; i<n; i++) {
     float val = sin(i * PI / 180.0); // recalculated every iteration
}

// After
float step = PI / 180.0; //calculated only one time
for (int i=0; i<n; i++) {
    float val = sin(i * step);
}

Reduce repetitive computations to save cycles and improve loop efficiency.

 

RTOS-Level Optimization (FreeRTOS on ESP32)

ESP32 often runs FreeRTOS, where task scheduling impacts performance:

    • Assign task priorities correctly.
    • Pin tasks to cores to reduce contention.
    • Monitor stack usage and avoid over-allocation.

Proper task management ensures your critical code runs predictably and efficiently.

Example: Task Pinning
void highPriorityTask(void *pvParameters) {
    while(1) {
        // critical loop
        vTaskDelay(1);
    }
}

void setup() {
    xTaskCreatePinnedToCore(highPriorityTask, "HighTask", 2048, NULL, 2, NULL, 1);
}
 
Profiling Tasks
vTaskGetRunTimeStats(buffer); // Provides execution time per task

Profiling lets you identify bottlenecks and adjust tasks to maximize CPU efficiency.

 

Profiling and Benchmarking Tools

To optimize effectively, measure before you optimize:

    • esp_timer_get_time(): microsecond timing.
    • ESP-IDF performance monitor (esp_log_level_set, esp_timer)
    • PlatformIO map files: analyze memory footprint.

Profiling provides insight into where optimization efforts will have the greatest impact.

Example: Benchmarking Function Execution
uint64_t start = esp_timer_get_time();
myCriticalFunction();
uint64_t end = esp_timer_get_time();
Serial.printf("Execution time: %llu us\n", end - start);

This simple measurement allows you to compare optimizations and verify improvements.

 

Power-Aware Optimization

Performance and power often conflict. Consider:

    • Deep sleep for low-duty sensors.
    • Frequency scaling: lower CPU frequency when possible.
    • Batching sensor reads to reduce wake cycles.
esp_sleep_enable_timer_wakeup(1000000); // 1 second
esp_deep_sleep_start();

Reducing energy usage is critical for battery-powered projects without compromising essential operations.

 

Practical Workflow: From Measurement to Improvement

    1. Identify hotspots with profiling tools.
    2. Measure execution time and memory usage.
    3. Apply compiler and memory optimizations.
    4. Validate functionality and repeat.

Iterating through this workflow ensures each optimization delivers measurable benefits.

Integrate this workflow into PlatformIO Tasks:

[env:esp32dev]
extra_scripts = pre:benchmark.py

Automating benchmarking reduces manual effort and keeps optimizations safe.

 

Common Pitfalls and Anti-Patterns

    • Overusing inline or volatile.
    • Placing time-critical routines in Flash instead of IRAM.
    • Relying solely on compiler optimizations without profiling.
    • Ignoring multicore synchronization issues.

Being aware of these pitfalls prevents subtle bugs and performance issues.

 

Conclusion: Optimization as a Continuous Process

Optimization is iterative, measurable, and essential in embedded systems. By combining:

    • Compiler-level tuning
    • Memory-aware function placement
    • Algorithmic improvements
    • RTOS-level task management

You can achieve significant performance gains on the ESP32 and build reliable, efficient embedded systems.

💡 Ready to take your ESP32 firmware to the next level?

Try Please Code Generator today to automatically analyze and optimize your embedded code for peak performance, streamlining the path from measurement to improvement.

Chat with us on WhatsApp