What Is Embedded Systems Software and 8 Ways to Optimize It

Today, embedded systems in small devices are more popular and being used for more purposes than ever. Embedded systems are used in automation (including home, industrial and building automation), consumer, automotive, appliances, medical, telecommunication, commercial and military applications.

It’s been estimated that 98% of all microprocessors manufactured are used in embedded systems.

Modern embedded systems are often based on microcontrollers (i.e. microprocessors with integrated memory and peripheral interfaces), but ordinary microprocessors (using external chips for memory and peripheral interface circuits) are also common, especially in more complex systems.

In either case, the processor(s) used may be types ranging from general purpose to those specialized in a certain class of computations, or even custom designed for the application at hand.

Since the embedded system is dedicated to specific tasks, design engineers can optimize it to reduce the size and cost of the product and increase the reliability and performance. Some embedded systems are mass-produced, benefiting from economies of scale.

Because devices with embedded systems typically have much less computing power and memory than personal computers, it’s more important to optimize the resource usage of these devices than it is on personal computers.

The clock frequency may be a hundred or a thousand times lower (Megahertz vs Gigahertz); and the amount of Random Access memory (RAM) may be a million times less than in a PC (Kilobytes vs Gigabytes).

How to Optimize Embedded Systems Software

The smaller the system, the more important it is to design embedded software that uses less resources. On the smallest devices, you do not even have an operating system.

With well optimized embedded software design, it’s possible to get good performance for many applications, even on such small devices, by avoiding large libraries, graphics frameworks, interpreters, just-in-time compilers, system database, and other extra software layers or frameworks typically used on larger systems.

The best performance on embedded systems is obtained by choosing the features the programming language that use the minimal amount of resources.

The preferred embedded software programming language will often be C or C++. Critical functions or device drivers may better written in Assembly language.

Contrary to what some may think, C++, when used properly can use the same resources as embedded software written in C. Modern C++ has added safety and productivity benefits.

The use of C++ programming language is beneficial for enhancing the reliability and maintainability of the written code and improving the productivity of programming through the application of techniques like object-oriented programming and data abstraction.

Once you have some working code, you should have a pretty good idea of which functions are the most critical for overall code efficiency.

Interrupt services, high-priority tasks, calculations with real-time deadlines, and functions that are either compute-intensive or frequently called are all likely candidates. A tool called a profiler, included with some software development packages, can be used to narrow your focus to those functions in which the program spends most (or too much) of its time.

Now that you’ve identified the functions that require greater code efficiency, one or more of the following techniques can be used to reduce their execution time:

Hand-coded assembly Some software functions are best written in assembly language. This gives the embedded software programmer an opportunity to make them as efficient as possible. Though most C/C++ compilers produce much better machine code than the average programmer, a good programmer can still do better than the average compiler for a given function.

***Using the STL (Standard Template Library) *** Using the parts of the STL, when appropriate, in a microcontroller software project, it’s possible to significantly decrease coding complexity while simultaneously improving legibility, portability and performance.

The STL authors have meticulously optimized algorithms. The programming idioms can be optimized particularly well by the compiler.

C++ is a great language to use for embedded applications and templates are a powerful aspect of it. The standard library offers a great deal of well tested functionality, but there are some parts that do not fit well with deterministic behaviour and limited resource requirements. These limitations usually prevent the use of STL containers with the default (std::allocator), because they dynamically allocate memory.

The embedded template library has been designed for lower resource embedded applications. It defines a set of containers, algorithms, and utilities, some of which emulate parts of the STL. There is no dynamic memory

The embedded template library makes no use of the heap. All the containers (apart from intrusive types) have a fixed capacity allowing all memory allocation to be determined at compile-time.

Use Template Metaprogramming to Unroll Loops Template metaprogramming can be used to improve code performance by forcing compile-time loop unrolling.

Fixed-point arithmetic Unless your target platform includes a floating-point coprocessor, you’ll pay a very large penalty for manipulating float data in your program. The compiler-supplied floating-point library contains a set of software subroutines that emulate the instruction set of a floating-point coprocessor. Many of these functions take a long time to execute relative to their integer counterparts and also might not be reentrant.

To avoid potentially slow floating-point emulation libraries manipulating 32-bit single-precision float or even 64-bit double-precision double, you can use integer-based fixed-point arithmetic.

A fixed-point number is an integer-based data type representing a real-valued fractional number, optionally signed, having a fixed number of integer digits to the left of the decimal point and another fixed number of fractional digits to the right of the decimal point. Fixed-point data types are commonly implemented in base-2 or base-10. Fixed-point calculations can be highly efficient in microcontroller programming because they use a near-integer representation of the data type.

Polling Interrupt service routines are often used to improve program efficiency. However, there are some rare cases in which the overhead associated with the interrupts actually causes an inefficiency. These are cases in which the average time between interrupts is of the same order of magnitude as the interrupt latency. In such cases it might be better to use polling to communicate with the hardware device.

Interrupt service routines and device drivers are particularly critical because they can block the execution of everything else. This normally belongs to the area of system programming, but in applications without an operating system this is the job of the application programmer.

An interrupt service should do as little work as possible. Typically it should save one unit of received data in a static buffer or send data from a buffer. It should not respond to a command or do other input/output than the specific event it is servicing.

A command received by an interrupt should preferably be responded to at a lower priority level, typically in a message loop in the main program.

Inline functions In C++, the keyword inline can be added to any function declaration. This keyword makes a request to the compiler to replace all calls to the indicated function with copies of the code that’s inside. This eliminates the runtime overhead associated with the actual function call and is most effective when the inline function is called frequently but contains only a few lines of code. Inline functions are an example of how execution speed and code size are sometimes inversely linked.

The repetitive addition of the inline code will increase the size of your program in direct proportion to the number of times the function is called. And, obviously, the larger the function, the more significant the size increase will be. The resulting program runs faster, but now requires more ROM.

Reducing Memory Usage In some cases, it’s RAM (Random Access Memory) rather than ROM (Read Only Memory) that is the limiting factor for your application. In these cases, you’ll want to reduce your dependence on global data, the stack, and the heap. These are all optimizations better made by the programmer than by the compiler. Because ROM is usually cheaper than RAM (on a per-byte basis), one acceptable strategy for reducing the amount of global data might be to move constant data into ROM. This can be done automatically by the compiler if you declare all of your constant data with the keyword const. Most C/C++ compilers place all of the constant global data they encounter into a special data segment that is recognizable to the locator as ROM-able. This technique is most valuable if there are lots of strings or table-oriented data that does not change at runtime.

C++ Constructors and destructors C++ Constructors and destructors have a slight performance penalty associated with them. These special member functions are guaranteed to be called each time an object of the type is created or goes out of scope, respectively. However, this small amount of overhead is a reasonable price to pay for fewer bugs.

Constructors eliminate an entire class of C programming errors having to do with uninitialized data structures.


Embedded systems present some special software development challenges due to their limited resources, requirements for real time availability and reliability. By following best practises, with well optimized embedded software design, it’s possible to have an efficient system with good performance.