TinyML Explained: Running Machine Learning on Microcontrollers

TinyML is the practice of running machine learning inference on microcontrollers and other severely resource-constrained hardware — devices with milliwatts of power budget, kilobytes of RAM, and megabytes (or less) of flash storage. It is the engineering discipline that sits at the intersection of machine learning and embedded systems, and it is reshaping what connected devices can do without a cloud connection.

In 2026, TinyML is no longer a research curiosity. It is a production technology shipping in billions of consumer devices, industrial sensors, and medical wearables. Keyword spotting chips in earbuds, anomaly detection modules on factory floors, and fall detection algorithms in wrist-worn health monitors all rely on TinyML. This article gives you the complete foundation: what TinyML is, how the hardware enables it, which tools are used, and how to get started.

What Makes TinyML Different from Regular ML?

Most machine learning runs on servers with gigabytes of RAM, fast SSDs, and GPUs or TPUs providing teraflops of compute. TinyML inverts every one of those assumptions.

A typical TinyML target might be a Cortex-M4 running at 64 MHz with 256 KB of SRAM and 1 MB of flash, consuming 3 mW at active processing. The constraints cascade:

No dynamic memory allocation: heap fragmentation is unacceptable in embedded systems, so inference must run from a pre-allocated static tensor arena
No operating system (or a minimal RTOS): no virtual memory, no dynamic loader, no file system
Fixed-point arithmetic: most MCUs lack floating-point hardware, so models must use INT8 or INT16 weights and activations
Size budget: the model, its weights, and the inference runtime must all fit in flash alongside the application firmware

These constraints force a completely different approach to model design compared to cloud ML. You do not run a ResNet-50 on a Cortex-M0+. You design a purpose-built architecture — often a shallow CNN, decision tree ensemble, or LSTM — sized to your target’s exact memory and compute profile.

The Hardware Enabling TinyML

Cortex-M Series MCUs

ARM’s Cortex-M series is the dominant TinyML hardware platform. The key variants:

Cortex-M0/M0+: Ultra-low power, no FPU, suitable for simple classifiers. Budget ~16–32 KB SRAM for ML tasks.
Cortex-M4F/M7F: FPU + DSP extensions including SIMD instructions. CMSIS-NN leverages these for 4–8× speedup on convolutions. Most popular for TinyML in 2026.
Cortex-M55: Includes Helium (M-Profile Vector Extension) and pairs with the Ethos-U55/U65 NPU for dedicated neural-network acceleration. Achieves 15× ML throughput improvement over Cortex-M4.

Application Processors with NPUs

Chips like the STM32N6 and Espressif’s ESP32-P4 include dedicated neural processing units, enabling larger models at lower power. These blur the line between TinyML and full edge AI.

RISC-V MCUs

RISC-V processors from companies like SiFive and GigaDevice are increasingly targeting TinyML workloads with custom ML instruction extensions.

The TinyML Software Stack

TensorFlow Lite Micro (TFLM)

TensorFlow Lite Micro from Google is the most widely deployed TinyML runtime. It is a port of TensorFlow Lite that runs on bare-metal embedded systems with:

No operating system requirement
Static memory allocation only
An optimized operator library with hardware-specific backends (CMSIS-NN for Cortex-M)
A reference implementation for any target plus optimized implementations for common hardware

The TFLM workflow: train in TensorFlow → convert to .tflite → quantize → convert to C array → include in your embedded project → initialize interpreter → run inference.

Edge Impulse

Edge Impulse is the most complete end-to-end TinyML development platform. It handles:

Data collection from embedded hardware over serial or BLE
Signal processing (FFT, MFCC, spectrogram generation)
Model training with AutoML support
Quantization and deployment as a portable C++ library
Real-time performance estimates for target hardware

Edge Impulse is the fastest path from raw sensor data to a deployed model. It abstracts away most of the low-level framework complexity, making it excellent for teams whose primary expertise is not ML.

MicroAI / STM32Cube.AI

ST Microelectronics offers STM32Cube.AI, which converts trained Keras/ONNX models to optimized C code targeting the STM32 family. It produces tight integration with STM32’s HAL and DSP libraries and provides cycle-accurate performance estimates via simulation.

TinyML explained — software and hardware stack

Model Architectures for TinyML

Not every neural network architecture is suitable for microcontrollers. The following are proven choices:

1D Convolutional Neural Networks (1D CNNs)

Ideal for time-series data from accelerometers, microphones, temperature sensors, and current clamps. A typical architecture has 3–5 convolutional layers with depthwise-separable kernels, followed by a global average pooling layer and a dense classifier head. Achievable model sizes: 8–64 KB.

Depthwise-Separable CNNs (MobileNet-style)

For image classification on capable hardware (ESP32-S3, STM32H7), depthwise-separable convolutions reduce computation by ~8× versus standard convolutions with comparable accuracy. MobileNetV1 at 0.25 alpha (width multiplier) fits in 250 KB and runs at ~120 ms on STM32H743.

Decision Tree Ensembles

For tabular sensor data, gradient-boosted trees can be more accurate than neural networks while being far simpler to deploy. Libraries like Edge Impulse’s Learn block support decision trees natively.

Autoencoders for Anomaly Detection

A small autoencoder (encoder-decoder architecture) trained only on normal data can flag anomalies by elevated reconstruction error. These tend to be 4–16 KB and are well-suited for unsupervised anomaly detection in industrial sensors.

Recurrent Networks (LSTM, GRU)

Useful for sequential patterns in time-series data, but computationally expensive. Use only on Cortex-M4+ or when Edge Impulse’s EON compiler reports acceptable cycle counts.

The Quantization Pipeline in Detail

Quantization is the process of representing model weights and activations at lower precision than the float32 used during training. Here is the standard INT8 quantization workflow:

Train your model in float32 using TensorFlow/Keras
Collect a representative dataset — 100–500 samples covering the full input distribution
Run post-training quantization:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()

Evaluate accuracy on a held-out test set using the quantized model
Convert to C array: xxd -i model.tflite > model_data.cc

For models where accuracy drops are unacceptable after PTQ, repeat with quantization-aware training (QAT) using tf.keras.experimental.QuantizeModel.

Real-World TinyML Performance Numbers

These benchmarks reflect typical results on common MCU targets:

Model	Task	Target MCU	Size	Latency	Accuracy
DS-CNN (keyword spotting)	Yes/No wakeword	Cortex-M4 @80 MHz	18 KB	87 ms	93.8%
Anomaly detection autoencoder	Vibration	Cortex-M4 @64 MHz	12 KB	3.2 ms	— (reconstruction error)
MobileNetV1 0.25	Image 96×96	ESP32-S3 @240 MHz	242 KB	95 ms	89.2% (CIFAR-10)
Fall detection LSTM	IMU (50 Hz)	nRF5340 @128 MHz	22 KB	14 ms	97.4% sensitivity

Source: Hackster.io TinyML benchmarks and Edge Impulse public model library.

Getting Started: Your First TinyML Project

The fastest path to a working TinyML prototype:

Pick a supported development board: Arduino Nano 33 BLE Sense, SparkFun Edge, or any STM32 Nucleo board
Sign up for Edge Impulse (free tier is sufficient for prototyping)
Connect your board and collect 2–3 minutes of labeled sensor data per class
Train a model using Edge Impulse Studio — the AutoML feature handles architecture selection
Deploy: Edge Impulse generates an Arduino library or a deployable C++ SDK
Flash and test on hardware

Your first working model — a simple gesture recognizer or keyword detector — can be running in under an hour. Production-grade deployment to thousands of devices is a longer journey, but the prototype validates feasibility fast.

For more depth on building your first model with Edge Impulse, see our Edge Impulse tutorial.

Conclusion

TinyML is the enabling technology that transforms dumb sensors into intelligent devices. By running trained ML models directly on microcontrollers — with milliwatt power budgets, millisecond latency, and zero cloud dependency — it unlocks applications that are impossible with traditional IoT architectures. The tools are mature, the hardware is affordable, and the methodology is well-documented.

Whether you are building a new product from scratch or adding intelligence to an existing IoT device, UABit’s AIoT development services can help you navigate the model design, quantization, and deployment challenges that determine whether a TinyML project ships on time and on spec.