TinyML is the practice of running machine learning inference on microcontrollers and other severely resource-constrained hardware — devices with milliwatts of power budget, kilobytes of RAM, and megabytes (or less) of flash storage. It is the engineering discipline that sits at the intersection of machine learning and embedded systems, and it is reshaping what connected devices can do without a cloud connection.
In 2026, TinyML is no longer a research curiosity. It is a production technology shipping in billions of consumer devices, industrial sensors, and medical wearables. Keyword spotting chips in earbuds, anomaly detection modules on factory floors, and fall detection algorithms in wrist-worn health monitors all rely on TinyML. This article gives you the complete foundation: what TinyML is, how the hardware enables it, which tools are used, and how to get started.
What Makes TinyML Different from Regular ML?
Most machine learning runs on servers with gigabytes of RAM, fast SSDs, and GPUs or TPUs providing teraflops of compute. TinyML inverts every one of those assumptions.
A typical TinyML target might be a Cortex-M4 running at 64 MHz with 256 KB of SRAM and 1 MB of flash, consuming 3 mW at active processing. The constraints cascade:
- No dynamic memory allocation: heap fragmentation is unacceptable in embedded systems, so inference must run from a pre-allocated static tensor arena
- No operating system (or a minimal RTOS): no virtual memory, no dynamic loader, no file system
- Fixed-point arithmetic: most MCUs lack floating-point hardware, so models must use INT8 or INT16 weights and activations
- Size budget: the model, its weights, and the inference runtime must all fit in flash alongside the application firmware
These constraints force a completely different approach to model design compared to cloud ML. You do not run a ResNet-50 on a Cortex-M0+. You design a purpose-built architecture — often a shallow CNN, decision tree ensemble, or LSTM — sized to your target’s exact memory and compute profile.
The Hardware Enabling TinyML
Cortex-M Series MCUs
ARM’s Cortex-M series is the dominant TinyML hardware platform. The key variants:
- Cortex-M0/M0+: Ultra-low power, no FPU, suitable for simple classifiers. Budget ~16–32 KB SRAM for ML tasks.
- Cortex-M4F/M7F: FPU + DSP extensions including SIMD instructions. CMSIS-NN leverages these for 4–8× speedup on convolutions. Most popular for TinyML in 2026.
- Cortex-M55: Includes Helium (M-Profile Vector Extension) and pairs with the Ethos-U55/U65 NPU for dedicated neural-network acceleration. Achieves 15× ML throughput improvement over Cortex-M4.
Application Processors with NPUs
Chips like the STM32N6 and Espressif’s ESP32-P4 include dedicated neural processing units, enabling larger models at lower power. These blur the line between TinyML and full edge AI.
RISC-V MCUs
RISC-V processors from companies like SiFive and GigaDevice are increasingly targeting TinyML workloads with custom ML instruction extensions.
The TinyML Software Stack
TensorFlow Lite Micro (TFLM)
TensorFlow Lite Micro from Google is the most widely deployed TinyML runtime. It is a port of TensorFlow Lite that runs on bare-metal embedded systems with:
- No operating system requirement
- Static memory allocation only
- An optimized operator library with hardware-specific backends (CMSIS-NN for Cortex-M)
- A reference implementation for any target plus optimized implementations for common hardware
The TFLM workflow: train in TensorFlow → convert to .tflite → quantize → convert to C array → include in your embedded project → initialize interpreter → run inference.
Edge Impulse
Edge Impulse is the most complete end-to-end TinyML development platform. It handles:
- Data collection from embedded hardware over serial or BLE
- Signal processing (FFT, MFCC, spectrogram generation)
- Model training with AutoML support
- Quantization and deployment as a portable C++ library
- Real-time performance estimates for target hardware
Edge Impulse is the fastest path from raw sensor data to a deployed model. It abstracts away most of the low-level framework complexity, making it excellent for teams whose primary expertise is not ML.
MicroAI / STM32Cube.AI
ST Microelectronics offers STM32Cube.AI, which converts trained Keras/ONNX models to optimized C code targeting the STM32 family. It produces tight integration with STM32’s HAL and DSP libraries and provides cycle-accurate performance estimates via simulation.

Model Architectures for TinyML
Not every neural network architecture is suitable for microcontrollers. The following are proven choices:
1D Convolutional Neural Networks (1D CNNs)
Ideal for time-series data from accelerometers, microphones, temperature sensors, and current clamps. A typical architecture has 3–5 convolutional layers with depthwise-separable kernels, followed by a global average pooling layer and a dense classifier head. Achievable model sizes: 8–64 KB.
Depthwise-Separable CNNs (MobileNet-style)
For image classification on capable hardware (ESP32-S3, STM32H7), depthwise-separable convolutions reduce computation by ~8× versus standard convolutions with comparable accuracy. MobileNetV1 at 0.25 alpha (width multiplier) fits in 250 KB and runs at ~120 ms on STM32H743.
Decision Tree Ensembles
For tabular sensor data, gradient-boosted trees can be more accurate than neural networks while being far simpler to deploy. Libraries like Edge Impulse’s Learn block support decision trees natively.
Autoencoders for Anomaly Detection
A small autoencoder (encoder-decoder architecture) trained only on normal data can flag anomalies by elevated reconstruction error. These tend to be 4–16 KB and are well-suited for unsupervised anomaly detection in industrial sensors.
Recurrent Networks (LSTM, GRU)
Useful for sequential patterns in time-series data, but computationally expensive. Use only on Cortex-M4+ or when Edge Impulse’s EON compiler reports acceptable cycle counts.
The Quantization Pipeline in Detail
Quantization is the process of representing model weights and activations at lower precision than the float32 used during training. Here is the standard INT8 quantization workflow:
- Train your model in float32 using TensorFlow/Keras
- Collect a representative dataset — 100–500 samples covering the full input distribution
- Run post-training quantization:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()
- Evaluate accuracy on a held-out test set using the quantized model
- Convert to C array:
xxd -i model.tflite > model_data.cc
For models where accuracy drops are unacceptable after PTQ, repeat with quantization-aware training (QAT) using tf.keras.experimental.QuantizeModel.
Real-World TinyML Performance Numbers
These benchmarks reflect typical results on common MCU targets:
| Model | Task | Target MCU | Size | Latency | Accuracy |
|---|---|---|---|---|---|
| DS-CNN (keyword spotting) | Yes/No wakeword | Cortex-M4 @80 MHz | 18 KB | 87 ms | 93.8% |
| Anomaly detection autoencoder | Vibration | Cortex-M4 @64 MHz | 12 KB | 3.2 ms | — (reconstruction error) |
| MobileNetV1 0.25 | Image 96×96 | ESP32-S3 @240 MHz | 242 KB | 95 ms | 89.2% (CIFAR-10) |
| Fall detection LSTM | IMU (50 Hz) | nRF5340 @128 MHz | 22 KB | 14 ms | 97.4% sensitivity |
Source: Hackster.io TinyML benchmarks and Edge Impulse public model library.
Getting Started: Your First TinyML Project
The fastest path to a working TinyML prototype:
- Pick a supported development board: Arduino Nano 33 BLE Sense, SparkFun Edge, or any STM32 Nucleo board
- Sign up for Edge Impulse (free tier is sufficient for prototyping)
- Connect your board and collect 2–3 minutes of labeled sensor data per class
- Train a model using Edge Impulse Studio — the AutoML feature handles architecture selection
- Deploy: Edge Impulse generates an Arduino library or a deployable C++ SDK
- Flash and test on hardware
Your first working model — a simple gesture recognizer or keyword detector — can be running in under an hour. Production-grade deployment to thousands of devices is a longer journey, but the prototype validates feasibility fast.
For more depth on building your first model with Edge Impulse, see our Edge Impulse tutorial.
Conclusion
TinyML is the enabling technology that transforms dumb sensors into intelligent devices. By running trained ML models directly on microcontrollers — with milliwatt power budgets, millisecond latency, and zero cloud dependency — it unlocks applications that are impossible with traditional IoT architectures. The tools are mature, the hardware is affordable, and the methodology is well-documented.
Whether you are building a new product from scratch or adding intelligence to an existing IoT device, UABit’s AIoT development services can help you navigate the model design, quantization, and deployment challenges that determine whether a TinyML project ships on time and on spec.
IoT & AIoT Weekly
Get the best IoT development content delivered weekly. No noise, just signal.