MQTT Deep Dive: Topics, QoS Levels, Retained Messages, and Best Practices

MQTT (Message Queuing Telemetry Transport) is the protocol that powers more IoT deployments than any other. Originally designed by Andy Stanford-Clark at IBM for monitoring oil pipelines over satellite links in 1999, it was purpose-built for constrained devices, unreliable networks, and low bandwidth—which is precisely the IoT operating environment. Despite its age, MQTT 5.0 (published in 2019) is thoroughly modern, and the protocol’s elegance—publish/subscribe decoupling, compact binary framing, flexible QoS—makes it the right default choice for device-to-cloud communication in the vast majority of IoT applications. This guide goes deep: not just what MQTT does, but why the design decisions were made and how to apply them correctly in production.

MQTT Fundamentals: The Publish/Subscribe Model

MQTT is a publish/subscribe protocol, which means producers (devices) and consumers (backends) are decoupled in time and space. A device publishes a message to a topic on a broker. The broker delivers that message to every client that has subscribed to matching topics. Publisher and subscriber never communicate directly, and neither needs to know the other exists.

Contrast this with HTTP’s request/response model: in HTTP, a device must know the server’s address, initiate a TCP connection, and wait for a response. In MQTT, the device connects once to the broker, maintains that persistent connection, and can publish without awaiting a response. This matters for constrained devices where TCP handshake overhead is meaningful and where the application doesn’t need confirmation from the server for every message.

The three entities in every MQTT interaction:

Client (publisher or subscriber): a device, backend service, or application
Broker: the message routing server (mosquitto, EMQX, HiveMQ, AWS IoT Core, etc.)
Topic: a UTF-8 string used to route messages, structured like a filesystem path: sensors/building_a/floor_2/temp

A single client can be both publisher and subscriber simultaneously. Backend services typically subscribe to device topics to ingest telemetry and publish to command topics to send instructions down to devices.

Topic Design: The Art of Good Hierarchy

Topic design is the single most important architectural decision in an MQTT system. Bad topic hierarchies create routing nightmares, make access control difficult, and limit your ability to use wildcard subscriptions efficiently.

Best practice topic structure:

{product}/{tenantId}/{deviceId}/{dataType}

Examples:

sensors/acme-corp/device-00a3/telemetry
sensors/acme-corp/device-00a3/events
sensors/acme-corp/+/status (wildcard: subscribe to status of all ACME devices)
sensors/# (multi-level wildcard: subscribe to everything under sensors)

Wildcard characters:

+ — single-level wildcard: matches exactly one topic level. sensors/+/temperature matches sensors/device-1/temperature but not sensors/building-a/device-1/temperature.
# — multi-level wildcard: matches any number of levels. Must appear only at the end. sensors/# matches everything under sensors/. Use sparingly in production—it can generate enormous subscription fan-out.

Avoid these common mistakes:

Leading slashes (/sensors/data) — creates an empty first level, wastes a hierarchy level
Spaces in topic names — technically valid but causes bugs in many client implementations
Encoding device state in topic levels (sensors/device-1/online) — use retained messages instead
Overly deep hierarchies (8+ levels) — makes wildcard subscriptions unwieldy

For multi-tenant systems, placing tenantId early in the hierarchy enables per-tenant ACL rules at the broker level: a tenant’s backend can be granted permission to subscribe to sensors/{tenantId}/# without being able to see other tenants’ data.

QoS Levels: Exactly What They Mean

MQTT defines three Quality of Service levels. Understanding what they actually guarantee (and what they don’t) is essential for correct system design.

QoS 0 — At Most Once (“fire and forget”) The broker delivers the message to subscribers zero or one times. No acknowledgment is sent by the subscriber to the broker. If the network drops the message or the subscriber is temporarily disconnected, the message is lost. This is appropriate when:

Data loss is acceptable (e.g., a sensor sending readings every 5 seconds—a missed one doesn’t matter)
You’re optimizing for minimum overhead
Network reliability is high (local network, not cellular)

QoS 1 — At Least Once The publisher sends the message and waits for a PUBACK acknowledgment from the broker. If no PUBACK arrives within the timeout, the publisher resends (with the DUP flag set). The message may be delivered more than once if the publisher sends a duplicate before the PUBACK arrives. Subscribers must be idempotent (capable of handling duplicate messages). This is the most common production choice: reasonable reliability with manageable overhead.

QoS 2 — Exactly Once A four-part handshake (PUBLISH → PUBREC → PUBREL → PUBCOMP) ensures the message is delivered exactly once. This is the safest but most expensive QoS level in terms of latency and message exchange overhead. Use QoS 2 only when duplicate delivery is genuinely harmful—financial transactions, actuator commands that must fire exactly once, safety-critical operations.

Important nuance: QoS applies to the leg between publisher and broker, and separately to the leg between broker and subscriber. A device publishing at QoS 1 to a subscriber using QoS 0 means the broker receives the message reliably but delivers to the subscriber with no guarantee. Configure both legs appropriately.

MQTT QoS handshake diagram showing message flows for QoS 0, 1, and 2

Retained Messages: Device State at Subscription Time

A retained message is a special MQTT message with the RETAIN flag set. The broker stores the last retained message for each topic. When a new subscriber subscribes to a topic, it immediately receives the last retained message for that topic, even if no new message has been published since.

This solves a fundamental IoT problem: how does a new subscriber learn the current state of a device?

Without retained messages, a backend service that restarts must wait for the device to publish its next regular telemetry before knowing its state. With retained messages, the broker immediately delivers the last known state to the reconnecting backend.

Common uses of retained messages:

Device online/offline status: publish online with RETAIN=true to devices/{id}/status, use Last Will to publish offline with RETAIN=true on disconnect
Current firmware version: publish after OTA update
Device configuration acknowledgment: device publishes its current config with RETAIN=true so dashboards always have the latest

To clear a retained message, publish a zero-byte payload with RETAIN=true to that topic.

Last Will and Testament: Detecting Ungraceful Disconnects

The Last Will and Testament (LWT) mechanism lets a client specify a message that the broker publishes on its behalf when the client disconnects ungracefully (network failure, crash, power loss). It’s set at connection time in the CONNECT packet.

Example pattern for device presence:

Connect with LWT:
  Topic: devices/device-001/status
  Payload: {"status": "offline", "reason": "unexpected_disconnect"}
  QoS: 1
  RETAIN: true

On successful connect, publish:
  Topic: devices/device-001/status
  Payload: {"status": "online"}
  QoS: 1
  RETAIN: true

This gives your backend a reliable way to track device connectivity without polling. The LWT fires only on ungraceful disconnect (not on a clean DISCONNECT packet), so you should also publish an offline status message before intentional shutdown.

Persistent Sessions and Clean Sessions

The cleanSession flag (MQTT 3.1.1) or cleanStart + sessionExpiryInterval (MQTT 5.0) controls whether the broker retains subscription state and queued messages between connections.

With clean session / cleanStart=true: on each connection, the broker creates a fresh session with no remembered subscriptions and no queued messages. Simple and stateless, but the client must re-subscribe every connection and misses messages published while disconnected (even at QoS 1/2).

With persistent session / cleanStart=false: the broker remembers the client’s subscriptions and queues QoS 1/2 messages while the client is offline. When the client reconnects, queued messages are delivered. This is essential for devices that sleep between readings—they won’t miss downlink commands while sleeping.

Persistent sessions increase broker memory usage proportional to the number of persistent sessions and queued messages. Size your broker accordingly and set a sessionExpiryInterval (MQTT 5.0) to auto-expire sessions for devices that haven’t connected in a long time.

MQTT 5.0: Key Improvements

MQTT 5.0 (from HiveMQ’s excellent spec guide) added several features critical for production IoT:

Reason codes: Every CONNACK, PUBACK, SUBACK, etc. now includes a reason code indicating success or the specific failure reason—far better than the binary success/fail of 3.1.1
User properties: Arbitrary key-value pairs can be added to any MQTT packet—useful for tracing IDs, device firmware versions, and routing metadata without parsing the payload
Message expiry interval: Set a TTL on messages; the broker discards stale messages that haven’t been delivered within the interval
Shared subscriptions: Multiple subscribers share a subscription, enabling load balancing across backend consumers without the fan-out of regular subscriptions
Topic aliases: Replace frequently used long topic strings with a short integer alias, reducing overhead per message

For new projects, use MQTT 5.0. All major brokers (Mosquitto 2.0+, EMQX, HiveMQ) support it. Most device SDKs (Paho, AWS IoT SDK, Azure IoT SDK) have MQTT 5.0 support.

Security Best Practices

MQTT has no built-in authentication beyond username/password and TLS. Production systems must layer security carefully:

Always use TLS (mqtts:// on port 8883). Certificate pinning on devices prevents man-in-the-middle attacks.
Unique per-device credentials: Each device should have a unique client certificate (X.509) or unique username/password. Never share credentials across devices.
Enforce ACLs at the broker: A device should only be permitted to publish to devices/{its-own-id}/# and subscribe to commands/{its-own-id}/#. Use the broker’s ACL system (Mosquitto’s aclfile, EMQX’s authorization rules, HiveMQ’s extension SDK) to enforce this.
Validate client IDs: Many brokers can enforce that a device’s MQTT client ID matches its certificate CN/SAN, preventing credential reuse.
Rate limit publishes: An infected device could flood your broker. Implement per-client publish rate limits.

Our IoT Security Best Practices article covers the full security stack in depth.

For a complete guide to the protocols that complement MQTT (HTTP, CoAP, AMQP) and platform integration, visit our IoT Connectivity Integration services page.

Conclusion

MQTT’s elegance lies in its simplicity and its fitness for IoT constraints: a persistent TCP connection, tiny packet overhead, flexible QoS, and publish/subscribe decoupling make it the right default protocol for device-to-cloud communication. The nuances matter enormously in production: topic hierarchy determines your security model and subscription efficiency; QoS level must match your reliability requirements without over-engineering; retained messages and LWT give you device state visibility without polling; persistent sessions ensure devices don’t miss commands while sleeping; MQTT 5.0 adds the operational tooling that production systems need. Master these mechanisms, and MQTT becomes a robust foundation for IoT systems of any scale.