It all started with a box
I had ESP32s lying around. An unused Raspberry Pi 4. An apartment with WiFi everywhere. And a question that had been nagging at me: can you detect a human presence using only radio signals — no camera, no classic motion sensor, just ambient WiFi?
The short answer: yes, partly. The long answer: that's this project.
It's an experiment before it's a product. No machine learning model for now — everything is based on heuristics and classical signal processing. The idea was to see how far you can get without AI, to better understand what it would bring later.
What is WiFi, concretely?
A quick primer for those who didn't study telecom (I didn't either — I read up on it).
WiFi sends data by splitting the signal across multiple frequencies simultaneously — this is called OFDM. Rather than a single frequency carrying everything, WiFi uses dozens of small subcarriers in parallel. On a standard 20 MHz channel, there are 64 of them.
Each subcarrier arrives with a slightly different amplitude and phase depending on the obstacles it has passed through — walls, furniture, and human bodies. The sum of this information is called the CSI: Channel State Information. It's the radio channel's fingerprint at a given instant.
Imagine a graphic equalizer with 64 sliders. RSSI — the classic "signal strength" measurement — is just the average of all sliders in a single number. CSI is all 64 sliders individually.
Why RSSI is useless here
First naive idea: measure received signal strength (RSSI) from multiple points and trilaterate.
Problem: RSSI fluctuates by 10 to 20 dBm for absolutely no reason. A hand moves, a microwave starts, a frame gets retransmitted — and the signal jumps. It's unusable for reliably estimating distance.
CSI, on the other hand, is much more stable on each subcarrier individually. And more importantly: it changes coherently when someone moves through the room. You can measure temporal variance — how much the values shift frame to frame — and infer whether there's motion.
ESP32s in spy mode
To capture CSI, I use three ESP32-DevKitC positioned at the room's corners. Each runs in "promiscuous" mode: it listens to every WiFi frame in the air, even those not addressed to it, without connecting to any network.
For each captured frame, the firmware extracts the CSI data and sends it over a serial cable (UART) to the central Raspberry Pi that aggregates everything.
The firmware runs on FreeRTOS. And there's the first classic trap: printf() isn't thread-safe. At high frame rates, I had memory corruptions that were impossible to reproduce deterministically — the kind of bug that disappears if you watch it too closely. After a night of ghost-hunting, I set up a ring buffer in a dedicated write task. Problem solved, clean data.
Trilateration: the good idea that isn't so simple
With three sensors, you can theoretically compute a 2D position: that's trilateration. You measure the distance from each sensor, draw three circles, and their intersection gives the position. That's how GPS works (with satellites).
In practice, "measuring distance" with WiFi inside an apartment is complicated. We use a mathematical model that says signal power decreases with distance along a logarithmic curve — the path-loss model. The problem: that model assumes empty space. In an apartment with walls, furniture and a cat, the signal bounces everywhere.
Result: trilateration works well in open space with careful calibration. In a real apartment, it's more of an estimate than a precise measurement. The interface lets you calibrate sensors via drag-and-drop on the 3D scene, which helps.
The interface: because raw data is unreadable
The Python backend on the Raspberry Pi aggregates the three ESP32 streams, computes estimated positions, and exposes everything via FastAPI. Sensor data is stored in DuckDB — an embedded analytical database, perfect for lightweight time series.
The interface is built with React + Three.js. A 3D scene with a meter-graduated floor grid, repositionable sensors, detected devices represented as blobs whose opacity reflects localization confidence (three sensors = 100%, single sensor = ~22%). A CSI waterfall shows radio activity in real time.
The feature that took me 30 minutes but changes everything: Solo mode on a device. It dims all others and filters the waterfall to that device only. Small detail, huge readability impact.
What actually works, and what's next
Let's be honest: the README includes a section "What's real vs. decorative."
What's solid: motion detection via CSI variance is reliable. Presence detection (is someone in the room?) works well. Real-time data collection and visualization too.
What's approximate: precise 2D localization inside an apartment with obstacles. It's functional, not magic.
The logical next step is machine learning. Researchers use neural networks to learn to interpret CSI patterns — identifying activities, counting people, detecting falls. With a dataset recorded in this specific apartment, a model would learn the spatial "signatures" far better than a generic path-loss model.
That's the next project. For now, seeing how far classical heuristics could go — that was already the goal.