doppler from WiFi CSI

date: Jun 1 2026

So I’ve been playing around with localization and heartbeat detection using WiFi modules, using the extracted Channel-State-Information (CSI). Initially I did not realize how fundamentally different it is from other radars. In a radar, you can send a chirp over a given bandwidth of frequencies in a time dt, depending on the object’s velocity the signal you will get back might be doppler shifted (frequency of the signal will be shifted), and using the time it takes to receive the sent signature we can calculate range. But, at the magnitudes of the speed of light inside of a 10m room, those delays will be on the order of ~10ns. A single modulated WiFi packet is on the order of 10 to 100 uS. 1000x longer. And the doppler shift of a person walking at 6km/h is ~27Hz, hard to measure on the scale of 2.4GHz to put it lightly.

So CSI measures doppler across modulated packets instead. The absolute phase of any subcarrier’s H estimate is garbage. Every time a packet is modulated, the phase offset is essentially random, the clock between RX and TX drifted, etc. But, the difference in phase between two packets cancels most of that and leaves the small change from things that actually moved, i.e. the change in path length between the two packets. Stack those differences over a window and a moving scatterer becomes a phasor slowly rotating at its doppler frequency. So say at a 1 kHz packet rate, take any 500-packet window (500 ms) and FFT over the packet axis: out come the doppler frequencies in the room.

Now here is another problem, a single TX-RX pair cannot turn that into the object’s real velocity. The doppler you read is the rate of change of the bistatic path TX→object→RX, not the object’s speed. Without the geometry (where TX and RX sit, where the object is) the same 16 Hz could be a slow object heading straight at the link or a fast one cutting across it. The map below is the ratio of doppler-read speed to true speed, g: near 1 it is faithful, near 0 the link is blind (motion sideways to the path reads zero), near 2 it doubles.

adding information

We can extract more information from two transceivers. Firstly we can approximate the distance between two transceivers: either using RTT or RSSI signal strength.

RSSI is the blunt one: every received packet comes with a power reading, and power falls with distance in a known way, 6 dB per doubling in free space. Hear your peer at −40 dBm from a meter away and −58 dBm now, and it sits roughly 8 m out. The catch is that indoors the reading swings ±6-10 dB on the spot (stand in a multipath null, add a wall, rotate an antenna), and at 6 dB per doubling that is a 2x distance error. So RSSI gives a distance to about a factor of two, call it ±2 m at room scale, and averaging does not fully fix it because the error is the room, not the noise.

RTT is the precise one, and the trick is that the two clocks never need to agree. A stamps t1 when the packet leaves, B stamps t2 on arrival and t3 on reply (both on B’s clock), A stamps t4 on return. Say B’s clock runs 7 ms ahead: t2 and t3 are both inflated by the same 7 ms, so t3 − t2 never sees it, and t1, t4 live on A’s clock anyway. Flight time is ((t4 − t1) − (t3 − t2)) / 2: the offset cancels exactly, and only drift during the microsecond-long exchange survives, which is nothing. The fight is resolution instead: light does a meter in 3.3 ns, and timestamping a WiFi packet to nanoseconds is hard, so 802.11mc FTM hardware lands at a meter or two.

Then also, for any object meaningfully away from 2 transceivers, we will get the echo of TX->object->RX superimposed on top of the direct signal from TX->RX. The interesting part is we can conceivably extract some information of how much longer that multipath is compared to direct line of sight.

The mover’s signal travels TX -> object -> RX, which is longer than the direct TX → RX line by some extra distance Δ, and every meter of extra path is 3.3 ns of extra delay. A delay is something CSI can see: a path delayed by τ rotates each subcarrier’s phase by −2π·f·τ, and since the subcarriers sit at different frequencies, the rotations differ, leaving a phase ramp across the band.

This is where bandwidth enters. The tilt is 2π·B·Δ/c end to end: across the ~16 MHz the 52 subcarriers actually occupy (a “20 MHz” channel spends the rest on guard bands), one meter of extra path tilts the ramp by ≈ 19°. More bandwidth, more tilt per meter, finer reading. Even if noise is not an issue, which is a big IF, we still have another issue that “echo” is not a single path. A person’s reflection arrives smeared over a few meters (torso, limbs, the echo’s own wall bounces), the slope reads the centroid of that smear, and the centroid wobbles as they move, multipath from all the other objects and furniture and wall reflections are superimposed on top of single modulated packet as well. But even with that, there is still some information there.

So we hold two blurry numbers: nodes say ~5 m apart, echo path ~4 m longer than the line between them. Where can the object be? Anywhere its two distances sum: r_TX + r_RX = 5 + 4 = 9 m. The set of points whose distances to two fixed anchors sum to a constant is exactly an ellipse with the anchors as foci (pin a 9 m string at both nodes, pull it taut with a pen, trace). Each Δ traces its own ellipse around the link: Δ = 0 is the degenerate line between the nodes, Δ = 2 m a tight oval, 4 m a fatter one, 8 m fatter still. A blurry Δ just means a fuzzy band of ellipses instead of a curve.

And that band is enough, because the warp factor F = |e_TX + e_RX| is trapped on an ellipse. At the ellipse’s tips the two sight lines point the same way, the unit vectors stack, F = 2. At its widest point they splay symmetrically and the sum shrinks to 2b/a (b, a are the half-width and half-length of the ellipse, straight from baseline and Δ). Nothing on the ellipse leaves [2b/a, 2]. With the numbers above (Δ = 4 m on a 5 m baseline) b/a ≈ 0.83, so F sits in [1.66, 2]: six times narrower than [0, 2]. Even taking the worst corner of both error bars at once (Δ = 2 m on a 7 m baseline, b/a ≈ 0.63) the band is [1.3, 2], still three times narrower. Divide the reading by the band’s midpoint and the worst-case amplitude error is ~20-25%, down from “up to 2x off or reading zero”. The one place this dies is Δ ≈ 0, the object near the line between the nodes: the ellipse degenerates, b/a heads to zero and the band reopens to useless.

The maps below show this correction’s endpoint (geometry taken as known): same three setups, warp divided out. The red half of the scale is gone; what stays dark is motion sliding along the ellipses, which no geometry knowledge buys back.

three transceivers

Three transceivers in round robin give three pairs, so three spectrograms: 1↔2, 1↔3, 2↔3. Three ridges, three bistatic rates m_12, m_13, m_23, generally three different numbers, each warped by its own geometry. Looks like three times the same mess. It is not, because of what each one actually measures.

The pair (i,j) path is L_ij = r_i + r_j, the object’s distance to node i plus its distance to node j, so the rate the ridge reports is m_ij = r_i' + r_j'. And each r_i' is a projection of the velocity: differentiating r_i = |p − x_i| gives r_i' = v·e_i, only the component of motion along the line of sight changes a distance. So the three spectrograms are three sums of the same three unknowns:

m_12 = r_1' + r_2'
m_13 = r_1' + r_3'
m_23 = r_2' + r_3'

Three equations, three unknowns, and the system unmixes by inspection: node 1 appears in two of the measurements and is absent from the third, so add the two and subtract the third:

r_1' = (m_12 + m_13 − m_23) / 2     (rotate indices for r_2', r_3')

r_2' and r_3' each show up once with + and once with −, and vanish. Notice what was never used: the triangle, the object’s position, the echo delay, anything. Three entangled spectrograms turn into three clean radial speeds, how fast the object closes on each node in honest m/s, by pure addition and subtraction.

What do three radial speeds give, by themselves, with nothing else known about the system?

A speed floor that is actually good: |v| >= max_i |r_i'|, since each is a projection onto a unit vector. And the floor’s quality depends only on how widely the three lines of sight fan out, not on knowing them. An object inside the array sees the three directions spread over more than 120°, so some node sits within 60° of the motion: the floor is never worse than half the true speed, and typically lands within 15-20%. A calibrated-units speed estimate from zero geometry.
Zero crossings: r_i' = 0 exactly at closest approach to node i (the motion is momentarily perpendicular to that line of sight). Three doppler tracks give three closest-approach timestamps, and their order says which node the object passed first, second, third.
Integrate each track and you get r_i(t) − r_i(0): the full time profile of every node distance, missing only the three starting offsets. The distances themselves are unknown, but how all three evolve is known exactly, so the trajectory’s choreography around the (unknown) triangle falls out: what it approaches, in what order, how sharply it turns past each node, how fast through each leg. A trajectory sketch around three anchors that are not on any map.

noise σ 0.085

true world hidden ground truth

CSI |H| per pair, last 8 s

doppler, raw one FFT per column, per pair

doppler, tracked 4 s static null + Welch + phase readout; white = tracked ridge

radial speeds unmixed from the 3 tracked ridges, truth faint

trajectory sketch integrated radial speeds, CPA dots

the mover calibrates the array

Push point 4 and the doppler tracks alone overdetermine everything. Each ridge sample contributes three equations; the unknowns are the trajectory (4 numbers for a constant-velocity stretch, a handful more for a spline) plus the triangle shape (three side lengths, after spending the free translation and rotation). A person walking a curved path for 5 s at 20 ridge samples per second is 300 equations against roughly a dozen unknowns. And crucially the fit is not scale-ambiguous: doppler reads absolute m/s through λ, so a double-size triangle with a double-speed walker produces measurably different tracks, not the same ones. The mover itself can calibrate the array, up to a global translation, rotation, and one mirror flip.

The fine print: it is a nonlinear fit with local minima, a straight-line walk leaves it degenerate (you need motion that actually curves around the array), and integrating noisy ridges drifts. That is what the pairwise RSSI/FTM ranging is really for: not strictly necessary, just the shortcut that turns a fragile self-calibration into a fast, stable one.