The Latency Problem

When you move your mouse on a second display, your brain detects delays above 20ms. Most wireless display solutions — Miracast, AirPlay, ChromeCast — operate at 80-200ms. That's 4-10 frames of delay at 60fps. It's noticeable. It's distracting. And for precision work like design or coding, it's unusable.

We set ourselves an audacious target: under 16ms end-to-end. That's faster than a single frame at 60fps. Here's how we did it.

The Pipeline

Every wireless display system follows the same basic pipeline:

1. Capture — grab the screen buffer

2. Encode — compress the frame

3. Transmit — send over the network

4. Decode — decompress on the receiver

5. Render — display on screen

The total latency is the sum of all five stages. Most solutions lose time in every single one.

Stage 1: Zero-Copy Capture

Traditional screen capture copies the GPU framebuffer to CPU memory, then back to GPU for encoding. That round-trip alone adds 3-5ms.

Bavua uses platform-native zero-copy capture APIs:

•macOS : ScreenCaptureKit with IOSurface sharing

•Windows : Windows.Graphics.Capture with Direct3D interop

•Android : MediaProjection with hardware buffer access

•iOS : ReplayKit Broadcast Extension

The captured frame stays in GPU memory throughout. No copies. No round-trips.

Stage 2: Hardware Encoding

We bypass software encoders entirely. Every frame goes directly to the platform's hardware encoder:

•NVIDIA NVENC

•AMD AMF

•Intel QuickSync

•Apple VideoToolbox

•Android MediaCodec

We configure these for minimum latency: no B-frames, no look-ahead, tuned for real-time. The encoding step typically completes in under 2ms.

Stage 3: QUIC Transport

We chose QUIC over TCP and raw UDP for good reasons:

•0-RTT connection establishment — no handshake delay on reconnection

•Stream multiplexing — video, audio, and input share one connection without head-of-line blocking

•Built-in encryption — DTLS 1.3 with no additional overhead

•Congestion control — adaptive bitrate without manual tuning

Our custom frame packetizer splits encoded frames across multiple QUIC streams, allowing partial frame delivery even under packet loss.

Stage 4: Predictive Decoding

The receiver doesn't wait for the complete frame before starting decode. Our predictive pipeline begins decoding as soon as the first packet of a frame arrives, using the hardware decoder's streaming mode.

Stage 5: Immediate Render

We render directly to the display surface with no intermediate compositing. On macOS, that's AVSampleBufferDisplayLayer. On Windows, Direct3D swap chains with DXGI_SWAP_EFFECT_FLIP_DISCARD. On mobile, we use SurfaceView/CAMetalLayer for minimum compositor overhead.

The Result

Stage	Traditional	Bavua
Capture	3-5ms	<1ms
Encode	5-15ms	1-2ms
Transmit	10-50ms	2-5ms
Decode	5-15ms	1-2ms
Render	3-10ms	<1ms
Total	26-95ms	5-11ms

Under optimal conditions on a local network, we consistently measure 8-12ms end-to-end. On Wi-Fi 6, we achieve sub-16ms for 95% of frames.

Try It Yourself

The best way to understand the difference is to feel it. Download Bavua and move your cursor across to a second screen. You'll forget it's wireless.

How We Achieved Sub-16ms Latency Over Wi-Fi