The Latency Problem
When you move your mouse on a second display, your brain detects delays above 20ms. Most wireless display solutions — Miracast, AirPlay, ChromeCast — operate at 80-200ms. That's 4-10 frames of delay at 60fps. It's noticeable. It's distracting. And for precision work like design or coding, it's unusable.
We set ourselves an audacious target: under 16ms end-to-end. That's faster than a single frame at 60fps. Here's how we did it.
The Pipeline
Every wireless display system follows the same basic pipeline:
1. Capture — grab the screen buffer
2. Encode — compress the frame
3. Transmit — send over the network
4. Decode — decompress on the receiver
5. Render — display on screen
The total latency is the sum of all five stages. Most solutions lose time in every single one.
Stage 1: Zero-Copy Capture
Traditional screen capture copies the GPU framebuffer to CPU memory, then back to GPU for encoding. That round-trip alone adds 3-5ms.
Bavua uses platform-native zero-copy capture APIs:
The captured frame stays in GPU memory throughout. No copies. No round-trips.
Stage 2: Hardware Encoding
We bypass software encoders entirely. Every frame goes directly to the platform's hardware encoder:
We configure these for minimum latency: no B-frames, no look-ahead, tuned for real-time. The encoding step typically completes in under 2ms.
Stage 3: QUIC Transport
We chose QUIC over TCP and raw UDP for good reasons:
Our custom frame packetizer splits encoded frames across multiple QUIC streams, allowing partial frame delivery even under packet loss.
Stage 4: Predictive Decoding
The receiver doesn't wait for the complete frame before starting decode. Our predictive pipeline begins decoding as soon as the first packet of a frame arrives, using the hardware decoder's streaming mode.
Stage 5: Immediate Render
We render directly to the display surface with no intermediate compositing. On macOS, that's AVSampleBufferDisplayLayer. On Windows, Direct3D swap chains with DXGI_SWAP_EFFECT_FLIP_DISCARD. On mobile, we use SurfaceView/CAMetalLayer for minimum compositor overhead.
The Result
| Stage | Traditional | Bavua |
|---|---|---|
| Capture | 3-5ms | <1ms |
| Encode | 5-15ms | 1-2ms |
| Transmit | 10-50ms | 2-5ms |
| Decode | 5-15ms | 1-2ms |
| Render | 3-10ms | <1ms |
| **Total** | **26-95ms** | **5-11ms** |
Under optimal conditions on a local network, we consistently measure 8-12ms end-to-end. On Wi-Fi 6, we achieve sub-16ms for 95% of frames.
Try It Yourself
The best way to understand the difference is to feel it. Download Bavua and move your cursor across to a second screen. You'll forget it's wireless.