Skip to content
Back to Blog
Engineering

How We Achieved Sub-16ms Latency Over Wi-Fi

A deep dive into Bavua's Zero-Latency Engine: the architecture decisions that let us beat a single frame at 60fps.

Nathan Alard·Founder & CEO
|March 22, 2026|8 min read

The Latency Problem

When you move your mouse on a second display, your brain detects delays above 20ms. Most wireless display solutions — Miracast, AirPlay, ChromeCast — operate at 80-200ms. That's 4-10 frames of delay at 60fps. It's noticeable. It's distracting. And for precision work like design or coding, it's unusable.

We set ourselves an audacious target: under 16ms end-to-end. That's faster than a single frame at 60fps. Here's how we did it.

The Pipeline

Every wireless display system follows the same basic pipeline:

1. Capture — grab the screen buffer

2. Encode — compress the frame

3. Transmit — send over the network

4. Decode — decompress on the receiver

5. Render — display on screen

The total latency is the sum of all five stages. Most solutions lose time in every single one.

Stage 1: Zero-Copy Capture

Traditional screen capture copies the GPU framebuffer to CPU memory, then back to GPU for encoding. That round-trip alone adds 3-5ms.

Bavua uses platform-native zero-copy capture APIs:

  • macOS : ScreenCaptureKit with IOSurface sharing
  • Windows : Windows.Graphics.Capture with Direct3D interop
  • Android : MediaProjection with hardware buffer access
  • iOS : ReplayKit Broadcast Extension
  • The captured frame stays in GPU memory throughout. No copies. No round-trips.

    Stage 2: Hardware Encoding

    We bypass software encoders entirely. Every frame goes directly to the platform's hardware encoder:

  • NVIDIA NVENC
  • AMD AMF
  • Intel QuickSync
  • Apple VideoToolbox
  • Android MediaCodec
  • We configure these for minimum latency: no B-frames, no look-ahead, tuned for real-time. The encoding step typically completes in under 2ms.

    Stage 3: QUIC Transport

    We chose QUIC over TCP and raw UDP for good reasons:

  • 0-RTT connection establishment — no handshake delay on reconnection
  • Stream multiplexing — video, audio, and input share one connection without head-of-line blocking
  • Built-in encryption — DTLS 1.3 with no additional overhead
  • Congestion control — adaptive bitrate without manual tuning
  • Our custom frame packetizer splits encoded frames across multiple QUIC streams, allowing partial frame delivery even under packet loss.

    Stage 4: Predictive Decoding

    The receiver doesn't wait for the complete frame before starting decode. Our predictive pipeline begins decoding as soon as the first packet of a frame arrives, using the hardware decoder's streaming mode.

    Stage 5: Immediate Render

    We render directly to the display surface with no intermediate compositing. On macOS, that's AVSampleBufferDisplayLayer. On Windows, Direct3D swap chains with DXGI_SWAP_EFFECT_FLIP_DISCARD. On mobile, we use SurfaceView/CAMetalLayer for minimum compositor overhead.

    The Result

    StageTraditionalBavua
    Capture3-5ms<1ms
    Encode5-15ms1-2ms
    Transmit10-50ms2-5ms
    Decode5-15ms1-2ms
    Render3-10ms<1ms
    **Total****26-95ms****5-11ms**

    Under optimal conditions on a local network, we consistently measure 8-12ms end-to-end. On Wi-Fi 6, we achieve sub-16ms for 95% of frames.

    Try It Yourself

    The best way to understand the difference is to feel it. Download Bavua and move your cursor across to a second screen. You'll forget it's wireless.