Skip to main content

Overview

AICQuailVADAnalyzer is a standalone voice activity detection (VAD) analyzer powered by ai-coustics’ Quail VAD 2.0 model. Unlike the deprecated AICVADAnalyzer which relies on AICFilter’s internal VAD, this analyzer owns its own dedicated processor and can be placed anywhere in the pipeline, working independently of audio enhancement. The analyzer provides noise-robust speech detection using a specialized Quail VAD-only model, making it ideal for detecting speech in challenging acoustic environments. To use AIC, you need a license key. Get started at ai-coustics.com.

Installation

The AIC Quail VAD analyzer requires additional dependencies:
uv add "pipecat-ai[aic]"

Constructor Parameters

license_key
str
required
ai-coustics SDK license key for authentication. Get your key at developers.ai-coustics.io.
model_id
str | None
default:"\"quail-vad-2.0-xxs-16khz\""
Quail VAD model identifier. Defaults to the published standalone VAD model "quail-vad-2.0-xxs-16khz". See artifacts.ai-coustics.io for the catalogue. Ignored if model_path is provided.
model_path
Path | None
default:"None"
Optional path to a local .aicmodel file. Overrides model_id when set. Useful for offline deployments or custom models.
model_download_dir
Path | None
default:"None"
Directory for downloaded models. Defaults to ~/.cache/pipecat/aic-models.
speech_hold_duration
float | None
default:"None"
Seconds the VAD continues reporting speech after the signal stops containing speech. Range: 0.0 to 300x the model window length. Default (SDK): 0.03s
minimum_speech_duration
float | None
default:"None"
Seconds of speech required before the VAD reports speech detected. Range: 0.0 to 1.0. Default (SDK): 0.0s
sensitivity
float | None
default:"None"
Speech-probability threshold for dedicated Quail VAD models. Range: 0.0 to 1.0. Values above this threshold are considered speech. Default is model-specific.Note: This differs from the deprecated AICVADAnalyzer which used an energy-based threshold in range 1.0 to 15.0.
sample_rate
int | None
default:"None"
Initial sample rate; the pipeline will set this via set_sample_rate once the transport rate is known.
params
VADParams | None
default:"None"
Optional VADParams for the base VAD state machine configuration.

Usage Examples

Basic Usage

The recommended approach for AIC-powered voice detection:
import os
from pipecat.audio.filters.aic_filter import AICFilter
from pipecat.audio.vad.aic_quail_vad import AICQuailVADAnalyzer
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.services.daily import DailyTransport, DailyParams

# Create the AIC filter for audio enhancement
aic_filter = AICFilter(
    license_key=os.environ["AIC_SDK_LICENSE"],
    model_id="quail-vf-2.0-l-16khz",
)

# Create standalone Quail VAD 2.0 analyzer
aic_vad = AICQuailVADAnalyzer(
    license_key=os.environ["AIC_SDK_LICENSE"],
)

transport = DailyTransport(
    room_url,
    token,
    "Bot",
    DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
        audio_in_filter=aic_filter,
    ),
)

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        vad_analyzer=aic_vad,
    ),
)

With Custom VAD Parameters

Fine-tune the VAD behavior for your specific use case:
from pipecat.audio.vad.aic_quail_vad import AICQuailVADAnalyzer

aic_vad = AICQuailVADAnalyzer(
    license_key=os.environ["AIC_SDK_LICENSE"],
    speech_hold_duration=0.05,  # Hold speech detection for 50ms after speech ends
    minimum_speech_duration=0.1,  # Require 100ms of speech before triggering
    sensitivity=0.5,  # Speech probability threshold (0.0-1.0)
)

VAD-Only (Without Enhancement)

Use Quail VAD without audio enhancement:
from pipecat.audio.vad.aic_quail_vad import AICQuailVADAnalyzer
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.services.daily import DailyTransport, DailyParams

# Just VAD, no enhancement filter
aic_vad = AICQuailVADAnalyzer(
    license_key=os.environ["AIC_SDK_LICENSE"],
)

transport = DailyTransport(
    room_url,
    token,
    "Bot",
    DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
        # No audio_in_filter - raw audio goes directly to VAD
    ),
)

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        vad_analyzer=aic_vad,
    ),
)

Using a Local Model

For offline deployments or custom Quail VAD models:
from pathlib import Path
from pipecat.audio.vad.aic_quail_vad import AICQuailVADAnalyzer

aic_vad = AICQuailVADAnalyzer(
    license_key=os.environ["AIC_SDK_LICENSE"],
    model_path=Path("/path/to/your/quail-vad-model.aicmodel"),
)
See the AIC Quail VAD example for a complete working example with detailed logging.

Comparison to Deprecated AICVADAnalyzer

FeatureAICQuailVADAnalyzer (Recommended)AICVADAnalyzer (Deprecated)
ModelStandalone Quail VAD 2.0Enhancement model’s internal VAD
IndependenceOwns its own processorBound to AICFilter instance
Audio pathProcesses whatever the pipeline feeds itReads post-enhancement VAD state
SensitivityProbability threshold (0.0-1.0)Energy threshold (1.0-15.0)
PlacementCan be placed anywhere in pipelineMust follow AICFilter
Use caseNoise-robust VAD as primary differentiatorLegacy coupling to enhancement pipeline
Removal timelineN/A (current recommended approach)Will be removed in Pipecat 1.6.0

Audio Flow

The Quail VAD analyzer can work with or without the AIC enhancement filter, providing flexibility in your pipeline architecture.

Notes

  • Requires ai-coustics license key (get one at developers.ai-coustics.io)
  • Environment variable: Use AIC_SDK_LICENSE for authentication
  • Default model is quail-vad-2.0-xxs-16khz, optimized for 16kHz audio
  • Model is downloaded and cached on first use
  • Works independently of AICFilter - can be used with or without audio enhancement
  • Provides noise-robust speech detection in challenging acoustic environments
  • Handles PCM_16 audio format (int16 samples)
  • Thread-safe for pipeline processing
  • For available models, visit artifacts.ai-coustics.io