PersonaPlex Quick Guide: Full-Duplex Conversational Speech with Persona and Voice Control

PersonaPlex is a real-time full-duplex speech-to-speech conversational model. It provides two key control dimensions:

text prompts for role/persona control
audio conditioning for voice style control

It is built on Moshi architecture and weights, aiming for low-latency and more natural spoken interactions with consistent persona behavior.

What It Is Good For

Common use cases include:

real-time voice assistants
customer-service style role interactions
low-latency conversational demos
persona + voice control experiments

Prerequisites

Install the Opus audio codec development library:

1
2
3
4
5


# Ubuntu/Debian
sudo apt install libopus-dev

# Fedora/RHEL
sudo dnf install opus-devel

Installation and Environment

Install from repository:

1

pip install moshi/.

For Blackwell GPUs, an extra step can be used:

1

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

After accepting the PersonaPlex model license on Hugging Face, set your token:

1

export HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN>

Launch Live Server

Standard launch (temporary SSL):

1

SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR"

If GPU memory is limited, enable CPU offload (accelerate required):

1
2


pip install accelerate
SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR" --cpu-offload

Use localhost:8998 for local runs, or the printed access link for remote setups.

Offline Evaluation

The offline script consumes an input wav and produces an output wav with the same duration:

1
2
3
4
5
6
7


HF_TOKEN=<TOKEN> \
python -m moshi.offline \
  --voice-prompt "NATF2.pt" \
  --input-wav "assets/test/input_assistant.wav" \
  --seed 42424242 \
  --output-wav "output.wav" \
  --output-text "output.json"

1
2
3
4
5
6
7
8


HF_TOKEN=<TOKEN> \
python -m moshi.offline \
  --voice-prompt "NATM1.pt" \
  --text-prompt "$(cat assets/test/prompt_service.txt)" \
  --input-wav "assets/test/input_service.wav" \
  --seed 42424242 \
  --output-wav "output.wav" \
  --output-text "output.json"

Built-in Voice Labels

Natural(female): NATF0, NATF1, NATF2, NATF3
Natural(male): NATM0, NATM1, NATM2, NATM3
Variety(female): VARF0, VARF1, VARF2, VARF3, VARF4
Variety(male): VARM0, VARM1, VARM2, VARM3, VARM4

Prompting Tips

Training coverage mainly includes:

Assistant Role
Customer Service Roles
Casual Conversations

Practical tips:

define role identity first, then add task context
keep prompts concise to reduce persona drift
reuse the same voice prompt for stable comparisons

Summary

PersonaPlex stands out not because it gives one smarter answer, but because it keeps persona and voice behavior more consistent in real-time speech interaction.

For full-duplex voice agents, this is a practical option worth testing and benchmarking.