PersonaPlex is a real-time full-duplex speech-to-speech conversational model. It provides two key control dimensions:
- text prompts for role/persona control
- audio conditioning for voice style control
It is built on Moshi architecture and weights, aiming for low-latency and more natural spoken interactions with consistent persona behavior.
What It Is Good For
Common use cases include:
- real-time voice assistants
- customer-service style role interactions
- low-latency conversational demos
- persona + voice control experiments
Prerequisites
Install the Opus audio codec development library:
|
|
Installation and Environment
Install from repository:
|
|
For Blackwell GPUs, an extra step can be used:
|
|
After accepting the PersonaPlex model license on Hugging Face, set your token:
|
|
Launch Live Server
Standard launch (temporary SSL):
|
|
If GPU memory is limited, enable CPU offload (accelerate required):
|
|
Use localhost:8998 for local runs, or the printed access link for remote setups.
Offline Evaluation
The offline script consumes an input wav and produces an output wav with the same duration:
|
|
|
|
Built-in Voice Labels
- Natural(female): NATF0, NATF1, NATF2, NATF3
- Natural(male): NATM0, NATM1, NATM2, NATM3
- Variety(female): VARF0, VARF1, VARF2, VARF3, VARF4
- Variety(male): VARM0, VARM1, VARM2, VARM3, VARM4
Prompting Tips
Training coverage mainly includes:
- Assistant Role
- Customer Service Roles
- Casual Conversations
Practical tips:
- define role identity first, then add task context
- keep prompts concise to reduce persona drift
- reuse the same voice prompt for stable comparisons
Summary
PersonaPlex stands out not because it gives one smarter answer, but because it keeps persona and voice behavior more consistent in real-time speech interaction.
For full-duplex voice agents, this is a practical option worth testing and benchmarking.