-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: LiquidAI audio plugin for LiveKit Agents #4656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kakopappa
wants to merge
7
commits into
livekit:main
Choose a base branch
from
kakopappa:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
696afae
feat: liquidai support added
kakopappa 9547aca
Merge branch 'main' of https://github.com/kakopappa/livekit-agents
kakopappa 7ac3560
chore: README.md
kakopappa 810d101
fix: ruff
kakopappa 44fc551
fix: revew comments
kakopappa 77a3468
fix: revew comments
kakopappa 8414446
fix: revew comments
kakopappa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # LiquidAI Audio plugin for LiveKit Agents | ||
|
|
||
| Support for the Audio family of STT/TTS from LiquidAI. | ||
|
|
||
| See [https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF) for more information. | ||
|
|
||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| pip install livekit-plugins-liquidai | ||
| ``` | ||
|
|
||
| ## Pre-requisites | ||
|
|
||
| Start audio server. `llama-liquid-audio-server` is inside LFM2.5-Audio-1.5B-GGUF's `runners` folder. | ||
|
|
||
| ```bash | ||
| export CKPT=/path/to/LFM2.5-Audio-1.5B-GGUF | ||
| ./llama-liquid-audio-server -m $CKPT/LFM2.5-Audio-1.5B-Q4_0.gguf -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf | ||
| ``` |
45 changes: 45 additions & 0 deletions
45
livekit-plugins/livekit-plugins-liquidai/livekit/plugins/liquidai/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Copyright 2023 LiveKit, Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """LiquidAI LFM2.5-Audio plugin for LiveKit Agents | ||
|
|
||
| Provides STT and TTS capabilities using the LFM2.5-Audio model with OpenAI-compatible API. | ||
| """ | ||
|
|
||
| from .stt import STT | ||
| from .tts import TTS | ||
| from .version import __version__ | ||
|
|
||
| __all__ = ["STT", "TTS", "__version__"] | ||
|
|
||
| from livekit.agents import Plugin | ||
|
|
||
| from .log import logger | ||
|
|
||
|
|
||
| class LiquidAIPlugin(Plugin): | ||
| def __init__(self) -> None: | ||
| super().__init__(__name__, __version__, __package__, logger) | ||
|
|
||
|
|
||
| Plugin.register_plugin(LiquidAIPlugin()) | ||
|
|
||
| # Cleanup docs of unexported modules | ||
| _module = dir() | ||
| NOT_IN_ALL = [m for m in _module if m not in __all__] | ||
|
|
||
| __pdoc__ = {} | ||
|
|
||
| for n in NOT_IN_ALL: | ||
| __pdoc__[n] = False |
3 changes: 3 additions & 0 deletions
3
livekit-plugins/livekit-plugins-liquidai/livekit/plugins/liquidai/log.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| import logging | ||
|
|
||
| logger = logging.getLogger("livekit.plugins.liquidai") |
Empty file.
164 changes: 164 additions & 0 deletions
164
livekit-plugins/livekit-plugins-liquidai/livekit/plugins/liquidai/stt.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,164 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import asyncio | ||
| import base64 | ||
| from dataclasses import dataclass | ||
| from typing import cast | ||
|
|
||
| import httpx | ||
| import openai | ||
| from openai import AsyncStream | ||
| from openai.types.chat import ChatCompletionChunk | ||
|
|
||
| from livekit import rtc | ||
| from livekit.agents import APIConnectionError, APIConnectOptions, stt | ||
| from livekit.agents.stt import SpeechEventType, STTCapabilities | ||
| from livekit.agents.types import NOT_GIVEN, NotGivenOr | ||
| from livekit.agents.utils import AudioBuffer, is_given | ||
|
|
||
| from .log import logger | ||
|
|
||
| DEFAULT_BASE_URL = "http://127.0.0.1:8080/v1" | ||
| DEFAULT_API_KEY = "dummy" | ||
| DEFAULT_SYSTEM_PROMPT = "Perform ASR." | ||
|
|
||
|
|
||
| @dataclass | ||
| class _STTOptions: | ||
| language: str | ||
| system_prompt: str | ||
|
|
||
|
|
||
| class STT(stt.STT): | ||
| """Speech-to-Text using LiquidAI LFM2.5-Audio model.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| *, | ||
| base_url: NotGivenOr[str] = NOT_GIVEN, | ||
| api_key: NotGivenOr[str] = NOT_GIVEN, | ||
| language: NotGivenOr[str] = NOT_GIVEN, | ||
| system_prompt: NotGivenOr[str] = NOT_GIVEN, | ||
| ) -> None: | ||
| """ | ||
| Create a new instance of LiquidAI STT. | ||
|
|
||
| Args: | ||
| base_url: The base URL of the LFM2.5-Audio server (default: http://127.0.0.1:8080/v1) | ||
| api_key: API key for authentication (default: "dummy") | ||
| language: Language code for transcription (default: "en") | ||
| system_prompt: System prompt for ASR (default: "Perform ASR.") | ||
| """ | ||
| super().__init__( | ||
| capabilities=STTCapabilities( | ||
| streaming=False, interim_results=False, aligned_transcript=False | ||
| ) | ||
| ) | ||
|
|
||
| self._opts = _STTOptions( | ||
| language=language if is_given(language) else "en", | ||
| system_prompt=system_prompt if is_given(system_prompt) else DEFAULT_SYSTEM_PROMPT, | ||
| ) | ||
|
|
||
| self._client = openai.AsyncClient( | ||
| max_retries=0, | ||
| api_key=api_key if is_given(api_key) else DEFAULT_API_KEY, | ||
| base_url=base_url if is_given(base_url) else DEFAULT_BASE_URL, | ||
| http_client=httpx.AsyncClient( | ||
| timeout=httpx.Timeout(connect=15.0, read=60.0, write=5.0, pool=5.0), | ||
| follow_redirects=True, | ||
| limits=httpx.Limits( | ||
| max_connections=50, max_keepalive_connections=50, keepalive_expiry=120 | ||
| ), | ||
| ), | ||
| ) | ||
|
|
||
| @property | ||
| def model(self) -> str: | ||
| return "LFM2.5-Audio" | ||
|
|
||
| @property | ||
| def provider(self) -> str: | ||
| return "LiquidAI" | ||
|
|
||
| def update_options( | ||
| self, | ||
| *, | ||
| language: NotGivenOr[str] = NOT_GIVEN, | ||
| system_prompt: NotGivenOr[str] = NOT_GIVEN, | ||
| ) -> None: | ||
| if is_given(language): | ||
| self._opts.language = language | ||
| if is_given(system_prompt): | ||
| self._opts.system_prompt = system_prompt | ||
|
|
||
| async def _recognize_impl( | ||
| self, | ||
| buffer: AudioBuffer, | ||
| *, | ||
| language: NotGivenOr[str] = NOT_GIVEN, | ||
| conn_options: APIConnectOptions, | ||
| ) -> stt.SpeechEvent: | ||
| try: | ||
| # Use local variable to avoid mutating instance state | ||
| effective_language = language if is_given(language) else self._opts.language | ||
|
|
||
| # Convert audio buffer to WAV bytes and base64 encode | ||
| wav_bytes = rtc.combine_audio_frames(buffer).to_wav_bytes() | ||
| encoded_audio = base64.b64encode(wav_bytes).decode("utf-8") | ||
|
|
||
| # Create messages for the API | ||
| messages = [ | ||
| {"role": "system", "content": self._opts.system_prompt}, | ||
| { | ||
| "role": "user", | ||
| "content": [ | ||
| { | ||
| "type": "input_audio", | ||
| "input_audio": {"data": encoded_audio, "format": "wav"}, | ||
| } | ||
| ], | ||
| }, | ||
| ] | ||
|
|
||
| # Call the streaming chat completion API | ||
| response = await self._client.chat.completions.create( | ||
| model="LFM2.5-Audio", | ||
| messages=messages, # type: ignore | ||
| stream=True, | ||
| max_tokens=512, | ||
| extra_body={"reset_context": True}, | ||
| timeout=conn_options.timeout, | ||
| ) | ||
| # When stream=True, the response is always an AsyncStream | ||
| stream = cast(AsyncStream[ChatCompletionChunk], response) | ||
|
|
||
| # Collect text from the stream | ||
| text_chunks: list[str] = [] | ||
| async for chunk in stream: | ||
| if chunk.choices and chunk.choices[0].delta.content: | ||
| text_chunks.append(chunk.choices[0].delta.content) | ||
|
|
||
| text = "".join(text_chunks) | ||
| logger.debug(f"STT transcription: {text}") | ||
|
|
||
| return self._transcription_to_speech_event(text=text, language=effective_language) | ||
|
|
||
| except openai.APITimeoutError as e: | ||
| raise APIConnectionError() from e | ||
| except asyncio.CancelledError: | ||
| raise | ||
| except openai.APIStatusError as e: | ||
| raise APIConnectionError() from e | ||
| except Exception as e: | ||
| logger.error(f"STT error: {e}") | ||
| raise APIConnectionError() from e | ||
|
|
||
| def _transcription_to_speech_event(self, text: str, language: str) -> stt.SpeechEvent: | ||
| return stt.SpeechEvent( | ||
| type=SpeechEventType.FINAL_TRANSCRIPT, | ||
| alternatives=[stt.SpeechData(text=text, language=language)], | ||
| ) | ||
|
|
||
| async def aclose(self) -> None: | ||
| await self._client.close() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.