Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 38 additions & 15 deletions admin_manual/ai/app_live_transcription.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
==============================================================
App: Live Transcription in Nextcloud Talk (live_transcription)
==============================================================
==============================================================================
App: Live Transcription and Translation in Nextcloud Talk (live_transcription)
==============================================================================

.. _ai-live-transcription:

This app provides live transcription of speech in Nextcloud Talk calls using open source AI models provided by `Vosk <https://alphacephei.com/vosk/>`_.
The transcription is done on your own server, preserving your privacy and data sovereignty.
| This app provides live transcription and translation of speech in Nextcloud Talk calls using open source AI models provided by `Vosk <https://alphacephei.com/vosk/>`_.
| The transcription is done on your own server, preserving your privacy and data sovereignty, while the translation is done using a translation task processing provider like the :ref:`translate2 app <ai-app-translate2>`. `OpenAI and LocalAI integration <https://apps.nextcloud.com/apps/integration_openai>`_ and `DeepL integration <http://apps.nextcloud.com/apps/integration_deepl>`_ apps will soon also be supported for translation.

A good set of language models are auto-downloaded. They include Arabic, Arabic (Tunisian), Breton, Catalan, Czech, German, English, Esperanto, Spanish, Persian (Farsi), French, Hindi, Italian, Japanese, Kazakh, Korean, Dutch, Polish, Portuguese (Brazilian), Russian, Telegu, Tajik, Turkish, Ukrainian, Uzbek, Vietnamese and Chinese.
| A good set of language models for transcription are auto-downloaded. They include Arabic, Arabic (Tunisian), Breton, Catalan, Czech, German, English, Esperanto, Spanish, Persian (Farsi), French, Hindi, Italian, Japanese, Kazakh, Korean, Dutch, Polish, Portuguese (Brazilian), Russian, Telegu, Tajik, Turkish, Ukrainian, Uzbek, Vietnamese and Chinese.
| The translation capabilities depend on the installed translation task processing provider app. A list of translation-capable apps can be found :ref:`here <mt-consumer-apps>` in the "Backend apps" section.

Installation
------------
Expand All @@ -24,21 +25,42 @@ Installation
--env LT_INTERNAL_SECRET=1234 \
--wait-finish

.. important::

.. note::
The environment variables ``LT_HPB_URL`` and ``LT_INTERNAL_SECRET`` must be set in the :ref:`Deploy Options <ai-app_api_deploy_options>` during installation,
and the High-Performance Backend must be functionally configured in Nextcloud Talk settings for the app to work.

Environment variables and mounts can be set during the app installation from the "Deploy Options" button.
The models are stored in a persistent volume at ``/nc_app_live_transcription_data``.
This volume is created automatically during the installation but you can also mount your own volume there.
As the name suggests, this volume is persistent and will not be deleted when the app is updated or uninstalled
(without removing data).
Changing these environment variables after installation is possible through a re-installation of the app after uninstalling it first.

5. Install a Text-to-text task processing provider app for translation capabilities from the "Backend apps" section :ref:`here <mt-consumer-apps>`.

.. important::
Requirements
------------

The environment variables ``LT_HPB_URL`` and ``LT_INTERNAL_SECRET`` must be set in the Deploy Options,
and the High-Performance Backend must be functionally configured in Nextcloud Talk settings for the app to work.
* Minimal Nextcloud version: 33
* Nextcloud AIO is supported
* We currently support NVIDIA GPUs and x86_64 CPUs. Only CPU-based transcription is also supported and works well on modern x86 CPUs.
* CUDA >= v12.4.1 on your host system for GPU-based transcription
* GPU Sizing

* A NVIDIA GPU with at least 10 GB VRAM
* 16 GB of system RAM should be enough for one or two concurrent calls

* CPU Sizing

* x86 CPU with 4 threads. Additional 2 threads per concurrent call.
* 16 GB of RAM should be enough for one or two concurrent calls

* Space usage
* ~ 2.8 GB for the docker container
* ~ 6.0 GB for the default models

.. note::

We currently have very little real-world experience running this software on production instances.
The above sizing recommendations come from our estimates and are not real-world benchmarks.
Actual requirements will vary based on factors such as the number of concurrent calls, audio quality, and selected languages.
Please do thorough testing to confirm your hardware meets your needs.

App store
---------
Expand All @@ -59,3 +81,4 @@ Limitations
* The app currently supports only a limited number of languages. More languages may be added in the future.
* The languages other than English may have lower accuracy mainly due to the shipped models being smaller.
* The app currently does not support punctuation in the transcription.
* `OpenAI and LocalAI integration <https://apps.nextcloud.com/apps/integration_openai>`_ and `DeepL integration <http://apps.nextcloud.com/apps/integration_deepl>`_ apps are not yet supported for translation.
3 changes: 2 additions & 1 deletion admin_manual/ai/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,14 @@ Frontend apps
* *Text* for offering the translation menu
* `Assistant <https://apps.nextcloud.com/apps/assistant>`_ offering a graphical translation UI
* `Analytics <https://apps.nextcloud.com/apps/analytics>`_ for translating graph labels
* `Talk <https://apps.nextcloud.com/apps/spreed>`_ for translating messages and live translations in calls in conjunction with the :ref:`Live Transcription app <ai-live-transcription>`

Backend apps
~~~~~~~~~~~~

* :ref:`translate2 (ExApp)<ai-app-translate2>` - Runs open source AI translation models locally on your own server hardware (Customer support available upon request)
* `OpenAI and LocalAI integration (via OpenAI API) <https://apps.nextcloud.com/apps/integration_openai>`_ - Integrates with the OpenAI API to provide AI functionality from OpenAI servers (Customer support available upon request; see :ref:`AI as a Service<ai-ai_as_a_service>`)
* *integration_deepl* - Integrates with the deepl API to provide translation functionality from Deepl.com servers (Only community supported)
* `DeepL integration <http://apps.nextcloud.com/apps/integration_deepl>`__ - Integrates with the deepl API to provide translation functionality from Deepl.com servers (Only community supported)

Speech-To-Text
^^^^^^^^^^^^^^
Expand Down
2 changes: 2 additions & 0 deletions admin_manual/exapps_management/AdvancedDeployOptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Advanced Deploy Options
=======================

.. _ai-app_api_deploy_options:

AppAPI allows optionally to configure environment variables and mounts for the ExApp container.

It is available via "Deploy options" modal next to "Deploy and Enable" button in the sidebar of the ExApp page on the Apps management page:
Expand Down