Skip to content

sdsc-ordes/debates-analytics

Repository files navigation

debates logo

Debates Analytics

Current Release label License label

About

This repository provides an app that is able to transcribe and translate debates, where speakers take turns. Any such video or audio file in the format mp4 or wav can be uploaded via a dashboard for analysis.

  • The analysis is performed with the hugging face component odtp-pyannote-whisper, that was developed in the context of this project and can be accessed directly via hugging face.

  • The results of that analysis are loaded into an S3 compatible object store (garage).

  • From there it will be indexed into the Search Engine Solr. A Mongo db database is used to manage the media processing results and status

  • A dashboard is provided to make all processing and results available via a common interface: it consists of a frontend, a backend and a redis queue for a decoupled processing of the long running media analysis jobs on hugging face.

Authors

Installation

Installation and options for the installations are described in the documentation

Usage

Usage is described in the documentation

Development

See documentation

Acknowledgement

This work was originally funded by the SNSF Spark Grant number 221139 “Debating Human Rights” SNSF Data Portal . Documentation: Political Debates.

The goal of that project was to create specialized components for the analysis of videos from United Nations Human Rights Council (UNHRC) debates.

  • Sophisticated Transcription: Integrating and optimizing cutting-edge transcription models (e.g., Whisper 3.0) to ensure accurate, multilingual transcription of UNHRC debates.
  • Multimodal Data Handling: Developing components tailored to video/audio processing, scene extraction, and diarization.
  • Specialized Database Integration: Designing and deploying a database structure to effectively store debate transcripts, relevant metadata, and extracted features.

This repo was created as a wrapup of that project, to make the processings and results available in a more general form.

Copyright

Copyright © 2025-2028 Swiss Data Science Center (SDSC), www.datascience.ch. All rights reserved. The SDSC is jointly established and legally represented by the École Polytechnique Fédérale de Lausanne (EPFL) and the Eidgenössische Technische Hochschule Zürich (ETH Zürich). This copyright encompasses all materials, software, documentation, and other content created and developed by the SDSC.

About

Debates Transcription and Translation by AI Whisper plus a dashboard to search in the debates

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •