Skip to content

An API to transcribe text from audio using Faster Whisper

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

coordnet/transcription-api

Logo

Transcription API

An API that takes an audio file, queues a job to transcribe the audio using Faster Whisper returning a job ID, the job ID will return the transcribed text once the job is completed. Files are stored during processing and then deleted unless configured otherwise.

Development

The simplest way to run the project is using Docker Compose. First, set up the environment:

Environment

cp .env.example .env

Replace the values in the .env file with your own. The MODEL value can be one of tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3.

Building

Build the images using:

docker compose build

Running

To run the app for development, use this command:

docker compose up --watch

This will copy the required files to the container from your local machine and will restart the services when necessary.

Deployment

Utilizing a GPU

To use Faster Whisper at its full speed, you need to deploy the code to an instance with an Nvidia GPU. The project is already configured to support it.

Edit .env and set:

GPU=1

Then build the images with the production compose file:

docker compose -f compose.yml -f compose.prod.yml build

And run it using the same:

docker compose -f compose.yml -f compose.prod.yml up

This should enable the GPU features and run the containers with automatic restarts in case of failure.

Sentry

To enable Sentry error tracking, edit the .env file:

SENTRY_DSN=__YOUR_DSN__
ENVIRONMENT=production

Then restart the Docker Compose command.

API

Once running, you can access the Swagger documentation at: http://localhost:3000/apidocs/. Below is an overview of the available endpoints:

1. Transcribe an Audio File

POST /transcribe

Description: Upload an audio file for transcription.

Request Headers:

  • Content-Type: application/octet-stream

Request Body:

  • Raw binary data of the audio file.

Responses:

  • 201 Created:
    • Transcription job created successfully.
    • Example Response:
      {
        "jobId": "string"
      }
  • 400 Bad Request:
    • No file uploaded or invalid file format.
    • Example Response:
      {
        "error": "No file uploaded or invalid file format."
      }
  • 500 Internal Server Error:
    • Server error.
    • Example Response:
      {
        "error": "Server error."
      }

2. Retrieve Job Information

GET /job/{job_id}

Description: Retrieve job information by the job ID.

Path Parameters:

  • job_id (string, required): The unique identifier for the transcription job.

Responses:

  • 200 OK:
    • Job information retrieved successfully.
    • Example Response:
      {
        "jobId": "string",
        "transcription": "Transcribed text here.",
        "filename": "path/to/file.wav",
        "totalDuration": 456.78,
        "runningTime": 123.45,
        "creationDate": "2023-10-05T14:48:00.000Z"
      }
  • 404 Not Found:
    • Job ID not found.
    • Example Response:
      {
        "error": "Job ID not found."
      }
  • 500 Internal Server Error:
    • Server error.
    • Example Response:
      {
        "error": "Server error."
      }

Contributing

Please check the repo issues for ideas for contributions and read the documentation about contributing for more information.

Any contribution intentionally submitted for inclusion in this repository, shall be dual licensed as below, without any additional terms or conditions.

Acknowledgments

  • Some logic for the Whisper transcription was influenced by ideas found in the WAAS project.

Additional Notes:

  • For the /transcribe endpoint, ensure that you send the audio file as raw binary data in the body of the request, with the Content-Type header set to application/octet-stream.
  • For the /transcribe_text endpoint, send a JSON payload with the text field containing the text you wish to process.
  • The /job/{job_id} endpoint includes filename and totalDuration fields, which may be null if the transcription was created from text input rather than an audio file.
  • Error responses are standardized to include an "error" field with a descriptive message.

License

Licensed under either of:

(back to top)

About

An API to transcribe text from audio using Faster Whisper

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •