An API that takes an audio file, queues a job to transcribe the audio using Faster Whisper returning a job ID, the job ID will return the transcribed text once the job is completed. Files are stored during processing and then deleted unless configured otherwise.
The simplest way to run the project is using Docker Compose. First, set up the environment:
cp .env.example .envReplace the values in the .env file with your own. The MODEL value can be one of tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3.
Build the images using:
docker compose buildTo run the app for development, use this command:
docker compose up --watchThis will copy the required files to the container from your local machine and will restart the services when necessary.
To use Faster Whisper at its full speed, you need to deploy the code to an instance with an Nvidia GPU. The project is already configured to support it.
Edit .env and set:
GPU=1Then build the images with the production compose file:
docker compose -f compose.yml -f compose.prod.yml buildAnd run it using the same:
docker compose -f compose.yml -f compose.prod.yml upThis should enable the GPU features and run the containers with automatic restarts in case of failure.
To enable Sentry error tracking, edit the .env file:
SENTRY_DSN=__YOUR_DSN__
ENVIRONMENT=productionThen restart the Docker Compose command.
Once running, you can access the Swagger documentation at: http://localhost:3000/apidocs/. Below is an overview of the available endpoints:
Description: Upload an audio file for transcription.
Request Headers:
Content-Type: application/octet-stream
Request Body:
- Raw binary data of the audio file.
Responses:
- 201 Created:
- Transcription job created successfully.
- Example Response:
- 400 Bad Request:
- No file uploaded or invalid file format.
- Example Response:
{ "error": "No file uploaded or invalid file format." }
- 500 Internal Server Error:
- Server error.
- Example Response:
{ "error": "Server error." }
Description: Retrieve job information by the job ID.
Path Parameters:
job_id(string, required): The unique identifier for the transcription job.
Responses:
- 200 OK:
- Job information retrieved successfully.
- Example Response:
{ "jobId": "string", "transcription": "Transcribed text here.", "filename": "path/to/file.wav", "totalDuration": 456.78, "runningTime": 123.45, "creationDate": "2023-10-05T14:48:00.000Z" }
- 404 Not Found:
- Job ID not found.
- Example Response:
{ "error": "Job ID not found." }
- 500 Internal Server Error:
- Server error.
- Example Response:
{ "error": "Server error." }
Please check the repo issues for ideas for contributions and read the documentation about contributing for more information.
Any contribution intentionally submitted for inclusion in this repository, shall be dual licensed as below, without any additional terms or conditions.
- Some logic for the Whisper transcription was influenced by ideas found in the WAAS project.
- For the
/transcribeendpoint, ensure that you send the audio file as raw binary data in the body of the request, with theContent-Typeheader set toapplication/octet-stream. - For the
/transcribe_textendpoint, send a JSON payload with thetextfield containing the text you wish to process. - The
/job/{job_id}endpoint includesfilenameandtotalDurationfields, which may benullif the transcription was created from text input rather than an audio file. - Error responses are standardized to include an
"error"field with a descriptive message.
Licensed under either of:
{ "jobId": "string" }