GitHub - coordnet/transcription-api: An API to transcribe text from audio using Faster Whisper

Transcription API

An API that takes an audio file, queues a job to transcribe the audio using Faster Whisper returning a job ID, the job ID will return the transcribed text once the job is completed. Files are stored during processing and then deleted unless configured otherwise.

Development

The simplest way to run the project is using Docker Compose. First, set up the environment:

Environment

cp .env.example .env

Replace the values in the .env file with your own. The MODEL value can be one of tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3.

Building

Build the images using:

docker compose build

Running

To run the app for development, use this command:

docker compose up --watch

This will copy the required files to the container from your local machine and will restart the services when necessary.

Deployment

Utilizing a GPU

To use Faster Whisper at its full speed, you need to deploy the code to an instance with an Nvidia GPU. The project is already configured to support it.

Edit .env and set:

GPU=1

Then build the images with the production compose file:

docker compose -f compose.yml -f compose.prod.yml build

And run it using the same:

docker compose -f compose.yml -f compose.prod.yml up

This should enable the GPU features and run the containers with automatic restarts in case of failure.

Sentry

To enable Sentry error tracking, edit the .env file:

SENTRY_DSN=__YOUR_DSN__
ENVIRONMENT=production

Then restart the Docker Compose command.

API

Once running, you can access the Swagger documentation at: http://localhost:3000/apidocs/. Below is an overview of the available endpoints:

1. Transcribe an Audio File

`POST /transcribe`

Description: Upload an audio file for transcription.

Request Headers:

Content-Type: application/octet-stream

Request Body:

Raw binary data of the audio file.

Responses:

201 Created:
- Transcription job created successfully.
- Example Response:
```
{
  "jobId": "string"
}
```
400 Bad Request:
- No file uploaded or invalid file format.
- Example Response:
```
{
  "error": "No file uploaded or invalid file format."
}
```
500 Internal Server Error:
- Server error.
- Example Response:
```
{
  "error": "Server error."
}
```

2. Retrieve Job Information

`GET /job/{job_id}`

Description: Retrieve job information by the job ID.

Path Parameters:

job_id (string, required): The unique identifier for the transcription job.

Responses:

200 OK:

Job information retrieved successfully.

Example Response:

{
  "jobId": "string",
  "transcription": "Transcribed text here.",
  "filename": "path/to/file.wav",
  "totalDuration": 456.78,
  "runningTime": 123.45,
  "creationDate": "2023-10-05T14:48:00.000Z"
}

404 Not Found:
- Job ID not found.
- Example Response:
```
{
  "error": "Job ID not found."
}
```
500 Internal Server Error:
- Server error.
- Example Response:
```
{
  "error": "Server error."
}
```

Contributing

Please check the repo issues for ideas for contributions and read the documentation about contributing for more information.

Any contribution intentionally submitted for inclusion in this repository, shall be dual licensed as below, without any additional terms or conditions.

Acknowledgments

Some logic for the Whisper transcription was influenced by ideas found in the WAAS project.

Additional Notes:

For the /transcribe endpoint, ensure that you send the audio file as raw binary data in the body of the request, with the Content-Type header set to application/octet-stream.
For the /transcribe_text endpoint, send a JSON payload with the text field containing the text you wish to process.
The /job/{job_id} endpoint includes filename and totalDuration fields, which may be null if the transcription was created from text input rather than an audio file.
Error responses are standardized to include an "error" field with a descriptive message.

License

Licensed under either of:

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-run.sh		.pre-commit-run.sh
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
compose-common.yml		compose-common.yml
compose.prod.yml		compose.prod.yml
compose.yml		compose.yml
logo.png		logo.png
pyproject.toml		pyproject.toml
renovate.json		renovate.json
uv.lock		uv.lock
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Transcription API

Development

Environment

Building

Running

Deployment

Utilizing a GPU

Sentry

API

1. Transcribe an Audio File

`POST /transcribe`

2. Retrieve Job Information

`GET /job/{job_id}`

Contributing

Acknowledgments

Additional Notes:

License

About

Licenses found

Uh oh!

Contributors 2

Uh oh!

Languages

License

Licenses found

coordnet/transcription-api

Folders and files

Latest commit

History

Repository files navigation

Transcription API

Development

Environment

Building

Running

Deployment

Utilizing a GPU

Sentry

API

1. Transcribe an Audio File

POST /transcribe

2. Retrieve Job Information

GET /job/{job_id}

Contributing

Acknowledgments

Additional Notes:

License

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages

`POST /transcribe`

`GET /job/{job_id}`