This is a RESTful API which attempts to determine the type and extract metadata from uploaded files. Created for the assignments 1-3 of the course "Сучасні технології розробки сервіс орієнтованих застосунків".
The API is built on ASP.NET Core (.NET 9) and is designed to run on Docker. At the moment, the analysis capabilities are rather basic and serve more of a demonstrational purpose. The project can, however, be extended to support more file types and store/return more file metadata.
The following endpoints are provided:
-
/upload
Method: POST
Returns JSON:
- id (string) – an id, generated for the uploaded file.
-
/uploads/{id}
Method: GET
Parameters:
- id – file id.
Returns binary contents of the file.
-
/uploads/{id}/analysisProgress
Method: GET
Parameters:
- id – file id.
Returns JSON:
- progress (float) – analysis progress, value from 0 to 1.
- hasFinished (boolean) – has the analysis finished?
-
/uploads/{id}/info
Method: GET
Parameters:
- id – file id.
Returns JSON:
- name (string) – file name.
- size (integer) – file size, in bytes.
- type (string) – detected file type.
- metadata (object) – all metadata that was extracted from the file.
Currently supported file types are:
- Binary (used as a fallback)
- Media files (title might be extracted)
- Audio (# of channels, sample rate and encoding will be determined)
- MP3
- WAV
- Images (size will be determined)
- PNG
- JPG
- Audio (# of channels, sample rate and encoding will be determined)
- Media files (title might be extracted)
- Text (an attempt will be made to determine the encoding)
- JSON
- XML
The data is stored in two formats:
- Raw files are stored in the /storage volume, their names are replaced with random Ids.
- Info about files + the metadata is stored inside the /data volume in the database.db file (an SQLite database).
To try and determine the file type, the contents are run through the following libraries:
- ImageMagick (Magick.NET)
- NAudio
- TagLib#
- Charset Detector/UTF Unknown
- System.Text.Json and System.Linq.Xml