diff --git a/Dockerfile b/Dockerfile index ab85832..c48fc97 100644 --- a/Dockerfile +++ b/Dockerfile @@ -25,4 +25,5 @@ RUN python -m pip install -v -r /home/${user}/RandomTelecomPayments/requirements # set working directory for random telecom payments app WORKDIR /home/${user}/RandomTelecomPayments +EXPOSE 8000 ENTRYPOINT ["python", "generator/main.py"] \ No newline at end of file diff --git a/README.md b/README.md index 6653841..2d6f524 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,9 @@ ## Overview -Randomly simulated data is particularly useful when it's real world counterpart is hard access due to complexity, privacy and security reasons. Moreover, randomly simulated data has additional benefits including reproducibility, scalability and controllability. +Randomly simulated data is particularly useful when it's real world counterpart is hard access due to complexity, privacy and security reasons. Moreover, randomly simulated data has additional benefits including reproducibility, scalability and controllability. -This application aims to simulate telecommunication payments using random number generation. It includes typical transaction level relationships and behaviours amongst the user, device, ip, and card entities. It can be used in place of real world telecommunication payments for prototyping solutions and as an education tool. +This application aims to simulate telecommunication payments using random number generation. It includes typical transaction level relationships and behaviours amongst the user, device, ip, and card entities. It can be used in place of real world telecommunication payments for prototyping solutions and as an education tool. The data generation algorithm works by first generating user level telecom payments data. Afterwards, the user level data is exploded to transaction level, and any inconsistencies within the data model are removed. Finally, the transaction status and error codes are generated using underlying features within the transaction level data. @@ -16,7 +16,7 @@ A stable master version of the Random Telecom Payments data can be found on Kagg ## Data Model -The underlying data model present in the simulated telecommunication payments is displayed below. +The underlying data model present in the simulated telecommunication payments is displayed below. ![Entity Relationship Diagram](doc/entity_relationship_diagram.jpg) @@ -26,27 +26,16 @@ For a more detailed account of each column in the dataset see the data dictionar ## Running the Application (Windows) -### Anaconda - -Create a local conda environment for the Random Telecom Payments app using [anaconda](https://www.anaconda.com/): - -``` -conda create --name RandomTelecomPayments python=3.12 --yes -conda activate RandomTelecomPayments -pip install -r requirements.txt -``` - -Execute the Random Telecom Payments app to generate data for 2000 users using the following command and the local conda environment: - -``` -python generator\\main.py --n_users 1000 --use_random_seed 1 --n_itr 2 -``` - -View the generated Random Telecom Payments data using the following command: +### Application Parameters -``` -type data\\RandomTelecomPayments.csv | more -``` +* **n_users** - integer, the number of users to generate Random Telecom Payments data for, default is 100. +* **use_random_seed** - integer, whether to run the Random Telecom Payments data generation with or without a random seed set for reproducible results; must be 0 or 1. +* **n_itr** - integer, the number of Random Telecom Payments data batches to generate; must be at least 1. The python multiprocessing library is used to run each in parallel across all available cores. +* **n_applications** - integer, the number of applications to generate, default is 20000 +* **registration_start_date** - string, the start date for user registrations, default is two years ago from today. +* **registration_end_date** - string, the end date for user registrations, default is one year ago from today. +* **transaction_start_date** - string, the start date for user transactions, default is one year ago from today. +* **transaction_end_date** - string, the end date for user transactions, default is today. ### Docker @@ -60,6 +49,8 @@ The docker image can be pulled from dockerhub using the following command: docker pull oislen/randomtelecompayments:latest ``` +#### Command Line Interface + The Random Telecom Payments app can then be executed to generate data for 2000 users using the following command and the docker image: ``` @@ -69,15 +60,20 @@ docker run --name rtp oislen/randomtelecompayments:latest --n_users 1000 --use_r The generated Random Telecom Payments data can then be extract from the docker image using the following command: ``` -docker cp rtp:/home/ubuntu/RandomTelecomPayments/data/RandomTelecomPayments.csv %userprofile%\Downloads\RandomTelecomPayments.csv +docker cp rtp:/home/user/RandomTelecomPayments/data/RandomTelecomPayments.csv %userprofile%\Downloads\RandomTelecomPayments.csv ``` -### Application Parameters +#### FastApi Interface -* **n_users** - integer, the number of users to generate Random Telecom Payments data for, default is 100. -* **use_random_seed** - integer, whether to run the Random Telecom Payments data generation with or without a random seed set for reproducible results; must be 0 or 1. -* **n_itr** - integer, the number of Random Telecom Payments data batches to generate; must be at least 1. The python multiprocessing library is used to run each in parallel across all available cores. -* **registration_start_date** - string, the start date for user registrations, default is two years ago from today. -* **registration_end_date** - string, the end date for user registrations, default is one year ago from today. -* **transaction_start_date** - string, the start date for user transactions, default is one year ago from today. -* **transaction_end_date** - string, the end date for user transactions, default is today. +Alternatively, a FastApi interface has been configured within the docker image to allow for interaction with the Random Telecom Payments app via REST API calls. The FastApi interface can be accessed by publishing port 8000 when running the docker image as follows: + +``` +docker run --name rtp --publish 8000:8000 --entrypoint fastapi --rm oislen/randomtelecompayments:latest run generator/api.py +``` + +Once the web endpoint is running, navigate to localhost:8000/docs in your preferred browser to access the FastApi interface documentation and test the available API calls. + +* http://localhost:8000/docs + + +![FastApi Endpoint](doc/fastapi_endpoint.jpg) diff --git a/doc/RandomTelecomPayments.postman_collection.json b/doc/RandomTelecomPayments.postman_collection.json new file mode 100644 index 0000000..7a0480c --- /dev/null +++ b/doc/RandomTelecomPayments.postman_collection.json @@ -0,0 +1,95 @@ +{ + "info": { + "_postman_id": "ff7bfe7a-c3ca-4d73-a609-6094c57def45", + "name": "RandomTelecomPayments", + "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json", + "_exporter_id": "39794605" + }, + "item": [ + { + "name": "/api", + "request": { + "method": "GET", + "header": [], + "url": { + "raw": "http://127.0.0.1:8000/api", + "protocol": "http", + "host": [ + "127", + "0", + "0", + "1" + ], + "port": "8000", + "path": [ + "api" + ] + } + }, + "response": [] + }, + { + "name": "/api?n_users=5&random_seed=1", + "request": { + "method": "GET", + "header": [], + "url": { + "raw": "http://127.0.0.1:8000/api?n_users=5&random_seed=1", + "protocol": "http", + "host": [ + "127", + "0", + "0", + "1" + ], + "port": "8000", + "path": [ + "api" + ], + "query": [ + { + "key": "n_users", + "value": "5" + }, + { + "key": "random_seed", + "value": "1" + } + ] + } + }, + "response": [] + }, + { + "name": "/api", + "request": { + "method": "POST", + "header": [], + "body": { + "mode": "raw", + "raw": "{\r\n \"n_users\": 1,\r\n \"use_random_seed\": 1,\r\n \"n_itr\": 1,\r\n \"n_applications\": 20000,\r\n \"registration_start_date\": \"2024-01-01\",\r\n \"registration_end_date\": \"2024-12-31\",\r\n \"transaction_start_date\": \"2025-01-01\",\r\n \"transaction_end_date\": \"2025-12-31\"\r\n}", + "options": { + "raw": { + "language": "json" + } + } + }, + "url": { + "raw": "http://127.0.0.1:8000/api", + "protocol": "http", + "host": [ + "127", + "0", + "0", + "1" + ], + "port": "8000", + "path": [ + "api" + ] + } + }, + "response": [] + } + ] +} \ No newline at end of file diff --git a/doc/fastapi_endpoint.jpg b/doc/fastapi_endpoint.jpg new file mode 100644 index 0000000..fe1a6b5 Binary files /dev/null and b/doc/fastapi_endpoint.jpg differ diff --git a/exeDocker.cmd b/exeDocker.cmd index 0452ab6..e16646f 100644 --- a/exeDocker.cmd +++ b/exeDocker.cmd @@ -16,6 +16,7 @@ SET UBUNTU_DIR=/home/ubuntu call docker run --name %DOCKER_CONTAINER_NAME% --memory 7GB --volume E:\GitHub\RandomTelecomPayments\data:/home/ubuntu/RandomTelecomPayments/data --rm %DOCKER_IMAGE% --n_users 100 --use_random_seed 1 --n_itr 1 :: call docker run --name %DOCKER_CONTAINER_NAME%w --memory 7GB --volume E:\GitHub\RandomTelecomPayments\data:/home/ubuntu/RandomTelecomPayments/data --rm %DOCKER_IMAGE% --n_users 13000 --use_random_seed 1 --n_itr 2 :: call docker run -it --entrypoint bash --name %DOCKER_CONTAINER_NAME% --memory 7GB --volume E:\GitHub\RandomTelecomPayments\data:/home/ubuntu/RandomTelecomPayments/data --rm %DOCKER_IMAGE% +:: call docker run --name %DOCKER_CONTAINER_NAME% --publish 8000:8000 --memory 7GB --entrypoint fastapi --rm %DOCKER_IMAGE% run generator/api.py :: useful docker commands :: docker images diff --git a/generator/api.py b/generator/api.py new file mode 100644 index 0000000..77125c2 --- /dev/null +++ b/generator/api.py @@ -0,0 +1,120 @@ +import json +from fastapi import FastAPI, Query +from typing import Annotated, Dict, List + +import cons +from main import main +from utilities.JsonEncoder import JsonEncoder as JsonEncoder + +tags_metadata = [ + { + "name": "Random Telecom Payments Data Generator", + "description": "Generate random telecom payments data based on user-defined parameters.", + }, +] + +app = FastAPI( + title="Random Telecom Payments Data Generator API", + description="An API to generate random telecom payments data based on user-defined parameters.", + version="0.0.0", + openapi_tags=tags_metadata, +) + +@app.get("/api", tags=["Random Telecom Payments Data Generator"]) +async def get_api( + n_users: Annotated[int, Query(title="Number of Users", description="The number of users")] = cons.default_n_users, + use_random_seed : Annotated[int, Query(title="Use Random Seed", description="The random seed to use", ge=0, le=1)] = cons.default_use_random_seed, + n_itr : Annotated[int, Query(title="Number of Iterations", description="The number of iterations", ge=1)] = cons.default_n_itr, + n_applications : Annotated[int, Query(title="Number of Applications", description="The number of applications", ge=1)] = cons.default_n_applications, + registration_start_date : Annotated[str, Query(title="Registration Start Date", description="The registration start date in YYYY-MM-DD format")] = cons.default_registration_start_date, + registration_end_date : Annotated[str, Query(title="Registration End Date", description="The registration end date in YYYY-MM-DD format")] = cons.default_registration_end_date, + transaction_start_date : Annotated[str, Query(title="Transaction Start Date", description="The transaction start date in YYYY-MM-DD format")] = cons.default_transaction_start_date, + transaction_end_date : Annotated[str, Query(title="Transaction End Date", description="The transaction end date in YYYY-MM-DD format")] = cons.default_transaction_end_date, + ): + """ + Generate random telecom payments data based on user-defined parameters. + + Parameters + ---------- + n_users : int + The number of users. + use_random_seed : int + The random seed to use (0 or 1). + n_itr : int + The number of iterations. + n_applications : int + The number of applications. + registration_start_date : str + The registration start date in YYYY-MM-DD format. + registration_end_date : str + The registration end date in YYYY-MM-DD format. + transaction_start_date : str + The transaction start date in YYYY-MM-DD format. + transaction_end_date : str + The transaction end date in YYYY-MM-DD format. + + Returns + ------- + response : str + A JSON string containing the generated telecom payments data. + """ + # generate parameters dictionary + input_params_dict={ + "n_users": n_users, + "use_random_seed": use_random_seed, + "n_itr": n_itr, + "n_applications": n_applications, + "registration_start_date": registration_start_date, + "registration_end_date": registration_end_date, + "transaction_start_date": transaction_start_date, + "transaction_end_date": transaction_end_date + } + # run random telecom payments generator + output_data_dict = main(input_params_dict=input_params_dict) + # convert transaction data to dictionary and then to json response + trans_data_dict = output_data_dict['trans_data'].to_dict(orient='records') + response = json.dumps(trans_data_dict, cls=JsonEncoder) + return response + +@app.post("/api", tags=["Random Telecom Payments Data Generator"]) +async def post_api( + body: Dict[str, object] = {} + ): + """ + Generate random telecom payments data based on user-defined parameters. + + Parameters + ---------- + body : Dict[str, object] + A dictionary containing the input parameters. + Possible keys are: + - n_users : int + The number of users. + - use_random_seed : int + The random seed to use (0 or 1). + - n_itr : int + The number of iterations. + - n_applications : int + The number of applications. + - registration_start_date : str + The registration start date in YYYY-MM-DD format. + - registration_end_date : str + The registration end date in YYYY-MM-DD format. + - transaction_start_date : str + The transaction start date in YYYY-MM-DD format. + - transaction_end_date : str + The transaction end date in YYYY-MM-DD format. + + Returns + ------- + response : str + A JSON string containing the generated telecom payments data. + """ + # generate parameters dictionary + input_params_dict={**cons.default_input_params_dict, **body} + # run random telecom payments generator + output_data_dict = main(input_params_dict=input_params_dict) + # convert transaction data to dictionary and then to json response + trans_data_dict = output_data_dict['trans_data'].to_dict(orient='records') + response = json.dumps(trans_data_dict, cls=JsonEncoder) + return response \ No newline at end of file diff --git a/generator/app/gen_random_telecom_data.py b/generator/app/gen_random_telecom_data.py index 1b400cd..28b0283 100644 --- a/generator/app/gen_random_telecom_data.py +++ b/generator/app/gen_random_telecom_data.py @@ -114,4 +114,8 @@ def gen_random_telecom_data( fpath_countrycrimeindex=cons.fpath_countrycrimeindex ) + # map np.nans to None for JSON serialisation + user_data = user_data.where(pd.notnull(user_data), None) + trans_data = trans_data.where(pd.notnull(trans_data), None) + return {"user_data":user_data, "trans_data":trans_data} diff --git a/generator/cons.py b/generator/cons.py index d3839d2..6fdb599 100644 --- a/generator/cons.py +++ b/generator/cons.py @@ -48,6 +48,17 @@ default_registration_end_date = (date_today - datetime.timedelta(days=366)).strftime(date_date_strftime) default_transaction_start_date = (date_today - datetime.timedelta(days=365)).strftime(date_date_strftime) default_transaction_end_date = date_today.strftime(date_date_strftime) +# define default input parameters dictionary +default_input_params_dict = { + "n_users": default_n_users, + "use_random_seed": default_use_random_seed, + "n_itr": default_n_itr, + "n_applications": default_n_applications, + "registration_start_date": default_registration_start_date, + "registration_end_date": default_registration_end_date, + "transaction_start_date": default_transaction_start_date, + "transaction_end_date": default_transaction_end_date +} # set unittest constants unittest_seed = 42 diff --git a/generator/exeApi.cmd b/generator/exeApi.cmd new file mode 100644 index 0000000..be168b3 --- /dev/null +++ b/generator/exeApi.cmd @@ -0,0 +1 @@ +call fastapi run api.py \ No newline at end of file diff --git a/generator/exeApi.sh b/generator/exeApi.sh new file mode 100644 index 0000000..5d8da41 --- /dev/null +++ b/generator/exeApi.sh @@ -0,0 +1 @@ +fastapi run api.py \ No newline at end of file diff --git a/generator/main.py b/generator/main.py index eabed33..769812a 100644 --- a/generator/main.py +++ b/generator/main.py @@ -12,20 +12,13 @@ from utilities.multiprocess import multiprocess from app.gen_random_telecom_data import gen_random_telecom_data -if __name__ == '__main__': - - # set up logging - lgr = logging.getLogger() - lgr.setLevel(logging.INFO) - - # set user parameters - input_params_dict = commandline_interface() - +def main(input_params_dict: dict): + """ + Main function to generate random telecom payments data. + """ # run input error handling - input_error_handling(input_params_dict) - logging.info(f'Input Parameters: {input_params_dict}') - + input_error_handling(input_params_dict) # start timer t0 = time() if input_params_dict['n_itr'] > 1: @@ -67,19 +60,28 @@ t1 = time() total_runtime_seconds = round(t1 - t0, 2) logging.info(f'Total Runtime: {total_runtime_seconds} seconds') - # print out head and shape of data logging.info(f'RandomTeleComUsersData.shape: {user_data.shape}') logging.info(f'RandomTeleComTransData.shape: {trans_data.shape}') - # check output data directories exist data_fdirs = [os.path.dirname(cons.fpath_randomtelecomtransdata), os.path.dirname(cons.fpath_randomtelecomusersdata)] for data_fdir in data_fdirs: if not os.path.exists(data_fdir): os.mkdir(data_fdir) - # write data to disk logging.info(f'Writing intermediate user level random telecoms data to: {cons.fpath_randomtelecomusersdata}') logging.info(f'Writing output trans level random telecoms data to: {cons.fpath_randomtelecomtransdata}') user_data.to_parquet(cons.fpath_randomtelecomusersdata, engine='fastparquet') - trans_data.to_csv(cons.fpath_randomtelecomtransdata, index = False) \ No newline at end of file + trans_data.to_csv(cons.fpath_randomtelecomtransdata, index = False) + # return dataframes as dictionary + return {"user_data": user_data, "trans_data": trans_data} + +if __name__ == '__main__': + # set up logging + lgr = logging.getLogger() + lgr.setLevel(logging.INFO) + # set user parameters + input_params_dict = commandline_interface() + # run main + output_data_dict = main(input_params_dict) + logging.info('Programme finished successfully.') \ No newline at end of file diff --git a/generator/utilities/JsonEncoder.py b/generator/utilities/JsonEncoder.py new file mode 100644 index 0000000..051bd74 --- /dev/null +++ b/generator/utilities/JsonEncoder.py @@ -0,0 +1,37 @@ +import json +import numpy as np +import pandas as pd + +class JsonEncoder(json.JSONEncoder): + """ + A custom JSON encoder for handling numpy and pandas data types. + Extends the default JSONEncoder to convert numpy data types and pandas Timestamps + to native Python data types for JSON serialization. + + Methods + ------- + default(self, obj) + Override the default method to handle specific data types. + """ + def default(self, obj): + """ + Convert numpy and pandas data types to native Python types. + + Parameters + ---------- + obj : object + The object to be converted. + + Returns + ------- + object + The converted object suitable for JSON serialization. + """ + dtypes = (np.datetime64, pd.Timestamp) + if isinstance(obj, dtypes): + return str(obj) + elif isinstance(obj, np.integer): + return int(obj) + elif isinstance(obj, np.floating): + return float(obj) + return super(JsonEncoder, self).default(obj) \ No newline at end of file diff --git a/generator/utilities/arch/check_message_body.py b/generator/utilities/arch/check_message_body.py new file mode 100644 index 0000000..5869396 --- /dev/null +++ b/generator/utilities/arch/check_message_body.py @@ -0,0 +1,5 @@ +from fastapi import HTTPException + +def check_message_body(body: dict): + if not body.get("n_users"): + raise HTTPException(status_code=400, detail="'n_users' field is required") \ No newline at end of file diff --git a/generator/utilities/commandline_interface.py b/generator/utilities/commandline_interface.py index 515ec43..867f866 100644 --- a/generator/utilities/commandline_interface.py +++ b/generator/utilities/commandline_interface.py @@ -49,7 +49,7 @@ def commandline_interface() -> Dict[str, object]: parser.add_argument("--transaction_start_date", action="store", dest="transaction_start_date", type=str, default=cons.default_transaction_start_date, help="String, the start date for transactions",) parser.add_argument("--transaction_end_date", action="store", dest="transaction_end_date", type=str, default=cons.default_transaction_end_date, help="String, the end date for transactions",) # create an output dictionary to hold the results - input_params_dict = {} + input_params_dict = cons.default_input_params_dict.copy() # extract input arguments args = parser.parse_args() # map input arguments into output dictionary diff --git a/requirements.txt b/requirements.txt index 213f85d..5e77de7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -10,4 +10,5 @@ pyarrow==18.1.0 fastparquet==2024.11.0 beartype==0.19.0 unidecode==1.3.8 -boto3==1.36.12 \ No newline at end of file +boto3==1.36.12 +fastapi[standard]==0.128.0 \ No newline at end of file