Jeevyy · Jeevyy · Apr 25, 2024 · Apr 25, 2024 · Apr 26, 2024 · Apr 28, 2024
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,2 @@
+env
+__pycache__
diff --git a/Dockerfile b/Dockerfile
@@ -1,17 +1,39 @@
-# Use official Python runtime as a parent image
-FROM python:3.9
-
-#Set the working directory in the container
-WORKDIR /app
-
-#Copy the current directory contenst into container at /app
-COPY server.py /app
-
-#Install Flask
-RUN pip install Flask
-
-#Set SERVER_ID env variable
-ENV SERVER_ID "1"
-
-#Run app.py when container launches
-CMD ["python", "server.py"]
+FROM python:3.10.12-slim
+
+WORKDIR /app
+
+RUN pip install flask requests asyncio httpx Flask-APScheduler matplotlib
+
+RUN apt-get update
+RUN apt-get -y install sudo
+
+RUN apt-get -y update
+RUN apt-get -y install ca-certificates curl gnupg
+RUN install -m 0755 -d /etc/apt/keyrings
+RUN curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+RUN chmod a+r /etc/apt/keyrings/docker.gpg
+
+RUN echo \
+  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
+  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+   tee /etc/apt/sources.list.d/docker.list > /dev/null
+RUN apt-get -y update
+
+RUN apt-get -y install docker-ce-cli 
+
+ENV USER=theuser
+RUN adduser --home /home/$USER --disabled-password --gecos GECOS $USER \
+  && echo "$USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$USER \
+  && chmod 0440 /etc/sudoers.d/$USER \
+  && groupadd docker \
+  && usermod -aG docker $USER \
+  && chsh -s /bin/zsh $USER
+USER $USER
+
+ENV HOME=/home/$USER
+
+COPY . /app
+
+CMD ["python", "load_balancer.py"]
+
+EXPOSE 5000
diff --git a/README.md b/README.md
@@ -2,14 +2,159 @@
 ### This repository contains the implementation of a load balancer using Docker, aimed at efficiently distributing requests among several servers. The load balancer routes requests from multiple clients asynchronously to ensure nearly even distribution of the load across the server replicas.
 
 ## Task Description
-### Task One
-1. Implement a load balancer that routes requests among several servers asynchronously.
-2. Use Docker to manage the deployment of the load balancer and servers within a Docker network.
-3. Implement a simple web server in Python to handle HTTP requests on specified endpoints (/home and /heartbeat).
-4. Use consistent hashing data structure for efficient request distribution.
-5. Ensure fault tolerance by spawning new replicas of servers in case of failures.
-6. Write clean and well-documented code, along with a README file detailing design choices, assumptions, testing, and performance analysis.
-7. Provide a Makefile for deploying and running the code, and version control the project using Git.
+### TASK ONE: SERVER
+* **Endpoint (/home, method=GET)**: This endpoint returns a string with a unique identifier to distinguish among the replicated server containers. For instance, if a client requests this endpoint and the load balancer schedules the request to Server: 3, then an example return string would be Hello from Server:
+Hint: Server ID can be set as an env variable while running a container instance from the docker image of the server
+   - Command: ```curl -X GET -H "Content-type: application/json" http://localhost:5000/home```
+   - Response: ```{"message": "Hello from server_1"}}```
+
+* **Endpoint (/heartbeat, method=GET)**: This endpoint sends heartbeat responses upon request. The load balancer further
+uses the heartbeat endpoint to identify failures in the set of containers maintained by it. Therefore, you could send an empty
+response with a valid response code.
+  - Command: ```curl -X GET -H "Content-type: application/json" http://localhost:5000/heartbeat```
+  - Response: ```{}```
+
+### TASK TWO: CONSISTENT HASHING
+ * In this task, you need to implement a consistent hash map using an array, linked list, or any other data structure. This map data
+structure details are given in Appendix A. Use the following parameters and hash functions for your implementation.
+    - Number of Server Containers managed by the load balancer (N) = 3
+    - Total number of slots in the consistent hash map (#slots) = 512
+    - Number of virtual servers for each server container (K) = log (512) = 9 2
+    - Hash function for request mapping H(i) = i + 2i + 17 2 2
+    - Hash function for virtual server mapping Φ(i, j) = i + j + 2j + 25
+
+* **Consistant Hashing Algorithm Implementation**
+    - Implementation Details:
+      - Uses array-based data structure
+      - Number of Server Containers (N): 3
+      - Total number of slots in the consistent hash map (#slots): 512
+      - Number of virtual servers for each server container (K): log(512) = 9
+    - Hash functions used:
+      - Hash function for request mapping H(i): H(i) = i + 2i + 17
+      - Hash function for virtual server mapping Φ(i, j): Φ(i, j) = i + j + 2j + 25
+
+### TASK THREE: LOAD BALANCER
+* **Endpoint (/add, method=POST)**: This endpoint adds new server instances in the load balancer to scale up with increasing client numbers in the system. The endpoint expects a JSON payload that mentions the number of newinstances and their preferred hostnames (same as the container name in docker) in a list. An example request and response is below.
+  - Command: ``` curl -X POST -H "Content-Type: application/json" --data-binary "{\"n\": 4, \"hostnames\": [\"server11\", \"server12\", \"server13\", \"new_servers4\"]}" http://localhost:5000/add ```
+  - Response: ```{"message": {"N": 5,"replicas": ["server12","new_servers4","server_1","server11","server13"]},
+  "status": "successful"}```
+  * Perform simple sanity checks on the request payload and ensure that hostnames mentioned in the Payload are less than or equal to newly added instances. Note that the hostnames are preferably set. One can never set the hostnames. In that case, the hostnames (container names) are set randomly. However, sending a hostname list with greater length than newly added instances will result in an error.
+    - Command: ```curl -X POST -H "Content-Type: application/json" --data-binary "{\"n\": 2, \"hostnames\": [\"server11\", \"server12\", \"server13\", \"new_servers4\"]}" http://localhost:5000/add```
+    - Response: ```{"message": "<Error> Length of hostname list is more than newly added instances","status": "failure"}```
+
+* **Endpoint (/rep, method=GET)**: This endpoint only returns the status of the replicas managed by the load balancer. The response contains the number of replicas and their hostname in the docker internal network:n1 as mentioned in Fig. 1. An example response is shown below.
+  - Command: ``` curl -X GET -H "Content-type: application/json" http://localhost:5000/rep ```
+  - Response: ```{"message": {"N": 0,"replicas": []},"status": "successful"}```
+
+* **Endpoint (/rm, method=DELETE)**: This endpoint removes server instances in the load balancer to scale down with decreasing client or system maintenance. The endpoint expects a JSON payload that mentions the number of instances to be removed and their preferred hostnames (same as container name in docker) in a list. An example request and response is below.
+  - Command: ```curl -X POST -H "Content-Type: application/json" --data-binary "{\"n\": 2, \"hostnames\": [\"server11\", \"server12\"]
+}" http://localhost:5000/rm ```
+  - Response: ```{"message": {"N": 3,"replicas": ["new_servers4","server_1","server13"]},"status": "successful"}```
+   - *shows that the server12 and server11 have been removed successfully*
+
+* **Endpoint (/checkpoint, method=GET)**: This endpoint is used to view all of the servers that are currently being used. Additonally, it provides a list of the amount each server has that aids in visualising the load balancing. An example request and response is below.
+  - Command: ```curl -X GET -H "Content-type: application/json" http://localhost:5000/checkpoint```
+  - Response: ```{"requests": {"new_servers4": 10,"server13": 28,"server_1": 63}, "servers": ["new_servers4","server_1","server13"]}```
+  - *Using the existing servers, "new_servers4","server_1","server13", you are able to see the distribution of requests to each server after sending 100 requests to the   ```/home``` endpoint*
+
+* **Endpoint (/graph, method=GET)**: This endpoint is used to create a bar graph using the distribution data from the ```/checkpoint``` endpoint, where the server names are on the x-axis, and the number of requests are on the y-axis. An example request and response is below.
+  - Command: ```curl -X GET -H "Content-type: application/json" -o endpoint_example.png http://localhost:5000/graph```
+  - Response: ``` % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
+                                 Dload  Upload   Total   Spent    Left  Speed
+                  100 23576  100 23576    0     0  57646      0 --:--:-- --:--:-- --:--:-- 57784```
+  - *a new graph is created, "endpoint_example.png", that shows the distribution as seen in ```checkpoint``` data*
+     - Graph:
+        -  
+
+### TASK FOUR: ANALYSIS
+* Launch 10000 async requests on N = 3 server containers and report the request count handled by each server instance in a bar chart. Explain your observations in the graph and your view on the performance.
+  - After adding three servers ('1', '2', '3'):
+    - Ran 10,000 requests routed to the ```/home``` and used the endpoint ```/server_stats``` to view the number of requests each server recieved
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/3%20servers.png" width="400">
+      - Observations:
+        - Servers 1 and 2 bear the majority of the load, with Server 2 consistently handling the highest load followed by Server 1.
+        - In contrast, Server 3 carries a significantly lighter load compared to the other servers, indicating a potential imbalance in the load distribution strategy.
+
+
+* Next, increment N from 2 to 6 and launch 10000 requests on each such increment. Report the average load of the servers at each run in a line chart. Explain your observations in the graph and your view on the scalability of the load balancer implementation
+  - When N = 2 
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/2%20servers.png" width="400">
+  - When N = 3 
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/3%20servers.png" width="400">
+  - When N = 4 
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/4%20servers.png" width="400">
+  - When N = 5 
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/5%20servers.png" width="400">
+  - When N = 6 
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/6%20servers.png" width="400">
+  - Average load accross each server with 10,000 requests:
+    - Graph:
+      - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/second/average.png" width="400">
+    - Observation:
+      - In the case of two servers, both servers handle a relatively similar load, with Server 1 slightly edging out Server 2.
+      - This suggests a relatively balanced load distribution, although there is still room for improvement to ensure equitable resource utilization across the servers.
+
+* Test all endpoints of the load balancer and show that in case of server failure, the load balancer spawns a new instance quickly to handle the load.
+  - By simulating a forced exit of a server and the latency betweeen a new server arriving, this is shown in a graph below where ```N``` represents the amount of servers:
+  - | Number of Servers | Latency for New Server Spawn |
+    |-------------------|------------------------------|
+    | 2 servers         | 1.2313279 seconds            |
+    | 3 servers         | 0.8681224 seconds            |
+    | 4 servers         | 0.2518046 seconds            |
+    | 5 servers         | 0.2355351 seconds            |
+
+* Finally, modify the hash functions H(i), Φ(i, j) and report the observations from (A-1) and (A-2).
+  - To achieve a better distribution, the following changes were made to the consistent hashing function:
+    - Number of Server Containers managed by the load balancer (N) = 3
+    - Total number of slots in the consistent hash map (#slots) = 512
+    - Number of virtual servers for each server container (K) = 20
+    - Hash function for request mapping H(i) = Rid % M
+    - Hash function for virtual server mapping Φ(i, j) = (Sid + i) % M
+   - Although there is still a bias towards one of the servers, the load balancer is able to effectively balance the load accross all respective servers
+   - Additionally, after closing a healthy server using the ```/rm``` endpoint, all of its previous requests are distributed amongst the existing healthy servers
+
+    * Launch 10000 async requests on N = 3 server containers and report the request count handled by each server instance in a bar chart. Explain your observations in the graph and your view on the performance.
+      - After adding three servers ('1', '2', '3'):
+      - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/task4_a42_3servers.png" width="400">
+        - Observations:
+          - Server 1 consistently handles the highest load, followed by Server 2 and then Server 3. Despite minor variations, this trend remains consistent across the different server counts.
+          - The performance of the load balancer appears effective in distributing the load somewhat evenly across the servers, with Server 1 consistently bearing the highest load
+
+    * Next, increment N from 2 to 6 and launch 10000 requests on each such increment. Report the average load of the servers at each run in a line chart. Explain your observations in the graph and your view on the scalability of the load balancer implementation
+      - When N = 2 
+        - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/task4_a42_2servers.png" width="400">
+      - When N = 3 
+        - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/task4_a42_3servers.png" width="400">
+      - When N = 4 
+        - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/task4_a42_4servers.png" width="400">
+      - When N = 5 
+        - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/task4_a42_5servers.png" width="400">
+      - When N = 6 
+        - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/task4_a42_6servers.png" width="400">
+      - Average load accross each server with 10,000 requests:
+        - Graph:
+          - <img src="https://raw.githubusercontent.com/Jeevyy/Load-Balancer---Distributed-Systems/jeevan-develop/images/first/average.png" width="400">
+      - Observation:
+        - In the case of two servers, Server 1 consistently handles a higher load compared to Server 2, suggesting an imbalance in the load distribution. 
+        - Despite this, the load balancer demonstrates a basic ability to distribute the load across multiple servers as compared to the first consistant hashing which happened to skew more into the first two servers
+
+
+
+
+
+
 
 ## Group Members
 1. 137991 - Jesse Kamau

diff --git a/consistent_hash.py b/consistent_hash.py
@@ -0,0 +1,59 @@
+import hashlib
+
+class ConsistantHash:
+    def __init__(self):
+        self.slots = 512
+        self.k = 20
+        self.consistant_hash = [0] * self.slots
+        self.map = {}
+
+    def h(self, i: int) -> int:
+        return (i*i + 2*i + 17) % self.slots
+
+    def fi(self, i: int, j: int) -> int:
+        return (i*i + j*j + 2*j + 25) % self.slots
+
+    def get_server_id(self, server: str) -> int:
+        return int(hashlib.md5(server.encode()).hexdigest(), 16) % self.slots
+
+    def build(self, server_list: set[str]):
+        for server in server_list:
+            self.add_server_to_hash(server)
+
+    def get_server_from_request(self, request_id: int) -> str:
+        req_pos = self.h(request_id)
+        for i in range(self.slots):
+            if self.consistant_hash[req_pos] != 0:
+                return self.consistant_hash[req_pos]
+            else:
+                req_pos = (req_pos + 1) % self.slots
+        return None
+
+    def add_server_to_hash(self, server: str):
+        server_id = self.get_server_id(server)
+        for j in range(self.k):
+            pos = self.fi(server_id, j)
+            if self.consistant_hash[pos] == 0:
+                self.consistant_hash[pos] = server
+            else:
+                original_pos = pos
+                while self.consistant_hash[pos] != 0:
+                    pos = (pos + 1) % self.slots
+                    if pos == original_pos:
+                        raise Exception("Hash table is full")
+                self.consistant_hash[pos] = server
+        self.map[server] = server_id
+
+    def remove_server_from_hash(self, server: str, request_counts: dict):
+        server_id = self.map[server]
+        for i in range(self.slots):
+            if self.consistant_hash[i] == server:
+                self.consistant_hash[i] = 0
+        del self.map[server]
+
+        total_requests = request_counts.pop(server, 0)
+        servers = list(self.map.keys())
+        if servers:
+            requests_per_server = total_requests // len(servers)
+            for s in servers:
+                request_counts[s] += requests_per_server
diff --git a/images/first/2 servers.png b/images/first/2 servers.png
diff --git a/images/first/3 servers.png b/images/first/3 servers.png
diff --git a/images/first/4 servers.png b/images/first/4 servers.png
diff --git a/images/first/5 servers.png b/images/first/5 servers.png
diff --git a/images/first/6 servers.png b/images/first/6 servers.png
diff --git a/images/first/average.png b/images/first/average.png
diff --git a/images/second/2 servers.png b/images/second/2 servers.png
diff --git a/images/second/3 servers.png b/images/second/3 servers.png
diff --git a/images/second/4 servers.png b/images/second/4 servers.png
diff --git a/images/second/5 servers.png b/images/second/5 servers.png
diff --git a/images/second/6 servers.png b/images/second/6 servers.png
diff --git a/images/second/average.png b/images/second/average.png
diff --git a/images/task4_a42_2servers.png b/images/task4_a42_2servers.png
diff --git a/images/task4_a42_3servers.png b/images/task4_a42_3servers.png
diff --git a/images/task4_a42_4servers.png b/images/task4_a42_4servers.png
diff --git a/images/task4_a42_5servers.png b/images/task4_a42_5servers.png
diff --git a/images/task4_a42_6servers.png b/images/task4_a42_6servers.png