From f6674047e5e282dce4c6d86c87b6afdc1d034960 Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Mon, 18 Aug 2025 23:39:31 +0300 Subject: [PATCH 1/8] Update README.md --- README.md | 316 +++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 241 insertions(+), 75 deletions(-) diff --git a/README.md b/README.md index 9e92e033..5b927451 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# webserv +# Webserv [![Build Status](https://github.com/to0nsa/webserv/actions/workflows/build.yml/badge.svg)](https://github.com/to0nsa/webserv/actions/workflows/build.yml) [![Docs Status](https://github.com/to0nsa/webserv/actions/workflows/docs.yml/badge.svg?branch=main)](https://to0nsa.github.io/webserv/) @@ -12,104 +12,239 @@ ___ -## Build & Test Instructions +## Core of Webserv -### Build with Makefile +| Component | Responsibility | +| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **`Server`** | Represents a virtual host configuration. Manages binding (host + port), server names, error pages, body size limits, and a collection of `Location` blocks. | +| **`Location`** | Encapsulates route-specific configuration. Defines path matching, allowed HTTP methods, root directories, index files, redirects, CGI interpreters, and upload stores. | +| **`runWebserv`** | Orchestrates the execution of the web server: initializes `Server` objects from the parsed configuration, launches sockets, and enters the event loop. | -```bash -make -./bin/webserv configs/default.conf -``` +___ + +### Program Entrypoint + + 1. Parse CLI arguments or fallback to default configuration. + 2. Load and normalize configuration (via config parser). + 3. Validate `Server` and `Location` objects. + 4. Call **`runWebserv()`** to start the server runtime. + +The program is designed so that **configuration and validation are complete before runtime begins**, ensuring that only consistent and safe server objects are passed to the execution loop. + +___ + +## Configuration Parsing Flow + +This section describes how the configuration parsing logic of **Webserv** works, including the step‑by‑step pipeline and the rules applied during parsing and validation. + +
+See Details + +### 1. Tokenization + +* **Component:** `Tokenizer` +* **Goal:** Convert raw configuration text into a structured list of tokens. +* **Steps:** + + * Skip UTF‑8 BOM if present. + * Ignore whitespace, line breaks, and comments (`# ...`). + * Classify tokens into categories: -Available Makefile targets: + * **Keywords:** `server`, `location`, `listen`, `host`, `root`, `index`, `autoindex`, `methods`, `upload_store`, `return`, `error_page`, `client_max_body_size`, `cgi_extension`. + * **Identifiers:** Alphanumeric strings with `-`, `.`, `/`, `:` allowed. + * **Numbers & Units:** Digits with optional single‑letter suffix (`k`, `m`, `g`). + * **Strings:** Quoted values (single `'` or double `"`). + * **Symbols:** `{`, `}`, `;`, `,`. + * Detect and reject invalid characters, control characters, or malformed identifiers. -### Build Modes +### 2. Parsing -| Command | Description | -|----|----| -| `make` | Build in release mode (optimized) | -| `make debug` | Build in debug mode (with `-g` and no optimizations) | -| `make debug_asan` | Build in debug mode with AddressSanitizer | -| `make debug_ubsan`| Build in debug mode with UndefinedBehaviorSanitizer | -| `make fast` | Fast build without dependency tracking (development only) | +* **Component:** `ConfigParser` +* **Goal:** Transform token stream into structured objects (`Config`, `Server`, `Location`). +* **Rules:** -### Code quality + * **Block structure:** Curly braces `{ ... }` delimit `server` and `location` blocks. + * **Directives:** Each directive must end with `;` unless it opens a block. + * **Directive placement:** Certain directives are only valid at specific levels: -| Command | Description | -|-----|----| -| `make format` | Format all `.cpp` and `.hpp` files using `clang-format` | + * Server level: `listen`, `host`, `server_name`, `error_page`, `client_max_body_size`. + * Location level: `root`, `index`, `autoindex`, `methods`, `upload_store`, `return`, `cgi_extension`, `cgi_interpreter`. + * **Nesting:** Locations may not contain other `server` blocks. -### Run and Test +### 3. Configuration Objects -| Command | Description | -|-----|----| -| `make run` | Build and run the web server | -| `make test` | Build and run all test binaries from `tests/` folder | -| `make sanitize` | Build and run under all sanitizers (ASAN, TSAN, UBSAN) | +* **Server:** Represents a virtual host. -### Cleaning + * Holds host, port, server names, error pages, body size limits, and `Location` blocks. +* **Location:** Defines behavior for a URI path prefix. -| Command | Description | -|-----|----| -| `make clean` | Remove all object files and dependency files | -| `make fclean`| Remove everything: binaries, builds, tests | -| `make re` | Full clean and rebuild | + * Includes root directory, index file(s), autoindex flag, allowed methods, redirects, CGI settings, and upload store. -### Help +### 4. Normalization -| Command | Description | -|-----|---| -| `make help` | Displays a categorized list of all available `Makefile` targets | +* After parsing, the configuration is **normalized** to ensure consistency and defaults: + + * Missing `client_max_body_size` → default = **1 MB**. + * Missing `error_page` → add defaults for common errors (403, 404, 500, 502 → `/error.html`). + * Missing `methods` → defaults to **GET, POST, DELETE**. + * Locations without `root` → fallback to `/var/www` (unless redirected). + * Root location (`/`) without `index` → defaults to **index.html**. +* Normalization guarantees that later validation and runtime logic operate on a **complete and uniform** model. + +### 5. Validation + +* **Component:** `validateConfig` +* **Goal:** Enforce semantic correctness beyond syntax. +* **Checks applied:** + + * **Presence checks:** At least one `location` per `server`. + * **Path rules:** Location paths must start with `/` and not contain segments beginning with `.`. + * **Defaults:** Each location must define either a `root` or `return` (but not both with CGI). + * **Server names:** Must be unique per host\:port, valid per RFC 1035 (no spaces, no control chars, no empty labels). + * **Ports:** Only one unnamed default server per host\:port pair. + * **Error pages:** Codes restricted to 400–599. + * **Redirects:** Only 301, 302, 303, 307, 308 allowed. + * **Methods:** Only `GET`, `POST`, `DELETE` permitted. + * **Client body size:** Must be > 0. + * **CGI:** Extensions must start with a dot, interpreters must map 1‑to‑1 with declared extensions. + * **Roots & Upload stores:** Must exist and be directories. + * **Index:** Requires a valid `root`. + +### 6. Error Handling + +* **Tokenizer:** Throws `TokenizerError` with line/column context when encountering invalid tokens. +* **Parser:** Throws `ConfigParseError` on invalid structure or misplaced directives. +* **Validator:** Throws `ValidationError` with descriptive guidance on fixing invalid configurations. + +
___ -## Continuous Integration & Documentation +## Networking Core — `SocketManager` + +The heart of Webserv’s I/O: a single `poll()` loop multiplexing **listening sockets**, **client sockets**, and **CGI pipes**, with strict timeouts and robust error recovery. + +
+See Details + +### What it Does + +* **Listening sockets**: set up `bind()`/`listen()` for each configured host\:port. +* **Event loop**: run non-blocking `poll()` to monitor all descriptors. +* **Connections**: + + * **New connections** → `accept()` → initialize per-client state. + * **Reads** → receive → parse (supports pipelining) → route. + * **CGI** → spawn, monitor pipes, enforce timeouts, finalize. + * **Writes** → stream raw or file-backed responses with keep-alive and backpressure. +* **Timeouts**: enforce idle, header, body, and send deadlines. +* **Errors**: generate accurate HTTP error responses, close cleanly. + +___ + +### High-Level Flow + +```mermaid +flowchart TD + A[Listening Sockets] -->|poll()| B[Accept New Client] + B --> C[Initialize ClientInfo] + C --> D[Client Socket Ready?] + D -->|POLLIN| E[Receive Data] + E --> F[Parse Requests] + F --> G[Route or CGI] + G -->|CGI? yes| H[Spawn & Monitor CGI] + G -->|CGI? no| I[Build Response] + H --> J[Collect CGI Output] + J --> I + I --> K[Queue Response] + K -->|POLLOUT| L[Send Raw/File Response] + L --> M{Keep-Alive?} + M -->|Yes| D + M -->|No| N[Cleanup + Close FD] +``` -> This project uses **GitHub Actions** to automate building, testing, and documentation deployment. +
-### ✅ CI Pipeline +___ -On each push or pull request to `main` or `dev`, the following jobs are run automatically: +## Flow Overview -| Job | Purpose | -|---|---| -| 🧪 Build (Release) | Builds the project using the provided `Makefile`. | -| 📄 Doxygen Docs | Generates and deploys Doxygen documentation to GitHub Pages. | +1. **Tokenizer** → breaks input into tokens. +2. **ConfigParser** → builds in‑memory `Config` with `Server` & `Location` objects. +3. **normalizeConfig** → fills missing defaults (sizes, error pages, roots, index, methods). +4. **validateConfig** → applies semantic checks. +5. **Runtime** → validated configuration is passed to the server for request routing. -All configurations rely on the project `Makefile` and follow the project's coding style. +The configuration pipeline guarantees that only syntactically valid, normalized, and semantically correct configurations are accepted. This ensures the server runs with predictable defaults, strong validation, and developer-friendly diagnostics. -### 🧼 Sanitizer Suppressions +___ + +## Continuous Integration & Documentation -To reduce noise in sanitizer reports, the `.asanignore` file suppresses: +This project leverages **GitHub Actions** to ensure code quality, stability, and up-to-date documentation. -- Known benign leaks from `libstdc++`, `libc`, and dynamic allocators -- Internal race conditions in `__sanitizer` symbols -- Undefined behavior in standard library internals +
+See Details -These help CI focus on bugs *in your code*, not external sources. +### CI Pipeline -### 📚 Documentation +* Runs automatically on pushes and pull requests to `main` and `dev`. +* Includes manual triggers (`workflow_dispatch`) and dependency checks after successful builds. -- Doxygen generates HTML docs from source code and Markdown (`README.md` is the main page) -- Graphviz is enabled for call graphs, class diagrams, and source browser -- Documentation is deployed automatically via GitHub Pages from the `docs/html` directory +**Jobs Overview:** + +| Job | Description | +| ------------ | ------------------------------------------------------------------------------------------------ | +| 🔨 **Build** | Compiles the project using the provided `Makefile` to ensure successful builds. | +| 🧪 **Test** | Builds the server, runs Python test suite against a live instance, and captures logs on failure. | +| 📚 **Docs** | Generates Doxygen documentation (with Graphviz diagrams) and deploys it to **GitHub Pages**. | + +
+ +Every code change is built, tested, and documented automatically, ensuring a robust development workflow and always-available reference docs. ___ -## Contributing +## Documentation + +This section describes how project documentation is generated, structured, and published. + +
+See Details + +### 1. Doxygen-Powered + +* Documentation is generated automatically from **source code comments** and **Markdown files**. +* `README.md` serves as the **entry point**, offering an overview and links to modules. + +### 2. Graphical Support + +* **Graphviz** integration produces: + + * **Class diagrams** to illustrate object hierarchies. + * **Call graphs** to visualize execution flow. + * **Dependency graphs** to map relationships between modules. +* These visuals improve comprehension of the server’s architecture. + +### 3. Navigation & Browsing + +* The source browser cross-references **functions, classes, and files**. +* Each documented entity links directly to its definition in the codebase. +* Groups (`@defgroup`, `@ingroup`) provide thematic navigation across modules (e.g., `config`, `core`, `http`). -Contribution guidelines and workflow standards are detailed in the dedicated document: +### 4. Deployment -- [📚 View Contributing Guide](CONTRIBUTING.md) +* Documentation is built in **CI/CD pipelines**. +* Published automatically via **GitHub Pages** from the `docs/html` directory. +* Ensures the latest version is always available for contributors and maintainers. -This document explains: +### 5. Best Practices -- The coding style -- The branching strategy (main, dev, feature branches) -- The commit message conventions (module: short description) -- How to structure pull requests properly -- The review and merge process -- Cleanup and quality rules before pushing code +* Consistent **Doxygen-style headers** across `.hpp` and `.cpp` files. +* Markdown files complement code documentation with **high-level design notes** and **workflow explanations**. +* Together, these guarantee both **low-level API reference** and **high-level architectural guidance**. + +
___ @@ -123,28 +258,59 @@ webserv │ └── docs.yml # Doxygen documentation generation & GitHub Pages deploy ├── 📁 include/ # All public project headers, grouped by module (config, http, core, etc.) ├── 📁 src/ # Source files, mirrors the include/ structure -├── 📁 tests/ # Unit tests for various modules -├── 📁 configs/ # Test configuration files for parser/tokenizer -├── 📁 docs/ # Markdown documentation (DOCS.md, guides, etc.) -├── 📁 scripts/ # Helper scripts to run tests and sanitizer builds -├── .asanignore # Suppression rules for AddressSanitizer (e.g. libc++ internals) +├── 📁 test_webserv/ # Unit tests +├── 📁 configs/ # Default config file +├── 📁 docs/ # Documentation generated by doxygen ├── .clang-format # Enforces formatting rules (4-space indent, K&R braces, etc.) ├── .editorconfig # Shared IDE/editor config for consistent style ├── .gitattributes # Defines merge/diff rules for Git (e.g. binary files) ├── .gitignore # Files and folders ignored by Git (e.g. build/, *.o) -├── ACTIONPLAN.md # Project-level planning or roadmap -├── CONTRIBUTING.md # Guidelines for contributing to the project +├── ACTIONPLAN.md # Project-level planning/roadmap ├── DOXYGENSTYLEGUIDE.md # Doxygen conventions for documenting code ├── Doxyfile # Main config for Doxygen documentation generation -├── LICENSE # Project license (e.g. MIT, GPL) -├── Makefile # Build system entry point (defines targets like all, clean, fclean) -├── README.md # Main README shown on GitHub (overview, build, usage, etc.) +├── LICENSE # Project license +├── Makefile # Build system entry point +├── README.md # Main README ├── STYLEGUIDE.md # Coding conventions for naming, layout, formatting +├── run_test.py # Entrypoint for python tests ├── webserv.subject.pdf # Original subject specification for the project ``` ___ +## Build & Test Instructions + +### Build with Makefile + +```bash +make +./bin/webserv +``` + +> The default goal is `all`. The binary is produced at `bin/webserv`. + +### Available Makefile Targets + +| Command | Description | +| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------- | +| `make` | Build the project in release mode (C++20, `-O3 -flto -DNDEBUG -march=native`). | +| `make re` | Clean everything and rebuild from scratch. | +| `make clean` | Remove object files and dependency files in `objs/`. | +| `make fclean` | Remove the executable, `bin/`, and all build artifacts (also runs `clean`). | +| `make install_test_deps` | Create a local Python venv in `.venv/` and install `requirements-test.txt`. | +| `make test` | Build, start the server in background with `./test_webserv/tester/config/tester.conf`, run `run_test.py`, then stop the server. | +| `make format` | Run `clang-format -i` on all listed sources and headers. | +| `make help` | Print a categorized list of available targets. | + +### Notes + +* Objects and auto-generated deps are stored under `objs/` (built via `-MMD -MP`). +* The build uses explicit source lists (no wildcards) for deterministic builds. +* The test rule writes the PID to `.webserv_test.pid` and cleans it up on success/failure. +* Ensure `python3-venv` and `clang-format` are installed on your system. + +___ + ## License This project is licensed under the terms of the [MIT License](LICENSE). From 7aac12d2510399a0a17f384fa87bb1ba1533467f Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Mon, 18 Aug 2025 23:44:47 +0300 Subject: [PATCH 2/8] Update README.md --- README.md | 48 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 5b927451..a3dd6151 100644 --- a/README.md +++ b/README.md @@ -146,21 +146,39 @@ ___ ```mermaid flowchart TD - A[Listening Sockets] -->|poll()| B[Accept New Client] - B --> C[Initialize ClientInfo] - C --> D[Client Socket Ready?] - D -->|POLLIN| E[Receive Data] - E --> F[Parse Requests] - F --> G[Route or CGI] - G -->|CGI? yes| H[Spawn & Monitor CGI] - G -->|CGI? no| I[Build Response] - H --> J[Collect CGI Output] - J --> I - I --> K[Queue Response] - K -->|POLLOUT| L[Send Raw/File Response] - L --> M{Keep-Alive?} - M -->|Yes| D - M -->|No| N[Cleanup + Close FD] + %% High-level: setup → poll loop → per-FD handling + subgraph Boot["Startup"] + A[Load servers] --> B[SocketManager(servers)] + B --> C[setupSockets()] + end + + C --> D{run(): poll()} + D -->|EINTR| Z[Graceful shutdown] + D -->|timeout| D + + %% CGI tick happens each loop before FD events + D --> E[handleCgiPollEvents()] + + %% Iterate FDs + E --> F{for each pfd in _poll_fds (rev)} + F --> G{is CGI pipe fd?} + G -->|yes| F + G -->|no| H{revents has ERR/HUP/NVAL?} + H -->|yes| H1[handlePollError()] --> F + + H -->|no| I{revents has POLLIN?} + I -->|yes| J{listen fd?} + J -->|yes| J1[handleNewConnection()] --> F + J -->|no| J2[handleClientData()] --> J3{queued response?} + J3 -->|yes| K[enable POLLOUT] --> F + J3 -->|no| F + + I -->|no| L{revents has POLLOUT && responses not empty?} + L -->|yes| L1[sendResponse()] --> F + L -->|no| F + + F --> D + Z --> ZZ[Logger: "Shutting down server"] ``` From 397d26e6548ba14a526c56cc1823bd583608b3a5 Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Mon, 18 Aug 2025 23:47:33 +0300 Subject: [PATCH 3/8] Update README.md --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index a3dd6151..12179345 100644 --- a/README.md +++ b/README.md @@ -146,12 +146,13 @@ ___ ```mermaid flowchart TD - %% High-level: setup → poll loop → per-FD handling - subgraph Boot["Startup"] + %% Setup + subgraph Boot[Startup] A[Load servers] --> B[SocketManager(servers)] B --> C[setupSockets()] end + %% Main loop C --> D{run(): poll()} D -->|EINTR| Z[Graceful shutdown] D -->|timeout| D @@ -160,25 +161,25 @@ flowchart TD D --> E[handleCgiPollEvents()] %% Iterate FDs - E --> F{for each pfd in _poll_fds (rev)} + E --> F{for each pfd in _poll_fds (reverse)} F --> G{is CGI pipe fd?} G -->|yes| F G -->|no| H{revents has ERR/HUP/NVAL?} H -->|yes| H1[handlePollError()] --> F H -->|no| I{revents has POLLIN?} - I -->|yes| J{listen fd?} + I -->|yes| J{is listen fd?} J -->|yes| J1[handleNewConnection()] --> F J -->|no| J2[handleClientData()] --> J3{queued response?} J3 -->|yes| K[enable POLLOUT] --> F J3 -->|no| F - I -->|no| L{revents has POLLOUT && responses not empty?} + I -->|no| L{revents has POLLOUT and responses not empty?} L -->|yes| L1[sendResponse()] --> F L -->|no| F F --> D - Z --> ZZ[Logger: "Shutting down server"] + Z --> ZZ[Logger: Shutting down server] ``` From 506946873ea04a799c71cc9dea29b0d5e3755e15 Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Mon, 18 Aug 2025 23:51:20 +0300 Subject: [PATCH 4/8] Update README.md --- README.md | 55 +++++++++++++++++++++++++------------------------------ 1 file changed, 25 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index 12179345..5d96b366 100644 --- a/README.md +++ b/README.md @@ -146,40 +146,35 @@ ___ ```mermaid flowchart TD - %% Setup - subgraph Boot[Startup] - A[Load servers] --> B[SocketManager(servers)] - B --> C[setupSockets()] + subgraph Boot [Startup] + A[Load servers] --> B[Create SocketManager] + B --> C[Setup sockets] end - %% Main loop - C --> D{run(): poll()} - D -->|EINTR| Z[Graceful shutdown] - D -->|timeout| D - - %% CGI tick happens each loop before FD events - D --> E[handleCgiPollEvents()] - - %% Iterate FDs - E --> F{for each pfd in _poll_fds (reverse)} - F --> G{is CGI pipe fd?} - G -->|yes| F - G -->|no| H{revents has ERR/HUP/NVAL?} - H -->|yes| H1[handlePollError()] --> F - - H -->|no| I{revents has POLLIN?} - I -->|yes| J{is listen fd?} - J -->|yes| J1[handleNewConnection()] --> F - J -->|no| J2[handleClientData()] --> J3{queued response?} - J3 -->|yes| K[enable POLLOUT] --> F - J3 -->|no| F - - I -->|no| L{revents has POLLOUT and responses not empty?} - L -->|yes| L1[sendResponse()] --> F - L -->|no| F + C --> D{Run poll loop} + D --> E[Handle CGI events] + + subgraph Loop [Per FD handling] + E --> F{For each pfd} + F --> G{Is CGI pipe fd} + G -- yes --> F + G -- no --> H{ERR or HUP or NVAL} + H -- yes --> H1[handlePollError] --> F + + H -- no --> I{Has POLLIN} + I -- yes --> J{Is listen fd} + J -- yes --> J1[handleNewConnection] --> F + J -- no --> J2[handleClientData] --> J3{Queued response} + J3 -- yes --> K[Enable POLLOUT] --> F + J3 -- no --> F + + I -- no --> L{Has POLLOUT and responses not empty} + L -- yes --> L1[sendResponse] --> F + L -- no --> F + end F --> D - Z --> ZZ[Logger: Shutting down server] + D --> Z[Shutdown on signal] ``` From bd5f591bca8b8c4169cd6647f3764e070e613b5b Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Mon, 18 Aug 2025 23:55:22 +0300 Subject: [PATCH 5/8] Update README.md --- README.md | 88 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 62 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 5d96b366..3cec8621 100644 --- a/README.md +++ b/README.md @@ -146,35 +146,71 @@ ___ ```mermaid flowchart TD - subgraph Boot [Startup] - A[Load servers] --> B[Create SocketManager] - B --> C[Setup sockets] - end - - C --> D{Run poll loop} - D --> E[Handle CGI events] - - subgraph Loop [Per FD handling] - E --> F{For each pfd} - F --> G{Is CGI pipe fd} - G -- yes --> F - G -- no --> H{ERR or HUP or NVAL} - H -- yes --> H1[handlePollError] --> F - - H -- no --> I{Has POLLIN} - I -- yes --> J{Is listen fd} - J -- yes --> J1[handleNewConnection] --> F - J -- no --> J2[handleClientData] --> J3{Queued response} - J3 -- yes --> K[Enable POLLOUT] --> F - J3 -- no --> F - - I -- no --> L{Has POLLOUT and responses not empty} - L -- yes --> L1[sendResponse] --> F - L -- no --> F + %% ========================= + %% Setup + %% ========================= + subgraph Boot[Startup] + A[Load servers from config] + B[Create SocketManager] + C[setupSockets: bind + listen + register FDs] + A --> B --> C end + %% ========================= + %% Main loop + %% ========================= + C --> D{run(): poll(...)} + + %% Signals / EINTR + D -->|EINTR| Z[Graceful shutdown] + Z --> ZZ[Logger: "Shutting down server"] + + %% Loop tick + D --> E[handleCgiPollEvents] + E --> F{for each pfd in _poll_fds (reverse)} + + %% Skip CGI pipe FDs entirely + F --> G{is CGI pipe FD?} + G -- yes --> F + + %% Timeouts (checked per-FD) + G -- no --> T{checkClientTimeouts} + T -- idle/send -> closed --> F + T -- header/body -> 408 queued --> H{revents has ERR/HUP/NVAL?} + T -- none --> H + + %% Socket errors + H -- yes --> H1[handlePollError + close] --> F + + %% Reads + H -- no --> I{revents has POLLIN?} + I -- no --> O{revents has POLLOUT and responses not empty?} + + I -- yes --> J{is listen FD?} + J -- yes --> J1[handleNewConnection] --> F + + J -- no --> Rcv[handleClientData] + Rcv -->|recv ok| Parse{parseAndQueueRequests} + Parse -- incomplete --> F + Parse -- queued --> Proc[processPendingRequests] + Proc -->|shouldSpawnCgi| CGI[handleCgiRequest + mark running] --> F + Proc -->|static/regular| Q[response queued] + Q --> K[enable POLLOUT] --> F + + %% Writes + O -- yes --> S[sendResponse] + S -->|file response| SF[sendFileResponse] + S -->|raw response| SR[sendRawResponse] + + %% Keep-alive / close after send + SF --> EndSend{response done?} + SR --> EndSend + EndSend -- yes & Connection: close --> Close[cleanupClientConnectionClose] --> F + EndSend -- yes & keep-alive --> KA[disable POLLOUT] --> F + EndSend -- not yet --> F + + %% Poll again F --> D - D --> Z[Shutdown on signal] ``` From 2741e718fd54ff9164e86fb261aa760a801428f6 Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Tue, 19 Aug 2025 00:00:10 +0300 Subject: [PATCH 6/8] Update README.md --- README.md | 49 +++++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 28 deletions(-) diff --git a/README.md b/README.md index 3cec8621..661ee266 100644 --- a/README.md +++ b/README.md @@ -152,65 +152,58 @@ flowchart TD subgraph Boot[Startup] A[Load servers from config] B[Create SocketManager] - C[setupSockets: bind + listen + register FDs] + C[setupSockets: bind, listen, register FDs] A --> B --> C end %% ========================= %% Main loop %% ========================= - C --> D{run(): poll(...)} - - %% Signals / EINTR - D -->|EINTR| Z[Graceful shutdown] - Z --> ZZ[Logger: "Shutting down server"] + C --> D{run: poll} + D -->|EINTR| Z[Graceful shutdown] --> ZZ[Log "Shutting down server"] %% Loop tick D --> E[handleCgiPollEvents] - E --> F{for each pfd in _poll_fds (reverse)} + E --> F{for each pfd (reverse)} - %% Skip CGI pipe FDs entirely - F --> G{is CGI pipe FD?} + %% Skip CGI pipe FDs + F --> G{CGI pipe FD?} G -- yes --> F - %% Timeouts (checked per-FD) + %% Timeouts (per FD) G -- no --> T{checkClientTimeouts} - T -- idle/send -> closed --> F - T -- header/body -> 408 queued --> H{revents has ERR/HUP/NVAL?} + T -- idle/send --> Close[cleanupClientConnectionClose] --> F + T -- header/body --> Halt[disable POLLIN; queue 408] --> H{ERR/HUP/NVAL?} T -- none --> H %% Socket errors - H -- yes --> H1[handlePollError + close] --> F + H -- yes --> H1[handlePollError; close] --> F %% Reads - H -- no --> I{revents has POLLIN?} - I -- no --> O{revents has POLLOUT and responses not empty?} + H -- no --> I{POLLIN?} + I -- no --> O{POLLOUT && responses?} - I -- yes --> J{is listen FD?} + I -- yes --> J{listen FD?} J -- yes --> J1[handleNewConnection] --> F J -- no --> Rcv[handleClientData] - Rcv -->|recv ok| Parse{parseAndQueueRequests} - Parse -- incomplete --> F - Parse -- queued --> Proc[processPendingRequests] - Proc -->|shouldSpawnCgi| CGI[handleCgiRequest + mark running] --> F - Proc -->|static/regular| Q[response queued] - Q --> K[enable POLLOUT] --> F + Rcv -- queued --> K[enable POLLOUT] --> F + Rcv -- incomplete --> F %% Writes O -- yes --> S[sendResponse] - S -->|file response| SF[sendFileResponse] - S -->|raw response| SR[sendRawResponse] - - %% Keep-alive / close after send + S -->|file| SF[sendFileResponse] + S -->|raw| SR[sendRawResponse] SF --> EndSend{response done?} SR --> EndSend - EndSend -- yes & Connection: close --> Close[cleanupClientConnectionClose] --> F + + EndSend -- yes & close --> Close EndSend -- yes & keep-alive --> KA[disable POLLOUT] --> F EndSend -- not yet --> F - %% Poll again + %% Next tick F --> D + ``` From a605437120e0bfbd2df4005a82e1bcd1d050c4b0 Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Tue, 19 Aug 2025 00:02:06 +0300 Subject: [PATCH 7/8] Update README.md --- README.md | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 661ee266..b8619179 100644 --- a/README.md +++ b/README.md @@ -152,38 +152,38 @@ flowchart TD subgraph Boot[Startup] A[Load servers from config] B[Create SocketManager] - C[setupSockets: bind, listen, register FDs] + C[setupSockets: bind listen register FDs] A --> B --> C end %% ========================= %% Main loop %% ========================= - C --> D{run: poll} - D -->|EINTR| Z[Graceful shutdown] --> ZZ[Log "Shutting down server"] + C --> D{run poll} + D -->|EINTR| Z[Graceful shutdown] --> ZZ[Log shutting down server] %% Loop tick D --> E[handleCgiPollEvents] - E --> F{for each pfd (reverse)} + E --> F{for each pfd reverse} %% Skip CGI pipe FDs - F --> G{CGI pipe FD?} + F --> G{CGI pipe FD} G -- yes --> F %% Timeouts (per FD) G -- no --> T{checkClientTimeouts} - T -- idle/send --> Close[cleanupClientConnectionClose] --> F - T -- header/body --> Halt[disable POLLIN; queue 408] --> H{ERR/HUP/NVAL?} + T -- idle or send --> Close[cleanupClientConnectionClose] --> F + T -- header or body --> Halt[disable POLLIN then queue 408] --> H{ERR HUP NVAL} T -- none --> H %% Socket errors - H -- yes --> H1[handlePollError; close] --> F + H -- yes --> H1[handlePollError then close] --> F %% Reads - H -- no --> I{POLLIN?} - I -- no --> O{POLLOUT && responses?} + H -- no --> I{POLLIN} + I -- no --> O{POLLOUT and responses queued} - I -- yes --> J{listen FD?} + I -- yes --> J{listen FD} J -- yes --> J1[handleNewConnection] --> F J -- no --> Rcv[handleClientData] @@ -194,16 +194,15 @@ flowchart TD O -- yes --> S[sendResponse] S -->|file| SF[sendFileResponse] S -->|raw| SR[sendRawResponse] - SF --> EndSend{response done?} + SF --> EndSend{response done} SR --> EndSend - EndSend -- yes & close --> Close - EndSend -- yes & keep-alive --> KA[disable POLLOUT] --> F + EndSend -- yes and close --> Close + EndSend -- yes and keep alive --> KA[disable POLLOUT] --> F EndSend -- not yet --> F %% Next tick F --> D - ``` From c2aa2449a35c14e0bdebfbc1a0d98af04c2cc3ae Mon Sep 17 00:00:00 2001 From: To0nsa <162323294+To0nsa@users.noreply.github.com> Date: Tue, 19 Aug 2025 11:59:39 +0300 Subject: [PATCH 8/8] Update README.md --- README.md | 300 +++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 228 insertions(+), 72 deletions(-) diff --git a/README.md b/README.md index b8619179..80f62fa1 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,20 @@ > A lightweight HTTP/1.1 server written in modern C++20, compliant with the Hive/42 webserv project specifications. +**Webserv** is our first **large-scale C++ project** at Hive/42. +The goal was to implement a lightweight, fully working HTTP/1.1 server from scratch, trying to use modern C++ standard libraries. + +The server is designed to be RFC-compliant (following HTTP/1.1 [RFC 7230–7235]) and supports essential features such as: + +* Parsing and validating configuration files (Nginx-style syntax). +* Handling GET, POST, DELETE with static files, autoindex, and uploads. +* Executing CGI scripts securely with proper environment and timeouts. +* Multiplexing sockets and CGI pipes in a single poll-based event loop. +* Graceful error handling, timeouts, and connection reuse (keep-alive). + +For reference and correctness, Nginx was used as a behavioral benchmark: routing, error responses, and edge cases were compared against it to ensure realistic and compliant behavior. + +This project was both a challenge in systems programming and a solid introduction to networking, concurrency, and protocol design in modern C++. ___ ## Core of Webserv @@ -120,15 +134,13 @@ This section describes how the configuration parsing logic of **Webserv** works, ___ -## Networking Core — `SocketManager` +## Networking `SocketManager` The heart of Webserv’s I/O: a single `poll()` loop multiplexing **listening sockets**, **client sockets**, and **CGI pipes**, with strict timeouts and robust error recovery.
See Details -### What it Does - * **Listening sockets**: set up `bind()`/`listen()` for each configured host\:port. * **Event loop**: run non-blocking `poll()` to monitor all descriptors. * **Connections**: @@ -140,84 +152,228 @@ The heart of Webserv’s I/O: a single `poll()` loop multiplexing **listening so * **Timeouts**: enforce idle, header, body, and send deadlines. * **Errors**: generate accurate HTTP error responses, close cleanly. +
+ ___ -### High-Level Flow - -```mermaid -flowchart TD - %% ========================= - %% Setup - %% ========================= - subgraph Boot[Startup] - A[Load servers from config] - B[Create SocketManager] - C[setupSockets: bind listen register FDs] - A --> B --> C - end - - %% ========================= - %% Main loop - %% ========================= - C --> D{run poll} - D -->|EINTR| Z[Graceful shutdown] --> ZZ[Log shutting down server] - - %% Loop tick - D --> E[handleCgiPollEvents] - E --> F{for each pfd reverse} - - %% Skip CGI pipe FDs - F --> G{CGI pipe FD} - G -- yes --> F - - %% Timeouts (per FD) - G -- no --> T{checkClientTimeouts} - T -- idle or send --> Close[cleanupClientConnectionClose] --> F - T -- header or body --> Halt[disable POLLIN then queue 408] --> H{ERR HUP NVAL} - T -- none --> H - - %% Socket errors - H -- yes --> H1[handlePollError then close] --> F - - %% Reads - H -- no --> I{POLLIN} - I -- no --> O{POLLOUT and responses queued} - - I -- yes --> J{listen FD} - J -- yes --> J1[handleNewConnection] --> F - - J -- no --> Rcv[handleClientData] - Rcv -- queued --> K[enable POLLOUT] --> F - Rcv -- incomplete --> F - - %% Writes - O -- yes --> S[sendResponse] - S -->|file| SF[sendFileResponse] - S -->|raw| SR[sendRawResponse] - SF --> EndSend{response done} - SR --> EndSend - - EndSend -- yes and close --> Close - EndSend -- yes and keep alive --> KA[disable POLLOUT] --> F - EndSend -- not yet --> F - - %% Next tick - F --> D -``` +## HTTP Handling + +This section explains how **Webserv** processes HTTP/1.1 requests end‑to‑end, from bytes on a socket to fully formed responses, and how the server enforces protocol rules, timeouts, and connection reuse. + +
+See Details + +### Request Lifecycle (High‑Level) + +1. **Accept & Read** + `SocketManager` accepts client connections on non‑blocking sockets and collects incoming bytes. Per‑connection state tracks **read deadlines** (header/body) and **keep‑alive**. + +2. **Parse** + `HttpRequestParser` incrementally parses: + + * **Start line**: method, request‑target (absolute‑path + optional query), HTTP version (HTTP/1.1). + * **Headers**: canonicalizes keys; enforces size limits and folding rules; detects `Connection`, `Host`, `Content-Length`, `Transfer-Encoding`, etc. + * **Body**: supports `Content-Length` and **chunked** transfer decoding. Body size is capped by `client_max_body_size`. + +3. **Route** + `requestRouter` selects a `Server` (host+SNI/server\_name) and the most specific `Location` (longest URI prefix match). It normalizes the filesystem target path and determines whether the request hits **static** content, **autoindex**, **redirect**, **upload**, or **CGI**. + +4. **Dispatch** + Based on method and location rules, it calls `handleGet`, `handlePost`, or `handleDelete`. Unsupported or disallowed → **405** with `Allow` header. + +5. **Build Response** + `responseBuilder` produces status line, headers, and body. It + + * Sets `Content-Type` (MIME by extension), `Content-Length` or `Transfer-Encoding: chunked`, `Connection` (keep‑alive vs close), and error pages. + * Streams file bodies (sendfile/read+write) with backpressure; can fall back to buffered I/O for CGI and dynamic content. + +6. **Send & Reuse** + `SocketManager` writes the response, respecting **write timeouts** and TCP backpressure. If `Connection: keep-alive` and protocol rules allow, the connection stays open for subsequent pipelined requests. + +### Static Files & Autoindex + +* **Static files**: Path is resolved from `root` + URI, protecting against traversal. If an **index** is configured and exists for a directory, it is served. +* **Autoindex**: When enabled and no index present, `generateAutoindex` renders a minimal HTML directory listing. +* **ETag/Last‑Modified** *(optional)*: If enabled, responses include validators; otherwise strong caching is avoided. Range requests are not served unless explicitly implemented. + +### Errors & Edge Cases + +* **400** malformed request, **413** body too large, **414** URI too long, **404/403** missing or forbidden paths, **405** method not allowed. +* **408/504** on header/body/send timeouts. **431** for oversized header sections. +* **5xx** on internal faults, filesystem errors, or CGI failures (see below).
___ -## Flow Overview +## Request & CGI Handling + +This section details how POST uploads, multipart forms, and CGI programs are handled, including sandboxing and timeout policy. + +
+See Details + +### POST Uploads & Multipart + +* **Content dispatch**: `handlePost` inspects `Content-Type` and forwards to specialized handlers. +* **application/x-www-form-urlencoded**: Parsed into key/value pairs. Small payloads are buffered; oversized inputs fail fast with **413**. +* **multipart/form-data**: `handleMultipartForm` parses parts lazily to disk, honoring per‑file and aggregate size limits. Saved files go to the `upload_store` defined on the matched `Location`. +* **application/octet-stream / arbitrary media**: Stored as a single file in `upload_store` with a server‑generated filename when no name is provided. +* **Overwrite policy**: Configurable (e.g., reject on conflict or rename). Errors yield **409** (conflict) or **500** depending on the cause. + +### CGI Execution Model + +* **When CGI triggers**: A request is routed to CGI when the target path matches a configured `cgi_extension` (e.g., `.py`, `.php`) and an interpreter is set, or when the `Location` forces CGI. + +* **Environment**: `handleCgi` constructs a POSIX environment per CGI/1.1: + + * `REQUEST_METHOD`, `QUERY_STRING`, `CONTENT_LENGTH`, `CONTENT_TYPE`, `SCRIPT_FILENAME`, `PATH_INFO`, `SERVER_PROTOCOL`, `SERVER_NAME`, `SERVER_PORT`, `REMOTE_ADDR`, and `HTTP_*` for forwarded headers. + * Working directory is the script directory; stdin is the request body (streamed or buffered based on size). + +* **Process lifecycle**: + + 1. Create pipes for **stdin**/**stdout**, fork, exec interpreter + script. + 2. Parent polls child pipes non‑blocking with **CPU/IO activity watchdogs**. + 3. Enforces **hard timeouts** (startup, read, total runtime). On violation → terminate child. + +* **Output parsing**: CGI writes `Status: 200 OK\r\n`, arbitrary headers, blank line, then body. The server: + + * Parses CGI headers (maps/filters hop‑by‑hop), merges with server headers. + * If `Location` header without body → treat as **redirect** per CGI spec. + * Otherwise body is streamed back to the client. + +* **Failure mapping**: + + * Exec/spawn error → **502 Bad Gateway**. + * Timeout or premature exit → **504 Gateway Timeout**. + * Malformed CGI headers → **502**. + * Script wrote nothing (unexpected EOF) → **502**. + +* **Security & Limits**: + + * Drop privileges/chroot *(if configured)*; never inherit ambient FDs; sanitize environment. + * Enforce **max body size**, **max headers**, **max response size** (protects RAM), and per‑request **open‑file caps**. + +### GET/DELETE Semantics -1. **Tokenizer** → breaks input into tokens. -2. **ConfigParser** → builds in‑memory `Config` with `Server` & `Location` objects. -3. **normalizeConfig** → fills missing defaults (sizes, error pages, roots, index, methods). -4. **validateConfig** → applies semantic checks. -5. **Runtime** → validated configuration is passed to the server for request routing. +* **GET**: Serves static files, autoindex pages, or dispatches to CGI. Conditional GETs (If‑Modified‑Since/If‑None‑Match) may be supported depending on build settings. +* **DELETE**: Removes targeted file from the resolved root when allowed in `methods`. On success → **204 No Content**; on missing/forbidden → **404/403**. -The configuration pipeline guarantees that only syntactically valid, normalized, and semantically correct configurations are accepted. This ensures the server runs with predictable defaults, strong validation, and developer-friendly diagnostics. +### Response Builder (Recap) + +* Centralizes status line + headers, error page selection, and body streaming. Ensures `Content-Length` vs `chunked` consistency and keeps **connection semantics** correct across errors and CGI boundaries. + +
+ +___ + +## Flow Overview - End‑to‑End Runtime + +This is the complete lifecycle from configuration to bytes on the wire, aligned with the current codebase. + +
+See Details + +1. **Startup & Configuration** + +* **Tokenizer → ConfigParser → normalizeConfig → validateConfig** + + * Tokenize config, build `Server`/`Location` graphs, apply defaults (client body size, methods, roots, index, error pages), and enforce semantic rules (paths, redirects, methods, CGI mapping). +* **Bootstrap** + + * Instantiate `Server` objects, bind/listen on configured host\:port pairs, pre‑compute route tables and error pages. + +2. **Event Loop (SocketManager)** + +* Single non‑blocking `poll()` loop over listening sockets, client sockets, and CGI pipes. +* Per‑connection state tracks read/write buffers, deadlines (header/body/send), and keep‑alive. +* **Accept** new connections ➜ initialize state. + +3. **Read → Parse (HttpRequestParser)** + +* Accumulate bytes until `"\r\n\r\n"` (header terminator) is found. +* **Start line**: validate method token, request‑target, version. +* **Headers**: normalize keys, reject duplicates where disallowed, check `Content‑Length`/`Transfer‑Encoding` (conflict, format), enforce `Host` on HTTP/1.1, cap header section size. +* **URL/Host routing hint**: derive effective `Url` and matched server affinity; store `Host`, `Query`, `Content‑Length`. +* **Body**: + + * If `Transfer‑Encoding: chunked` ➜ incremental chunk decoding; forbid trailers; enforce `client_max_body_size`. + * Else if `Content‑Length` ➜ wait until full body; enforce size cap; detect pipelined next request beyond the declared length. + * GET/DELETE: treat any extra bytes as pipeline, not body. + +4. **Routing (requestRouter)** + +* **Directory‑slash redirect** when target resolves to a directory but URI lacks trailing `/`. +* **Location selection**: exact match, else longest prefix. +* **Configured redirect (`return` 301/302/307/308)** short‑circuit. +* **Method gate**: + + * 501 if method not implemented (only GET/POST/DELETE supported). + * 405 if not allowed by `Location`’s `methods`. + +5. **Dispatch (methodsHandler)** + +* **GET** + + * Resolve physical path under `root` (no traversal, no symlinks). + * If directory: + + * If index exists ➜ serve file. + * Else if `autoindex on` ➜ generate HTML listing. + * Else ➜ 403. + * If regular file ➜ serve with MIME type detection. Small files buffered, large files streamed. +* **POST** + + * Preconditions: non‑empty body, size ≤ `client_max_body_size`, `upload_store` configured. + * Determine safe target path under `upload_store` (percent‑decode, canonicalize, reject symlinks, mkdir ‑p). + * Content‑type switch: + + * `multipart/form-data` ➜ stream first file part to disk (boundary parsing, per‑part size cap). + * `application/x‑www‑form‑urlencoded` ➜ parse kv pairs; persist rendered HTML summary. + * Other types ➜ raw body saved as a file. + * 201 on success with minimal HTML confirmation. +* **DELETE** + + * Resolve path; reject directories/symlinks; remove regular file; reply 200 with HTML confirmation. + +6. **CGI (handleCgi) - when location/extension triggers** + +* **Spawn** + + * Write request body to temp file; create output temp file. + * Build `execve` argv (interpreter + script) and CGI/1.1 env (`REQUEST_METHOD`, `QUERY_STRING`, `SCRIPT_FILENAME`, `PATH_INFO`, `SERVER_*`, `HTTP_*`, etc.). + * `fork()` child ➜ `dup2(stdin/out)` to temp fds ➜ `chdir(script dir)` ➜ `execve()`. +* **Supervision** + + * Parent polls pipes/Fds with timeouts; on inactivity/overrun ➜ kill and 504/502. +* **Finalize** + + * Parse output file head for CGI headers (`Status:`, `Content‑Type:`) until `CRLF CRLF`. + * Compute body offset and size, then return a **file‑backed** response pointing at CGI output (no copy), with correct status and content type. + * Ensure temp files are unlinked/cleaned after send. + +7. **Response Building (responseBuilder/HttpResponse)** + +* Build status line + headers; choose reason phrase; select custom error page if configured. +* Set `Content‑Type`, `Content‑Length` (or stream file length) and connection semantics. +* **Keep‑alive policy** + + * HTTP/1.1: keep‑alive by default unless `Connection: close` **or** fatal status (e.g., 400/408/413/500) forces close. + * HTTP/1.0: close by default unless `Connection: keep‑alive`. +* For redirects: set `Location`; body often omitted/minimal. + +8. **Write → Reuse/Close** + +* Non‑blocking writes honor backpressure and send timeouts. +* If `keep‑alive` and no close‑forcing status ➜ retain connection for next pipelined request (parser resumes at leftover bytes). +* Else ➜ close socket and release all per‑connection resources. + +9. **Error Mapping & Hardening** + +* Parser/Router/FS/CGI errors mapped to precise HTTP codes (400/403/404/405/408/411/413/414/415/431/500/501/502/504/505). +* Safeguards: normalized paths, no `..`, symlink denial, header/body caps, per‑request timeouts, upload store confinement, and strict header validation. + +
___