-
Notifications
You must be signed in to change notification settings - Fork 0
post: Architecture reduction post #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
bb082d9
add: Architecture reduction post
gustahrodrigues 3ce1c9e
Update content/posts/2025-11-06-reducting-microservices/index.md
gustahrodrigues 08c2167
fix: Format issues
gustahrodrigues fdd791e
Update content/posts/2025-11-06-reducting-microservices/index.md
gustahrodrigues 879756d
Update content/posts/2025-11-06-reducting-microservices/index.md
gustahrodrigues 2057737
Update content/posts/2025-11-06-reducting-microservices/index.md
gustahrodrigues b4b85bd
Update content/posts/2025-11-06-reducting-microservices/index.md
gustahrodrigues af0f452
Update content/posts/2025-11-06-reducting-microservices/index.md
gustahrodrigues File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
156 changes: 156 additions & 0 deletions
156
content/posts/2025-11-06-reducting-microservices/index.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
| --- | ||
| title: "Less is More: Reducing Microservices Architecture Complexity" | ||
| author: Gustavo Rodrigues | ||
| twitter: gustahrodrigues | ||
| layout: post | ||
| lang: en | ||
| path: /blog/less-is-more-reducing-microservices-architecture-complexity | ||
| date: 2025-11-06 | ||
| comments: true | ||
| --- | ||
|
|
||
| Like many fast-growing engineering organizations, our microservices architecture evolved organically over the years. | ||
| What started as a deliberate move away from a monolith to enable team autonomy and faster deployments had grown into a sprawling ecosystem of services. | ||
|
|
||
| Several factors prompted us to take action: | ||
|
|
||
| - **Operational burden:** Each service required monitoring, alerting, documentation, and security updates | ||
| - **Cost inefficiency:** We were paying for infrastructure that wasn't delivering proportional value | ||
| - **Developer velocity:** Engineers spent excessive time determining whether existing services could be leveraged to deliver new features | ||
| - **Maintenance overhead:** Small, rarely-used services still required the same care as high-traffic ones | ||
| - **Lack of knowledge:** Many of these services were created years ago by engineers who are no longer with the company, | ||
| leaving the current owners without the necessary context and expertise to effectively manage and maintain them. | ||
|
|
||
| The issue wasn't about having too many services, but rather which ones we could safely consolidate or eliminate. | ||
|
|
||
| ## Methodology: Building the Decommissioning Score | ||
|
|
||
| Rather than relying on intuition or anecdotal evidence, we developed a data-driven scoring system to evaluate each service objectively. | ||
| Our primary goal was to establish an initial filter using a _"decommissioning probability score"_ to help us determine which services to address first. | ||
|
|
||
| ### Metrics Collection | ||
|
|
||
| We collected three categories of metrics for each service over the last year (2024): | ||
|
|
||
| - **Usage metrics** | ||
| - \# of web requests received (API endpoint utilization), excluding health checks and admin endpoints | ||
| - \# of messages processed from our event-driven architecture | ||
|
|
||
| - **Cost Metrics** | ||
| - Cloud cost (database, cache, load balancer, DNS…) | ||
| - K8s cluster cost | ||
| - Log ingestion cost | ||
| - Observability cost | ||
|
|
||
| - **Maintenance Metrics** | ||
| - \# of PRs merged | ||
|
|
||
| There are several other metrics that could be used, like # of deployments, # of incidents, and the percentage of out-of-date dependencies, among others; | ||
| however, we decided to adhere to the original list as it is more suitable for our context. | ||
|
|
||
| ### Scoring Algorithm | ||
|
|
||
| Before applying our scoring formula, we normalized all raw metric values to a `0-1` interval to ensure fair comparison across vastly different scales. | ||
| We used min-max normalization across our entire service portfolio: `normalized_value = (value - min_value) / (max_value - min_value)`. | ||
|
|
||
| However, these metrics had opposite relationships to decommissioning probability. For Total Cost, higher values directly indicated candidates | ||
| for removal - expensive services with low returns were prime targets. For the Usage and Maintenance metrics, the logic was inverted: | ||
| higher values indicated a healthy, actively-used service that should not be decommissioned. Therefore, we applied `1 - normalized_value` | ||
| to these three metrics, ensuring that low activity translated to high decommissioning scores. | ||
| This inversion was critical - a service with minimal traffic and few code changes should score high for removal, while a high-traffic, actively | ||
| maintained service should score low. | ||
|
|
||
| We then applied the following score for each metric: | ||
| - Total Cost: 30% | ||
| - \# PRs merged: 20% | ||
| - \# of web requests received: 30% | ||
| - \# of messages processed: 20% | ||
|
|
||
| We combined all costs into a single metric because our main goal is service usage rather than cost reduction. | ||
|
|
||
| Finally, we applied the following decommissioning score formula for each service: | ||
|
|
||
| ``` | ||
| Decommissioning Score = (0.30 × Total Cost) + (0.20 × # PRs merged) + (0.30 × # of web requests received) + (0.20 * # of messages processed). | ||
| ``` | ||
|
|
||
| We defined a score greater than 80 as indicating a high likelihood of decommissioning the service. | ||
| A score greater than 50 suggests that further investigation is warranted, while scores below that threshold are not considered significant. | ||
|
|
||
| ## Execution: From Analysis to Action | ||
|
|
||
| The scoring system identified 8% of candidate services as highly likely, with 44% warranting further investigation. | ||
|
|
||
| Even after applying the initial score as a filter, a critical analysis was still lacking: **product features in those services**. | ||
| Is the feature that the service is supposed to deliver still in use? | ||
| Is it still relevant for our customers? Do we have any plans to leverage it in the future? | ||
|
|
||
| We engaged in various research activities to collect insights from Product Managers and Stakeholders. | ||
| Additionally, a thorough technical assessment of the service was conducted and properly documented. | ||
| This process eliminated some more services, resulting in 16 out of 45 services identified for decommissioning. | ||
|
|
||
| We implemented the following strategy to decommission the remaining services: | ||
| - For services with valuable functionality, we migrated the logic to the appropriate services or libraries. | ||
| - For deprecated services: | ||
| - First, we added a feature flag on the clients to allow easy activation or deactivation. | ||
| - After a couple of weeks with no usage and no complaints, we removed the client code. | ||
| - We created a snapshot of the service’s database. | ||
| - We shut down all cloud resources associated with the service. | ||
| - Finally, we wrote thorough documentation explaining the reasons for decommissioning the service, focusing on the assumptions made during the process. | ||
|
|
||
| ### Results | ||
|
|
||
| We have decommissioned 12 out of 44 services, with 4 remaining to be decommissioned later. | ||
| This results in a 29% reduction in services for one team and a 37% reduction for another. | ||
|
|
||
| In terms of savings, we estimated the following costs: | ||
| - Microservices Infrastructure Cost: USD 33.6k per year | ||
| - Engineering Maintenance Cost: USD 34.9k per year | ||
|
|
||
| ### Key Learnings | ||
|
|
||
| 1. Periodic Architecture Review is Essential | ||
|
|
||
| The biggest takeaway: architecture reviews should be a regular, scheduled practice - not something we do when complexity becomes painful. | ||
|
|
||
| 2. Context Matters: This Wasn't Over-Engineering | ||
|
|
||
| It's tempting to look back and label the creation of these services as "over-engineering." That would be incorrect and unfair to the engineers who made those decisions. | ||
|
|
||
| When these services were created, they addressed real problems: | ||
| - We were smaller and optimizing for team autonomy over operational efficiency | ||
| - Several services were built for features that had legitimate product hypotheses that simply didn't pay out | ||
| - Our scale and traffic patterns were different | ||
| - Technology and best practices evolved (e.g., service mesh capabilities, observability tools) | ||
|
|
||
| **The lesson:** Good architectural decisions can become wrong architectural decisions as context changes. This isn't failure — it's evolution. | ||
|
|
||
| 3. Optimization is continuous work | ||
|
|
||
| Software architecture isn't "done". It requires ongoing attention and optimization, just like code refactoring. | ||
| Without this project, our complexity would have continued growing linearly while our ability to manage it grew sub-linearly — a recipe | ||
| for future technical debt and reduced competitiveness. | ||
|
|
||
| We learned that: | ||
| - The cost of complexity is often invisible until measured explicitly | ||
| - Small inefficiencies compound across dozens of services | ||
| - Proactive optimization is cheaper than reactive firefighting | ||
| - Regular "pruning" enables healthier future growth | ||
|
|
||
| ### What's Next | ||
|
|
||
| This project was just the first step. We plan to decommission the remaining four services, evolve this work, and make it a regular part of our engineering culture. | ||
|
|
||
| ### Conclusion | ||
|
|
||
| Reducing our microservices complexity was more than a cost-saving exercise — it was a strategic investment in our engineering | ||
| organization's future effectiveness. By approaching the problem systematically with data-driven scoring, careful validation, | ||
| and phased execution, we reduced complexity while maintaining system reliability. | ||
|
|
||
| The most important lesson? Architecture, like code, requires continuous refactoring. The services we decommissioned weren't | ||
| mistakes — they were correct decisions that had outlived their usefulness. Recognizing when to evolve or eliminate architectural | ||
| patterns is just as important as knowing when to introduce them. | ||
|
|
||
| > _Have you gone through a similar architecture consolidation project? What metrics did you find most valuable? I'd love to hear about your experiences in the comments._ | ||
|
|
||
| Like to solve challenges like this one? We have many open positions at the moment. Check out our [engineering culture](https://github.com/loadsmart/culture) and the [careers page](https://loadsmart.com/careers/). | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this will be a public article, I would recommend making this section more generic to avoid exposing anything we don’t intend to share. For example:
It might be better to avoid using specific numbers or internal metrics, and instead keep the statements high-level. This helps maintain confidentiality while still communicating the overall impact.