More Services!

Why services are needed

One of two main reasons:

You have multiple independent teams so the idea is to separate services to make independent progress and less alignment / communication / blockers.
One of the components in the application has very different / higher scaling requirements than others, which makes scaling the entire application difficult. Hence we move small parts to a separate service.

Remember

When services are designed for scale, please note:

Data is the most critical part of services
Compute is cheaper than database
Scaling is constrained by database not code

Mandatory

If you are creating a new service which is not for scaling teams or scale or requires a separate data repository and ownership, stop.

First Principles:

Database is sacred (Actually data is).
Never consider fixing data as the first solution; its last option.
Always keep your state management system stable.
Reject a request as soon as possible.
Any request SMS (State management system) accepts goes to the database.
Always instrument your code

What is Scalable

Mix of:

Being able to handle spikes of incoming requests and respond within defined SLA / SLOs.
Ability to serve more customers / requests / transactions by adding more hardware. (Horizontal scaling)

Problems with Services

Multiple modes of failures
Network issues, timeouts etc
Increased and random latency issues (More reading)

Synchronous calls across services break service isolations and create high coupling.
Within a service
- High Cohesion
Across services
- Low coupling

Async Services to Rescue

What is required :

Service communication will at some point of time fail / slow down (timeouts)
Services also need time to recover despite being HA.

Requirements for Async:

Idempotency
Replayable queues
Embarace DLQ
Circuit breakers (Hysterix / resilience4J )
Eventual consistency
Multiple redundancy (cache / db / prepared views / long term storage / data lakes)

Designing Services

Core services:

Requires Domain Knowledge.
Design services around domains not functions.
Obviously don’t copy everything. 🙂
- Excess never works.
- Be pragmatic
Organize services based on Root aggregates and bounded context.
Every domain state transition triggers a domain event and other services respond to the event.

Design

Some duplication is perfectly okay in distributed systems, it’s ok to keep a copy of immutable data; over making sync network calls.
More often, systems don’t fail because of duplication, they mostly fail because of dependency on external resources.
If your service requires you to call another service to perform its most critical function, congratulations you have designed a distributed monolith.

Types:

There are two types of services that can be built:

Core services
- These are your domains. Ex: Transfer / Beneficiary …
- These services have state and work as state machines.
- Should only be responsible for three things:
  - Validate incoming request (Balance / business rules)
  - Persist request
  - Manage and update state
- All side effects of persistence which can be eventually consistent should be moved out of transaction and processed later using retryable queues

Actors
- Actors work on messages, they can either
  - Process an incoming message (SMS / Email)
  - Or convert message to a different message and send message (Splitting payment request into credit and debit)
- These are stateless actors, they should not manage state
- They can persist messages for idempotency of any internal aggregation but should not have their own state

Actors are purely scalable components as if we need to process more messages we can always achieve it by adding more workers, or actor instances. Since they are stateless it’s easier to scale up or down once the backlog is cleared.

State Machines

To achieve the above design we need two components:

Reliable service bus
Downstream Actors (services)

There is additional challenge here of maintaining the source of truth and source of events in sync; that is:

Event should be and must be emitted once transaction is successful
Event should not be sent if transaction to DB fails

If this is not done, there would be spaghetti design consisting of millions of if checks in code and impossible to manage state transitions.

For services which emit domain events this is usually achieved with go routine / sidekiq or thread with retries and graceful failure handling. Which works for most cases but there is no way to fix it if the service crashes (OOM / segfaults from platform and tons of other issues).

Keep paths to updates as little as possible 🙂

PS:

If you want infinite scale, make state management someone else’s problem.

Published by cubestack on January 26, 2022January 26, 2022

Why services are needed

Remember

Mandatory

First Principles:

What is Scalable

Problems with Services

Async Services to Rescue

Designing Services

Core services:

Design

Types:

State Machines

0 Comments

Leave a Reply Cancel reply

Technology

Using CDC for Micro service Transactions

Technology

Mediocracy and Cultures

Technology

Thoughts on Software

More Services!

Published by cubestack on January 26, 2022January 26, 2022

Why services are needed

Remember

Mandatory

First Principles:

What is Scalable

Problems with Services

Async Services to Rescue

Designing Services

Core services:

Design

Types:

State Machines

0 Comments

Leave a Reply Cancel reply

Related Posts

Technology

Using CDC for Micro service Transactions

Technology

Mediocracy and Cultures

Technology

Thoughts on Software