Building Resilient Architecture Patterns on AWS Cloud: Strategies and Use Case: By Sonali Patil

Introduction

This blog explores the architecture patterns for building resilient architecture on AWS Cloud. In the banking & insurance domain, challenges have been observed during design phase of application migration where applications needed either active-passive DR
setup, active-active setup, phase wise migration for active-active setup, active-standby solutions in single DC and so on. Expectations from business varies around building application availability, scalability and fault tolerance depending on various use
cases. And that is where building resilient architecture patterns plays a vital role during design phase.

Resilient architecture is the practice to design applications which can be able to operate without impacting end users, automatically/manually failover from failures, building recovery solutions in advance if system fails to perform, detecting faults and
building distributed systems, scale in/out when needed etc. AWS cloud has broad set of services which supports both infrastructure and managed services to build resilient architecture on cloud.

In this blog, we’ll explore key resilient architecture patterns, how they are implemented on AWS, and a
real-life use case demonstrating these concepts in action.

Patterns

Let’s look at the effective patterns you can adopt for resilient design on AWS.

1. Application using single AZ deployment

If your application requirement is single AZ deployment which will also ensure availability in case of failure within hours of RTO/RPO then you can use

AWS Services:

EC2 instance as standby upon instance failure
AMI for quick deployment
Snapshots for EBS backup
Amazon S3 with lifecycle policies
EC2 DB on standby/Amazon RDS backup data, cluster configuration
Route 53 with failover routing along with ALB load balancing

Benefit: If instances fail, standby can become active and load balancing can redirect traffic automatically to a healthy environment or automating start of standby instance in the absence of LB will ensure environment availability within
RTO/RPO window.

2. Application using Multi-AZ deployment & Multi region deployment

If your application requirement is deploying active-active setup within 2 DC in single region with RTO/RPO of 15 mins or multi-region, active-active setup with RTO/RPO nearly zero, then you can use

AWS Services:

Amazon RDS Multi-AZ for failover while there is in build cross region data replication feature
Amazon S3 is a global service and will be available on single AZ failure while region failure is supported using (S3 CRR) cross region replication
Route 53 with latency-based or failover routing
ELB for load balancing and routing request to another AZ
Auto scaling for automatically scale in/scale out instances
ECS/EKS with Auto Scaling groups replaces failed instances and maintains performance and availability
Backup, Snapshots, AMI for data/instance recovery
Amazon SQS (message queues) for distributed architecture
Amazon SNS (pub/sub) for notifications and alerts
Amazon EventBridge for notification and building services to recover
AWS Lambda with retry strategies
Amazon API Gateway with throttling and routing API requests without being overwhelmed on peak traffic
Elasticache (Redis/Memcached) for cached data when real-time data service is down
AWS Code Deploy & API Gateway Blue/Green for deploying new versions alongside existing and switch/test code
Amazon CloudWatch (metrics, logs, alarms) for monitoring systems, detecting faults and automating recovery with minimal downtime

Benefit: If one AZ or region fails, traffic can be redirected automatically to a healthy environment.

Use-Case: Real life example for one of the money transfer application

A money transfer company with global customers wants to ensure its platform is highly available,
scalable, and resilient with multi region deployment

Architectural Components & Patterns Used:

Component	Pattern	AWS Service	Resilience Role
Web Layer	Auto scaling, Multi-AZ	EC2 + ALB + Auto scaling	Handles traffic surges and AZ failures
API Layer	Circuit Breaker + Graceful Degradation	API Gateway + Lambda + EventBridge + RDS	Reduces pressure on downstream services, distributed architecture
Batch Processing	Queue-based decoupling	S3 + Amazon SQS + Lambda + RDS	Ensure files are not lost even if downstream fails
Database	Multi-AZ + Multi Region +CRR data	Amazon RDS PostgreSQL	Provides automated failover, cross region data replication
Traffic routing	Automated failover	Route 53 + ALB	Failover Policies
Monitoring	Observability + Auto Recovery	CloudWatch+ SNS + Lambda + Systems Manager	Detects and recovery from anomalies
Application Migration	Phase wise migration	Route53	Percentage based routing
Change Requests	Code Deployment +Testing	API Gateway + EC2 + autoscaling in another subnet	1% traffic routing for testing new deployment

Component

Pattern

AWS Service

Resilience Role

Web Layer

Auto scaling, Multi-AZ

EC2 + ALB + Auto scaling

Handles traffic surges and AZ failures

API Layer

Circuit Breaker + Graceful Degradation

API Gateway + Lambda + EventBridge + RDS

Reduces pressure on downstream services, distributed architecture

Batch Processing

Queue-based decoupling

S3 + Amazon SQS + Lambda + RDS

Ensure files are not lost even if downstream fails

Database

Multi-AZ + Multi Region +CRR data

Amazon RDS PostgreSQL

Provides automated failover, cross region data replication

Traffic routing

Automated failover

Route 53 + ALB

Failover Policies

Monitoring

Observability + Auto Recovery

CloudWatch+ SNS + Lambda + Systems Manager

Detects and recovery from anomalies

Application Migration

Phase wise migration

Route53

Percentage based routing

Change Requests

Code Deployment +Testing

API Gateway + EC2 + autoscaling in another subnet

1% traffic routing for testing new deployment

Outcomes:

During a peak event, EC2 instances are scaled from 5 to 7 within minutes using Auto Scaling.
If one AZ went offline – ALB has automatically rerouted traffic to healthy AZs.
API gateway directed 1% traffic to production instances in another subnet for testing new changes without disturbing 99% traffic routing to current deployment.
AWS Data replication in-build feature supported data availability
Code deployment had been automated using AWS Cloud formation, AWS catalog and AWS CICD pipeline tools
Distributed architecture for batch processing aided system availability

Conclusion

Resilience architecture is achieved using best practices and design the architecture using broad sets of AWS services. Adopting resilient architecture patterns helps ensure your applications stay
available, responsive, and scalable.

AWS Well-Architected Framework – Reliability Pillar

Source link

Hybrid AI in Action: Shaping the Next Frontiers of Fraud Prevention and AML Compliance: By Roy Prayikulam

How can financial services firms modernise cores without full replacement?

Getting Business Value from AI: Speak to stakeholders: By Alastair Gill

Here’s the latest 12-month Nvidia stock price growth forecast

If an investor put £10k in Rolls-Royce shares 1 week ago here’s what they’d have now

Why September 21 Could Change Everything

Your Retention Crisis Won’t End Until You Make This Shift

Machine Economy Free Zone Created by peaq and Pulsar Group to Accelerate UAE Innovation

Most Popular

Ares Joins the Borderless.xyz Network, Expanding Stablecoin Coverage Across South and Central America

ApeWifHat (APEWIFHAT) Price Prediction

Ethereum Price Eyeing A Breakout? On-Chain Analysis Places Short-Term Target At $4,800

Our Picks