Job Title :: Senior AWS Cloud/Infrastructure Engineer/Architect
Location :: San Francisco, CA (Hybrid 3-4 days)
Job Type :: Full time
The Senior AWS Cloud/Infrastructure Engineer/Architect will own the design, implementation, and operation of large‑scale, secure, and highly available platforms on AWS, with a strong focus on event‑driven architectures, streaming (MSK/MSF), caching data stores, data lake/table formats such as Iceberg, observability with OpenTelemetry, and container orchestration on EKS.
This role is hands‑on and customer‑facing, partnering with architecture, platform, data, and product teams to build scalable foundations and enable high‑velocity delivery.
Key responsibilities
· Design and implement cloud‑native architectures on AWS using services such as VPC, EC2, EKS, S3, RDS/Aurora, IAM, CloudWatch, and KMS, following Well‑Architected and security best practices.
· Lead the design and operation of event‑driven systems using Amazon MSK (Managed Streaming for Apache Kafka) and/or managed streaming frameworks (e.g., Kinesis/Kafka‑based MSF), including topic design, partitioning, consumer groups, schema evolution, and back‑pressure handling.
· Architect and manage caching layers and in‑memory data stores (e.g., Amazon ElastiCache for Redis/Memcached or similar) to improve performance, reduce latency, and offload downstream databases.
· Implement and support data lakehouse patterns using Apache Iceberg or similar table formats on object storage (e.g., S3), including table design, partitioning, schema evolution, and performance optimization for analytical and near‑real‑time workloads.
· Design, provision, and operate Kubernetes clusters on Amazon EKS, including node groups, autoscaling, networking, ingress, service mesh (where applicable), secrets management, and multi‑environment separation.
· Implement full‑stack observability using OpenTelemetry (traces, metrics, logs), integrating with centralized telemetry backends, defining SLOs/SLIs, and enabling deep visibility into distributed, event‑driven workloads.
· Build and maintain Infrastructure‑as‑Code (IaC) using tools such as Terraform and/or AWS CloudFormation, enforcing reusable modules, environment parity, and Git‑based workflows.
· Establish and enhance CI/CD pipelines for infrastructure and application deployments on AWS/EKS/MSK, including automated testing, security scans, canary/blue‑green releases, and rollback strategies.
· Ensure platform security, compliance, and governance, including IAM roles and policies, network segmentation, encryption in transit/at rest, secrets management, and audit logging.
· Monitor and optimize cost, performance, and resilience of AWS environments; drive capacity planning, rightsizing, and architectural improvements for high availability and disaster recovery.
· Troubleshoot complex production incidents across EKS, MSK, event pipelines, caching tiers, and data platforms, driving root cause analysis and long‑term remediation.
· Mentor engineers, champion engineering best practices, and collaborate with architects and product teams to align platform roadmaps with business goals.
Required skills and experience
· 10+ years of hands‑on experience in cloud engineering, infrastructure engineering, or platform/SRE roles, with at least 5+ years focused primarily on AWS.
· Strong expertise with core AWS services: VPC, IAM, EC2, EKS/ECS, S3, RDS/Aurora, CloudWatch/CloudTrail, KMS, and networking (subnets, routing, security groups, NACLs, load balancers).
· Proven production experience with Amazon MSK or equivalent Kafka‑based managed streaming platforms (MSF), including cluster operations, capacity planning, security, and observability.
· Practical experience with event‑driven and streaming architectures (e.g., Kafka/Kinesis + consumers, stream processing, CQRS, pub/sub patterns) in mission‑critical systems.
· Hands‑on experience with caching data stores and distributed caches (e.g., Redis, Memcached, ElastiCache), including eviction strategies, key design, and cache‑aside/write‑through patterns.
· Experience implementing or operating data lake or lakehouse solutions on S3 or similar, using Apache Iceberg or comparable table formats (e.g., Delta Lake, Hudi), and integrating with analytics/processing engines.
· Strong Kubernetes and EKS background, including cluster lifecycle management, Helm or similar packaging, autoscaling, network policies, and container security baselines.
· Deep understanding of observability, distributed tracing, and telemetry; hands‑on with OpenTelemetry SDKs/collectors and integration into logging/metrics/tracing backends.
· Proficiency with IaC tools such as Terraform and/or CloudFormation, plus strong Git and DevOps practices around code review, branching, and automated testing.
· Solid scripting or programming skills (e.g., Python, Bash, Go, or similar) for automation, tooling, and glue code around AWS, MSK, EKS, and observability stacks.
· Strong knowledge of security, networking, and compliance in cloud environments, including least‑privilege IAM, network isolation, certificate management, and secrets rotation.
· Excellent communication and stakeholder management skills, with experience collaborating in cross‑functional teams and mentoring engineers at mid‑level and below.
Nice‑to‑have qualifications
· Experience with service meshes (e.g., Istio, Linkerd) on EKS for traffic management, mTLS, and advanced observability.
· Exposure to big‑data/analytics ecosystems around Iceberg or similar (e.g., Spark, Flink, Trino, Athena, Glue, EMR) and streaming ETL pipelines.
· Hands‑on experience with additional managed streaming services (e.g., Amazon Kinesis, Azure Event Hubs, GCP Pub/Sub) in multi‑cloud or hybrid environments.
· AWS certifications such as AWS Certified Solutions Architect – Professional, DevOps Engineer – Professional, or specialty certifications in Security or Advanced Networking.
· Prior experience in SRE, platform engineering, or reliability‑focused roles with strong emphasis on SLOs, error budgets, and incident management.
Sponsored
Explore Engineering
Skills in this job
People also search for
Similar Jobs
Cloud Architect - Infrastructure (AWS)
Keyloop
Engineering
AWS Cloud Infrastructure Engineer
Axiom Software Solutions Limited
Engineering
Infrastructure Architect, Cloud
Dev Technology
Engineering
Cloud Infrastructure Architect
RAKBANK
Engineering
SC Cleared AWS Cloud Architect
Axiom Software Solutions Limited
Engineering
More jobs at Qode
Sponsored
Apply for this position
Sign In to ApplyAbout Qode
Qode is dedicated to helping technical talent around the world find meaningful careers that match their skills and interests. Our platform provides a range of resources and tools that empower job seekers to take control of their careers and connect wit...
Category:
Engineering
Similar Jobs
Cloud Architect - Infrastructure (AWS)
Keyloop
Engineering
AWS Cloud Infrastructure Engineer
Axiom Software Solutions Limited
Engineering
Infrastructure Architect, Cloud
Dev Technology
Engineering
Cloud Infrastructure Architect
RAKBANK
Engineering
SC Cleared AWS Cloud Architect
Axiom Software Solutions Limited
Engineering