About the job
What Youβll Do
- Support existing/new services running on AWS. (Primarily ECS, lambda, SNS, SQS, MSK, Elasticache, RDS, Sagemaker, ACM, Cloudfront and EC2).
- Maintain and work on infrastructure configurations written on Terraform (IaC).
- Help in supporting the platform reliability and setup/maintain automations to ensure platform reliability.
- Help in maintaining and creating infrastructure documentations.
- Help in establishing monitoring and alerting setups to have a holistic view of services and infrastructure.
- Go out of the way to support business requirements related to Reliability, automations and DevOps.
- Build/Maintain existing/new CI/CD processes for any underlying tech stack.
- Manage ACLs for different services which includes but not limited to AWS Cloud, Databases, ELK etc.
- Participate in frequent infrastructure auditing to track vulnerabilities and cloud spending. Preparing cloud budgets and tracking regularly to avoid drifts.
- Manage incidents and prepare reports.
- Manage cloudflare as DNS server and monitor for suspicious events. Manage WAF and page rules.
- Assisting developers to increase productivity.
- Participate in compliance and regulatory audits.
- Maintain and support on-prem services like VPN server, Clickhouse server, MongoDB, HC Vault, Jumphost servers etc.
Requirements
- Expert in AWS and its ecosystem.
- Must have experience in Terraform and in maintaining production environments.
- Must have hands on nodejs/python/golang/bash scripting for supporting existing/new automations.
- Bias for action and learning.
- Actively available for infrastructure/service support.
- At least 3 years of experience in DevOps/SRE roles.
- Knowledge of network and cloud infrastructure security practices.
- Extremely strong ownership.
Nice to Have:
- Experience with ELK stack.
- Experience in software development.
- Certified Solutions Architect or DevOps.
- Hands on AI/ML Deployments.
- Multi cloud experience.
- Experience in managing and administrating MySQL, Postgres, MongoDB, Clickhouse and Kafka.
Tech
- AWS (IAM, SSO, EC2, ASG, ECS, Elasticache, MSK, DynamoDB, Lambda, VPC, Cloudwatch, EventBridge, Codepipeline, ACM, Sagemaker, Cloudfront, Codebuild).
- Github (With Actions)
- Cloudflare (DNS, WAF, SSL, Security).
- Supporting infrastructure for Golang, Python, Nodejs, Typescript, React etc.
- Terraform for IaC.
- Linear/Slack/GSuite/Coda/Elasticsearch/Kibana