Experienced Site Reliability Engineer and Cloud Architect with a demonstrated history of working in a fast paced and highly demanding role, implementing and developing DevOps methods and principles with an Agile workflow.
Develop and maintain infrastructure standards; develop server build and configuration standards; plan and oversee adherence to change management and release management processes Create and maintain proper documentation for all actions, practices, procedures and processes Administer and troubleshoot infrastructure systems Oversee administration of systems maintained by the systems team including Windows OS, Backup Systems, SAN, Exchange, File Servers, Print Servers Provide expertise in advanced/complex technical areas including SAN, VMWare, load balancing, and fail-over/clustering Provide technical support and engineering support for Active Directory, group policy, DNS, DHCP, Windows Server
Train employees on Terraform best practices with and integrate with CAF Help developers with pipelines, code, general support via Azure DevOps Designed and Implemented Azure IaC and Pipeline for new Azure ML infrastructure with a focus on using Databricks Workspaces. Setup Monitoring, Alerting, and Logging to be captured by Azure’s built in offerings as well as shipping to a custom built ELK stack on Kubernetes for a consolidated view.
Monitor and maintain Azure Kubernetes clusters (KubeCost, AlertManager, ELK, Prometheus, LENs) Deploy new infrastructure for Data Scientists as needed via IAC, GitHub, Terraform, ArgoCD to follow the fail fast methodology Kubernetes performance tuning and dynamic scaling via HPAs Configure cross cluster communications with least privileged access Documentation, Training materials, Demos, R&D, Machine Learning design and deployment via Azure ML.
Lead cultural change for cloud adoption. Develop and coordinate cloud architecture. Develop a cloud strategy and coordinate the adaptation process. Assessing applications, software and hardware Creating a “cloud broker team” Establish best practices for cloud across the company Selecting cloud providers and vetting third-party services Oversee governance and mitigate risk Work closely with IT security to monitor privacy and develop incident-response procedures Managing budgets and estimating cost Operating at scal
Train employees on Terraform best practices. Help developers with pipelines, code, general support via Azure DevOps Designe and Implemente Azure IaC and Pipeline. Setup Monitoring, Alerting, and Logging to be captured by Azure’s built in offerings as well as custom built ELK stack on Kubernetes for a consolidated views.
Monitor and maintain Azure Kubernetes clusters (KubeCost, AlertManager, ELK, Prometheus, LENs) Deploy new infrastructure for Data Scientists as needed via IAC, GitHub, Terraform, ArgoCD to follow the fail fast methodology Kubernetes performance tuning and dynamic scaling via HPAs Configure cross cluster communications with least privileged access Documentation, Training materials, Demos, R&D, Machine Learning design and deployment via Azure ML.