Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Vibhor Kumar

DevOps and SRE
Pune

Summary

DevOps and Site Reliability Engineer with 12+ years of experience optimizing deployments in AWS and Azure. Expertise in automating CI/CD pipelines with Git, GitLab, Jenkins, and Rundeck. Proficient in Docker, Kubernetes, and scripting with Python and Shell. Skilled in proactive monitoring with Datadog to ensure high system availability. Strong collaborator with a focus on delivering scalable solutions and improving incident management. Passionate about driving efficiency and innovation in infrastructure and deployment processes.

Overview

13
13
years of professional experience
6
6
years of post-secondary education
3
3
Certifications

Work History

Lead DevOps Engineer

Infinite Computer Solutions
06.2023 - Current
  • Automate Code Deployment: Improve code deployment efficiency by automating processes with CI/CD pipelines using tools like Git, GitLab, Jenkins, and Rundeck.
  • Containerization Strategies: Design and implement containerization strategies using Docker and Kubernetes on AWS and Azure platforms to enhance resource utilization and management.
  • Build and Integration Monitoring: Monitor automated build and continuous software integration processes, driving the resolution of build/release failures.
  • Scripting and Automation: Automate manual tasks through scripting languages such as Python and Shell, significantly boosting team productivity levels.
  • Collaboration with Development and Testing Teams: Work with software development and testing team members to design and develop robust solutions that meet client requirements for functionality, scalability, and performance.
  • 24/7 On-Call Support: Provide 24/7 on-call support for critical systems, ensuring high availability and rapid issue resolution.
  • Monitoring and Alerting: Reduce system downtime for critical applications by implementing robust monitoring and alerting tools like Datadog.
  • Infrastructure Optimization: Optimize infrastructure performance by conducting thorough analyses of system metrics and data, utilizing tools like Terraform for infrastructure as code.
  • Custom Tool Development: Develop custom scripts and tools as needed to automate routine tasks, increasing overall team productivity and efficiency.
  • Incident Management and Documentation: Improve incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issue resolution steps.

Senior Site Reliability Engineer

SalesForce
12.2018 - 11.2022
  • Incident Management: Lead incident response and serve as a subject matter expert, collaborating with SMEs for post-incident analysis to prevent recurrence.
  • Release Management: Plan and manage release windows and cycles across clouds, supporting customer communications and risk management.
  • Eliminating Toil: Provide architectural and practical guidance to automate and enhance resiliency, efficiency, and performance. Identify and construct new process frameworks, and recommend improvements to existing ones.
  • Service Owner Collaborations: Work with business users to understand issues, perform root cause analysis, and develop enhancements or fixes with team.
  • Infrastructure Availability: Monitor and report on service level objectives, and collaborate with service and product owners to establish key performance indicators.
  • Mentor/Coach: Conduct knowledge transfer sessions, design training programs, and coach new team members on processes and practices.

Tech Lead

Cognizant Technology Solutions
09.2017 - 12.2018
  • Investigation and resolution of process failure in SF applications using, web service tool, Splunk, and Instrumental applications
  • Responsible for Job creating, modifying, rescheduling, deleting, rerun, holding, forcing ok, deleting, confirm from SCOMs
  • Responsible for application support & Infrastructure monitoring and maintenance, managing server health, SQL server, change management request and IIS server
  • Handle various alerts regarding application issues, System issues, Database issues, Network issues, and Customer escalation issues
  • Creating and responding to cases/tickets on salesforce and Escalation handling.

Application Monitoring Engineer/Group Lead

P.I Softek Ltd.
11.2013 - 09.2017
  • Leading and managing 24 X 7 IT production support through a team of 5+ engineers and acting as the first point of internal escalation for the service desk team members
  • Responsible for Job creating, modifying, rescheduling, deleting, rerun, holding, forced ok, delete, confirm from scams
  • Keep an eye on alerts with the help of SCOM, NAGIOS, IGNITE, Appdynamics& MIR3 APICA, Sledgehammer, and Kibana 3 monitoring tools and troubleshooting application alerts/ issues and take action to resolve issues at Level I
  • Responsible for application support & Infrastructure monitoring and maintenance, managing server health, SQL server, change management request and IIS server.

System Engineer (Monitoring)

Tumlare Software Services Pvt. Ltd.
05.2011 - 11.2013
  • Responsible for the Infrastructure monitoring using the different tools and acting as level one support and working with the service owner to resolve the issue
  • Tools- Host monitor, NetIQ App-manager, Nagios, IBM Integrated Management Module, Session killer, and TDS monitor.

Education

PG Program in DevOps - DevOps

Caltech University
04.2022 - Current

BBA - undefined

CCS university
Meerut, U.P
01.2006 - 04.2009

Skills

Amazon Web Services (AWS)

DevOps, Jenkins, Git, GitLab, Terraform, Python, SRE, Linux, Grafana, Nagios, Ignite, Solarwinds, Appdynamics, Sledgehammer, CI/CD, GUS, WindowsServer, Slack, ITIL, SCOM, PagerDuty, Splunk

Infrastructure Automation, Containerization Technologies

Monitoring and Logging DataDog GitLab Project Planning

Performance Optimization Incident Management Performance Management JIRA ServiceNow FreshService Problem-solving abilities Agile development methodologies Team Collaboration

JIRA Confluence

Certification

AWS Certified Solutions Architect - Associate

Languages

English
Advanced (C1)
Hindi
Advanced (C1)

Timeline

Lead DevOps Engineer

Infinite Computer Solutions
06.2023 - Current

PG Program in DevOps - DevOps

Caltech University
04.2022 - Current

Senior Site Reliability Engineer

SalesForce
12.2018 - 11.2022

Tech Lead

Cognizant Technology Solutions
09.2017 - 12.2018

Application Monitoring Engineer/Group Lead

P.I Softek Ltd.
11.2013 - 09.2017

System Engineer (Monitoring)

Tumlare Software Services Pvt. Ltd.
05.2011 - 11.2013

BBA - undefined

CCS university
01.2006 - 04.2009
Vibhor KumarDevOps and SRE