SITE RELIABILITY ENGINEER (Part Time Job)
Job Description:- THE IMPACT YOU WILL MAKE
The Service Reliability Engineering (SRE) Senior Associate role will offer you the flexibility to make each day your own, while working alongside people who care, so that you can deliver on the following responsibilities:
Independently determine the needs of the customer and create solution frameworks.
Design and develop moderately complex software solutions to meet needs.
Use a process-driven approach in designing and developing solutions.
Implement new software technology and coordinate end-to-end tasks across the team.
May maintain or oversee the maintenance of existing software.
3 - 5+ years of relevant professional experience;
Experience as a full-stack developer with hands on knowledge of languages like Java, Python etc. and exposure with application / infrastructure architecture;
Excellent verbal and written communication skills with experience presenting information and/or ideas to an audience
Experience collaborating cross-functionally on availability / performance issues in order to identify root-cause, determine areas for improvement, and drive those actions to closure through effective solutions
Adept at managing project plans, resources, and people to ensure successful project completion in an Agile / Scrum environment in order to facilitate the design / development of engineering and resiliency methodologies through collaboration with engineering and product teams to implement shift left techniques on test design & automation
Knowledge of Performance and Chaos Engineering strategies and scripts with a strong emphasis on automated deployment, infrastructure automation solutions, and continuous integration & delivery processes
Ability to identify gaps in the code from a non-functional viewpoint and experience assisting other developers to fix the code and promote relevant reliability pattern implementations
Skilled in establishing and maintaining the overall health, availability, performance, resiliency, and capacity of technology products with specific experience in performance engineering and validations using JMeter, Load Runner, etc.
Skilled in cloud technologies and cloud computing to include Amazon Web Services (AWS) offerings, development, and networking platforms
Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations Toil Reduction through Automation
Experience designing, building and implementing dashboards from application and infrastructure health perspectives using tools such as Splunk, Dynatrace, Datadog, etc. to provide a single pane view of all critical business and operational information to relevant stakeholders
Excellent analytical and problem-solving skills with a passion to resolve the issues in a timely manner
Knowledge on Cloud technologies and containerization using Docker & Kubernetes
Excellent understanding and demonstrated experience in the use of DevOps / CICD tools like Jenkins, Terraform, Jules, and automated deployment tools
Working knowledge of one of Unix operating systems
Knowledge of performance tuning of enterprise level Java / J2EE applications (Web and Application Servers Configuration, JVM parameters tuning, GC and Heap Size, Message Broker);
Experience in implementing resiliency design pattern frameworks and validation
Experience in performance engineering tools – monitoring tools, performance testing tools, and analysis tools
Experience in troubleshooting Performance / Scalability / Availability issues in production environment.
As a valued colleague on our team, you will collaborate with team in designing, producing, testing, or implementing moderately complex software, technology, or processes, as well as create and maintain IT architecture, large scale data stores, and cloud-based systems.
You will apply your expertise in software and systems engineering to ensure that both our internally critical and externally visible systems meet the appropriate performance needs of our users. You will serve as a champion of service availability, efficiency, automation, monitoring, and capacity management. Specifically, you will leverage your skills and experience in Amazon Web Services, software development with Java and/or Python, customization in Splunk and/or Dynatrace, and automation in Selenium and/or Blue Prism (among others) to enable increased feature velocity and continuous improvement.
Please DM me for further process.