Posição: Entry level

Tipo de empregos: Full-time

Loading ...

Conteúdo do emprego

Canada | Remote | Work from Home |

Why you?

Are you an Infrastructure Specialist, Site Reliability Engineer or DevOps, who has strong experience in the Hadoop ecosystem? Do you know how to deploy, upgrade, create disaster plans, perform system and ecosystem tuning? What about infrastructure architecture, performance analysis, deployment automation, and intelligent monitoring? Do you have solid experience of Site Reliability Engineering and DevOps tooling, processes and best practices? Do you have experience using infrastructure as code, Cloud Formation, Terraform etc? Knowledge of CI using Maven, Nexus or Jenkins? How about setting up a Kafka cluster?

This is much more than a Hadoop Administrator role, it is a true SRE job. At Pythian our team is focused on Hadoop service operations and open source, cloud-enabled infrastructure architecture. If you Love Your Data™, enjoy solving complex technical problems and want to Love Your Career then this could be the job for you!

What will you be doing?
  • As a SRE on the Hadoop team you will; Deploy, operate, maintain, secure and administer solutions that contribute to the operational efficiency, availability, performance and visibility of Pythian customers’ infrastructure and Hadoop platform services, across multiple vendors (i.e. Cloudera, Hortonworks, MapR).
  • Gather information and provide performance and root cause analytics and remediation planning for faults, errors, configuration warnings and bottlenecks within our customers’ infrastructure, applications and Hadoop ecosystems.
  • Deliver well-constructed, explanatory technical documentation for architectures that we develop, and plan service integration, deployment automation and configuration management to business requirements within the infrastructure and Hadoop ecosystem.
  • Understand distributed Java container applications, their tuning, monitoring and management; such as logging configuration, garbage collection and heap size tuning, JMX metric collection and general parameter-based Java tuning.
  • Observe and provide feedback on the current state of the client’s infrastructure, and identify opportunities to improve resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
  • Contribute heavily to the development of deployment automation artifacts, such as images, recipes, playbooks, templates, configuration scripts and other open source tooling.
  • Be conversant about cloud architecture, service integrations, and operational visibility on common cloud (AWS, Azure, Google) platforms.
  • Understanding of ecosystem deployment options and how to automate them via API calls is a huge asset.
What do we need from you?
  • While we realise you might not have everything on the list to be the successful candidate for the Hadoop SRE job you will likely have experience in similar roles and most or all of the following skills;
  • Strong understanding of the end-to-end operations of complex Hadoop-based ecosystems and handle / configure core technologies such as HDFS, MapReduce, YARN, HBase, ZooKeeper and Kafka.
  • Understand the dependencies and interactions between these core components, alternative configurations (i.e. MRv2 vs Spark, scheduling in YARN), availability characteristics and service recovery scenarios.
  • Be able to identify workflow and job pipeline characteristics and tune the ecosystem to support high performance and scalability, from the infrastructure platform through to the application layers in the ecosystem.
  • Enable metric collection at all layers of a complex infrastructure, ensuring good visibility for engineering and troubleshooting tasks, and ensure end to end monitoring of critical ecosystem components and workflows.
  • Bring strong knowledge of the Hadoop toolset, how to manage and copy data between and within a Hadoop cluster, integrate with other ecosystems (for instance, cloud storage), configure replication and plan backups and resiliency strategies for data on the cluster.
  • Have comprehensive systems hardware and network troubleshooting experience in physical, virtual and cloud platform environments, including the operation and administration of virtual and cloud infrastructure provider frameworks. Experience with at least one virtualization and one cloud provider (for instance, VMWare, AWS) is required.
  • Experience with the design, development and deployment of at least one major configuration management framework (i.e. Puppet, Ansible, Chef)
  • Solid understanding of infrastructure as code deployment with Cloud Formation, Terraform, Opsworks etc)
  • Knowledge of DevOps tools, processes, and culture (i.e. Git, continuous integration, test-driven development, Scrum).
  • Hands-on knowledge on Job Automation and Monitoring like Grafana, Ganglia, Kibana, Nagios
  • Be ability to pick up new technologies and ecosystem components quickly, and establish their relevance, architecture and integration with existing systems.
What do you get in return?
  • Love your career: Competitive total rewards package with an annual bonus
  • Love your development: Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend conferences, become certified, whatever you like!
  • Love your work/life balance: Why commute? Work remotely from your home (forever), there’s no daily travel requirement to an office!, You can be located anywhere in Canada, all you need is a stable internet connection.
  • Love your workspace: We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalise your work environment!
  • Love your community: Blog during work hours; take a day off and volunteer for your favorite charity.
Why Pythian?

Pythian excels at helping businesses use their data and cloud to transform how they compete and win in this ever-changing environment by delivering advanced on-prem, hybrid, cloud and multi-cloud solutions to solve the toughest data challenges faster and better than anyone else. Founded and headquartered in Ottawa, Canada in 1997, Pythian now has more than 370 employees located around the globe with over 350 clients spanning industries from SaaS; media; gaming; financial services; e-commerce and more. Pythian is known for its technology-enabled data expertise covering everything from ETL to ML. We pride ourselves on our ability to deliver innovative solutions that meet the specific data goals of each client and have built meaningful partnerships with major cloud vendors AWS, Google and Microsoft. The powerful combination of our extensive expertise in data and cloud and our ability to keep on top of the latest bleeding edge technologies make us the perfect partner to help mid and large-sized businesses transform to stay ahead in today’s rapidly changing digital economy. If you are a Hadoop Site Reliability Engineer or DevOps; live in Canada or the US; love your data and want to love your career then join us!

Disclaimer

For this job an equivalent combination of education and experience, which results in demonstrated ability to apply skills will also be considered.

Pythian is an equal opportunity employer and welcomes applications from people with disabilities. Accommodations are available upon request for candidates taking part in all aspects of the selection process.

The successful applicant will need to fulfill the requirements necessary to obtain a background check.

Applicants must be legally authorized to work in their country of residence permanently– Pythian will not relocate, sponsor, or file petitions of any kind on behalf of a foreign worker to gain a work visa, become a permanent resident based on a permanent job offer, or to otherwise obtain authorization to work.

Love Your Data™ is a registered trademark of Pythian Services Inc.

No recruitment agencies

Loading ...
Loading ...

Data limite: 21-06-2024

Clique para aplicar para o candidato livre

Aplicar

Loading ...
Loading ...

EMPREGOS SEMELHANTES

Loading ...
Loading ...