Principal Programmer Analyst -(Spark Hadoop Hive Nifi)

Updated: January 7, 2020
Location: Morrisville, NC, United States
Job ID: 7649

Apply Now

Overview

You’re driven, resourceful, and above all else - remarkably smart.
 
You love a good challenge. You are the first to roll up your sleeves and work with relentless energy until you solve the unsolvable, beat the unbeatable and you always come out on top. Passable doesn’t cut it – you’ve got fire in your belly to learn more, do more and be more. For you, the sweetest success is shared success and you’re known for your good nature. You’ll fit right in at Syneos Health where we surround ourselves with the most talented and agile professionals in the industry, but we check our egos at the door.

Responsibilities

This role can be located in our Morrisville, NC  or Somerset, NJ office and we will consider a remote work arrangement for the right person.   


Design, develop, and deliver solutions based on Big Data applications that fulfill the strategic vision for enterprise applications to successfully support the business.  Activities will include:

  • Perform the full deployment lifecycle, from on-premises to the cloud, including installation, configuration, initial production deployment, recovery, security and data governance for Hadoop.
  • Evaluates and provides technical solutions to design, develop, and support as required in a lead role to business units wishing to implement an information technology solution. 
  • Refine raw data into actionable insight using visualization and statistics with innovative analytics applications and systems.
  • Develop applications that can interact with the data in the most appropriate way, from batch to interactive SQL or low latency access using latest tools - Hortonworks Data Platform (HDP) preferred.
Most importantly must have

Apache Spark (5 years)

Hadoop

Hive - 3 years

Nifi

Azure is a plus 

Essential Functions:

  1. Leads implementation (installation and Configuration) of HDP 2.6 with complete cluster deployment layout with replication factors, setup NFS Gateway to access HDFS data, resource managers, node managers & various phases of Map Reduce Jobs.  Experience with configuring workflows and deployment using tools such as Apache Oozie is necessary.
  2. Participates in design, development, validation, and maintenance of the Big Data platform and associated applications.  Provides assistance in architecture oversight to how the platform is built to ensure that it supports high volume / high velocity data streams and is scalable to meet growth expectations.
  3. Monitor workflows and job execution using the Ambari UI, Ganglia or any equivalent tools.  Assisting administration in commission and decommission of nodes, back up and recover Hadoop data using snapshots & high availability. Good understanding of rack awareness and topology is preferred.
  4. Develops, implements, and participates in designing column family schemas of Hive and Hbase within HDFS. Experience in designing Hadoop flat and Star models with Map Reduce impact analysis is necessary.
  5. Develops Data layer for performance critical reporting system.  Experience with real time big data reporting system is necessary.
  6. Recommends and assists with the development and design of HDFS – hive data partitioning, Vectorization and bucketing with Horton works Big Insights query tools. Perform Day to Day operational tasks using flume and Sqoop insight data to different RDBMS. Expertise in java scripts, UNIX shell scripts to support custom functions or steps is required.
  7. Develops guidelines and plans for Performance tuning of a Hadoop/NoSQL environment with underlying impact analysis of Map-reduce jobs using CBO and analytical conversions and. Implement a mixed batch / near-real time architecture to analyze, index, and publish data for applications. Write a custom reducer that reduces the number of underlying Map Reduce jobs generated from a Hive query. Helps with cluster efficiency capacity planning and sizing.
  8. Develops efficient Spark & Hive scripts  with joins on datasets using a variety of techniques, including Map-side and Sort-Merge joins with various analytical functions .Experience with  advanced Hive features like windowing, CBO,views and ORC files and compression techniques are necessary. Perform development of jobs to capture CDC (Change Data Capture) from Hive based internal, external and managed systems.
  9. Partners with key internal teams, such as clinical operations and data management, to ensure that the Big Data solution is identifying all the data points in upstream systems and classifying them appropriately to support analytic objectives.  Identifies and implements appropriate information delivery mechanisms that improve decision-making capability of our customers.
  10. Design , Develop and troubleshoot  transformations to ingest and manipulate data from various sources within the company and their extended environment using native Hadoop tools or any ETL tools such as Pentaho Data Integrator.with Hadoop-hive based data transformations.

Other Responsibilities:

  • Designing and setting up exception handling jobs, writing Oracle scripts, functions, stored procedures, complex SQL queries, PL/SQL Analytical functions, hierarchical, parent-child queries to support application systems.
  • Providing Solutions for Portal and mash-up integration seamlessly connecting business analytics with other applications in a publisher/subscriber model.

Job Requirements

Along with demonstrated initiative, uncompromised integrity and a results-oriented mindset, the ideal candidate has:

Requirements:
  • BA/BS in computer science or similar discipline, plus 5+ years development experience in technologies such as Hadoop (HDP preferred), and Oracle databases.
  • A very strong SQL/data analysis or data mining background , experience with Business Intelligence, Data Warehousing, Solid understanding of large scale data management environments (relational and/or NoSQL), Audit Controls, ETL Framework is expected.
  • Prior experience in building scalable distributed data processing solutions with Hadoop using tools such as HBase (NoSQL), Hive, Spark, Solr & phoenix.
  • Some proficiency with MapReduce / HDFS architecture, and Linux or Unix OS system management and at least one scripting language experience is required. 
  • Hortonworks certified developers strongly preferred, but Cloudera is acceptable.

At Syneos Health, we believe in providing an environment and culture in which our people can thrive, develop and advance. We reward and recognize our people by providing valuable benefits and a quality of life balance.

Why Syneos Health? Join a game-changing global company that is reinventing the way therapies are developed and commercialized. Here, you’re essential in solving and executing against today’s toughest commercialization challenges facing the world’s leading healthcare companies. From the very beginning, you’ll be supported by team members who, like you, aren’t afraid to try something new. You'll gain exposure and work in a dynamic environment to create better, smarter, faster ways to get biopharmaceutical therapies to patients.


WORK HERE MATTERS EVERYWHERE | How will you accelerate biopharmaceutical commercialization?

Syneos Health companies are affirmative action/equal opportunity employers (Minorities/Females/Veterans/Disabled)

 

Apply Now

Click here to enable personalized experience
Apply Now

Tell Us About Your Candidate Experience! Get In Touch