Strong skills in Python - Scala or Java. // 4 years experience on full life cycle Big Data production projects. // Experience with AWS services like EMR, Lambda, S3, DynamoDb. // Strong Experience in processing Big Data and analyzing the data using Spark, Map Reduce, Hadoop, Sqoop, Apache Airflow, HDFS, Hive, Zookeeper, etc.
C1X Inc. is a fast-growing, global technology company headquartered in San Jose, US, and with offices in Chennai and Tokyo. Our mission is to simplify and innovate in digital marketing by building unique and large-scale data products. We are a world-class engineering team that encompasses frontend (Angular), backend (API / Java / Node.js), mobile (Android/IOS), and Big Data engineering to deliver compelling products.
You will be a key member of the engineering team, responsible for shaping data products. You’ll have the opportunity to work with the C-level staff, business stakeholders, product managers, and engineering teams to define the next-generation customer data management and marketing software.
Responsibilities
Design and implement distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem on AWS. Ability to design and implement end-to-end solutions.
Build utilities, user-defined functions, and frameworks to better enable data flow patterns.
Work with architecture/engineering leads and other teams to ensure quality solutions are implemented, and engineering best practices are defined and adhered to. Create and maintain data documentation and definitions.
Work with the product owner and scrum master to understand requirements, help the team plan, and execute sprint tickets, work with other technical teams to develop new features.
Help drive best practices in continuous integration and deliver data quality.
Manage data flows and set up automation between various data sources.
Help drive optimization, testing, and tooling of the data products.
Qualifications
B.E. in Computer Science/Engineering or equivalent.
Strong demonstrable skills in two of the following programming languages – Python, Scala or Java.
Minimum 4 years experience working on full life cycle Big Data production projects.
Should have experience with AWS services like EMR, Lambda, S3, DynamoDb.
Strong Experience in processing Big Data and analyzing the data using Spark, Map Reduce, Hadoop, Sqoop, Apache Airflow, HDFS, Hive, Zookeeper, etc.
Familiarity with Docker, Airflow, or equivalent data pipeline and workflow management tools, distributed Stream Processing frameworks for Fast & Big Data like ApacheSpark, Flink, Kafka streams.
Strong skills around developing RESTful APIs.
Experience with version control using GitHub and Jenkins CI/CD pipelines.
Intermediate to advanced knowledge of SQL. Relational SQL and NoSQL databases, including Elasticsearch, Postgres, MySql. Experience in SQL tuning, schema design, or analytical programming.
Experience with different storage formats like Parquet, CSV and JSON.
Experience in unit, functional and automated testing.
Strong expertise in troubleshooting production issues.
Comfortable working across a wide array of technologies, fast-paced, and results-oriented environment.
Excellent verbal and writing skills to effectively collaborate with both business and technical teams.