ETL Developer (Media Measurement sphere)

Минск

Our client is the company which pioneers the future of cross-platform media measurement, arming organizations with the insights they need to make decisions with confidence. Central to this aim are our people who work together to simplify the complex on behalf of our clients & partners.

It is a trusted partner for planning, transacting and evaluating media across platforms. With a data footprint that combines digital, linear TV, over-the-top and theatrical viewership intelligence with advanced audience insights, its platform allows media buyers and sellers to quantify their multiscreen behavior and make business decisions with confidence.

Project Description

You’ll be responsible for building next-generation data delivery platform. Our application is the main television data processor and supplier for a broad range of clients and products, including industry-leading ad agencies, national television networks, and other products. As a member of this fast-moving team, you’ll have large impact on the evolution and adoption of the data processing as well as on the success of the business. It’s worth mentioning that this company processes and stores dozens of petabytes of data which is coming from Web and their current infrastructure processes 15 bln requests per day.

 

 

Our processing consists of several steps where ETL is implemented in Java using Spark as the main tool. Every step provides a dataset stored on S3 as parquet files with custom batching logic implemented. The result of the workflow is custom aggregations used later for API calls to support the reports generated on front-end applications. In between the main logical steps, there are several additional steps to update the cloud storage such as Athena or provide the data to the additional endpoints. The workflow scheduling and monitoring are handled via Airflow. Given the information above, the tasks to work with are related mostly to adjusting the Java ETL code, Tech dept improvements including performance optimization, data inconsistency investigation (in comparison with the legacy system as well). Scrum is used as a development methodology. Tickets/tasks are not assigned strictly, it is always possible to pick any one interesting to you from the backlog.

What You’ll Do
  • Work within an agile team to develop new ETL processes;
  • Recommend and implement creative solutions for improving performance;
  • Increase scalability and maintainability to support rapid usage growth;
  • Collaborate openly with stakeholders and clients to continuously improve the product and increase adoption;
Technologies
Java 8+
Apache Spark
ETL
AWS
Apache Airflow
Apache Parquet
Bash
Python
Yarn
Jenkins
Gradle
Job Requirements
  • Experience in Java development, having worked with ETL (ELT) would be beneficial;
  • Experience in Apache Spark;
  • Experience building, deploying and managing application in AWS is preferred;
  • Experience with functional languages like Scala would be a benefit;
  • Strong SQL skills nice to have;
  • Strong communication skills (written and verbal) along with a track record of success delivering large software projects;
  • Demonstrated knowledge of commonly used software engineering concepts, practices, and procedures;
Напишите нам.
Мы обязательно ответим!
Отклинуться через: linkedin.com hh.ru

*Обязательное поле

Проверьте, правильность заполнения формы.
Ваша заявка принята, спасибо. Мы свяжемся с вами, используя указанные вами контакты.