Our client is the company which pioneers the future of cross-platform media measurement, arming organizations with the insights they need to make decisions with confidence. Central to this aim are our people who work together to simplify the complex on behalf of our clients & partners.
It is a trusted partner for planning, transacting and evaluating media across platforms. With a data footprint that combines digital, linear TV, over-the-top and theatrical viewership intelligence with advanced audience insights, its platform allows media buyers and sellers to quantify their multiscreen behavior and make business decisions with confidence.
You’ll be responsible for building next-generation data delivery platform. Our application is the main television data processor and supplier for a broad range of clients and products, including industry-leading ad agencies, national television networks, and other products. As a member of this fast-moving team, you’ll have large impact on the evolution and adoption of the data processing as well as on the success of the business. It’s worth mentioning that this company processes and stores dozens of petabytes of data which is coming from Web and their current infrastructure processes 15 bln requests per day.
Our processing consists of several steps where ETL is implemented in Java using Spark as the main tool. Every step provides a dataset stored on S3 as parquet files with custom batching logic implemented. The result of the workflow is custom aggregations used later for API calls to support the reports generated on front-end applications. In between the main logical steps, there are several additional steps to update the cloud storage such as Athena or provide the data to the additional endpoints. The workflow scheduling and monitoring are handled via Airflow. Given the information above, the tasks to work with are related mostly to adjusting the Java ETL code, Tech dept improvements including performance optimization, data inconsistency investigation (in comparison with the legacy system as well). Scrum is used as a development methodology. Tickets/tasks are not assigned strictly, it is always possible to pick any one interesting to you from the backlog.