![]() ![]() Typically this is done by referencing the dag_id and task_id in the sensor definition. ![]() One common pattern for sensors in our Airflow environment is to depend on specific tasks and DAGs. Automating the Process of Resolving DAG and Task Names Further, when we did run into operational issues with the new environment, the sensor enabled us to quickly and safely revert back to the previous setup while those issues were being resolved, allowing us to avoid downtime. While simple, this sensor enabled us to quickly move DAGs back and forth between environments, and completely decoupled our teams from the migrations of other teams, meaning that we could iteratively migrate DAGs between environments at a measured pace. Then, when a sensor is scheduled, the poke method of the sensor queries the external task registry service, and completes only when the expected task is found to have been completed. This ensures that the service has a complete picture of all tasks running in both environments. Every time a DAG or task runs in either environment, a callback executes after the task finishes which registers the start time, DAG name, and task name. This sensor relies on a service outside of Airflow which acts as a registry for DAGs. To achieve this we implemented a new type of cluster-independent sensor to allow for waiting for upstream tasks and DAGs to finish executing, which we call a CrossAirflowSensor. That way teams would be able to move their DAGs without having to notify other teams, or write migration-specific code to handle DAGs existing in other clusters. Further, we wanted to make this process automatic such that when writing or migrating a DAG, it wasn't necessary for either DAG to know which cluster the other DAG was in. That is, we wanted tasks in our old Airflow 1 cluster to be able to depend on tasks in the new Airflow 2 cluster, and vice versa. To enable this it was essential to extend the ability in Airflow to depend on tasks in other DAGs to make our DAGs able to depend on tasks in other clusters. Additionally, we wanted to make it possible to easily switch DAGs on and off in both environments, such that if there was an issue it was operationally easy to swap DAGs temporarily back to the new environment. This was important both to avoid rushing through any changes, as well as to give teams time to test those changes. When planning the migration, one of the key things we wanted to ensure was making it possible for teams to migrate at their own pace within the migration window. Enabling Migrating Between Either Environment in Both Directions This blog details how we organized that migration, and the tools that we put in place in order to reduce the risk. It also required coordination between all the teams using Airflow, because DAGs often depend on other DAGs. While necessary, this move required a large number of modifications to our DAGs due to breaking changes in the Airflow API, as well as in various providers which we use. Recently we migrated to the latest major version of Airflow in order to take advantage of architectural and performance improvements, as well as ensuring that we could use the most up-to-date provider code. Alongside a schedule, this allows us to ensure that tasks run in a periodic and predictable fashion such that the dependencies between all of our DAGs are properly encoded. If you're not familiar with Airflow, it allows engineers to write Python files which express operators and compose them together into a directed acyclic graph (DAG). Here, he explains how the team supported a migration to the newest version of Apache Airflow in a way that minimized risk and ensured a smooth transition for other user teams.Īt GetYourGuide we use Apache Airflow for scheduling the majority of our data generation and transformation tasks. As part of his work on the Marketing Platform Team, he builds and maintains infrastructures across our architecture. Ben Ryves is a Staff Software Engineer based in our Zurich office. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |