AIRFLOW

  • Linux
  • Python
Please feel free to contact us
Go

About

Airflow is an open-source platform built to streamline and automate complex workflows through programmable task scheduling, monitoring, and orchestration. At its core, Airflow allows users to define intricate workflows as Directed Acyclic Graphs (DAGs), where tasks are represented as nodes, and dependencies between tasks are depicted as edges. This architecture facilitates the efficient management of tasks, enabling users to schedule, execute, and monitor workflows with ease. Airflow’s robust feature set includes dependency resolution, task retries, dynamic task generation, monitoring, alerting, and extensive logging capabilities. Moreover, it boasts a vast ecosystem of integrations, empowering users to seamlessly incorporate various data sources, tools, and services into their workflows. Whether orchestrating data pipelines, ETL processes, machine learning workflows, or other complex tasks, Apache Airflow provides a flexible and scalable solution for workflow automation and management.

  1. Type virtual machines in the search.
  2. Under Services, select Virtual machines.
  3. In the Virtual machines page, select Add. The Create a virtual machine page opens.
  4. In the Basics tab, under Project details, make sure the correct subscription is selected and then choose to Create new resource group. Type myResourceGroup for the name.*.
  5. Under Instance details, type myVM for the Virtual machine name, choose East US for your Region, and choose Ubuntu 18.04 LTS for your Image. Leave the other defaults.
  6. Under Administrator account, select SSH public key, type your user name, then paste in your public key. Remove any leading or trailing white space in your public key.
  7. Under Inbound port rules > Public inbound ports, choose Allow selected ports and then select SSH (22) and HTTP (80) from the drop-down.
  8. Leave the remaining defaults and then select the Review + create button at the bottom of the page.
  9. On the Create a virtual machine page, you can see the details about the VM you are about to create. When you are ready, select Create.

It will take a few minutes for your VM to be deployed. When the deployment is finished, move on to the next section.

Connect to virtual machine

Create an SSH connection with the VM.

  1. Select the Connect button on the overview page for your VM.
  2. In the Connect to virtual machine page, keep the default options to connect by IP address over port 22. In Login using VM local account a connection command is shown. Select the button to copy the command. The following example shows what the SSH connection command looks like:

bashCopy

ssh azureuser@10.111.12.123

  1. Using the same bash shell you used to create your SSH key pair (you can reopen the Cloud Shell by selecting >_ again or going to https://shell.azure.com/bash), paste the SSH connection command into the shell to create an SSH session.

Usage/Deployment Instructions

Step 1:  Access the Airflow in Azure Marketplace and click on Get it now button.

A screenshot of a computer

Description automatically generated

Click on Continue and then click on Create.

A screenshot of a computer

Description automatically generated

Step 2: Now to create a virtual machine, enter or select appropriate values for zone, machine type, resource group and so on as per your choice.

Graphical user interface, text, application, email

Description automatically generatedGraphical user interface, text, application, email

Description automatically generated

Graphical user interface, text, application, email

Description automatically generatedGraphical user interface, text, application, email

Description automatically generated

Graphical user interface, text, application, email

Description automatically generatedGraphical user interface, text, application, email

Description automatically generated

Click on Review + create.

Step 3:  The below window confirms that VM was deployed.

Graphical user interface, text, application, email

Description automatically generated

 

Step 4:  Open port 8080 in security group by going to resource group –

  1. Select your network security group.
  2. Select Inbound security rules from the left menu, then select Add.
  3. You can limit the Source as needed or leave the default of Any.
  4. Limit the Source port range as 8080. 
  5. You can limit the Destination as needed or leave the default of Any.
  6. Choose a common Service from the drop-down menu, TCP. You can also select Custom if you want to provide a specific port to use like Port_4000.
  7. Optionally, change the Priority or Name. The priority affects the order in which rules are applied: the lower the numerical value, the earlier the rule is applied.
  8. Select Add to create the rule.

Graphical user interface, text, application, email

Description automatically generated

 

Step 5: Open putty and connect with your machine. Add IP address of the running virtual machine.

Step 6: Login with user name and password that you provided during machine creation.

Creating a New Airflow User

  • Developers must create a new user on the first startup of airflow.
  • It can be done with the help of the “users create” command.
  • To create a new user with a username as admin with Admin role, we can run the following code:

airflow users create –username admin –password your_password –firstname your_first_name –lastname your_last_name –role Admin –email your_email@domain.com

 

A close-up of a computer code

Description automatically generated

  • Run the following command to check if the user was created successfully:

airflow users list

A computer code with black text

Description automatically generated

Running of the Airflow Scheduler and Webserver

  • Now we will start the airflow scheduler using the airflow scheduler command after activating the virtual environment:

$ airflow scheduler

  • Open a new terminalactivate the virtual environment, go to the airflow directory, and start the web server.

source airflow_env/bin/activate

$ cd airflow

airflow webserver

 

A computer code with black text

Description automatically generated with medium confidence

  • Once the scheduler and webserver get initialized, open any browser and go to http://IP-Address:8080/.
  • Port 8080 should be the default port for Airflow, and you see the following page:

A screenshot of a computer

Description automatically generated

  • If it doesn’t work or shows occupied by some other program, go to airflow.cfg file and change the port number.
  • After logging in using our airflow username and password, we should see the following web server UI.

A screenshot of a computer

Description automatically generated

  • These are some prebuilt dags you will observe when you log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dags, and see their workings.

 

 

 

 

Submit Your Request

Captcha

Until now, small developers did not have the capital to acquire massive compute resources and ensure they had the capacity they needed to handle unexpected spikes in load. Amazon EC2 enables any developer to leverage Amazon’s own benefits of massive scale with no up-front investment or performance compromises. Developers are now free to innovate knowing that no matter how successful their businesses become, it will be inexpensive and simple to ensure they have the compute capacity they need to meet their business requirements.

The “Elastic” nature of the service allows developers to instantly scale to meet spikes in traffic or demand. When computing requirements unexpectedly change (up or down), Amazon EC2 can instantly respond, meaning that developers have the ability to control how many resources are in use at any given point in time. In contrast, traditional hosting services generally provide a fixed number of resources for a fixed amount of time, meaning that users have a limited ability to easily respond when their usage is rapidly changing, unpredictable, or is known to experience large peaks at various intervals.

 

Traditional hosting services generally provide a pre-configured resource for a fixed amount of time and at a predetermined cost. Amazon EC2 differs fundamentally in the flexibility, control and significant cost savings it offers developers, allowing them to treat Amazon EC2 as their own personal data center with the benefit of Amazon.com’s robust infrastructure.

When computing requirements unexpectedly change (up or down), Amazon EC2 can instantly respond, meaning that developers have the ability to control how many resources are in use at any given point in time. In contrast, traditional hosting services generally provide a fixed number of resources for a fixed amount of time, meaning that users have a limited ability to easily respond when their usage is rapidly changing, unpredictable, or is known to experience large peaks at various intervals.

Secondly, many hosting services don’t provide full control over the compute resources being provided. Using Amazon EC2, developers can choose not only to initiate or shut down instances at any time, they can completely customize the configuration of their instances to suit their needs – and change it at any time. Most hosting services cater more towards groups of users with similar system requirements, and so offer limited ability to change these.

Finally, with Amazon EC2 developers enjoy the benefit of paying only for their actual resource consumption – and at very low rates. Most hosting services require users to pay a fixed, up-front fee irrespective of their actual computing power used, and so users risk overbuying resources to compensate for the inability to quickly scale up resources within a short time frame.

 

You have complete control over the visibility of your systems. The Amazon EC2 security systems allow you to place your running instances into arbitrary groups of your choice. Using the web services interface, you can then specify which groups may communicate with which other groups, and also which IP subnets on the Internet may talk to which groups. This allows you to control access to your instances in our highly dynamic environment. Of course, you should also secure your instance as you would any other server.

 

Highlights

  • Programmable.
  • Workflow Orchestration.
  • Task Scheduling.
  • Dependency Resolution.
  • Monitoring and Alerting.
  • Dynamic Task Generation.

Application Installed