Published on

Data Engineering Zoomcamp | Week 2.1. Mage Intro and Setup |

Authors

Let's dive into what mage is, how we can use it and how to configure it.

What is Mage?

Mage is an open-source pipelien tool for orchestrating, transforming, and intergrating data 👷‍♂️

mage-architecture

Projects: Sort of like your home base (can have multiple in an instance)

Pipelines:

  1. Similar to DAGs or data workflows.
  2. Each pipeline is represented by a YAML file int the "pipelines" folder of a project.
    1. This means we can dynamically create pipelines by using code based automation to create these.
Blocks: The atomic units that make up a transformation in mage. Can be written Python, SQL, R. We use the blocks to perform loading, transforming, and exporting. Block Anatomy The Function inside a dataframe must return a df. The test will be run on the output dataframe from the block. The only thing that's getting executed when a block is run is the function in the block, it's similar to a main() function. Anything outside of this i.e. imports will still run but it's not going to return anything.
Features

Mage brings alot of unique functionality out of the box. Sensors that can trigger on some event. Conditionals with branching logic i.e. if else. Dynamics that can create dynamic children & also webhooks.

  • Hybrid Environment
    • Uses a GUI for interactive development (or don't, can just use VSCode!)
    • Use blocks as testable, reusable pieces of code.

Configuring Mage

As a prequisite, we need docker installed.

Let's get started

This repo contains a Docker Compose template for getting started with a new Mage project. It requires Docker to be installed locally. If Docker is not installed, please follow the instructions here.

You can start by cloning the repo:

git clone https://github.com/mage-ai/mage-zoomcamp.git mage-zoomcamp

Navigate to the repo:

cd mage-data-engineering-zoomcamp

Rename dev.env to simply .env— this will ensure the file is not committed to Git by accident, since it will contain credentials in the future.

Now, let's build the container

docker compose build

Finally, start the Docker container:

docker compose up

Now, navigate to http://localhost:6789 in your browser! Voila! You're ready to get started with the course.

What just happened?

We just initialized a new mage repository. It will be present in your project under the name magic-zoomcamp. If you changed the varable PROJECT_NAME in the .env file, it will be named whatever you set it to.

This repository should have the following structure:

.
├── mage_data
│   └── magic-zoomcamp
├── magic-zoomcamp
│   ├── __pycache__
│   ├── charts
│   ├── custom
│   ├── data_exporters
│   ├── data_loaders
│   ├── dbt
│   ├── extensions
│   ├── interactions
│   ├── pipelines
│   ├── scratchpads
│   ├── transformers
│   ├── utils
│   ├── __init__.py
│   ├── io_config.yaml
│   ├── metadata.yaml
│   └── requirements.txt
├── Dockerfile
├── README.md
├── dev.env
├── docker-compose.yml
└── requirements.txt

Assistance

  1. Mage Docs: a good place to understand Mage functionality or concepts.
  2. Mage Slack: a good place to ask questions or get help from the Mage team.
  3. DTC Zoomcamp: a good place to get help from the community on course-specific inquireies.
  4. Mage GitHub: a good place to open issues or feature requests.