- Published on
Data Engineering Zoomcamp | Week 2.1. Mage Intro and Setup |
Let's dive into what mage is, how we can use it and how to configure it.
What is Mage?
Mage is an open-source pipelien tool for orchestrating, transforming, and intergrating data 👷♂️
Projects
: Sort of like your home base (can have multiple in an instance)
Pipelines
:
- Similar to DAGs or data workflows.
- Each pipeline is represented by a YAML file int the "pipelines" folder of a project.
- This means we can dynamically create pipelines by using code based automation to create these.
Blocks
: The atomic units that make up a transformation in mage. Can be written Python, SQL, R. We use the blocks to perform loading
, transforming
, and exporting
. The Function inside a dataframe must return a df. The test will be run on the output dataframe from the block. The only thing that's getting executed when a block is run is the function in the block, it's similar to a main() function. Anything outside of this i.e. imports will still run but it's not going to return anything.Mage brings alot of unique functionality out of the box. Sensors that can trigger on some event. Conditionals with branching logic i.e. if else. Dynamics that can create dynamic children & also webhooks.
- Hybrid Environment
- Uses a GUI for interactive development (or don't, can just use VSCode!)
- Use
blocks
as testable, reusable pieces of code.
Configuring Mage
As a prequisite, we need docker installed.
Let's get started
This repo contains a Docker Compose template for getting started with a new Mage project. It requires Docker to be installed locally. If Docker is not installed, please follow the instructions here.
You can start by cloning the repo:
git clone https://github.com/mage-ai/mage-zoomcamp.git mage-zoomcamp
Navigate to the repo:
cd mage-data-engineering-zoomcamp
Rename dev.env
to simply .env
— this will ensure the file is not committed to Git by accident, since it will contain credentials in the future.
Now, let's build the container
docker compose build
Finally, start the Docker container:
docker compose up
Now, navigate to http://localhost:6789 in your browser! Voila! You're ready to get started with the course.
What just happened?
We just initialized a new mage repository. It will be present in your project under the name magic-zoomcamp
. If you changed the varable PROJECT_NAME
in the .env
file, it will be named whatever you set it to.
This repository should have the following structure:
.
├── mage_data
│ └── magic-zoomcamp
├── magic-zoomcamp
│ ├── __pycache__
│ ├── charts
│ ├── custom
│ ├── data_exporters
│ ├── data_loaders
│ ├── dbt
│ ├── extensions
│ ├── interactions
│ ├── pipelines
│ ├── scratchpads
│ ├── transformers
│ ├── utils
│ ├── __init__.py
│ ├── io_config.yaml
│ ├── metadata.yaml
│ └── requirements.txt
├── Dockerfile
├── README.md
├── dev.env
├── docker-compose.yml
└── requirements.txt
Assistance
- Mage Docs: a good place to understand Mage functionality or concepts.
- Mage Slack: a good place to ask questions or get help from the Mage team.
- DTC Zoomcamp: a good place to get help from the community on course-specific inquireies.
- Mage GitHub: a good place to open issues or feature requests.