Prefect flows and tasks

Tasks are the most basic unit of work in Prefect. A task performs certain function or operation. The way to enable a task in Prefect is by using the @task decorator. A flow is a combination of several tasks where one can specify a certain order of execution. Flow will contain one or more tasks… Continue reading Prefect flows and tasks

Prefect Installation: Part I

Prefect has two editions — open source and Prefect cloud. To get going with Prefect open source, one can install Prefect from pip installer. It may be advised to create a separate conda environment or virtual environment. To use virtual environment, virtualenv {{name-of-the-virtual-environment}} To activate the virtual environment source {{name-of-the-virtual-environment}}/bin/activate One may choose to install… Continue reading Prefect Installation: Part I

dbt documentation

dbt enables publishing technical documentation as a website. One can add detailed description about dbt models, relevant tests that may have been included, information about sources, table columns, data types, etc... Once these information are entererd, dbt then generates documentation for the given dbt project. The details are entered in the YAML file of dbt… Continue reading dbt documentation

sourcing in data build tool

Dbt sources is where data sources are mentioned. Sources are declared in a YAML file. One can have many sources. One can also write tests to the data being sourced. The YAML file enables one to provide detailed information about each source such that they appear on the documentation section. For instance, the source YAML… Continue reading sourcing in data build tool

dbt models: Part II

We looked at SQL models. Similarly, python model is basically a python script that loads the data into a data frame, performs various transformation using specialist packages like pandas or pyspark and produces a data frame. Example of python model: import pandas as pd def function1(param1, param2): df = … … … resultant_df = …… Continue reading dbt models: Part II

Docker Networking

By default, uses a type of network called bridge network. Within a given Docker Host, all the containers can easily communicate with each other. Their usual IP addresses start off with 172.17.xxx.xxx Port mapping is essential in case of a web application listening in on a port. This is because bridge network provides an internal… Continue reading Docker Networking

data build tool materialization: Part II

We looked at tables and view. The other ones are incremental and ephemeral. Incremental loads implement a very elegant solution. during the first run, the table is populated in the data store. any subsequent runs from that point on, only the new rows will be inserted, existing rows that needs any changes will be changed… Continue reading data build tool materialization: Part II