Previously, we looked at reusable piece of transformation by using dbt macros. Dbt packages are a way to modularize the entire code. One can create a reusable library (which may contain several reusable tests and transformations). One can add the packages into their existing projects. This way one can reuse the transformations and other reusable… Continue reading dbt packages
Prefect flows and tasks
Tasks are the most basic unit of work in Prefect. A task performs certain function or operation. The way to enable a task in Prefect is by using the @task decorator. A flow is a combination of several tasks where one can specify a certain order of execution. Flow will contain one or more tasks… Continue reading Prefect flows and tasks
Prefect Installation: Part I
Prefect has two editions — open source and Prefect cloud. To get going with Prefect open source, one can install Prefect from pip installer. It may be advised to create a separate conda environment or virtual environment. To use virtual environment, virtualenv {{name-of-the-virtual-environment}} To activate the virtual environment source {{name-of-the-virtual-environment}}/bin/activate One may choose to install… Continue reading Prefect Installation: Part I
dbt macros
Just like any other programming/scripting language, dbt macros are reusable code. The idea is to capture the business logic that may occur in multiple instances and write macro for such transformations so that it can be reused in various models. Here is an example: {% macro mean(field, table) select avg({{ field }}) as mean from… Continue reading dbt macros
Prefect
For a data professional, there are several tasks that need to be completed in addition to writing the data job. Some of them may include sending email notifications, using cron scheduler, write procedures that would re-try a specific operation upon failure, optimize the data flow, etc… Here is a workflow orchestration tool that would take… Continue reading Prefect
dbt jinja
Dbt uses Jinja as templating language. Jinja enables dynamical generation of SQL code. One can create a SQL template and reuse it for multiple values. So that one can generate SQL dynamically that matches a condition, iterate over a list and use macros (reusable piece of code). Jinja has their own syntax. {{ something }}… Continue reading dbt jinja
dbt environment and deployment
Dbt jobs can be scheduled to run at specific time intervals using an automated system. Dbt commands that are used to build, run and debug the project can be scheduled to run automatically. deployment is when the jobs in the analyst's workstation moves to production server. One can think of this as development and production… Continue reading dbt environment and deployment
dbt documentation
dbt enables publishing technical documentation as a website. One can add detailed description about dbt models, relevant tests that may have been included, information about sources, table columns, data types, etc... Once these information are entererd, dbt then generates documentation for the given dbt project. The details are entered in the YAML file of dbt… Continue reading dbt documentation
dbt tests
Dbt enables unit testing the models. There are two ways of defining tests, singular and generic test. A singular test is a simple SQL query that is stored in a SQL file, stored under the test directory. It returns failed rows. A generic test is a parameterized SQL query that accepts arguments within the SQL… Continue reading dbt tests
dbt seeds
Dbt seeds are flat files that can be added in a dbt project. These flat files can be added into the seeds folder. The seeds can be version controlled. A common use case is to add crosswalk or validation tables within the seeds folder. For instance, S.No, ShortForm, LongForm 1, NH, New Hampshire 2, OH,… Continue reading dbt seeds