Modern Data Stack – Page 2 – DataScience Hacks

Docker Internal Hub Configuration

April 24, 2022July 18, 2023 PavanLeave a comment

It is possible to setup internal hub that may be accessible only by the organization. To setup an internal hub, docker run -d \-p 5000:5000 \—restart=always \—name \registry:2 Here are some sample code, First pull the image(s) that may be needed: docker pull Re-tag the image(s) docker image tag 198.xxx.xxx.xxx:8080/ Push the image(s) to the… Continue reading Docker Internal Hub Configuration

dbt models: Part II

March 11, 2022July 18, 2023 PavanLeave a comment

We looked at SQL models. Similarly, python model is basically a python script that loads the data into a data frame, performs various transformation using specialist packages like pandas or pyspark and produces a data frame. Example of python model: import pandas as pd def function1(param1, param2): df = … … … resultant_df = …… Continue reading dbt models: Part II

dbt models: Part I

March 7, 2022July 18, 2023 PavanLeave a comment

Dbt models are such that they constitute a given SQL query. There are two types of dbt models. They are SQL models and python models. Dbt models is not a reference to relational or dimensional data models. Dbt models are code. The SQL model is basically a SQL file with exactly one SQL statement. It… Continue reading dbt models: Part I

Docker Networking

March 6, 2022July 18, 2023 PavanLeave a comment

By default, uses a type of network called bridge network. Within a given Docker Host, all the containers can easily communicate with each other. Their usual IP addresses start off with 172.17.xxx.xxx Port mapping is essential in case of a web application listening in on a port. This is because bridge network provides an internal… Continue reading Docker Networking

data build tool materialization: Part II

January 31, 2022July 18, 2023 PavanLeave a comment

We looked at tables and view. The other ones are incremental and ephemeral. Incremental loads implement a very elegant solution. during the first run, the table is populated in the data store. any subsequent runs from that point on, only the new rows will be inserted, existing rows that needs any changes will be changed… Continue reading data build tool materialization: Part II

data build tool materialization: Part I

January 17, 2022July 18, 2023 PavanLeave a comment

Dbt has some terminologies that may be unique to dbt. One of such is the concept of materialization. Materialization is a load strategy, that denotes how one plans to load the data into the data store. There are four types of load that dbt offers, namely, tables, view, incremental and ephemeral. Tables are the traditional… Continue reading data build tool materialization: Part I

Docker Architecture

January 16, 2022July 18, 2023 PavanLeave a comment

Docker has a layered architecture. Starting with its broad layers, there are: Image layer Container layer Image layer is a read only layer whereas container layer is a run time, supporting both read and write operations. The Image layer architecture Is directly proportional to its Dockerfile. The base layer of the Dockerfile specifies an operating… Continue reading Docker Architecture

dbt platform and version control

December 13, 2021July 18, 2023 PavanLeave a comment

There exists two ways that one can use Dbt in their project. They are command line interface and a cloud based IDE Dbt core is the command line interface tool that can be made to run locally. Dbt cloud has this very appealing IDE (at least to a data person) Dbt focusses on the transformation… Continue reading dbt platform and version control

Docker Environment Variables

December 12, 2021July 18, 2023 PavanLeave a comment

Docker supports environment variables that helps define various configuration values. One can leverage on docker inspect command to explore the properties of the container, specifically what environment variables are present. Environment variables are quite handy with database containers. For instance, docker run -d \-e MYSQL_ROOT_PASSWORD= \--name \

Docker Images & DockerFile

November 14, 2021July 18, 2023 PavanLeave a comment

Docker images are the higher level abstraction of docker containers. Docker images are generated using Dockerfile. Several commands are issued in the DockerFile, which is then leveraged to generate docker image. The command is , docker build Dockerfile -t In case the image is too big, there may be options to build a smaller images.… Continue reading Docker Images & DockerFile