Posts

Decorators in Python

Decorators provide additional functionality to the functions without directly changing their definition of them. Basically, it takes the functions as an argument, adds functionality to them, and returns it.  Before diving deep into the concept of Decorators, let's first try to understand  What are functions in Python and what is an inner function? In Python everything is Objects, be it Class, Variables, and Functions . So functions are python first-class objects that can be used or passed as an argument. You can store the functions in variables, you can pass a function to another function as parameters, and you can also return the function from the function. Below is one simple example where we are treating functions as objects . def make_me_lowercase ( str ): return str .lower() print (make_me_lowercase( "HELLO World" )) copy_of_you = make_me_lowercase print (copy_of_you( "HELLO World" )) The output of the above calls for both the functio

Top 10 SQL Interview Questions

Image
In this post, we are going to provide the frequently asked SQL  interview questions and answers asked in Data Engineer Interviews . Interviewers usually go from asking fundamental SQL interview queries  to asking advanced SQL  queries. In this post, we are mostly going to discuss SQL interview questions for freshers. You should also be prepared to write some SQL interview questions involving multiple JOINs, CASE statements, and GROUP BY . The below questions are collected on the basis of personal experience in several Interviews with TOP IT companies. Want to become a Data Engineer? Check out below blog posts  1.  5 Key Skills Every Data Engineer needs in 2023 2.  How to prepare for Data Engineering Interviews 3.  Top 25 Data Engineer Questions 1. What is a Primary Key? A Primary key is a field or combination of fields that uniquely identify the records in the tables. The primary key column cannot be NULL or Empty. The primary key column should be unique and the table cannot contain

Top 25 Data Engineer Interview Questions

In my last post  How to prepare for Data Engineer Interviews  I wrote about how one can prepare for the Data Engineer Interviews and in this blog post I am going to provide the  Top 25 Basic   data engineer interview questions  asked frequently and its brief Answers. This is typically the first round of Interview where interviewer just want to access whether you are aware of basic concepts or not and therefore you don't need to explain it in detail. Just a single statement would be suffice. Lets get started A. Programming ( python interview questions for data engineer ) 1. What is the Static method in Python? Static methods are the methods which are bound to the  Class  rather than Class's Object. Thus, it can be called without creating objects of class. We can just call it using the reference of the class. Also, all the objects of the class shares only one copy of the static method. 2. What is a Decorator in Python? Decorators provide additional functionality to the functions

How to Prepare for Data Engineer Interviews?

In recent years, due to the humongous growth of Data, almost all IT companies want to leverage the Data for their Businesses, and that's why the Data Engineering & Data Science opportunities in IT companies are increasing at a rapid rate, we can easily say that Data Engineers are currently at the top of the list of "most hired profiles" in the year 2020-21.  And due to huge demand companies wants to hire Data Engineers who are skilled in programming, SQL, are able to design and create scalable Data Pipelines, and are able to do Data Modelling. In a way, Data engineers should possess all the skills that Software engineers have and as well as skills Data Analysts possess. And, in interviews also the companies look for all the skills mentioned above in Data Engineers. So in this blog post, I am going to cover all the topics and domains one can expect in Data Engineer Interviews A. Programming Round Most of the Product based companies, especially MAANG (Meta, Apple, Amazo

What is NoSQL database?

The name NoSQL itself tells us that it is "non-SQL" or "non-relational" database. Around 30 years back when the data used to be non-changing and smaller in size, traditional relational databases were more prominent like ORACLE, Postgres and so on which had fixed schemas. But during the last decade, the data has grown exponentially and it is also changing quickly. The traditional databases have failed to handle this BIG DATA effectively. So there was a need to introduce the database that can adapt itself with ever-changing data and that can handle the enormous size of data. And thus NoSQL databases came into the picture. Nowadays NoSQL databases have been referred to as "Not Only SQL" databases which means that these databases may support SQL like query languages and can be a part of polygot persistent architecture along with other relational databases. The data structures used in the NoSQL database are more efficient than the data structures used by

Kafka Producer JAVA code

Image
In the last post What is Apache Kafka , we discuss Kafka producer, Topics and Consumers. In this post, I am going to provide JAVA code for writing Kafka Producer and explain how it works. In this code, we are going to send data from CSV row by row to Kafka Topics for further consumption by Kafka Consumer . In this way, we can generate the continuous streams of Data. 1. First, you need to create a Kafka Topic. You can either do it from the console or do programmatically. Please go through the basic Quickstart from  Apache Kafka Website  and create Topics, send basic message from producer and consume it at consumer. Once, you have created Topic, its time to write Kafka Producer, which will send data to Topic. For creating Procuder we need to configure it some parameters. Lets, look at the configurations required for creating producer 1. List of Kafka Brokers 2. Serializer used for sending data to Kafka 3. Acknowledge from Kafka that messages are properly received.

What is Apache Kafka?

Image
Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system that sends messages between applications, processes, and servers. It is composed of three important components called producer (which acts as Sender), Topics (topic or category of messages), and Consumer (which acts as a Reciever) . Kafka aims to provide high-throughput, low latency for real-time data feeds. It is widely used for real-time streams analytics, ingestion data into Spark, Complex Event Processing, Log aggregation, etc. Before understanding Kafka Architecture, lets first understand what Kafka Broker is Kafka Broker: As Kafka is a distributed framework, Kafka's cluster consists of different servers called Brokers running Kafka. Producers publish the message to Kafka Topics within this Broker and consumer consumes this message from Topics. Kafka Architecture: Kafka Architecture is comprised of mainly 3 components Kafka Producer Kafka Topics Kafka Consumer