Blogs Posts
Welcome to my page of passion. Scroll around to find the best technical blogs on Machine Learning and Data Science. Posts are in the order from basic to advance enabling newbies to get started with Data Science. At the end, you will find my freelance blogs.
Data Science Job Market Trend Analysis
Are you preparing for a data science job interview? We have analyzed the hiring trends from more than 3000+ data science job postings across several online career portals. Hopefully, these insights will help you get ready for an interview by analyzing the expectations of employers and the overall market demand.
NumPy - The very basics!
This article is for people who have zero knowledge of NumPy so that they can get a little hang of it to kick start the Data Science journey.
NumPy is the package for scientific and mathematical computing in Python. While NumPy is widely used in an assortment of routines for fast operations on arrays.
Advanced NumPy for Data Science
This will be covering some of the advanced concepts of NumPy specifically functions and methods required to work on a real-time dataset. The concepts covered here are more than enough to start your journey with data
Unpacking Pandas for Data Science
If you are already familiar with NumPy, Pandas is just a package built on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of a single data type(dtype) Pandas has the flexibility to store values of multiple data types. Hence, we say Pandas is heterogeneous. We will unpack several more advantages of Pandas today.
6 Pandas Operations You Should Not Miss
Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate it if required. These are used in the steps performed before building any machine learning model.
Descriptive Statistics with Pandas
Detailed explanation of topics: mean, Trimmed Mean, Weighted Mean, Median, Mode, Deviation, Mean Absolute Deviation, Median Absolute Deviation, Variance, Standard Deviation, Interquartile Range with Python code.
Hypotheses Testing with SciPy
With a lot of hype going on with the data science field, most of us jump directly into machine learning models and algorithms to make business decisions. All the online courses available fail to teach the very basics of decision-making. Hypotheses testing is one of the basic building blocks of decision making and oldest. The earliest use of hypotheses testing was in the 1700s by John Arbuthnot to test whether male and female births are equally likely to occur.
MySQL Functions: Cheatsheet with examples
The intention of the article is to provide one spot for all MySQL functions so that one can quickly go through it before your interview or an examination. I’m assuming you already have basic knowledge of SQL. Without wasting your time let me directly jump into the functions.
Beginners Guide to Data Visualization with Bokeh
Bokeh is a data visualization library in Python. It provides highly interactive graphs and plots. What makes it different from other Python plotting libraries is that the output from Bokeh will be on the web page, meaning if we run the code in python editor the resulting plot will be in the browser. This gives the advantage of embedding the Bokeh plot on any website using Django or Flask.
Analyzing CitiBike Data: EDA
CitiBike is New York City’s famous bike rental company and the largest in the USA.
I have got the data of CityBike riders of June 2013 from Kaggle. I will walk you through the complete exploratory data analysis answering some of the questions like:
Automating Data Science with dabl
The main idea behind developing the library is to allow data scientists to spend more time thinking about the problem statement and creating more custom analyses instead of going through the same repeated traditional steps every time. dabl takes inspiration from scikit-learn and auto-sklearn.
Python 3.9 Updates in 2 Minutes
The stable version of Python 3.9.0 has been released on 5th October 2020. Let’s see the new major features.
Random Number Generator Tutorial with Python
In this tutorial, we will dive into what pseudorandomness is, its importance in machine learning and data science, and how to create a random number generator to generate pseudorandom numbers in Python using popular libraries.
Genetic Algorithm (GA) Introduction with Example Code
This tutorial will be diving into genetic algorithms in detail and explaining their implementation in Python. We will also explore the different methods involved in each step diagrammatically. As always, we are including code for reproducibility purposes. We have split the code when required while exploring the different steps involved during our implementation.
K-Nearest Neighbors (KNN) Algorithm Tutorial — Machine Learning Basics
The k-nearest neighbor algorithm, commonly known as the KNN algorithm, is a simple yet effective classification and regression supervised machine learning algorithm. This article will be covering the KNN Algorithm, its applications, pros and cons, the math behind it, and its implementation in Python.
Support Vector Machine (SVM) Introduction — Machine Learning
SVM stands for support vector machine, and although it can solve both classification and regression problems, it is mainly used for classification problems in machine learning (ML). SVM models help us classify new data points based on previously classified similar data, making it is a supervised machine learning technique.
Data Collaboration Made Easier
DataLogz is a free web tool that offers a zero-implementation cost solution for data science and analytics teams to organize data without complicated IT procedures. This tool can be used immediately without any hustle and bustle, which helps in understanding data faster to generate valuable insights and document data in a modern way
Power BI Metadata
During the last two decades, the data has been flowing in abundance to the companies, and hence the new modern, advanced, interactive, and more accessible tool was required, which gave birth to tools like Power BI.
Data Documentation Tools
Data documentation ensures that data is understandable and interpretable by any consumer. It should describe how the data was created, the context for the data, the structure of the data and its contents, and any alterations done to the data.
Data Catalog in Azure
As data assists in making better decisions, solving problems, understanding performance, improving procedures, and understanding customers, data is becoming more critical in the industry. The data catalog may be used to maintain such assets in various structured ways.