top of page
Image by Octavian Dan

SIMPLERECOMMENDER

Published on PyPi

A python package that makes recommendations simple by recommending items for a specific existing user based on user rating and average rating per item. It is a blend of popularity-based and content-based systems.

Link to PyPI

Link to a detailed example

image_2021-12-15_112813.png

DECISION MAKING WITH STATISTICS

 Predictive Analytics

image_2021-12-15_112958.png

In this project, I have used several statistical methods such as hypothesis testing, ANOVA, and chi-square to arrive at business conclusions by summarizing a total of 7 statistical tests. It also covers the simulation of CLT (Central Limit Theory).

Hypothesis tests used:
1. Left-tailed and right-tailed hypothesis testing
2. Two-tailed hypothesis testing
3. Post-hoc test
4. Shapiro test to check normality
5. Levene test to check the equivalence of varience
6. Mann Whitney-u for non-parametric test
7. ANOVA
8. Chi-square

FORECASTING MONTHLY ARMED ROBBERIES IN BOSTON

Time Series Forecasting

image_2021-12-15_113220.png

Several methods and techniques of time series forecasting are used to forecast whether robberies in Bosted will increase or decrease in upcoming years. Which will help the government and police departments to take measures accordingly.
1. Dickey-Fuller test for stationarity
2. ACF PACF plots
3. Defferencing to make the series stationary
4. Box-Cox transformation
5. Building ARIMA model
6. Hyperparameter tuning
7. Rolling forecasting to capture random variation
8. Exponential Smoothning

ANIME RECOMMENDATION

Recommendation Systems

image_2021-12-15_113348.png

FORECASTING MONTHLY ARMED ROBBERIES IN BOSTON

Time Series Forecasting

Several methods and techniques of time series forecasting are used to forecast whether robberies in Bosted will increase or decrease in upcoming years. Which will help the government and police departments to take measures accordingly.

 


1. Dickey-Fuller test for stationarity
2. ACF PACF plots
3. Defferencing to make the series stationary
4. Box-Cox transformation
5. Building ARIMA model
6. Hyperparameter tuning
7. Rolling forecasting to capture random variation
8. Exponential Smoothning

HOUSE PRICE PREDICTION

Regression Problem

Prices are a good indicator of both the overall market condition and the economic health of a country. The buyers are just not concerned only about the size(square feet) of the house but various other factors play a key role to decide the price of a house/property. Considering the data provided, we are wrangling a large set of property sales records with unknown data quality issues.

Algorithm used:
1. Linear Regression
2. Ridge Regression
3. Grid Search for hyperparameter tuning

Feature engineering:
1. MinMaxScaler
2. StandardScalerFeature selection:

Encoding techniques:

1. OneHot encoding

2. Label encoding

1. Feed-forward selection

Model validation:
1. LOOCV (Leave One Out Cross Validation)
2. K-Fold Cross-Validation

Text Summarization using LSTM's

Natural Language Processing

image_2021-12-15_115312.png

The objective here is to generate a summary for the "Amazon Fine Food reviews" using the abstraction-based and as well as extraction-based text summarization approaches.

Project pipeline

  1. Understanding Text Summarization,

  2. Text pre-processing,

  3. Abstractive Text Summarization using LSTM, ENCODER-DECODER architecture,

  4. Web scrape an article using BS4.

  5. Extractive Text Summarization using Transformer

Evaluation Metric for GAN's

Advance Deep Learning

image_2021-12-15_115447.png

The evaluation of supervised image classification is simple. The projected output must be compared to the actual production. To get this fake(generated) image, though, you use a GAN and some random noise. This created image should appear as authentic as possible. So, how do you measure the reality of this computer-generated image? Or, to put it another way, how can you assess GAN?

One of the most widely used measures for determining the feature distance between real and produced images is Frechet Inception Distance (FID). Frechet Distance is a measure of similarity between curves that takes the placement and order of points along the curves into account. It can also be used to calculate the difference between two distributions.

Real Time Classification of Inddian Car Models

Convolutional Neural Network

The convolutional neural network to predict the Indian car model in a real-time scenario. Can be used on a mobile phone. Transfer learning technique with MobileNets turnes is the best fit model. Use case: Instantly know the car model at your fingertips.

BANK CLIENT CLASSIFICATION

image_2021-12-15_115632.png

Classification Problem

The classification of clients applying for a loan into bad clients and good clients with respect to the various details regarding the client provided to the bank so the bank could make informed decision to avoid risk of non-repayment of loan and hence reduce liquid damage to the bank.

Tableau Dashboards

Data Analysis

image_2021-12-15_111802.png
image_2021-12-15_111926.png
image_2021-12-15_111845.png
image_2021-12-15_112019.png

3. Grid Search for hyperparameter tuning

Scoring matrics used:

1. AUC Score

2. Precision Score

3. Recall Score

4. Accuracy Score

5. Kappa Score

6. f1-score

Deep-Learning-Mini-Projects

Deep Learning

Content:

  1. Classifying Cat/Dog

  2. Forecasting stock price using LSTM

  3. Predicting bank customer churn

  4. Predicting pressure level

image_2021-12-15_115815.png

Techniques used:
1. NearestNeighbors with cosine metric
2. simplerecommender
3. SVDpp from package 'surprise'
4. Apriori from mlxtend
5. association_rules from mlxtend

Algorithms used:
1. Logistic Regression
2. Gaussian Naive Bayes
3. KNN classifier
4. Decision Tree

5. Random Forest

6. XGBoost

 

Other techniques:

1. SMOTE data balancing

2. RFE feature selection

bottom of page