SIMPLERECOMMENDER
Published on PyPi
A python package that makes recommendations simple by recommending items for a specific existing user based on user rating and average rating per item. It is a blend of popularity-based and content-based systems.
Link to PyPI
Link to a detailed example
DECISION MAKING WITH STATISTICS
Predictive Analytics
In this project, I have used several statistical methods such as hypothesis testing, ANOVA, and chi-square to arrive at business conclusions by summarizing a total of 7 statistical tests. It also covers the simulation of CLT (Central Limit Theory).
Hypothesis tests used:
1. Left-tailed and right-tailed hypothesis testing
2. Two-tailed hypothesis testing
3. Post-hoc test
4. Shapiro test to check normality
5. Levene test to check the equivalence of varience
6. Mann Whitney-u for non-parametric test
7. ANOVA
8. Chi-square
FORECASTING MONTHLY ARMED ROBBERIES IN BOSTON
Time Series Forecasting
Several methods and techniques of time series forecasting are used to forecast whether robberies in Bosted will increase or decrease in upcoming years. Which will help the government and police departments to take measures accordingly.
1. Dickey-Fuller test for stationarity
2. ACF PACF plots
3. Defferencing to make the series stationary
4. Box-Cox transformation
5. Building ARIMA model
6. Hyperparameter tuning
7. Rolling forecasting to capture random variation
8. Exponential Smoothning
FORECASTING MONTHLY ARMED ROBBERIES IN BOSTON
Time Series Forecasting
Several methods and techniques of time series forecasting are used to forecast whether robberies in Bosted will increase or decrease in upcoming years. Which will help the government and police departments to take measures accordingly.
1. Dickey-Fuller test for stationarity
2. ACF PACF plots
3. Defferencing to make the series stationary
4. Box-Cox transformation
5. Building ARIMA model
6. Hyperparameter tuning
7. Rolling forecasting to capture random variation
8. Exponential Smoothning
HOUSE PRICE PREDICTION
Regression Problem
Prices are a good indicator of both the overall market condition and the economic health of a country. The buyers are just not concerned only about the size(square feet) of the house but various other factors play a key role to decide the price of a house/property. Considering the data provided, we are wrangling a large set of property sales records with unknown data quality issues.
Algorithm used:
1. Linear Regression
2. Ridge Regression
3. Grid Search for hyperparameter tuning
Feature engineering:
1. MinMaxScaler
2. StandardScalerFeature selection:
Encoding techniques:
1. OneHot encoding
2. Label encoding
1. Feed-forward selection
Model validation:
1. LOOCV (Leave One Out Cross Validation)
2. K-Fold Cross-Validation
Text Summarization using LSTM's
Natural Language Processing
The objective here is to generate a summary for the "Amazon Fine Food reviews" using the abstraction-based and as well as extraction-based text summarization approaches.
Project pipeline
-
Understanding Text Summarization,
-
Text pre-processing,
-
Abstractive Text Summarization using LSTM, ENCODER-DECODER architecture,
-
Web scrape an article using BS4.
-
Extractive Text Summarization using Transformer
Evaluation Metric for GAN's
Advance Deep Learning
The evaluation of supervised image classification is simple. The projected output must be compared to the actual production. To get this fake(generated) image, though, you use a GAN and some random noise. This created image should appear as authentic as possible. So, how do you measure the reality of this computer-generated image? Or, to put it another way, how can you assess GAN?
One of the most widely used measures for determining the feature distance between real and produced images is Frechet Inception Distance (FID). Frechet Distance is a measure of similarity between curves that takes the placement and order of points along the curves into account. It can also be used to calculate the difference between two distributions.
Real Time Classification of Inddian Car Models
Convolutional Neural Network
The convolutional neural network to predict the Indian car model in a real-time scenario. Can be used on a mobile phone. Transfer learning technique with MobileNets turnes is the best fit model. Use case: Instantly know the car model at your fingertips.
BANK CLIENT CLASSIFICATION
Classification Problem
The classification of clients applying for a loan into bad clients and good clients with respect to the various details regarding the client provided to the bank so the bank could make informed decision to avoid risk of non-repayment of loan and hence reduce liquid damage to the bank.
3. Grid Search for hyperparameter tuning
Scoring matrics used:
1. AUC Score
2. Precision Score
3. Recall Score
4. Accuracy Score
5. Kappa Score
6. f1-score
Deep-Learning-Mini-Projects
Deep Learning
Content:
-
Classifying Cat/Dog
-
Forecasting stock price using LSTM
-
Predicting bank customer churn
-
Predicting pressure level
Techniques used:
1. NearestNeighbors with cosine metric
2. simplerecommender
3. SVDpp from package 'surprise'
4. Apriori from mlxtend
5. association_rules from mlxtend
Algorithms used:
1. Logistic Regression
2. Gaussian Naive Bayes
3. KNN classifier
4. Decision Tree
5. Random Forest
6. XGBoost
Other techniques:
1. SMOTE data balancing
2. RFE feature selection