Posts

Showing posts from 2021

Mini Program 20 - Placement data analysis using Spark

Image
Apache spark is an open source unified analytics engine for large scale data processing. It is 100 times faster in operation and hence preferred for big data analysis. The set of commands used in spark is bit different from that of pandas. In this analysis we aim to familiarize you with common spark commands that can prove to be handy during exploratory data analysis. We have used Placement dataset from Kaggle for comparison with pandas and spark operations . Watch the video to know more. #CodeWithUs to find out more and do it your self!!  You can find the code at  Python Code - GitHub

Mini Program 19 - Integration of power bi dashboard in Jupyter notebook

Image
Data visualization is an important part of any data science project. This involves creation of attractive dashboards to tell stories about the data. Recently, Microsoft announced that power bi dashboards can be embedded in Jupyter notebook. Here is a simple illustration on how it can be done . #CodeWithUs to find out more and do it your self!!  You can find the code at  Python Code - GitHub

Mini Program 18 - Data Science Model Deployment using Flask

Image
Model deployment is an important part of any data science project. This involves creation of a front end application that can be used by a broader audience to get the result obtained from model using varied inputs. This is made simple in Python by using a web  application  framework - Flask. This article aims at showing a simple demo of model deployment using Flask. #CodeWithUs to find out more and do it your self!!  You can find the code at  Python Code - GitHub

Mini Program 17 - Health Insurance Data Analysis & Model building using Python - Part 4

Image
After exploratory data analysis and building hypothesis, we move to predictive model building stage where we try and test many models on the same dataset to compare the performance and to check which one fits the best in the given business and technical constraints. We should start with the base model and then build advanced models or try some bagging/boosting mechanisms to improve the performance. We are  analyzing the health insurance dataset in this case study.  You can follow  this  post to know how to get started. #CodeWithUs to find out more and do it your self!!  You can find the code at  Python Code - GitHub Bonus content - Read this article to know about how to optimize your Python codes