Life-cycle of a Data Science Project


Are you wondering how would the life-cycle of a data science project be? Here you go..
Problem Identification:

1 identify-the-problem

Have you ever heard the phrase “Here’s the data, can you do some analysis find some insights?” Often, management approach Data Scientists with vague or even undefined goals. Understanding the goal is important and sets up the rest of the project for success.

This step consumes up about 10% of the time in the project life-cycle

Data Preparation:

2 data prep

So far, everybody’s least favorite stage, but possibly the most important one. Data can come from different sources, be in the ugly format, and have errors and a myriad of other problems. A single error in this stage can render the rest of the analysis useless.

That’s why typically, up to 70% of the time is spent here.

Analyse the data:

3 Data-Analysis

Creating models, performing data mining, setting up simulations etc. This is the most exciting part and if the previous stages were done correctly, analyzing the data and getting insights will feel like a good.

Time needed here would be 10%

Visualization of the insights:

4 Visual

Visualizing comes hand-in-hand with analyzing. This is a powerful technique as looking at the data in various forms and shapes can help reveal insights that are otherwise not evident. Also several projects such as BI dashboards don’t need much analysis but rely on visualization instead.

Time needed here would be 10%

Presentation of the findings:

5 data-presentation

We’ve reached 100% the project is over! Actually, No. Presenting findings is a whole separate “Additional” stage. You need to not only convey the insights in your audience’s language but also get buy-in from them to take action based on those insights. This is an art.

Time needed: extra 80% 🙂

Hope you benefited ! Enjoy learning!


Treasure for Data Science blogs (A to Z)

This blog will help you in knowledge hunt of Data science. The below given list will help you to find the blogs that talk about Data science easily. I hope you will find this useful.

A Blog From a Human-engineer-being 
Aakash Japi 
Adit Deshpande 
Advanced Analytics & R 
Adventures in Data Land 
Agile Data Science 
Ahmed El Deeb 
Airbnb Data blog 
Alex Castrounis | InnoArchiTech 
Alex Perrier 
Algobeans | Data Analytics Tutorials & Experiments for the Layman 
Amazon AWS AI Blog 
Analytics Vidhya 
Analytics and Visualization in Big Data @ Sicara 
Andreas Müller 
Andrej Karpathy blog 
Andrew Brooks 
Andrey Kurenkov 
Anton Lebedevich’s Blog 
Arthur Juliani 
Audun M. Øygard 
Avi Singh 
Beautiful Data 
Becoming A Data Scientist 
Ben Bolte’s Blog 
Ben Frederickson 
Berkeley AI Research 
Big-Ish Data 
Blog on neural networks 
Blogistic RegressionAbout Projects 
blogR | R tips and tricks from a scientist 
Brain of mat kelcey 
Brilliantly wrong thoughts on science and programming 
Bugra Akyildiz 
Building Babylon 
Carl Shan 
Chris Stucchio 
Christophe Bourguignat 
Christopher Nguyen 
Cloudera Data Science Posts 
colah’s blog 
Cortana Intelligence and Machine Learning Blog 
Daniel Forsyth 
Daniel Homola 
Daniel Nee 
Data Based Inventions 
Data Blogger 
Data Labs 
Data Meets Media 
Data Miners Blog 
Data Mining Research 
Data Mining: Text Mining, Visualization and Social Media 
Data Piques 
Data School 
Data Science 101 
Data Science @ Facebook 
Data Science Insights 
Data Science Tutorials 
Data Science Vademecum 
Dataquest Blog 
David Mimno 
Dayne Batten 
Deep Learning 
Delip Rao 
District Data Labs 
Diving into data 
Domino Data Lab’s blog 
Dr. Randal S. Olson 
Drew Conway 
Dustin Tran 
Eder Santana 
Edwin Chen 
Emilio Ferrara, Ph.D. 
Entrepreneurial Geekiness 
Eric Jonas 
Eric Siegel 
Erik Bern 
Eugenio Culurciello 
Fabian Pedregosa 
Fast Forward Labs 
Florian Hartl 
Full Stack ML 
Garbled Notes 
Greg Reda 
Hyon S Chu 
i am trask 
I Quant NY 
Insight Data Science 
Ira Korshunova 
I’m a bandit 
Jason Toy 
Jeremy D. Jackson, PhD 
Jesse Steinweg-Woods 
Joe Cauteruccio 
John Myles White 
John’s Soapbox 
Jonas Degrave 
Joy Of Data 
Julia Evans 
Keeping Up With The Latest Techniques 
Kenny Bastani 
Kevin Davenport 
kevin frans 
korbonits | Math ∩ Data 
Large Scale Machine Learning 
Lazy Programmer 
Learn Analytics Here 
Learning With Data 
Life, Language, Learning 
Locke Data 
Louis Dorard 
Machine Learning (Theory) 
Machine Learning and Data Science 
Machine Learning 
Machine Learning Mastery 
Machine Learning Blogs 
Machine Learning, etc 
Machine Learning, Maths and Physics 
Machined Learnings 
MAPR Blog 
Math ∩ Programming 
Matthew Rocklin 
Melody Wolk 
Mic Farris 
Mike Tyka 
minimaxir | Max Woolf’s Blog 
Mirror Image 
Mitch Crowe 
Models are illuminating and wrong 
Moody Rd 
Mourad Mourafiq 
My thoughts on Data science, predictive analytics, Python 
Natural language processing blog 
Neil Lawrence 
NLP and Deep Learning enthusiast 
no free hunch 
Nuit Blanche 
Number 2147483647 
On Machine Intelligence 
Opiate for the masses Data is our religion. 
Pete Warden’s blog 
Plotly Blog 
Probably Overthinking It 
Publishable Stuff 
Pythonic Perambulations 
R and Data Mining 
Ramiro Gómez 
Random notes on Computer Science, Mathematics and Software Engineering 
Randy Zwitch 
RaRe Technologies 
Rinu Boney 
RNDuja Blog 
Robert Chang 
Rocket-Powered Data Science 
Sachin Joglekar’s blog 
Sean J. Taylor 
Sebastian Raschka 
Sebastian Ruder 
Sebastian’s slow blog 
SFL Scientific Blog 
Shakir’s Machine Learning Blog 
Simply Statistics 
Springboard Blog
Startup.ML Blog 
Statistical Modeling, Causal Inference, and Social Science 
Stigler Diet 
Stitch Fix Tech Blog 
Storytelling with Statistics on Quora 
Subconscious Musings 
Swan Intelligence 
The Angry Statistician 
The Clever Machine 
The Data Camp Blog 
The Data Incubator 
The Data Science Lab 
The Science of Data 
The Shape of Data 
The unofficial Google data science Blog 
Tim Dettmers 
Tombone’s Computer Vision Blog 
Tommy Blanchard 
Trevor Stephens 
Trey Causey 
UW Data Science Blog 
Wes McKinney 
While My MCMC Gently Samples 
Will do stuff for stuff 
Will wolf 
William Lyon 
Win-Vector Blog 
Yanir Seroussi 
Zac Stewart 
ℚuantitative √ourney 

Data Engineer vs Data Scientist (Infographic)

This Infographic will assist us to understand better about the skills and responsibilities of Data Engineer and Data Scientist. Also, it helps us to compare salaries, popular software and tools used by each. Hope this helps!


10 famous TV shows related to Data science & AI (Artificial Intelligence)

“If you want to become one, first get inspired by one”

There is always few interesting ways to learn things and get inspire. Would you like to know few TV shows which are based on Data science and Artificial intelligence? We always like to do the things in the way we love. Here you go & happy watching (learning)



Thanks to AV for this.

Top 8 Viz features in Excel 2016 !

This is especially for the excel lovers! In this blog, we will see few of the new and exciting data visualization features of Excel 2016.

Here is the list of new features

  1. Hierarchy Chart/Tree Map
  2. Sunburst
  3. Water fall or Stock Chart
  4. Transform Cold data into a cool picture
  5. Instant Histogram
  6. Pareto Chart
  7. 3D map
  8. One click forecast

These are the most wanted charts by the Dashboard creators. These are very simple and attractive. This set of features makes excel more competitive with other expensive visualization tools.

  1. Hierarchy Chart/Tree Map:

Select the data that you want to use for creation of the chart then Go to ‘Insert’ tab > Charts > Insert Hierarchy Chart


Isn’t it cool? OK, we go to the next one.

2. Sunburst/Donut Chart:

It is another representation of a Pie chart. An alternate to boring the Pie chart. Go to ‘Insert’ > Charts > Insert Hierarchy ChartSunburst

3. Water fall or Stock Chart

It is recommended to sort the data by any order to have the better insights.Screenshot 2016-01-02 12.13.11.png

4. Transform Cold data into a cool picture

This one is based on the Add-ins.

Screenshot 2016-01-02 13.10.54

Select your data to visualizeScreenshot 2016-01-02 12.21.56Screenshot 2016-01-02 12.22.02

Select ‘Settings’ to change the design of the chartsScreenshot 2016-01-02 12.24.11

5. Instant Histogram:

Create histograms quickly instead of going to “Analysis Tool Pack” in add-ins. Go to Insert > Charts > Histogram

Screenshot 2016-01-02 13.38.51.png

6. Pareto Chart:

Earlier, we had to customize the data structure to create ‘Pareto chart’ but now it is just a click away to explain the 80/20 principle.

Screenshot 2016-01-02 13.50.36.png

7. 3D map:

Power Map, the popular 3-D geospatial visualization add-in for Excel 2013, is now fully integrated into Excel. We’ve also this feature a more descriptive name, “3D Maps”. You’ll find this functionality alongside other visualization features on the Insert tab.

Screenshot 2016-01-02 13.55.08

It will open another sheet like below Screenshot 2016-01-02 14.00.36.png

then we can change the theme and other options like ‘2D Map’. “Play Tour” option will show an awesome chart with lively visual.

Screenshot 2016-01-02 14.02.13Screenshot 2016-01-02 14.03.48

8. One click Forecast

It has become more easy for the Data analysts who do forecast.

Select the data that you want to forecast and Go to ‘Data’ tab > Click on “Forecast Sheet”

Screenshot 2016-01-02 14.11.35

Adjust the “Seasonality” appropriatelyScreenshot 2016-01-02 14.17.37

Screenshot 2016-01-02 14.18.19

and your forecast is ready.

Hope you like these features and much more to come from Microsoft. Try these things and enjoy !

Data Viz ! Cheat sheet for R Data Analyst

Data visualization has become a vital slice of data science arena. Hence, our key tool should have strong capabilities on both the fronts – data analysis as well as data visualization. With this revolution in the landscape, or has extended immense popularity because of its splendid data visualization capabilities. With a few lines of code, you can produce beautiful charts and data stories. R contains superb libraries to create basic and more evolved visualizations like Bar Chart, Histogram, Scatter Plot, Map visualization, Mosaic Plot and various others. Below is the cheat sheet of widespread visualization for representing data. Thanks to my colleague for sharing this.

Data Viz Cheat Sheet

Introducing cricketr! : An R package to analyze performances of cricketers

A very good analysis using R in the field of cricket. Must see ! 🙂

Giga thoughts ...

Yet all experience is an arch wherethro’
Gleams that untravell’d world whose margin fades
For ever and forever when I move.
How dull it is to pause, to make an end,
To rust unburnish’d, not to shine in use!

Ulysses by Alfred Tennyson


This is an initial post in which I introduce a cricketing package ‘cricketr’ which I have created. This package was a natural culmination to my earlier posts on cricket and my completing 9 modules of Data Science Specialization, from John Hopkins University at Coursera. The thought of creating this package struck me some time back, and I have finally been able to bring this to fruition.

So here it is. My R package ‘cricketr!!!’

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package only uses data from test cricket. I plan to develop functionality for One-day and…

View original post 2,667 more words