Treasure for Data Science blogs (A to Z)

This blog will help you in knowledge hunt of Data science. The below given list will help you to find the blogs that talk about Data science easily. I hope you will find this useful.

A Blog From a Human-engineer-being http://www.erogol.com/ 
Aakash Japi http://aakashjapi.com/ 
Adit Deshpande https://adeshpande3.github.io/ 
Advanced Analytics & R http://advanceddataanalytics.net/ 
Adventures in Data Land http://blog.smola.org 
Agile Data Science http://blog.sense.io/ 
Ahmed El Deeb https://medium.com/@D33B 
Airbnb Data blog http://nerds.airbnb.com/data/ 
Alex Castrounis | InnoArchiTech http://www.innoarchitech.com/ 
Alex Perrier http://alexperrier.github.io/ 
Algobeans | Data Analytics Tutorials & Experiments for the Layman https://algobeans.com 
Amazon AWS AI Blog https://aws.amazon.com/blogs/ai/ 
Analytics Vidhya http://www.analyticsvidhya.com/blog/ 
Analytics and Visualization in Big Data @ Sicara https://blog.sicara.com 
Andreas Müller http://peekaboo-vision.blogspot.com/ 
Andrej Karpathy blog http://karpathy.github.io/ 
Andrew Brooks http://brooksandrew.github.io/simpleblog/ 
Andrey Kurenkov http://www.andreykurenkov.com/writing/ 
Anton Lebedevich’s Blog http://mabrek.github.io/ 
Arthur Juliani https://medium.com/@awjuliani 
Audun M. Øygard http://www.auduno.com/ 
Avi Singh https://avisingh599.github.io/ 
Beautiful Data http://beautifuldata.net/ 
Beckerfuffle http://mdbecker.github.io/ 
Becoming A Data Scientist http://www.becomingadatascientist.com/ 
Ben Bolte’s Blog http://benjaminbolte.com/ml/ 
Ben Frederickson http://www.benfrederickson.com/blog/ 
Berkeley AI Research http://bair.berkeley.edu/blog/ 
Big-Ish Data http://bigishdata.com/ 
Blog on neural networks http://yerevann.github.io/ 
Blogistic RegressionAbout Projects http://d10genes.github.io/blog/ 
blogR | R tips and tricks from a scientist https://drsimonj.svbtle.com/ 
Brain of mat kelcey http://matpalm.com/blog/ 
Brilliantly wrong thoughts on science and programming https://arogozhnikov.github.io/ 
Bugra Akyildiz http://bugra.github.io/ 
Building Babylon https://building-babylon.net/ 
Carl Shan http://carlshan.com/ 
Chris Stucchio https://www.chrisstucchio.com/blog/index.html 
Christophe Bourguignat https://medium.com/@chris_bour 
Christopher Nguyen https://medium.com/@ctn 
Cloudera Data Science Posts http://blog.cloudera.com/blog/category/data-science/ 
colah’s blog http://colah.github.io/archive.html 
Cortana Intelligence and Machine Learning Blog https://blogs.technet.microsoft.com/machinelearning/ 
Daniel Forsyth http://www.danielforsyth.me/ 
Daniel Homola http://danielhomola.com/category/blog/ 
Daniel Nee http://danielnee.com 
Data Based Inventions http://datalab.lu/ 
Data Blogger https://www.data-blogger.com/ 
Data Labs http://blog.insightdatalabs.com/ 
Data Meets Media http://datameetsmedia.com/ 
Data Miners Blog http://blog.data-miners.com/ 
Data Mining Research http://www.dataminingblog.com/ 
Data Mining: Text Mining, Visualization and Social Media http://datamining.typepad.com/data_mining/ 
Data Piques http://blog.ethanrosenthal.com/ 
Data School http://www.dataschool.io/ 
Data Science 101 http://101.datascience.community/ 
Data Science @ Facebook https://research.facebook.com/blog/datascience/ 
Data Science Insights http://www.datasciencebowl.com/data-science-insights/ 
Data Science Tutorials https://codementor.io/data-science/tutorial 
Data Science Vademecum http://datasciencevademecum.wordpress.com/ 
Dataaspirant http://dataaspirant.com/ 
Dataclysm http://blog.okcupid.com/ 
DataGenetics http://datagenetics.com/blog.html 
Dataiku https://www.dataiku.com/blog/ 
DataKind http://www.datakind.org/blog 
DataLook http://blog.datalook.io/ 
Datanice https://datanice.wordpress.com/ 
Dataquest Blog https://www.dataquest.io/blog/ 
DataRobot http://www.datarobot.com/blog/ 
Datascope http://datascopeanalytics.com/blog 
DatasFrame http://tomaugspurger.github.io/ 
David Mimno http://www.mimno.org/ 
Dayne Batten http://daynebatten.com 
Deep Learning http://deeplearning.net/blog/ 
Deepdish http://deepdish.io/ 
Delip Rao http://deliprao.com/ 
DENNY’S BLOG http://blog.dennybritz.com/ 
Dimensionless https://dimensionless.in/blog/ 
Distill http://distill.pub/ 
District Data Labs http://districtdatalabs.silvrback.com/ 
Diving into data https://blog.datadive.net/ 
Domino Data Lab’s blog http://blog.dominodatalab.com/ 
Dr. Randal S. Olson http://www.randalolson.com/blog/ 
Drew Conway https://medium.com/@drewconway 
Dustin Tran http://dustintran.com/blog/ 
Eder Santana https://edersantana.github.io/blog.html 
Edwin Chen http://blog.echen.me 
EFavDB http://efavdb.com/ 
Emilio Ferrara, Ph.D. http://www.emilio.ferrara.name/ 
Entrepreneurial Geekiness http://ianozsvald.com/ 
Eric Jonas http://ericjonas.com/archives.html 
Eric Siegel http://www.predictiveanalyticsworld.com/blog 
Erik Bern http://erikbern.com 
ERIN SHELLMAN http://www.erinshellman.com/ 
Eugenio Culurciello http://culurciello.github.io/ 
Fabian Pedregosa http://fa.bianp.net/ 
Fast Forward Labs http://blog.fastforwardlabs.com/ 
FastML http://fastml.com/ 
Florian Hartl http://florianhartl.com/ 
FlowingData http://flowingdata.com/ 
Full Stack ML http://fullstackml.com/ 
GAB41 http://www.lab41.org/gab41/ 
Garbled Notes http://www.chioka.in/ 
Greg Reda http://www.gregreda.com/blog/ 
Hyon S Chu https://medium.com/@adailyventure 
i am trask http://iamtrask.github.io/ 
I Quant NY http://iquantny.tumblr.com/ 
inFERENCe http://www.inference.vc/ 
Insight Data Science https://blog.insightdatascience.com/ 
INSPIRATION INFORMATION http://myinspirationinformation.com/ 
Ira Korshunova http://irakorshunova.github.io/ 
I’m a bandit https://blogs.princeton.edu/imabandit/ 
Jason Toy http://www.jtoy.net/ 
Jeremy D. Jackson, PhD http://www.jeremydjacksonphd.com/ 
Jesse Steinweg-Woods https://jessesw.com/ 
Joe Cauteruccio http://www.joecjr.com/ 
John Myles White http://www.johnmyleswhite.com/ 
John’s Soapbox http://joschu.github.io/ 
Jonas Degrave http://317070.github.io/ 
Joy Of Data http://www.joyofdata.de/blog/ 
Julia Evans http://jvns.ca/ 
KDnuggets http://www.kdnuggets.com/ 
Keeping Up With The Latest Techniques http://colinpriest.com/ 
Kenny Bastani http://www.kennybastani.com/ 
Kevin Davenport http://kldavenport.com/ 
kevin frans http://kvfrans.com/ 
korbonits | Math ∩ Data http://korbonits.github.io/ 
Large Scale Machine Learning http://bickson.blogspot.com/ 
LATERAL BLOG https://blog.lateral.io/ 
Lazy Programmer http://lazyprogrammer.me/ 
Learn Analytics Here https://learnanalyticshere.wordpress.com/ 
LearnDataSci http://www.learndatasci.com/ 
Learning With Data http://learningwithdata.com/ 
Life, Language, Learning http://daoudclarke.github.io/ 
Locke Data https://itsalocke.com/blog/ 
Louis Dorard http://www.louisdorard.com/blog/ 
M.E.Driscoll http://medriscoll.com/ 
Machinalis http://www.machinalis.com/blog 
Machine Learning (Theory) http://hunch.net/ 
Machine Learning and Data Science http://alexhwoods.com/blog/ 
Machine Learning https://charlesmartin14.wordpress.com/ 
Machine Learning Mastery http://machinelearningmastery.com/blog/ 
Machine Learning Blogs https://machinelearningblogs.com/ 
Machine Learning, etc http://yaroslavvb.blogspot.com 
Machine Learning, Maths and Physics https://mlopezm.wordpress.com/ 
Machined Learnings http://www.machinedlearnings.com/ 
MAPPING BABEL https://jack-clark.net/ 
MAPR Blog https://www.mapr.com/blog 
MAREK REI http://www.marekrei.com/blog/ 
MARGINALLY INTERESTING http://blog.mikiobraun.de/ 
Math ∩ Programming http://jeremykun.com/ 
Matthew Rocklin http://matthewrocklin.com/blog/ 
Melody Wolk http://melodywolk.com/projects/ 
Mic Farris http://www.micfarris.com/ 
Mike Tyka http://mtyka.github.io/ 
minimaxir | Max Woolf’s Blog http://minimaxir.com/ 
Mirror Image https://mirror2image.wordpress.com/ 
Mitch Crowe http://www.dataphoric.com/ 
MLWave http://mlwave.com/ 
MLWhiz http://mlwhiz.com/ 
Models are illuminating and wrong https://peadarcoyle.wordpress.com/ 
Moody Rd http://blog.mrtz.org/ 
Moonshots http://jxieeducation.com/ 
Mourad Mourafiq http://mourafiq.com/ 
My thoughts on Data science, predictive analytics, Python http://shahramabyari.com/ 
Natural language processing blog http://nlpers.blogspot.fr/ 
Neil Lawrence http://inverseprobability.com/blog.html 
NLP and Deep Learning enthusiast http://camron.xyz/ 
no free hunch http://blog.kaggle.com/ 
Nuit Blanche http://nuit-blanche.blogspot.com/ 
Number 2147483647 https://no2147483647.wordpress.com/ 
On Machine Intelligence https://aimatters.wordpress.com/ 
Opiate for the masses Data is our religion. http://opiateforthemass.es/ 
p-value.info http://www.p-value.info/ 
Pete Warden’s blog http://petewarden.com/ 
Plotly Blog http://blog.plot.ly/ 
Probably Overthinking It http://allendowney.blogspot.ca/ 
Prooffreader.com http://www.prooffreader.com 
ProoffreaderPlus http://prooffreaderplus.blogspot.ca/ 
Publishable Stuff http://www.sumsar.net/ 
PyImageSearch http://www.pyimagesearch.com/ 
Pythonic Perambulations https://jakevdp.github.io/ 
quintuitive http://quintuitive.com/ 
R and Data Mining https://rdatamining.wordpress.com/ 
R-bloggers http://www.r-bloggers.com/ 
R2RT http://r2rt.com/ 
Ramiro Gómez http://ramiro.org/notebooks/ 
Random notes on Computer Science, Mathematics and Software Engineering http://barmaley-exe.github.io/ 
Randy Zwitch http://randyzwitch.com/ 
RaRe Technologies http://rare-technologies.com/blog/ 
Rayli.Net http://rayli.net/blog/ 
Revolutions http://blog.revolutionanalytics.com/ 
Rinu Boney http://rinuboney.github.io/ 
RNDuja Blog http://rnduja.github.io/ 
Robert Chang https://medium.com/@rchang 
Rocket-Powered Data Science http://rocketdatascience.org 
Sachin Joglekar’s blog https://codesachin.wordpress.com/ 
samim https://medium.com/@samim 
Sean J. Taylor http://seanjtaylor.com/ 
Sebastian Raschka http://sebastianraschka.com/blog/index.html 
Sebastian Ruder http://sebastianruder.com/ 
Sebastian’s slow blog http://www.nowozin.net/sebastian/blog/ 
SFL Scientific Blog https://sflscientific.com/blog/ 
Shakir’s Machine Learning Blog http://blog.shakirm.com/ 
Simply Statistics http://simplystatistics.org 
Springboard Blog http://springboard.com/blog
Startup.ML Blog http://startup.ml/blog 
Statistical Modeling, Causal Inference, and Social Science http://andrewgelman.com/ 
Stigler Diet http://stiglerdiet.com/ 
Stitch Fix Tech Blog http://multithreaded.stitchfix.com/blog/ 
Storytelling with Statistics on Quora http://datastories.quora.com/ 
StreamHacker http://streamhacker.com/ 
Subconscious Musings http://blogs.sas.com/content/subconsciousmusings/ 
Swan Intelligence http://swanintelligence.com/ 
TechnoCalifornia http://technocalifornia.blogspot.se/ 
TEXT ANALYSIS BLOG | AYLIEN http://blog.aylien.com/ 
The Angry Statistician http://angrystatistician.blogspot.com/ 
The Clever Machine https://theclevermachine.wordpress.com/ 
The Data Camp Blog https://www.datacamp.com/community/blog 
The Data Incubator http://blog.thedataincubator.com/ 
The Data Science Lab https://datasciencelab.wordpress.com/ 
THE ETZ-FILES http://alexanderetz.com/ 
The Science of Data http://www.martingoodson.com 
The Shape of Data https://shapeofdata.wordpress.com 
The unofficial Google data science Blog http://www.unofficialgoogledatascience.com/ 
Tim Dettmers http://timdettmers.com/ 
Tombone’s Computer Vision Blog http://www.computervisionblog.com/ 
Tommy Blanchard http://tommyblanchard.com/category/projects 
Trevor Stephens http://trevorstephens.com/ 
Trey Causey http://treycausey.com/ 
UW Data Science Blog http://datasciencedegree.wisconsin.edu/blog/ 
Wellecks http://wellecks.wordpress.com/ 
Wes McKinney http://wesmckinney.com/archives.html 
While My MCMC Gently Samples http://twiecki.github.io/ 
WildML http://www.wildml.com/ 
Will do stuff for stuff http://rinzewind.org/blog-en 
Will wolf http://willwolf.io/ 
WILL’S NOISE http://www.willmcginnis.com/ 
William Lyon http://www.lyonwj.com/ 
Win-Vector Blog http://www.win-vector.com/blog/ 
Yanir Seroussi http://yanirseroussi.com/ 
Zac Stewart http://zacstewart.com/ 
ŷhat http://blog.yhat.com/ 
ℚuantitative √ourney http://outlace.com/ 
大トロ http://blog.otoro.net/ 

Data Viz ! Cheat sheet for R Data Analyst

Data visualization has become a vital slice of data science arena. Hence, our key tool should have strong capabilities on both the fronts – data analysis as well as data visualization. With this revolution in the landscape, or has extended immense popularity because of its splendid data visualization capabilities. With a few lines of code, you can produce beautiful charts and data stories. R contains superb libraries to create basic and more evolved visualizations like Bar Chart, Histogram, Scatter Plot, Map visualization, Mosaic Plot and various others. Below is the cheat sheet of widespread visualization for representing data. Thanks to my colleague for sharing this.

Data Viz Cheat Sheet

Introducing cricketr! : An R package to analyze performances of cricketers

A very good analysis using R in the field of cricket. Must see ! 🙂

Giga thoughts ...

Yet all experience is an arch wherethro’
Gleams that untravell’d world whose margin fades
For ever and forever when I move.
How dull it is to pause, to make an end,
To rust unburnish’d, not to shine in use!

Ulysses by Alfred Tennyson

Introduction

This is an initial post in which I introduce a cricketing package ‘cricketr’ which I have created. This package was a natural culmination to my earlier posts on cricket and my completing 9 modules of Data Science Specialization, from John Hopkins University at Coursera. The thought of creating this package struck me some time back, and I have finally been able to bring this to fruition.

So here it is. My R package ‘cricketr!!!’

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package only uses data from test cricket. I plan to develop functionality for One-day and…

View original post 2,667 more words

Eight Steps to become a Data Scientist ! (The Sexiest and the Hot Job of the Decade)

Thinking how to become a Data Scientist? Here we go, the 8 Steps to become a Data Scientist (The Sexiest and the Hot Job of the Decade)

Well, these steps are not so easy but possible if we try. Most of the steps come with no-cost or very low-cost.

https://i1.wp.com/blog.datacamp.com/wp-content/uploads/2014/08/How-to-become-a-data-scientist.jpg

Thanks for DataCamp for the nice infographic. Is this info useful? Then please share this info with your circle.

Clash of the Titans ! (R vs Python)

This is to all out there who are wondering which is better language to learn for data analysis and visualization. Whether one should use R or Python when they do their everyday data analysis tasks.

Both Python and R are amongst the most extensively held languages for data analysis, and have their supporters and opponents. While Python is a lot praised for being a general-purpose language with an easy-to-understand syntax, R’s functionality is developed with statisticians in thoughts, thus giving it field-specific advantages such as excessive features for data visualization.

The DataCamp has recently released a new infographic for everyone interested in how these two (statistical) programming languages relate to each other. This superb infographic discovers what the strengths of R over Python and vice versa, and aims to provide a basic comparison between these two programming languages from a data science and statistics perspective.

R vs Python for data science

Note:

Not to ignore the new entrant in war field “Julia” language. It is a high-level dynamic programming language designed to address the requirements of high-performance numerical and scientific computing while also being effective for general purpose programming. Influenced by MATLAB, C, Python, Perl, R, Ruby and others.

Soon we expect Julia to join the clash !

Steps to Learn Data Science using R

One of the common difficulties individuals face in learning R is lack of an organized way. They don’t know, from where to start, how to proceed, which way to choose? However, there is a surplus of good free resources accessible on the Internet, this could be overwhelming as well as puzzling at the mean time.

After mining through infinite resources & archives, here is a comprehensive Learning way on R to learn R from the beginning. This will help you to learn R rapidly and proficiently.

Step 1: Download and Install R

The easy way to proceed is to download the basic version of R and installation instructions from CRAN site. R is available for Windows, Mac and Linux. Windows and Mac users most likely want one of these versions of R. R is part of many Linux distributions, you should check with your Linux package management system in addition to the link above.

You can now install various packages. There are more than 9000 packages in R for different purposes. Here is a link to understand packages called CRAN Views.  You can accordingly select the sub type of packages that you want.

To install a package you can just do this

For example, if we want to install a package called “animation” then we use

install.packages("animation")

Normally the package should just install, however:

  • if you are using Linux and don’t have root access, this command won’t work.
  • you will be asked to select your local mirror, i.e. which server should you use to download the package.

You must also install RStudio. It helps R coding much easier since it allows you to type multiple lines of code, handle plots, install and maintain packages and navigate your programming environment.

Step 2: Learn the basics

You need to start by knowing the basics of the language, libraries and data structure. The R track from Datacamp is the best place to start your journey. See the free Introduction to R course at https://www.datacamp.com/courses/introduction-to-r. After doping this course, you would be comfortable writing basic scripts on R and also understand data analysis. Alternately, you can also see Code School for R at http://tryr.codeschool.com/

If you want to learn R offline on your own time – you can use the interactive package swirl from http://swirlstats.com

Primarily learn  read.table, data frames, table, summary, describe, loading and installing packages, data visualization using plot command.

Step 3: Learn Data Management:

You need to use them a lot for data cleaning, especially if you are going to work on text data. The best way is to go through the text manipulation and numerical manipulation assignments. You can learn about connecting to databases through the RODBC  package and writing sql queries to data frames through sqldf  package.

Step 4: Study specific packages in R– data.table and dplyr Here we go ! Here is a brief introduction to numerous libraries. We need to start practising some common operations.

  • Practice the data.table tutorial  thoroughly here. Print and study the cheat sheet for data.table
  • Next, you can have a look at the dplyr tutorial here.
  • For text mining, start with creating a word cloud in R and then learn learn through this series of tutorial: Part 1 and Part 2.
  • For social network analysis read through these pages.
  • Do sentiment analysis using Twitter data – check out this and this analysis.
  • For optimization through R read here and here

Step 5: Effective Data Visualization through ggplot2

  • Read Edward Tufte and his principles on how to make data visualizations here . Especially read on data-ink, lie factor and data density.
  • Read about the common pitfalls on dashboard design by Stephen Few.
  • For learning grammar of graphics and a good way to do it in R. Go through this link from Dr Hadley Wickham creator of ggplot2 and one of the most brilliant R package creators in the world today. You can download the data and slides as well.
  • Are you interested in visualzing data on spatial analsysis. Go through the amazing ggmap package.
  • Interested in making animations thorugh R. Look through these examples. Animate package will help you here.
  • Slidify will help supercharge your graphics with HTML5.

Step 6: Learn Data mining and Machine Learning Now, we come to the most valuable skill for a data scientist which is data mining and machine learning. You can see a very comprehensive set of resources on data mining in R here at http://www.rdatamining.com/ . The rattle package really helps you with an easy to use Graphical User Interface (GUI).  You can see a free open source easy to understand book here at http://togaware.com/datamining/survivor/index.html You will go through an overview of  algorithms like regressions, decision trees, ensemble modelling and   clustering.  You can also see the various machine learning options available in R by seeing the relevant CRAN view here. Resources:

Step 7: Practice Practice with example data available with you and on the internet. Stay in touch with what your fellow R coders are doing by subscribing to http://www.r-bloggers.com/ , http://stats.stackexchange.com and www.stackoverflow.com. Go through the questions and answers that users come up with. Start interacting by asking questions and providing the answers for the questions which you can ! Happy learning !!! 🙂