A Complete List of Data Science Online Classes

Great resources to learn data science online ! Here you go !

Hi, I'm Scott

The blog is now migrated to http://scottge.net/2015/06/08/complete-list-of-data-science-online-classes/

You can consider online classes from Coursera for self-study.  Coursera provides online classes (most of them are free) offered by university professors, typically attended worldwide by thousands of students and working professionals. In particular, consider the Data Science Specialization from John Hopkins University, which offers a guaranteed certificate demonstrating your ability.

Coursera Courses

Additional online class resources


Please share in the comments…

View original post 23 more words


Growth of Six Sigma

Below is the trend of six sigma search over the period from google trends.


There can be two reasons for the decreasing trend:

  1. The awareness on six sigma has almost done, hence the search is reduced over the period.
  2. Six sigma is not really a big deal.

May be I go with the second one but the reason is slightly different,

So many people are getting trained on six sigma just paying money ( 2 days, 5 days and max 10 days), then they start practice and teach six sigma.

I know personally so many inefficient people teaching six sigma!

So what they teach is six sigma now. I think this is the reason behind the fade out of six sigma.

Eight Steps to become a Data Scientist ! (The Sexiest and the Hot Job of the Decade)

Thinking how to become a Data Scientist? Here we go, the 8 Steps to become a Data Scientist (The Sexiest and the Hot Job of the Decade)

Well, these steps are not so easy but possible if we try. Most of the steps come with no-cost or very low-cost.


Thanks for DataCamp for the nice infographic. Is this info useful? Then please share this info with your circle.

Clash of the Titans ! (R vs Python)

This is to all out there who are wondering which is better language to learn for data analysis and visualization. Whether one should use R or Python when they do their everyday data analysis tasks.

Both Python and R are amongst the most extensively held languages for data analysis, and have their supporters and opponents. While Python is a lot praised for being a general-purpose language with an easy-to-understand syntax, R’s functionality is developed with statisticians in thoughts, thus giving it field-specific advantages such as excessive features for data visualization.

The DataCamp has recently released a new infographic for everyone interested in how these two (statistical) programming languages relate to each other. This superb infographic discovers what the strengths of R over Python and vice versa, and aims to provide a basic comparison between these two programming languages from a data science and statistics perspective.

R vs Python for data science


Not to ignore the new entrant in war field “Julia” language. It is a high-level dynamic programming language designed to address the requirements of high-performance numerical and scientific computing while also being effective for general purpose programming. Influenced by MATLAB, C, Python, Perl, R, Ruby and others.

Soon we expect Julia to join the clash !

Introduction to Six Sigma, in the way you want to know !

What is Six Sigma?

A method that delivers organizations to improve the capability of their business practices. This increase in performance and decrease in process variation lead to defect reduction and improvement in profits, employee morale, and quality of products or services. Six Sigma quality is a term generally used to indicate a process is well controlled (within process limits ±3s from the center line in a control chart, and requirements/tolerance limits ±6s from the center line).

Diverse definitions have been proposed for Six Sigma, but they all share some common threads:

Use of teams that are assigned well-defined projects that have direct impact on the organization’s bottom line.

Training in “statistical thinking” at all levels and providing key people with extensive training in advanced statistics and project management. These key people are designated “Black Belts.” Review the different Six Sigma belts, levels and roles.

Emphasis on the DMAIC approach to problem solving: define, measure, analyze, improve, and control.

A management environment that supports these initiatives as a business strategy.

Six Sigma has two key methodologies:

  • DMAIC: It refers to a data-driven quality strategy for improving processes. This methodology is used to improve an existing business process.
  • DMADV: It refers to a data-driven quality strategy for designing products & processes. This methodology is used to create new product designs or process designs in such a way that it results in a more predictable, mature and defect free performance.

There is one more methodology called DFSS – Design For Six Sigma. DFSS is a data-driven quality strategy for designing or redesigning a product or service from the ground up.

Sometimes a DMAIC project may turn into a DFSS project because the process in question requires complete redesign to bring about the desired degree of improvement.

DMAIC Methodology:

This methodology consists of the following five steps.

Define –> Measure –> Analyze –> Improve –>Control

  • Define: Define the problem or project goal that needs to be addressed.
  • Measure: Measure the problem and process from which it was produced.
  • Analyze: Analyze data and process to determine root causes of defects and opportunities.
  • Improve: Improve the process by finding solutions to fix, diminish, and prevent future problems.
  • Control: Implement, control, and sustain the improvements solutions to keep the process on the new course.

DMADV Methodology

This methodology consists of five steps:

Define –> Measure –> Analyze –> Design –>Verify

  • Define: Define the Problem or Project Goal that needs to be addressed.
  • Measure: Measure and determine customers needs and specifications.
  • Analyze: Analyze the process to meet the customer needs.
  • Design: Design a process that will meet customers needs.
  • Verify: Verify the design performance and ability to meet customer needs.

DFSS Methodology

DFSS is a separate and emerging discipline related to Six Sigma quality processes. This is a systematic methodology utilizing tools, training, and measurements to enable us to design products and processes that meet customer expectations and can be produced at Six Sigma Quality levels.

This methodology can have the following five steps.

Define –> Identify –> Design –> Optimize –>Verify

  • Define: Define what the customers want, or what they do not want.
  • Identify: Identify the customer and the project.
  • Design: Design a process that meets customers needs.
  • Optimize: Determine process capability and optimize the design.
  • Verify: Test, verify, and validate the design.

 Features of Six Sigma

  • Six Sigma’s aim is to eliminate waste and inefficiency, thereby increasing customer satisfaction by delivering what the customer is expecting.
  • Six Sigma follows a structured methodology, and has defined roles for the participants.
  • Six Sigma is a data driven methodology, and requires accurate data collection for the processes being analyzed.
  • Six Sigma is about putting results on Financial Statements.
  • Six Sigma is a business-driven, multi-dimensional structured approach for:
    • Improving Processes
    • Lowering Defects
    • Reducing process variability
    • Reducing costs
    • Increasing customer satisfaction
    • Increased profits

The word Sigma is a statistical term that measures how far a given process deviates from perfection.

The central idea behind Six Sigma: If you can measure how many “defects” you have in a process, you can systematically figure out how to eliminate them and get as close to “zero defects” as possible and specifically it means a failure rate of 3.4 parts per million or 99.9997% perfect.

Key Concepts of Six Sigma

At its core, Six Sigma revolves around a few key concepts.

  • Critical to Quality : Attributes most important to the customer.
  • Defect : Failing to deliver what the customer wants.
  • Process Capability : What your process can deliver.
  • Variation : What the customer sees and feels.
  • Stable Operations : Ensuring consistent, predictable processes to improve what the customer sees and feels.
  • Design for Six Sigma : Designing to meet customer needs and process capability.

Our Customers Feel the Variance, Not the Mean. So Six Sigma focuses first on reducing process variation and then on improving the process capability.

Myths about Six Sigma

There are several myths and misunderstandings surrounding Six Sigma. Some of them few are given below:

  • Six Sigma is only concerned with reducing defects.
  • Six Sigma is a process for production or engineering.
  • Six Sigma cannot be applied to engineering activities.
  • Six Sigma uses difficult-to-understand statistics.
  • Six Sigma is just training.

Benefits of Six Sigma

Six Sigma offers six major benefits that attract companies:

  • Generates sustained success
  • Sets a performance goal for everyone
  • Enhances value to customers
  • Accelerates the rate of improvement
  • Promotes learning and cross-pollination
  • Executes strategic change

Origin of Six Sigma

  • Six Sigma originated at Motorola in the early 1980s, in response to achieving 10X reduction in product-failure levels in 5 years.
  • Engineer Bill Smith invented Six Sigma, but died of a heart attack in the Motorola cafeteria in 1993, never knowing the scope of the craze and controversy he had touched off.
  • Six Sigma is based on various quality management theories (e.g. Deming’s 14 point for management, Juran’s 10 steps on achieving quality).

There are three key elements of Six Sigma Process Improvement:

  • Customers
  • Processes
  • Employees

The Customers:

Customers define quality. They expect performance, reliability, competitive prices, on-time delivery, service, clear and correct transaction processing and more. This means it is important to provide what the customers need to gain customer delight.

The Processes:

Defining processes as well as defining their metrics and measures is the central aspect of Six Sigma.

In a business, the quality should be looked form the customer’s perspective and so we must look at a defined process from the outside-in.

By understanding the transaction lifecycle from the customer’s needs and processes, we can discover what they are seeing and feeling. This gives a chance to identify weak areas with in a process and then we can improve them.

The Employees

A company must involve all its employees in the Six Sigma program. Company must provide opportunities and incentives for employees to focus their talents and ability to satisfy customers.

It is important to Six Sigma that all the team members should have a well-defined role with measurable objectives.

Six Sigma Belts (remember karate belts ! 🙂 )

Six Sigma professionals exist at all level – each with a different role to play. While executions and roles may vary, here is a straightforward guide to who does what.

At the project level, there are black belts, master black belts, green belts, yellow belts and white belts. These people conduct projects and implement improvements

Level Description with Roles and Responsibilities
Executives Provide overall alignment by establishing the strategic focus of the Six Sigma program within the context of the organization’s culture and vision
Champions Translate the company’s vision, mission, goals and metrics to create an organizational deployment plan and identify individual projects. Identify resources and remove roadblocks
Master Black Belt (MBB) Trains and coaches Black Belts and Green Belts. Functions more at the Six Sigma program level by developing key metrics and the strategic direction. Acts as an organization’s Six Sigma technologist and internal consultant.
Black Belt (BB) Understands Six Sigma philosophies and principles, including the supporting systems and tools. Demonstrates team leadership and understands all aspects of the DMAIC model in accordance with Six Sigma principles. Leads problem-solving projects. Trains and coaches project teams.
Green Belt (GB) Supports a Six Sigma Black Belt by analyzing and solving quality problems and is involved in quality-improvement projects. Assists with data collection and analysis for Black Belt projects. Leads Green Belt projects or teams.
Yellow Belt (YB) Participates as a project team member. Reviews process improvements that support the project. Has a small role, interest, or need to develop foundational knowledge of Six Sigma, whether as an entry level employee or an executive champion.
White Belt (WB) Can work on local problem-solving teams that support overall projects, but may not be part of a Six Sigma project team. Understands basic Six Sigma concepts from an awareness perspective

Different views on the definition of Six Sigma:

Methodology— This view of Six Sigma recognizes the underlying and rigorous approach known as DMAIC (define, measure, analyze, improve and control). DMAIC defines the steps a Six Sigma practitioner is expected to follow, starting with identifying the problem and ending with the implementation of long-lasting solutions. While DMAIC is not the only Six Sigma methodology in use, it is certainly the most widely adopted and recognized.

Metrics – In simple terms, Six Sigma quality performance means 3.4 defects per million opportunities

Philosophy— The philosophical standpoint views all effort as processes that can be defined, measured, analyzed, improved and controlled. Processes require inputs (x) and produce outputs (y). If you control the inputs, you will control the outputs. This is commonly expressed as y = f(x).

Set of tools— The Six Sigma expert uses qualitative and quantitative techniques to drive process improvement. A few such tools include statistical process control (SPC), control charts, failure mode and effects analysis, and process mapping. Six Sigma professionals do not totally agree as to exactly which tools constitute the set.

Steps to Learn Data Science using R

One of the common difficulties individuals face in learning R is lack of an organized way. They don’t know, from where to start, how to proceed, which way to choose? However, there is a surplus of good free resources accessible on the Internet, this could be overwhelming as well as puzzling at the mean time.

After mining through infinite resources & archives, here is a comprehensive Learning way on R to learn R from the beginning. This will help you to learn R rapidly and proficiently.

Step 1: Download and Install R

The easy way to proceed is to download the basic version of R and installation instructions from CRAN site. R is available for Windows, Mac and Linux. Windows and Mac users most likely want one of these versions of R. R is part of many Linux distributions, you should check with your Linux package management system in addition to the link above.

You can now install various packages. There are more than 9000 packages in R for different purposes. Here is a link to understand packages called CRAN Views.  You can accordingly select the sub type of packages that you want.

To install a package you can just do this

For example, if we want to install a package called “animation” then we use


Normally the package should just install, however:

  • if you are using Linux and don’t have root access, this command won’t work.
  • you will be asked to select your local mirror, i.e. which server should you use to download the package.

You must also install RStudio. It helps R coding much easier since it allows you to type multiple lines of code, handle plots, install and maintain packages and navigate your programming environment.

Step 2: Learn the basics

You need to start by knowing the basics of the language, libraries and data structure. The R track from Datacamp is the best place to start your journey. See the free Introduction to R course at https://www.datacamp.com/courses/introduction-to-r. After doping this course, you would be comfortable writing basic scripts on R and also understand data analysis. Alternately, you can also see Code School for R at http://tryr.codeschool.com/

If you want to learn R offline on your own time – you can use the interactive package swirl from http://swirlstats.com

Primarily learn  read.table, data frames, table, summary, describe, loading and installing packages, data visualization using plot command.

Step 3: Learn Data Management:

You need to use them a lot for data cleaning, especially if you are going to work on text data. The best way is to go through the text manipulation and numerical manipulation assignments. You can learn about connecting to databases through the RODBC  package and writing sql queries to data frames through sqldf  package.

Step 4: Study specific packages in R– data.table and dplyr Here we go ! Here is a brief introduction to numerous libraries. We need to start practising some common operations.

  • Practice the data.table tutorial  thoroughly here. Print and study the cheat sheet for data.table
  • Next, you can have a look at the dplyr tutorial here.
  • For text mining, start with creating a word cloud in R and then learn learn through this series of tutorial: Part 1 and Part 2.
  • For social network analysis read through these pages.
  • Do sentiment analysis using Twitter data – check out this and this analysis.
  • For optimization through R read here and here

Step 5: Effective Data Visualization through ggplot2

  • Read Edward Tufte and his principles on how to make data visualizations here . Especially read on data-ink, lie factor and data density.
  • Read about the common pitfalls on dashboard design by Stephen Few.
  • For learning grammar of graphics and a good way to do it in R. Go through this link from Dr Hadley Wickham creator of ggplot2 and one of the most brilliant R package creators in the world today. You can download the data and slides as well.
  • Are you interested in visualzing data on spatial analsysis. Go through the amazing ggmap package.
  • Interested in making animations thorugh R. Look through these examples. Animate package will help you here.
  • Slidify will help supercharge your graphics with HTML5.

Step 6: Learn Data mining and Machine Learning Now, we come to the most valuable skill for a data scientist which is data mining and machine learning. You can see a very comprehensive set of resources on data mining in R here at http://www.rdatamining.com/ . The rattle package really helps you with an easy to use Graphical User Interface (GUI).  You can see a free open source easy to understand book here at http://togaware.com/datamining/survivor/index.html You will go through an overview of  algorithms like regressions, decision trees, ensemble modelling and   clustering.  You can also see the various machine learning options available in R by seeing the relevant CRAN view here. Resources:

Step 7: Practice Practice with example data available with you and on the internet. Stay in touch with what your fellow R coders are doing by subscribing to http://www.r-bloggers.com/ , http://stats.stackexchange.com and www.stackoverflow.com. Go through the questions and answers that users come up with. Start interacting by asking questions and providing the answers for the questions which you can ! Happy learning !!! 🙂

Famous quotes about Statistics !

Here are few famous quotes about Statistics !

  • A big computer, a complex algorithm and a long time does not equal science. –Robert Gentleman
  • Absence of evidence is not evidence of absence. –Carl Sagan
  • All generalizations are false, including this one. – Mark Twain
  • All models are wrong, but some are useful. –(George E. P. Box)
  • An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” — John Tukey
  • Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin. – Von Neumann
  • Figures don’t lie, but liars do figure –Mark Twain
  • He uses statistics like a drunken man uses a lamp post, more for support than illumination. — Andrew Lang
  • I think it is much more interesting to live with uncertainty than to live with answers that might be wrong. –Richard Feynman
  • If you torture the data enough, nature will always confess. – Ronald Coase
  • If your experiment needs statistics, you ought to have done a better experiment. – Ernest Rutherford
  • In God we trust. All others must bring data. – Edwards Deming
  • It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so. –Mark Twain
  • Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. – Bobby Bragan, 1963
  • Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write.- G. Wells
  • Statisticians, like artists, have the bad habit of falling in love with their models. — George Box
  • Statistics – A subject which most statisticians find difficult but in which nearly all physicians are expert.
  • “Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. –Aaron Levenstein
  • Strange events permit themselves the luxury of occurring. – Charlie Chan
  • The best thing about being a statistician is that you get to play in everyone’s backyard. –John Tukey
  • The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data –Tukey
  • The death of one man is a tragedy. The death of millions is a statistic – Joseph Stalin
  • The greatest value of a picture is when it forces us to notice what we never expected to see. –John Tukey
  • The statistician cannot evade the responsibility for understanding the process he applies or recommends. –Sir Ronald A. Fisher
  • There are no routine statistical questions, only questionable statistical routines. – R. Cox
  • To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. –A Fisher (1938)
  • We are drowning in information and starving for knowledge. –Rutherford D. Roger

You can add the famous quotes that you like in the comments. 🙂