Statistics Topics in Data Science 

What is ' Statistics '
 

Statistics is the Field of Study that concerns about the collection, Organization, analysis and presentation of data.

It is used in industries such as Scientific, Industrial and social problem etc...

 

Statistics is used to calculate the approximate values when there is high population.

 

When census data can not be collected. Statisticians collects the data and perform several scientific experiments to get the accurate data.

 

It is a branch of mathematics. It is also used in data science industries in order to know the Industry performance.

 

There are many functions in Statistics which we are going to learn in the coming sessions...

 

Note: Data is nothing but, 'Piece of information'. 

 

Example: height of students {115cm, 150cm, 170cm, 200cm etc...}

-------------------------------------------------------------------------------------------

Types of Statistics:

 

There are two types of statistics they are..,

 

1. Descriptive Statistics

2. Inferential Statistics

 

1. Descriptive Statistics: 

 

It consists of organizing and summarizing from collection of data. Descriptive statistics is the procedure of Analyzing and using the data.

 

Example: PDF, Histogram, Box plot, Bar and Pie charts.

 

Question framing in Descriptive Statistics: what is the average age of students in your mathematics class? 

-------------------------------------------------------------------------------------------

2. Inferential Statistics:

 

It is represented as field of statistics that uses some tools to finalize the conclusions about a population by experimenting the random samples.  

 

Example: P-value, Z-test, T-test, Anova etc...

 

Question framing in Inferential Statistics: Are the ages of students in this mathematics class similar to what you would expect in a normal mathematics class in this university.

-------------------------------------------------------------------------------------------

Population data:

It is defined as the total value of data.

Ex: total population in a country, total production of cars etc...

Sample data:

It is defined as the part of the population.

 

Ex: Single state in country etc....

-----------------------------------------------------------------------------------------------------------------

 

 Sampling Techniques:

The use of this sampling technique is to make sure that every component of the population gets an equal chance of getting selected. It is also known as random sampling.

 

Types of Sampling:  

Simple Random Sampling:

Every component is having equal chance of getting selected.

Ex: selecting of children from a class. here each child is having equal chance of getting selected.

 

Stratified Sampling: 

This Sampling divides the components into small groups called 'Strata' based on similarity. the components are randomly selected from these strata. We need a piece of information about the population in order to create subgroups.

Systematic Sampling:

This sampling method chooses elements from a population by selecting a random starting point and selects sample entities after a Stable 'Sample Interval'.

Ex: In a school while selecting the leader of team, The teacher will select 1 to 5 members and the student will be selected as random number. it is a non stressful process to select the candidate.

Convenience Sampling:

It is a type of non-probability sampling that states the sample is drawn from a part of the population. This sampling very helpful when there is a large population. It's very uncomplicated and economical sampling technique. Only those who are interested will participate here.

Ex: If a new investigation team wants to establish itself in 10 cities. Then it selects top 10 cities and proceeds further.

-----------------------------------------------------------------------------------------------------------------

Variable:

A variable is a property that can take on any value.

Eg: Height = 182

Here height is a variable.

Types of variables:

1. Quantitative variable:

Quantitative variable is measured in numerical value. we can do mathematical operations such as  Add, subtract, Multiply etc....

Ex: Numerical values.

Types of quantitative variable:

a. Discrete variable:

Discrete variable is defined as a variable which has a fixed value.

Ex: Whole numbers etc...

b. Continuous variable:

Continuous variable is defined as a variable which is continuous.

Ex: 12.22 cm etc....

2. Qualitative variable: 

Qualitative variable is measured by some characteristics. we can derive categorial variables from it. They are completely two different components.

Ex: Gender, IQ of humans Etc.....