The word
‘statistics’ seems to have been derived from the Latin word ‘Status’ or Italian
word ‘Statista’ or the German word ‘Statistik’ or the French word ‘Statistique’
each of which means a political state.
It is not a new discipline, but is as old as the human society. In good old days, the term statistics was
applied to a branch of statecraft – science of statecraft. As such, the term statistics was applied to
mean facts and figures which were needed by the state in its day to day
life. Statistics was regarded as a
by-product of administrative activities of the State. Now statistics is usually not studied for its
own sake (as a separate branch), but statistics is employed as a tool in
solving or analyzing the problems of the State.
In the
present age, statistics is regarded as one of the most important tools for
taking decisions. All the branches of
science make use of statistics.
Statistics helps in forming suitable policies; as such it is being used
in all the fields. In science,
statistics is freely used. In research
work, it has got its own status as a tool of research. Thus in every situation there is a demand for
statistics. The sampling techniques
further reduce the cost of statistics.
This is because by studying a part of the population, the
characteristics of the whole population can be known. Thus the increasing demand and decreasing
cost of statistics give way to growth.
Planning and
control are the twin-babies of management.
Whenever we think of a plan we have to think of statistics. Planning cannot be devised without
statistics. In this technically advanced
and competitive world, a producer has to make a number of decisions such as
what to produce, where to produce, how to produce, where to sell, at what price
to sell etc. Such decisions depend upon
sound forecasting and forecasting cannot be made without statistics. Prof. Marshall observed that “statistics are the
straw out of which I, like any other economist, have had to make bricks”. Statistics helps in formulating suitable
policies and as such its need is increasingly felt in all the fields. A businessman needs information on daily
demand of the products, seasonal changes in demand, prices of competitive
products etc. All these problems are
resolved in the light of factual information and hence the need for statistics.
“By statistics we mean aggregates of facts
affected to a marked extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to reasonable standard of accuracy, collected
in a systematic manner for a predetermined purpose and placed in relation to
each other.” - Horacesecrist
Numerical
data alone constitute statistics.
Students can be classified very good, good, average, poor, etc. on the
basis of their performance in tests. But
they are in qualitative expressions and are not statistics. In particular, the qualitative
characteristics – honesty, beauty, intelligence, etc. which cannot be measured
numerically are not statistics. If they
are expressed by giving certain scores (marks) as numerical standards, then
they can be called as statistics.
Another example is beauty competition of girls; if ranks are assigned,
then the quantitative measure of beauty of the girls can be regarded as
statistics.
The numerical
data pertaining to any field of enquiry can be obtained either by enumeration
(by actual counting) or by estimation.
If the field of enquiry is not large, enumeration (actual counting) can
be conducted. If the field of enquiry is
wide and large, enumeration is out of question; and in such cases, data can be
estimated. For instance, in the MBA
class there are 60 students; this is a case of enumeration. (We count the number of students). At the same time we may say that 1,00,000
people attended the Independence Day Celebration; it is a case of estimation (approximation).
A reasonable
standard of accuracy is needed in both enumeration and estimation. For instance, if the weights of students are
being measured, fractions of kilogram (say 1/10th or 1/20th
) can be ignored; when measuring the distance from Chennai to Kanyakumari,
fraction of a kilometer can be easily ignored.
No hard and fast rule can be laid down for all cases. Hence mathematical accuracy cannot be
attained in statistical studies.
“Statistics
is the science of estimates and probabilities”
This definition is narrow, as the
other methods like enumeration, classification; analysis, etc. have been
ignored. Therefore, this definition
narrows down the scope of the science of statistics.
“Statistics
may be defined as the collection, presentation, analysis and interpretation of
numerical data”.
This
definition is clear and concise. The
data are collected to study a particular problem. The collected data in mass may be converted
in the form of diagrams, graphs, etc.
According to this definition, there are four stages.
a.
Collection of data: The first step of an investigation is the
collection of data. Careful collection
is needed, because further analysis is based on this. There are different methods of collection of
data (Census, sampling, primary, secondary etc.) and they must be
reliable. If the collected data are
faulty, results will also be faulty.
Therefore, the investigator must take special care in collection.
b.
Presentation of data: The collected data are generally in an
unintelligible form and need to be classified and tabulated before they can be
analyzed. For example, the investigator
is interested to know the average income of 1000 families of a village. The mass data collected should be difficult
to understand and analyze. Therefore,
the collected data are to be presented in tabular or diagrammatic or graphic
form. The data presented in a systematic
order will facilitate further analysis.
c.
Analysis of data: After the presentation of data, the next step
is to analyse the presented data. Analysis includes condensation,
summarization, conclusion, etc. though the means of measures of central
tendencies, dispersion, skewness, kurtosis, correlation, re-gression, etc.
d.
Interpretation of data: Figures do not speak for themselves. The duty of the statistician is not complete
with mere collection and analysis of data.
But, valid conclusions must be drawn on the basis of analysis. A high degree of skill and experience is
necessary for the interpretation.
Correct interpretation leads to valid conclusion.
Without an
adequate understanding of the statistical methods, the investigator in the
social sciences may be like the blind man groping in a dark room for a black
cat that is not there. The methods of
statistics are useful in an over-widening range of human activities in any
field of thought in which numerical data may be had.
The real
purpose of statistical methods is to make sense out of facts and figures, to
prove the unknown and to cast light upon the situation.
Broadly
speaking, one may say that the statistical methods can be fruitfully applied to
any problem of decision making where numerical data are available or can be
made available. Therefore, in business,
industry and economics; the statistical
techniques are applicable to problems like maintenance to trends of population,
production of agricultural industries, prices, internal and external trades,
gross national product, taxation laws and rates; preparation of budgets,
computation of consumer price indices from time to time o revise the wage
structures, preparation of price policies of new products, scheduling of the
projects and then exercising control over the operations till the completion,
resource allocations for any job carrying out inquiries to know the potential
markets, stock control, quality control, maintenance and replacement of
equipments etc.
In research
and technology, the statistical techniques are used to develop optimum designs
of experiments that can be applied to obtain the relevant information with
highest precision at minimum cost. In
social sciences, Statistics help in studying the distribution of wealth,
intelligence etc. It is also used in
studying the changes in standards of living, food habits and attitudes of
people.
Functions of Statistics:
In various
fields discussed above and many others, the science of statistics us used to
perform the following functions:
- Statistics
helps in developing sound methods of collecting data so that the data
collected can be used to draw the valid inference regarding the desired
objectives.
- It
presents the information in numerical form.
- It
helps in simplifying the complex data by way of classification /
tabulation / graphical representation.
- The
tabular / graphical representation of data and other complex statistics
help in comparison.
- Statistics
can be used to study the relationship between two or more factors. The use of such relationship can be made
in estimating one factor when other/s are known.
- The
data regarding a characteristic for a series of past periods can be used
to forecast its value for a future period.
- The
powerful function of forecasting leads us to the need of planning and thus
facilitates in formulating policies and helps in planning to implement
these policies.
Limitations of Statistics:
Statistics is
a very powerful science to study quantitative data. Qualitative data cannot be studied with the
help of Statistics except when we make them to be looked upon a quantitative by
defining suitable varieties.
More often
than not, Statistics is used to draw conclusions regarding a group of units
rather than single unit. In case of
individual units, the inference drawn is always with an element of chance or on
an average.
Sometimes due
to bias involved in the collection of data the inference drawn is a biased one.
The potential
danger involved in the use of statistics is its misuse. It is easy to misuse it for supporting or
contradicting any proposition or a conjuncture.
For instance, a statement like “During the last month, six street
accidents were recorded in the middle of the road compared to twenty one
accidents recorded on the sides of the road in busy streets of Mumbai” may lead
one to the conclusion like “It is safer to walk in the middle of the road”.
SOURCES OF DATA
An
application of statistics involves data and therefore the foremost question
that arises is from where to get the
data or what the sources of data are.
We have seen
that collection of the data is always the first step in any statistical
enquiry. Before starting any enquiry,
the following concepts must be clearly defined.
POPULATION:
Any finite or
infinite aggregation of all possible objects under study, not necessarily
animate is called a POPULATION. In statistical
study we may have a population of number of students at the University, number
of employees of a company, number of misprints in a book, the production of a
factory, the number of cheques cleared in a month etc.
SAMPLE:
Any finite
set of objects selected from a population is called a SAMPLE. The objects included in the sample are
representative of the items in the population so that by studying the sample
values in detail, an idea about the characteristics of the objects in the
population can be obtained.
VARIATE:
A
characteristic from the population which can be expressed numerically and which
varies from object to object is called a VARIATE. For example, the wages of persons or the
heights of students can be measured quantitatively and so these are variates.
ATTRIBUTE:
Certain
characteristic cannot be expressed quantitatively but they can be described
qualitatively. For example, beauty,
intelligence, skill, talent, etc. These
are called ATTRIBUTES.
PARAMETER:
A statistical
measure like mean, standard deviation, which is calculated for all the objects
included in the population is called a PARAMETER. It is usually expressed by Greek letters like
for mean, for standard
deviation etc.
STATISTIC:
A statistical
measure calculated for all units in the sample is called a statistic. It is expressed by using English
alphabets. For example the sample mean
is denoted by and the sample
standard deviation is denoted by S.
The following
points must be decided before collection of data begins:
- The
purpose or the objective of the collection must be precisely defined. The type of data to be included, the
characteristics to be considered, the sources from where the data is to be
obtained and the steps to be followed to collect the data – every step
should be worked out in advance.
- The
scope of the enquiry with respect to the time, the places to be covered
should be decided first. There are
different types of enquiries like official or non-official, regular or
ad-hoc, direct or indirect etc. The
proper type which suites the purpose and the scope should be decided.
- The
measurement of values of a variable is done in a particular unit which is
called Statistical unit. For
example, for incomes of employees, the unit is a rupee. For heights of persons, centimeter
etc. Along with the unit, the
degree of accuracy also should be decided.
After
considering the above mentioned points, the type of data whether primary of
secondary, is to be decided.
METHODS OF COLLECTING PRIMARY DATA
The primary
data is the information collected by an enumerator or investigator for the
purpose of the enquiry for the first time.
The following are the methods using which the primary data can be
collected.
a. DIRECT
PERSONAL INVESTIGATION: Here the
investigator meets the informants personally and collects the information by
asking questions. The questions should
be simple, short and should be so formed as to get brief and unambiguous
answers. The enumerator must be trained,
specially hired for the job. His
observation should be keen and he should be well acquainted with the local
conditions. He should possess sufficient
knowledge of tastes and preferences of the informants. The investigator should be polite and
courteous yet he should be firm, determined to get answers tactfully from the
respondents.
This type of
investigation, though very costly and time-consuming is the best method
available as far as accuracy concerned.
If the scope of the enquiry is very wide, this method cannot be
used. Also, care has to be taken to
avoid personal bias entering the answers of the respondents; otherwise it will
affect the validity of the data collected.
b. INDIRECT
ORAL INVESTIGATION: If the
persons, directly concerned with the investigation are not willing to supply
the necessary information, then it is obtained by questioning witnesses who are
supposed to know the situation, to have knowledge about the persons concerned
or the problem involved.
This method is
adopted by Inquiry Committees or Commissions.
It is applicable in those situations where indirect informants can give
more reliable and accurate information than the persons involved. This method can be successful only when the
witnesses are honest and are not hostile towards the persons concerned. They should be able to express themselves
precisely, accurately, without exaggerating the situation. The investigators should be able to judge
whether the information provided by the witness is correct and without bias.
c. QUESTIONNAIRES
AND SCHEDULES: In this method, a
list of questions is prepared and it is sent by post to various
informants. Usually, a sample of
informants is selected from the concerned population. Sometimes the schedules are filled in by the
enumerators who question the people and write down the necessary
information. If the questionnaire is
sent by mail, then a forwarding letter, explaining the objective of the survey
and requesting co-operation, should accompany the form. The advantage in this method is that the
respondents can write the answers of the questions as per their convenience and
would not hesitate to give some confidential information asked in the
questionnaire. This method has a wide
coverage, it is quick and inexpensive.
But still, the response is not very good. If possible, there should be some incentive
like a small prize, lucky number draw, concession at some shops etc. to get
better response. Every questionnaire
must be accompanied by an addressed and stamped envelope.
If a schedule
is to be filled in by an enumerator, he should be trained, qualified
person. The enumerator should be a
person of unquestioned integrity. He
must be patient and tactful with the respondents. He must explain the purpose of the investigation
and also the questions in detail. While
writing the answers, he has to take care that personal bias does not affect the
investigation. The reports of the
enumerators should be periodically checked by the supervisors. Now, let us see how a good questionnaire
should be prepared.
REQUISITES OF A GOOD QUESTIONNAIRE OR A SCHEDULE:
1.
The
number of questions should be as few as possible but at the same time, the
questions should cover all the essential topics on which information is
required.
2.
The
questions should be short, simple and unambiguous. Clarity is essential in forming the
questions.
3.
The
questions should be drafted in such a way that the answers to them are of
objective type and brief in nature, for example, the answers printed should be
‘yes’ or ‘no’ or multiple-choice answers of the type ‘single, married, widowed,
divorced’.
4.
It
is possible that some questions cannot be answered accurately by the
respondents. So the degree of accuracy
for a statistical unit should be mentioned with the question itself. For example, for age – the answer is expected
in completed years or the monthly income is to be expressed in hundreds of
rupees etc.
5.
The
questions which are unduly inquisitive or which are likely to offend the
respondents should not be included in a questionnaire. Questions regarding personal habits behaviour
with the family members, income should be tactfully asked. Leading questions providing a hint to the
possible answer should be avoided.
6.
The
questions should be so worded that personal bias of an investigator is not
reflected.
7.
The
arrangement of the questions should be carefully planned. Proper space for answers must be kept and
there should be logical flow from one question to the other.
8.
The
questionnaire should be neatly printed on a high quality paper creating good
impression on the respondents.
9.
If
possible, the questionnaire should be tried on a small sample before applying
it to a large group so that some revision or amendment of the questionnaire can
be made, if necessary.
EDITING OF THE PRIMARY DATA
The collected
data should be edited and then only it can be processed further. While editing the data, the following points
must be remembered.
- The
data should be consistent. That is,
the answers obtained should not contradict one another.
- The
answers should be complete and uniform in all respects. If some, important questions are left
unanswered then the respondent should be contacted again to complete the
questionnaire.
- The
answers should be checked for accuracy.
Inaccuracy due to mathematical errors is to be corrected.
- The
data must be checked for homogeneity of answers. For example, if one respondent has
mentioned the gross pay and if the other has mentioned net pay after tax
deduction, then these cannot be compared.
SECONDARY DATA
The data
compiled through various published or unpublished sources is known as Secondary
Data. The following are the main sources
of the secondary data.
a)
Various
Central or State Government publications supply reliable data, on many social
and economic activities. For example,
Census reports, Pay Commission reports, monthly or annual publications like
Bulletin on Index of Industrial Production, Retail Price Bulletin, Estimates or
national product etc.
b)
Various
international institutions publish the reports on matters of international
importance. Organizations like W.H.O.,
I.M.F., U.N.O., I.B.R.D., regularly publish official reports.
c)
Semi-official
publications of corporations like municipal corporations, Life Insurance
Corporation of India ,
etc.
d)
Publications
of private bodies like Chambers of Commerce, Institute of Chartered
Accountants, Institute of Bankers provide secondary data, on various issues.
e)
Periodicals
like Economic Weekly, Commerce, Economic Times supply reliable information.
f)
Various
universities, research organizations collect data in different fields which can
be used as Secondary data.
g)
Some
reference books also supply information over a long period.
h)
There
are also sources like records of government departments, trade union offices,
railways, state transport offices which can be used as secondary data.
The secondary
data should be carefully checked before using it in any investigation. The data should be suitable and adequate for
the investigation. The information
should be checked for the reliability and accuracy of data. The integrity of the investigators or
enumerators should be ascertained. The
secondary data should never be accepted at its face value without checking.
We have seen
different methods of data collection. If
the data is collected for all units of the population, it is called Census and if it is collected only for
a sample then it is called a Sample
Survey.