
JOURNAL OF RESEARCH IN NATIONAL DEVELOPMENT VOLUME 5 NO 2, DECEMBER, 2007
ON THE USE OF NETWORK SAMPLING IN DIABETIC SURVEYS
L. A. Nafiu
Department of Mathematics/Computer Science
Federal University of Technology, Minna and A.A. Adewara
Department of Statistics, University of Ilorin
Abstract
Diabetic patients’ surveys have to deal with the lack of proper formal sampling frames. For survey researches at the community level, often some partial sampling frames such as medical centres or households to which a person is linked are available. These frames can be used to draw a network sample. At a selected household, the adult occupants are asked to report on the occurrence of the characteristics not only in them but also in their siblings. Using this network sampling design, the total number of people with diabetics can be estimated with lower variance than conventional procedures. The design is illustrated by an analysis of the network data of Nafiu (2007) in a survey to estimate the population of diabetic patients in Niger State, Nigeria. Two estimators: HansenHurwitz estimator and HorvitzThompson estimator were considered; and the results were obtained using a program written in Microsoft Visual C++ programming language.
Keywords: Graph, Sampling Frame, Households, HansenHurwitz estimator and HorvitzThompson estimator.
Introduction
In network management, accurate measures of network status are needed to aid in planning, troubleshooting and monitoring. For example, it may be necessary to monitor the bandwidth consumption of several hundred links in a distributed system to pinpoint bottlenecks. If the monitoring is too aggressive, it may create artificial bottlenecks. With too passive a scheme, the network monitor may miss important events. Network query rates must strike a balance between accurate performance characterization and low bandwidth consumption to avoid changing the behaviour of the network while still providing a clear picture of the behaviour. This balance is often achieved through sampling. Sampling techniques are used to study the behaviour of a population of elements based on a representative subset.
In a survey to estimate the prevalence of a disease like diabetic, a random sample of medical centres is selected. From the records of each medical centre in the sample, records of patients treated for that disease are obtained. However, a given patient may have been treated, the higher is the probability that, that patient’s records will be obtained in the sample.
In another survey, also with the purpose of estimating the prevalence of a rare characteristic in a population, a simple random sample of households is selected. At a selected household, the adult occupants are asked to report on the occurrence of the characteristics not only in them but also in their siblings. Thus a person with several siblings who are living in different households has a higher inclusion probability than one with no siblings living in separate households. Even within a single household, the inclusion probabilities for different occupants are not necessary equal. Designs of the above type are referred to as network sampling. In this case, a simple random sample or stratified random sample of units (selection units) is selected, and all observation units (diabetic patients) which are linked to any of the selected units are included or observed. The network of a person is the number of selection units, that is, medical centres or households to which a person is linked. Defining a network to be a set of observation units with a given linkage pattern, a network may be linked with more than one selection unit (siblings living in more than one household). If the population of selection units is stratified, a network may also intersect more than one stratum.
Because of the unequal selection or inclusion probabilities, the sample total does not form an unbiased estimator of the population total with such a design. Unbiased estimators for such designs were given by Thompson (1992). In one of these estimators – termed the “HansenHurwitz estimator” – each observation is divided by its network. In this case, the network is proposed to the drawbydraw selection probability. The HorvitzThompson estimator for network sampling, in which each person’s inclusion probability is determined by the networks was also given.
References to many innovative applications of network sampling are found in Cowan (1986) and Anderson (1980). Faulkenberry and Garoui (1991) discussed network sampling estimators in the context of area sampling methods used in agricultural surveys.
Problem Definition
We consider an undirected graph with vertex set and adjacency matrix , representing a set of social actors and some relationship between them. The adjacency matrix is defined on the set of the ordered pairs of vertices; if there is an edge between vertices and ; and otherwise ( for all ). Since the graph is undirected, for all . Based on some binary auxiliary variable , vertex set can be partitioned into two disjoint vertex subsets and , that is, with order and with order .
For the sake of clarity throughout this paper vertices and refer to subset while vertices and refer to subset . Based on vertex sets and , population graph can be decomposed into three sub graphs:
 Sub graph with arcs between the vertices of set .
 Sub graph with arcs between the vertices of set .
 Sub graph with arcs between the vertices of sets and ,
Figure 1.1 below is an illustration of population with vertex set or order and size , that is, consists of vertices and arcs. Based on auxiliary variable vertex set is partitioned into subset (the uncoloured vertices) and subset (the coloured vertices).
Figure 1.1: Population with vertex set .
The number of relations between vertices of , that is, the size of sub graph , is denoted
(1.1)
Between the vertices of , that is, the size of sub graph , is denoted
(1.2)
and between vertices of and , that is, the size of sub graph is denoted
(1.3)
In figure 1.1, and
The mean number of relations for with other is
(1.4a)
For with is
(1.4b)
The mean number of relations for with other is
(1.5a)
For with other is
(1.5b)
In figure 1.1, we have and
Hence, we observe the relationship
(1.6)
which can be used to get an indication of the total number of vertices.
The described graph – theoretical representation reflects a lot of diabetic patients’ surveys. In such surveys, there are some urban areas with unknown populations of diabetic patients, and only a partial sampling frame is available from which some probability samples can be drawn. Frequently, some non probability sample is used to describe the study population. However, using a network sample in these situations will provide more accurate information about distributions of individual characteristics and additionally, also structural information about distributions of relations between diabetic patients can be estimated. The purpose of this paper is to estimate some simple network parameters that can be used to describe the study population. For that purpose, we use network data from Nafiu (2007) in a survey to estimate the prevalence of diabetes in Niger State, Nigeria where the register lists of the adult people have the role of a single partial sampling frame, , that is, the diabetic patients in the household and , that is, the nondiabetic patients in the household.
Estimation of Population Total
Let the value of the variable of interest for the observational unit in the population be denoted . In a survey to estimate the prevalence of a disease or other characteristic, is an indicator variable, equal to one if the unit has the characteristic and zero otherwise. The variable of interest need not be an indicator variable, it could, for example, be the cost of medical treatment for the disease for the person. Let denote the number of observational units in the population. The population total is . Let be the network of the observational unit, that is, the number of selection units to which that observational unit is linked. The number of selection units in the population will be denoted The population mean per selection unit is .
(A). HansenHurwitz Estimator
Consider a sampling design in which a simple random sample (without replacement) of selection units is obtained and every observational unit linked to any selected selection unit is included in the sample. The drawbydraw selection probability for the observational unit is the probability that any one of the selection units to which it is linked is selected, that is,
(2.1)
An unbiased estimator of the population total may be formed by dividing each observed value by the associated selection probability. The HansenHurwitz estimator thus obtained is
(2.2)
in which is the sequence of observational units in the sample, including repeat selections. An observational unit may be selected more than once, even though selection units are sampled without replacement, because the observational unit may be linked to more than one selection unit. The expected number of times the observational unit is selected is .
The notation for the HansenHurwitz estimator may be simplified in a way which renders the statistical properties of the HansenHurwitz estimator transparent. For the selection unit in the population, define the variable to be the sum of the for all observational units linked with selection unit , that is,
(2.3)
where is the set of observational units that are linked to selection unit .
With this notation, the HansenHurwitz estimator may be written
(2.4)
Thinking of as a new variable of interest associated with the selection unit, then the HansenHurwitz estimator is just , where is the sample mean of a simple random sample of size . Thus, from the basic results on simple random sampling,
(2.5)
where (2.6)
in which is the population mean per selection unit.
An unbiased estimator of this variance is
(2.7)
where (2.8)
for estimating the population mean per selection unit, and .
(B). The Horvitz – Thompson Estimator
The probability that the observational unit is included in the sample is the probability that one or more of the selection units to which it is linked is selected. Since the inclusion probabilities are identical for all observational units in a network, the problem can be simplified by changing notation to be in terms of networks rather than individual observational units. The population can be partitioned into networks, which will be labeled . Let now denote the total of the over all the observational units in the network, and let denote the common multiplicity for any observational unit within this network.
The inclusion probability for the network, which is in fact the inclusion probability for any of the observational units within their network, is
(2.9)
that is, one minus the probability that the entire simple random sample of selection units is selected from the selection units which are not linked with network .
Let denote the number of district networks of observational units include in the sample. The HorvitzThompson estimator of the population total is
(2.10)
Let denote the number of selection units linked to both networks and . The probability that both networks and are included in the sample is:
(2.11)
The usual variance formulae for the HorvitzThompson estimator then apply, giving
(2.12)
An unbiased estimator of this variance is:
(2.13)
For estimating the population mean per selection unit,
and .
Illustration and Results
In this section, we analyzed the network data on diabetes in Niger State, Nigeria obtained from Nafiu (2007), M.Sc thesis (Unpublished), Department of Statistics, University of Ilorin, Ilorin, Nigeria using HorvitzThompson and HansenHurwitz estimators. The results in table 1.1 below for the standard errors of the estimates for the years 2000  2003 were obtained with the help of computer program written in Microsoft Visual C++ programming language (Hubbard, 2000).
Year 
2000 
2001 
2002 
2003 
M 
1192 
1205 
1273 
1351 
N 
117 
119 
127 
137 

14425 
24106 
29699 
31963 

13567 
23538 
29642 
31456 
Table 1.1: Estimates for the standard errors using HorvitzThompson and HansenHurwitz estimators.
Discussion of Results
The results presented in table 1.1 indicate that substantial reductions in the standard error can be obtained through the use of network design without forfeiting an unbiased estimate of the sampling standard error. We also observed that irrespective of the year considered, the standard error of HorvitzThompson estimator () is always less than that of HansenHurwitz estimator (). The HorvitzThompson estimator is an unbiased estimator which, unlike the HansenHurwitz estimator, does not depend on the number of times any unit is selected.
Conclusion and Recommendations
When an unbiased estimator of high precision and an unbiased sample estimate of its standard error are required, the network sampling design is a better indication of the total size of the diabetes population. If we accept the assumption that each diabetic patient that is not registered knows at least one other diabetic patient that is a client of the medical centre, then by using network sampling design, we can define a simple ratio estimator for the total population.
References
Anderson, D. R. (1980). Estimation of Density from Line Transect Sampling ofBiological Populations. Journal of Wildlife Management, 72, 325336
Cowan, C. D. (1988). CaptureRecapture Models when both sources have Clustered Observations. Journal of American Statistical Association, 81, 347353
Faulkenberry, G. D. and Garoui, A. (1991). Estimating a Population Total Using an Area Frame. Journal of the American Statistical Association, 86, 445449
Frank, O. (1977). Survey Sampling in Graphs. Journal of Statistical Planning and Inference, 1, 224235
Frank, O. (1978). Sampling and Estimation in Large Social Networks, Social Networks, 1, 91101
Horvitz, D.G. and Thompson D. J. (1952). A Generalization of Sampling Without Replacement from a Finite Universe. Journal of American Statistical Association, 47, 663685
Hubbard, J.R. (2000). Programming with C++. Second Edition. Schaum’s Outlines, New Delhi: Tata McGrawHill Publishing Company Limited
Nafiu, L.A. (2007). Comparison of Four Estimators under Sampling without Replacement, Unpublished M.Sc. Thesis, University of Ilorin, Ilorin, Nigeria
Thompson, S.K. (1992). Sampling. New York: John Wiley and Sons Inc.

