Chapter 3

STATISTICAL PROBLEMS

Students who are interested in population analyses will want to examine the technical procedures on which the present study has been based. Because of the scope of the project, it has been necessary to work out some original techniques in recording the material, in testing the validity of the record, and in analyzing the data statistically. These matters will be of less interest to those who are primarily concerned with the actual behavior of the human male, and such readers may prefer to pass over this and the next chapter and turn directly to the consideration of the sexual data which begins with Chapter 5.

NATURE OF THE DATA

It has already been explained (Chapter 1) that the data in the present study have all been gathered through personal interviews. In each history, 521 items have been explored; but since a subject is questioned only about those things in which he has had specific experience, the actual number of items covered in each case is usually nearer 300, and the number involved in the histories of younger and less experienced individuals is often less than that. The maximum list is shown in the following table. A few of the items (those marked with asterisks) call for information which is procurable only through physical examination or other special tests, and such items are being investigated only on certain individuals who are available for special study.

image
image
image
image
image
image
image
image

Each item in the above list has been strictly defined in order to standardize the data used in the study; but the body of definitions is, unfortunately, too large to include in the present volume. As previously indicated (Chapter 2), the sequence of topics actually used in an interview is varied in accordance with the age, social background, and experience of the particular subject, and the sequence shown in the above list has never been used. Neither does the sequence in the list correspond with the one shown in the coded form of Figure 2. Although many of the items in the list are covered by single questions, and although an occasional question may elicit information on two or more points, it often takes more extended inquiry to secure particular answers. Consequently, the number of questions asked may considerably exceed the number of items which are covered. It has already been noted (Chapter 2) that additional questions may be asked of subjects who have been involved in activities which are not covered in the routine interviews. Persons with experience in particular situations, as in the armed forces, in prisons, CCC camps, and other institutions, are questioned in particular detail concerning those periods.

The data obtained in each interview are recorded directly in the sort of code which is shown in Figure 2. No record is kept in any other form, and the coded data have never been translated into any longhand or typewritten account. Coding at the time of the interview serves several functions: (1) It facilitates recording, making it possible to secure a complete history without slowing up an interview, and without losing rapport with the subject. (2) It preserves the confidence of the record, and this is particularly important in a sex study. (3) It facilitates the transference of the data from the original record sheet to punched cards for statistical analyses. (4) It increases the accuracy of the coding because the subject is present at the time of the operation. Where there is uncertainty about the classification of the data, additional questions may be asked and final determinations may be made on the spot. When coding is delayed until after a longhand record is carried back to the laboratory, the investigator too often finds that there are insufficient data to determine what classifications are involved. (5) Coding is of supreme importance in conserving space, making it possible to put the whole of the basic history on a single sheet, or on two sheets in the case of individuals who have especial experience in premarital intercourse, in extra-marital intercourse, in the homosexual, or in prostitution. Where the original record is made in longhand which is subsequently copied onto typewritten sheets, or even where the basic record is made in a standard or special system of shorthand, each history may extend over twenty, or thirty, or in some studies over a hundred or more pages. Coding and punching cards from such data are slow and sometimes well-nigh impossible procedures. Moreover, it is always difficult for the investigator to comprehend the whole of such a history when it is spread over so many pages. If the record is confined to a single sheet, it is possible to correlate any item with any and every other item by a rapid sweep of one’s eye over the page of simple and precisely placed symbols. (6) It facilitates and encourages the systematic coverage of the same items on each and every history. At the end of an interview, a rapid examination of the page will show what items have been missed, and the blank places can be filled before the subject has departed.

The specific code used in the present study cannot be explained because of the necessity for maintaining the confidence of the record. However, certain of the principles involved can be described for the benefit of those who are interested in developing coding devices in other studies:

1. The record is made on a ruled form which provides a number of blocks somewhat in excess of the number of items on the basic history (Figure 2). The form used in the present study is a standard Keuffel and Esser product (General Data Sheet No. 358–230), with over-printing done on our especial order.

2. Each aspect of the sex history is recorded in a particular block or portion of a particular block. The significance of any symbol depends, consequently, upon its position in a particular block on the page.

3. Each block has its own system of symbols, its own independent code.

image

Figure 2. Sample history in code

4. In each block, the available symbols are sufficient in number to designate all of the categories into which the particular data will be classified during subsequent analyses. It is necessary to anticipate the whole array of possible classifications, including those that lie beyond usual experience. The code used in the present study is so flexible that it has been possible to handle every type of overt activity and every sort of attitudinal situation which we have attempted to record in the 12,000 histories now at hand.

5. Since this study has been primarily concerned with percentages of incidence, with frequency distributions, graded scales of attitudes, intensities of response, and other questions of degree, the code provides for a considerable series of classifications of each item. Rarely is it a matter of alternative possibilities—of a yes or a no. It is the usual statistical experience that six to a dozen or twenty categories, and occasionally a few more, provide the number of points best designed to establish a curve; and in most instances there should be that many possibilities in the code for each item, in each block.

6. Where the nature of the code allows, as in using numbers to designate the ages or years involved, there is no objection to recording more detail than is used in the subsequent analyses. It not infrequently happens that such detail proves useful in making finer calculations than were originally planned.

7. The symbols used in coding data in the present study have included various mathematical signs (± , -, ×, √, 0, √ √) and numbers for recording ages, the years involved, frequencies, and still other items. In addition, we have used numbers, letters, symbols derived from standard practice in biology, chemistry, physics, and the other sciences, and some unique symbols developed especially for this study.

8. There is no written key to the code that has been used in the present study, and the strict maintenance of that rule has been necessary for preserving the confidence of the record.

9. In consequence, each interviewer has had to memorize the code and learn, through considerable drilling, the significance of each block, and the symbols pertaining thereto.

10. In coding items that are not discrete in nature, it is necessary to make precise definitions to which each coded symbol should apply. The coding done by different interviewers on a project must constantly be checked to provide strict standardization. This is especially important in coding attitudes, intensities of response, and other non-discrete materials. For this reason in the present study a minimum of such items has been employed, and for these items the judgments made by the several interviewers have been repeatedly analyzed and coordinated.

SUPPLEMENTARY DATA

While most of the material in the present volume is based upon the data which have been routinely secured in the interviews, considerable attention has been given to securing supplementary information by other techniques. These additional data have come from a considerable list of subjects with whom long-time social contacts have been maintained, in some cases for as long as seven and eight years. Time has been spent in their homes, and visits have been made with them to the homes of their friends, to theatres and to concerts, to night-clubs and to taverns, and to their other places of recreation. During these contacts, there have been abundant opportunities to observe how these individuals react to a variety of social, professional, academic, and other situations. With many of them there has been a considerable correspondence which now supplements their original histories with extensive day by day records of their activities, and of their thinking on various aspects of sex. In a number of cases there are several hundred pages, and in each of two cases there are over a thousand pages of such supplementary material. These latter constitute more extensive sexual histories than any which have yet been published on particular individuals. For some subjects there are photographic collections of their imaginative drawings, and scrapbooks; there are photographic collections of the complete artistic output of some of the artists who have contributed to the study, and complete collections of the books written by certain other subjects. For others who have had contact with public agencies, we have transcripts of the court records, institutional records, data from social agencies, and other material. While these supplementary records have contributed little to the statistical tabulations of data, they have provided a considerable portion of the detail which is given in this volume on the physical nature of sexual arousal and of orgasm, and on the psychologic and social concomitants of sexual behavior, particularly in relation to the factors which motivate and control the activities.

A number of persons have turned in sexual calendars and diaries showing their day to day activities over some period of time. The calendars now at hand cover periods which range from six months to more than thirty-five years in length. They admirably supplement the information routinely obtained on the standard histories. They provide data on the weekly periodicity which a seven-day calendar and the consequent social organization impose upon many human activities, as Havelock Ellis pointed out for a group of diaries which he studied (Ellis 1897, 1936 Edit.); and they clearly demonstrate the monthly periodicity of sexual responsiveness in the female, and the lack of any such periodicity in the male. As soon as there are enough of these calendars, it will be possible to run correlations between the precise records they supply and the estimated frequencies of activities obtained in the regular interviews; and it is unfortunate that there are not enough of the calendars yet available to make the analyses in the present volume. Persons who have kept records or who are willing to begin keeping day by day calendars showing the frequencies and the sources of their sexual outlet, are urged to place the accumulated data at our disposal.

Throughout this study, especial attention has been given to the communities in which the subjects of this study have lived. Contacts have been maintained with some of the communities for months or even several years. In that way it has been possible to win confidences from persons who hesitated to contribute their histories when we first arrived. In time one becomes accepted in the homes of a community and becomes acquainted with the daily lives of its individual members. One becomes acquainted with the general attitudes of the community on matters of sex and learns something about the community backgrounds which determine the early development of individual patterns of sexual behavior, and their fixation in the adult histories. One begins to understand how the church, the schools, political leaders, social agencies, and other groups affect the community’s thinking on these matters. One learns how far each community goes in controlling the sexual activities of its individual members, and how its law enforcement officials act when sexual situations are involved. The communities with which we have maintained such long-time contacts include:

College communities connected with a variety of institutions

Several upper middle-class groups

Several professional groups

A remote and isolated rural community

A concentrated and rather large homosexual community in a large city

A Negro underworld community in a large city

Several penal institutional groups

A white male underworld group in a large city

THE TWELVE-WAY BREAKDOWN

It has previously been pointed out that the analyses in the present study have depended upon successive breakdowns of the total population on the basis of twelve biologic and socio-economic factors. Each of the ultimate groups resulting from these breakdowns is, in consequence, homogeneous in respect to these twelve items. The exact nature of each item involved is shown in the following tabulation.

1. Sex. A 2-way breakdown into male and female populations.

2. Race-cultural Group. An 11-way or further breakdown into groups which are:

  (1) American and Canadian White

  (2) American and Canadian Negro

  (3) British (Great Britain)

  (4) Western and Northern European

  (5) Mediterranean European

  (6) Latin American

  (7) Slavic

  (8) Oriental (Asia)

  (9) Filipino

(10) Polynesian

(11) American Indian

There are still other groups to be considered and further breakdowns of the above which can be made whenever the material becomes available. The question is one of race-cultural backgrounds, rather than racial background in the exclusively biologic sense; and the subject’s place of birth, his place of residence during childhood and adolescent years, and the ancestral home of the parents decide the race-cultural group to which he belongs. An individual may be placed in two or more of these groups if he has lived for appreciable periods of time in two or more of the areas, particularly if he has ever had an adolescent background which is definitely different from that of the United States in which he is now living.

The present volume is confined to a record on American and Canadian whites, but we have begun accumulating material which will make it possible to include the American and Canadian Negro groups in later publications. Several hundred histories from still other race-cultural groups begin to show the fundamental differences which exist between American and other patterns of sexual behavior, but the material is not yet sufficient for publication.

3. Marital Status. A 3-way breakdown into single, married, and post-marital groups. The single persons have never been married. The married persons were living, at the time they contributed their histories, either in formally consummated legal marriages or in common-law relations that had lasted for a year or more. The post-marital cases were widowed, divorced, or permanently separated from their former spouses.

4. Age. An 18-way breakdown by five-year periods, ranging from a group which has its maximum age at 5, to a group with a maximum age of 90. The chief difficulty involved here is the existence of three systems for designating age. Each system is more or less confined to a particular social level, persons of lower levels usually calculating in terms of their forthcoming birthdays, while better educated persons are in the habit of expressing their ages in terms of past birthdays. Some persons (perhaps most commonly in the middle classes) express their ages in terms of their nearest birthdays, and this is the system used by the insurance companies and by many government agencies. The error introduced by these diverse systems is rarely compensated for in the literature of the social sciences (Pearl 1940: 74). Ages given in institutional records are likely to be in error by a year, especially where the data apply to lower level inmates. Throughout the present study an attempt has been made to determine the precise year of birth and to calculate all ages as from the past birthday; but this has not always been possible and, consequently, differences in mean ages of two groups that are not more than a year apart are never to be taken as significant, because of the uncertainty involved in the original record (also cf. U. S. Census 1940: Populat. 4 (1):2–4).

5. Age at Adolescence. A 6-way or further breakdown based on the age of the subject at the time of the onset of adolescence. The determination of the year involved in the onset of adolescence is described in Chapters 5 and 9. The breakdown is as follows: Those who start adolescence at 10 or earlier, at 11, at 12, at 13, at 14, and at 15 and later.

6. Educational Level. A 9-way breakdown on the basis of the number of years in a completed educational history, by two-year periods. This classification can be made for those who have permanently stopped their schooling before contributing a history, but it cannot be made for those who are still in school. Specifically, the groups have had the following years of schooling: 0–2, 3–4, 5–6, 7–8, 9–10, 11–12, 13–14, 15–16, 17 plus. The last group includes all those who have done any graduate work. The classification depends upon the educational level attained by the individual, rather than upon the number of years required to reach that level. On the other hand, ever since state laws have required a minimum number of years of school attendance, there have been school systems which pass pupils through the grades and even into high school without respect to their actual achievements, and it is occasionally possible to find an illiterate or even a feeble-minded child who has been in the eighth or ninth grade in school. In such cases, the educational rating of the individual should be lowered to a grade approximating the one in which he could perform satisfactory work. In cases of persons who have acquired their education through private tutoring or through their own independent reading and travel, as sometimes happens in families of upper social levels, the educational rating should approximate the level to which the individual’s achievements would have carried him in a formal school system. There are, however, few instances where it is necessary to make such arbitrary adjustments and, on the whole, the raw rating of an educational level is the best single indicator of the social stratum to which an individual belongs (Chapter 10).

7. Occupational Class of Subject. A 10-way breakdown based upon the classes developed by Chapin (1933) and W. Lloyd Warner (Warner and Lunt 1941, 1942, Warner and Srole 1945), and modified by other workers (Hollingshead 1939). This is an attempt to designate the social status of an individual by measuring the prestige of the work in which he is engaged. Persons within each occupational class not only work together, but carry on their social activities together. They are less often involved in social activities with persons of other occupational classes. The classification does not depend upon the individual’s income. Warner’s original classification has been adapted to the needs of the present study in the following manner:

(0) DEPENDENTS. If the subject is an adult who is dependent upon the State or upon a person other than a spouse for his or her support, the classification is 0. If the individual is a minor dependent upon his parents or other guardians, the classification is shown as a 0, with the classification of the parents shown in parenthesis, e.g., 0 (5) for a minor from a home which belongs to class 5. The classification of a dependent wife is that of her husband.

(1) UNDERWORLD. Deriving a significant portion of the income from illicit activities: e.g., bootleggers, con men, dope peddlers, gamblers, hold-up men, pimps, prostitutes, etc.

(2) DAY LABOR. Persons employed by the hour for labor which does not require special training: e.g., construction labor, domestic help, factory labor, farm hands, junk and trash collectors, laundry help, maids, messenger boys, porters, railroad section hands, stevedores, WPA labor, etc.

(3) SEMI-SKILLED LABOR. Persons employed by the hour or on other temporary bases for tasks involving some minimum of training: e.g., semiskilled labor in factories or on construction jobs, bartenders, bell hops, blacksmiths, cooks (some), elevator operators, filling station attendants, firemen on railroads, firemen in cities, marines, miners, policemen, prize fighters, sailors, showmen, soldiers, stationary engineers, street car conductors, taxi drivers, truck drivers, ushers, etc.

(4) SKILLED LABOR. Persons involved in manual activities which require training and experience. Employed either by the hour or more often for piece work, or on salary: e.g., skilled workmen as defined by labor unions, in factories or on construction jobs, athletes (professional), bakers, barbers, bricklayers (skilled), carpenters (skilled), cooks (skilled), dressmakers (skilled), electricians, farm owners (some), foremen in factories, linemen, machinists, masons, mechanics (skilled), plumbers, printers, radio technicians, tool and die makers, welders, etc.

(5) LOWER WHITE COLLAR GROUP. Persons involved in work which is not primarily manual but which more particularly depends upon their educational background and mental capacity: e.g., army officers (some), bank clerks, bookkeepers, clergymen (in smaller churches), clerks in offices, clerks in better stores, express and postal agents, salesmen (some), secretaries, small store owners, small business operators, stenographers, farmers (some), insurance agents, musicians (some), nurses, navy officers (some), political officers (some), railroad conductors, teachers in grade schools, laboratory technicians, etc.

(6) UPPER WHITE COLLAR GROUP. Including persons of some importance in the business group, army officers (some), bank officials, certified public accountants, clergymen (most), better store owners, better actors, artists, and musicians, navy officers (some), school teachers in high schools, school principals, farm and ranch owners (of better rank), management in construction and other businesses, higher political officers, some lawyers, some dentists, most salesmen, welfare workers, etc.

(7) PROFESSIONAL GROUP. Persons holding positions that depend upon professional training which is usually beyond the college level: e.g., college professors, trained lawyers, physicians, dentists (with better training), trained engineers; some actors, artists, musicians, and writers; some clergymen, etc.

(9) EXTREMELY WEALTHY GROUP. Living primarily on income and occupying high social status because of their monied position and/or their family backgrounds.

8. Occupational Class of Parent. A 10-way breakdown on the same basis as that for the occupational class of the subject. Significant as a measure of the childhood and educational backgrounds of the subject.

9. Rural-Urban Background. A 5-way breakdown as follows:

(0) Never lived on an operating farm

(1) Incidental residence of at least a year, but not for any long period of years in rural areas

(2) Primarily rural, up to 11 years of age

(3) Primarily rural, between 12 and 18 years of age

(4) Primarily rural, after 18 years of age

The classification gives an opportunity to measure the effect of rural backgrounds during those periods in childhood and adolescence which are most important in the development of sexual patterns. A single individual may fall into more than one of these classes. Town farmers who live on farms which are operated for them by other persons, while they maintain businesses and social interests in the city, are not treated as rural.

10. Religious Groups. A 3-way or further breakdown into Protestant, Catholic, Jewish and other groups. Based upon membership, attendance, or any degree of activity or nominal connection with a religious group, in any period of the subject’s life. A particular subject may belong to more than one such group within his lifetime.

11. Religious Adherence. A 4-way breakdown showing the degree of active connection with a particular religious group, as follows:

(1) Actively concerned in a religious group, either as a regular attendant or as an active participant in organized church activities. For devout Catholics, frequency of attendance at confession, and for Orthodox Jews, frequency of attendance at the Synagogue and the extent to which the Orthodox observances are followed, provide measures of the individual’s concern with his religion.

image

Figure 3. Principle involved in a twenty-way breakdown

Showing items used in the analyses in the present study

(2) Fairly frequent church attendance or activity.

(3) Infrequent church attendance or activity.

(4) Practically no church attendance or activity, although the individual’s background may still be classifiable as Protestant, Catholic, or Jewish.

12. Geographic Origin. A breakdown which will be made as soon as the sample in the present study is sufficiently large. Residence is defined as continuous living in a given place for a period of at least twelve months. A single individual may, therefore, claim several places of residence in a lifetime. The state of residence for the most continuous period of time, and the place of residence during the childhood arid adolescent years, will probably represent the most significant part of the data.

Successive breakdowns on these twelve items give a geometrically expanding array (Figure 3) which terminates in a great series of populations, each of which is homogeneous for all of the items involved in the breakdown. With only 12,000 cases now in hand, it has not been possible to make more than a 6- or 7-way breakdown at any point in the analysis, and there are some places at which it is impossible to make anything beyond a 5-way breakdown. As the study progresses, it should be possible and will be desirable to make the 12-way breakdowns outlined above.

Psychologic measurements of mental capacity have been available on two or three thousand of the persons who have contributed to this study. Unfortunately, the tests used have been so diverse, and administered by such a diversity of testers (in schools, colleges, and mental and penal institutions) that it has proved difficult to coordinate the measurements. An investigation of the possibility of a correlation between mental level and patterns of sexual behavior should be undertaken in the further development of the present research program.

The number of groups in the theoretic 12-way breakdown outlined above is nearly two billion, and a complete survey of the whole population would, obviously, be impossible if there were no means of reducing the problem. Fortunately; many of the theoretic groups are non-existent, or so rare in the American population that they are unimportant for study. For instance, it would never be possible to secure a statistically good sample of Orthodox Jewish males who were Negro, single, between the ages of eighty-five and ninety, illiterate, living in rural areas, and belonging to the Social Register. Again, the problem may be reduced by confining the study for the time being to American and Canadian white and Negro groups, and the theoretic eleven race-cultural groups are thus reduced to two. Since the age groups between 10 and 60 are the ones most frequently met with, the age breakdown may be limited to 11 instead of the theoretic 18 groups. In some other classifications, the problem can similarly be confined. Preliminary experience indicates that some groups are so similar that they may be thrown together for analyses. Thus the theoretic 9-way breakdown on the basis of the subject’s educational history can become a 6-way breakdown into groups having 0–4, 5–8, 9–10, 11–12, 13–16, and 17 plus years of schooling; and for most purposes it can become a 3-way breakdown into groups having 0–8, 9–12, and 13 plus years of schooling. Finally, the problem is tremendously reduced because the history of each older person covers data for all the earlier 5-year age periods and, therefore, supplies cases for several of the breakdowns. This reduces the total problem to a small fraction of its theoretic magnitude, and brings it within the range of possible study.

SIZE OF SAMPLE

The number of factors which actually affect human sexual behavior must far exceed the twelve listed above. In spite of the degree of homogeneity which such a breakdown brings, there is a considerable amount of variation still to be found in each ultimate group. In order to understand any group it is necessary, therefore, to secure a sample of such size as will show the full range of the variation in the group, and show the frequency with which each type of variant occurs in the group. This is possible only with samples of some size.

In studies where an over-all picture of a total, undivided population is a chief objective, as in some of the Department of Agriculture surveys, and in most of the problems with which the Census Bureau has been concerned (Stephan, Deming, and Hansen 1940), the sampling has been considered sufficient when a small group of individuals represents each ultimate cell in the population. The validity of such a procedure may, however, be debatable even for over-all surveys, and it is certainly inadequate when an understanding of the variability within any sub-group is the prime concern. Persons who have recommended that we use pin-point sampling, and those who have urged that an elaboration of the techniques of factor analysis could accomplish the ends of this research with a sample of much smaller size, have failed to comprehend that the chief concern of the present study is an understanding of the sexual behavior of each segment of the population, and that it is only secondarily concerned with generalizations for the population as a whole. As subsequent chapters will indicate, there are segments of the population which engage in particular kinds of sexual activities with frequencies that average 10 or 20 times as high as the frequencies in other segments of the population. Scientifically and socially it is of the greatest importance to understand why populations differ as much as that. Pin-point sampling which is designed to secure an over-all picture of a total population provides no basis for analyzing factors which account for differences between groups, and it even obscures such differences, reducing all measures to the sort of mediocrity which a combination of high and low scores always gives.

It has, then, been of prime importance in the present study to determine the size of the sample needed in each ultimate group. Such a determination has been attempted through the strictly pragmatic procedure of making calculations on series of samples of different sizes. Means and medians* for both total and active populations, the incidences of active cases, the range, the height of the mode, and the locus of the mode have been calculated for each of the 698 samples which have been used in this study. Systematic comparisons of the results obtained from the populations of various sizes have provided information on the size of sample necessary for securing relatively stable results. The detailed data are shown in Tables 155 to 162, which form an Appendix in the present volume. A summary of the material shown in those tables is presented here as Table 2.

In every instance, the samples used in the present study have represented populations that were made homogeneous for sex, race, marital status, age, educational level, and either the rural-urban background or the religious background of the individual. The samples of various sizes have all been selected by a randomization performed on IBM machines. The successive samples in each problem contained 50, 100, 200, 300, 400, and (wherever the material was available) 600,1000, and 1500 cases. Where there was still additional material, calculations were made for the total number of cases. In some instances more than 2700 histories were available for the final calculations.

The samples of successive size were all selected directly out of the total population, i.e., the original sample of 50 cases was turned back into the total population after calculation, and the sample of 100 cases was then chosen from the total population. This process was repeated for each sample of subsequent size. In no instance was the larger sample obtained by adding cases to the smaller sample. If the latter procedure had been used, the results might have been confused because of variation in the increments which were added. The present method of selecting each sample has strictly confined the study to the problem of sample size.

Table 2 shows how many of the samples of various sizes (50, 100, 200, etc.) gave statistical calculations that were close enough to calculations derived from the largest samples to have been acceptable without the further accumulation of cases. The statistics derived from the largest sample in each series have been the bases for measuring the adequacy of the results obtained from the smaller samples. The smaller samples were identified as adequate whenever the statistics calculated from them came within 5 per cent, plus or minus (i.e., within a total range of 10 per cent), of the corresponding statistics derived from the largest samples. All comparisons shown in the table have been on this 5 per cent basis, except the comparisons of incidence data, for which a range of error of only 2 per cent was allowed. These definitions of adequacy have been, of course, quite arbitrary. It is obviously possible to calculate the adequacy of each sample when a larger range of error, or when only a smaller range of error, is allowed. A whole series of such calculations should be made before this sample study is completed; but such an extended statistical survey must be pursued elsewhere, rather than in the present volume.

image

Table 2. Size of sample versus adequacy of sample

For an explanation, see the accompanying text.

An examination of Table 2 warrants a number of conclusions concerning the size of the sample that is needed for each of the ultimate cells in the present study. These generalizations will need to be modified before they are extended to problems in other fields; but they have served as guides in the set-up of the immediate problem, and should provide some help to others who are interested in setting up similar surveys.

1. Samples of 50 cases chosen at random after a 6-way breakdown of the total population occasionally give results which are within 5 per cent of those obtained from samples of 1000, 1500, or more cases. This happens in something between 5 and 60 per cent of all the problems which we have worked; but in most categories hardly 20 per cent of the samples of 50 prove adequate. If a sample of 50 is all that is used, the calculations of various statistics for various types of sexual outlet could not be depended upon in more than 1 in 5 cases.

2. A sample of 100 proves adequate, by the above definition, in a much larger number of cases. 1 in 3 or even 1 in 2 of the samples of 100 give results which are within 5 per cent, plus or minus, of those obtained from the largest samples.

3. There has been a corresponding increase in the quality of these samples when the cases were increased to 200.

4. There is a still more marked increase in adequacy when 300 cases are used. On most of the statistics calculated on populations of 300, two-thirds to three-fourths or more of all the samples give results which are nearly identical to those obtained from the largest samples.

5. Samples of 400 show still improved quality in regard to nearly all the calculated statistics except the frequency data; but the improvement is hardly enough to warrant the time and effort involved in gathering the last 100 cases—unless it is important to obtain greater precision than 300 cases afford.

6. Samples of various sizes between 400 and 1500, or even 2700 cases, fail to show any consistent improvement. By standard statistical theory, a steady albeit slow improvement in the quality of the calculations might have been expected as the size of the sample was increased. We fail to find that this is so in the present problem. On the contrary, the statistics calculated on the larger samples vary erratically from sample to sample, almost as much as they do between populations of two, three, and four hundred cases.

7. The incidence data are the most stable, and samples of 50 or 100 give results which, in many cases, are comparable to those obtained from the largest samples (Figures 4, 5). If samples of 200 or 300 are used, in half or more of these small samples the incidence data fall within 2 per cent of those obtained from the larger samples.

8. The locus of the mode is adequately determined in two-thirds of the samples of 50 cases, and in 80 or 90 per cent of the samples of 100 cases. This statement would be modified, of course, if the categories used in the frequency distributions were more or less extended than those which have been used in the present study (Figures 4, 5).

image

Figure 4. Relation of size of sample to statistical values

Size of each sample is shown in the figures at the base of each bar. Means and medians in each series are on the same scale, and therefore directly comparable.

 

image

Figure 5. Relation of size of sample to statistical values

Size of each sample is shown in the figures at the base of each bar. Means and medians in each series are on the same scale, and therefore directly comparable.

9. For most of the other statistics, samples of 300 are markedly better than samples of 200, except for the frequency data where samples of 400 are necessary to obtain consistent results (Figures 4, 5).

10. The range of variation which actually exists in a population is not adequately shown by any small sample. There is a steady extension of the range of variation through samples of 300 or 400, and in some cases the range is materially increased by still larger samples (Figures 47)

11. Frequency curves become increasingly smooth as the samples increase in size, at least up to 200 or 300 cases (Figures 6, 7). On some problems, they do not reach the ultimate degree of smoothness until 400 or 500 cases are used; but it is a waste of time and effort to secure a larger series of cases. Frequency curves never do reach the ideal in smoothness, at least with any large sample of the size (2700) which we have had for testing.

12. It is well known statistically that the adequacy of a sample depends upon its range of variation as well as upon its size. There are, therefore, some phenomena that may be sufficiently illustrated by samples that are inadequate for measuring other phenomena. The frequencies of masturbation, for instance, show a wider range of variation than the frequencies of nocturnal emissions, and the latter are sufficiently explored (Table 2) with a much smaller sample than would serve for describing masturbation in the same population. The size of a sample in a case history study, however, must be adequate for the examination of the most variable phenomenon which is to be studied.

13. Balancing the diverse considerations outlined above, we reach the conclusion that samples of 300 are desirable in each of the ultimate cells of the present study. Samples of 400 are enough better to warrant gathering that many histories when they are available. Samples of still larger size do not add enough information to warrant their use, and we have avoided going after such samples. The larger samples which are shown in a few places in the present volume have been obtained for the sake of an ultimate 7-, 8-, or even 12-way breakdown of the data.

14. While samples of 300 are more dependable than smaller samples, calculations based on samples of 100 or 200 have considerable significance, and calculations made in the present volume on samples of that size need not be dismissed as inadequate (Table 2, Figures 4, 5),

15. In a few cases, samples of 50 give a good indication of the results that a large sample would give. However, such small samples have been used in the present volume only when they belong to series for which most of the points are established by relatively large samples. Samples of 50 are used, for instance, to place older groups in age series for which larger samples of younger males have already established the trends.

16. Samples of less than fifty cases have not been used for any of the calculations in this volume. On occasion, incidental references have been made to such small groups.

17. All of the above conclusions apply to populations which are homogeneous for six of the factors which are used in the basic breakdown of the present problem. Preliminary calculations indicate that when seven or more breakdowns are made, the increasing homogeneity of each cell makes it possible to base analyses on something less than the three hundred cases called for above. Pragmatic tests of the size of sample necessary for these more complex breakdowns will have to be made as this research progresses.

It is customary in statistics to measure the accuracy of a calculated mean by computing its “standard deviation” (represented by the symbol σm), or by some similar measure of significance. This defines the limits on either side of the calculated mean, within which there is a 2 to 1 chance that the actual mean, the reality, may fall. Unfortunately, standard deviations of means are sometimes misinterpreted as measures of the adequacy of the samples on which they are based. In Tables 155156 in the Appendix, Standard deviations are attached to all of the means calculated for the samples of various sizes. Some idea of the effect of adding cases to originally smaller samples may be obtained from an examination of these standard deviations. More extensive comparisons of the significance of these statistical measures, in contrast to the results obtained by the pragmatic testing of sample size, will need to be made elsewhere at some later date.

Where the distribution of the variants in a population is fairly homogeneous (as in some physical universes), and where the range of variation is within limits which can be fairly well anticipated (again, as in some physical universes), a relatively small sample may be representative of the whole. But in the living world the distribution of the variants in any population is usually more irregular, and it is less often possible to anticipate the full range of variation. The number of factors affecting living protoplasm, and particularly the number of factors affecting the behavior of whole organisms, is infinitely greater than the number affecting most physical phenomena. There is, in consequence, much greater variation among living structures and biological phenomena. Behavior characters vary even more than physiologic characters, and these in turn vary more than morphologic characters (Pearl 1946:43 ff.). Frequency distributions of physical phenomena often follow standard curves or simple permutations thereof; but frequency distributions in the living world are rarely normal, and usually fall into irregular curves that are sometimes not even smooth curves, as our own work on insect measurements has shown (preliminarily reported in Kinsey 1942), and as the frequency curves in the present volume will also demonstrate.

In such non-homogeneous populations, it is quite possible to collect a few individuals so nearly alike that the standard deviation of the mean is small. Unfortunately, in too many biologic, psychologic, sociologic, and anthropologic studies, including some of the published sex studies, such small standard deviations are taken as indicators of the adequacy of a sample, even though it may have only a half dozen or a dozen or a score or two individuals in it. Such a use of standard deviations or probable errors as measures of validity involves a misunderstanding of their real nature and function. The student with practical experience in taxonomy or in human surveying soon learns that the addition of a few more cases to such small samples may introduce data that are outside of the range of variation covered by the original specimens, and that such additions may alter the original calculations to an extent which would never have been anticipated through an examination of the standard deviations of the means. Each investigator must know the general order of the variation that may occur in the material with which he works, see to it that the sample is well spread through the whole range of variation, and learn through some pragmatic means the general order of the sample size that will begin to represent the whole of the universe that is being sampled. At that point, and not before, standard deviations serve to indicate the range within which the calculated means may match reality. It is for that purpose, and not as measures of the adequacy of the samples, that standard deviations have been calculated and attached to the means shown in the tables throughout this volume.

image

Figure 6. Relation of size of sample t of o rm of frequency curve

Showing frequency distributions for total outlet. Based on single males, belonging to the age group 16-20, of grade school level (0-8), and inactive Protestant.

 

image

Figure 7. Relation of size of sample to form of frequency curve

Showing frequency distributions for masturbation. Based on single males belonging to the age group 16-20, of college level (13+), and inactive Protestant.

It is important to understand that the sampling techniques used in the present study call for more or less equal samples from each of the ultimate groups, irrespective of the relative size of each of those groups in the population as a whole. This has been called “stratified sampling” (Snedecor 1946. See Whelpton and Kiser 1943–1945 for an instance of its use). On the other hand, many persons think of sampling as a technique that draws from each group in proportion to the size of that group in the total population. This is “representative sampling.” Such samples may, in actuality, serve when the objective is a single set of figures which will describe the entire population. But whenever one attempts to understand the particular groups of which a population is composed, such a course is unacceptable because data so obtained are of variable reliability, due to the differences in the sizes of the samples which represent the several segments of the population. For instance, Negroes constitute less than 10 per cent of the total population of the United States (U. S. Census 1940); but a Negro sample that was only a tenth as large as the white sample would be much less adequate than the white sample. If one is to study Negroes as a group, one should have as many Negro cases as white. Similarly, the samples for each of the other cells in the present study should be more or less equal in size. This is a principle on which the public opinion surveys depend, and the principle about which the present study has been organized.

DIVERSIFICATION OF SAMPLE

In a physical universe, or even in measuring dead insects, it is possible to choose the cases which enter into any sample by some carefully planned system of randomization which avoids bias on the part of the investigator, and minimizes those fortuitous circumstances which account for the irregular distribution of particular kinds of individuals within a population. By the same token, the ideal set-up in a human study would involve a preliminary survey in which every person in the total population, or a randomized percentage of all persons, would be required to provide the information which would allow him to be classified on the basis of the items involved in the analysis of the problem (e.g., the six-way or twelve-way breakdown in the present study). From the persons that fall into each ultimate cell, the necessary number of cases would then be selected by some thorough scheme of randomization, and persuaded or commanded to contribute the full and complete data necessary in the survey. A recent survey of factors affecting fertility, sponsored by the Milbank Foundation Fund, chose its sample in this way from white couples in the city of Indianapolis (Whelpton and Kiser 1943–1945).

Unfortunately, human subjects cannot be regimented as easily as cards in a deck, and the investigator of human behavior faces sampling problems which are not sufficiently allowed for by pencil and paper statisticians. In a nation-wide survey, it would be impossible to make the preliminary investigation necessary for classifying the population on a twelve-way, or even a six-way breakdown. Neither is it feasible to stand on a street corner, tap every tenth individual on the shoulder, and command him to contribute a full and frankly honest sex history. Theoretically less satisfactory but more practical means of sampling human material must be accepted as the best that can be done.

The first principle to observe in securing histories is that of diversifying each collection which enters into the sample. Even after a twelve-way breakdown, the population in each ultimate cell is still affected by a multiplicity of factors which cause variation in the group. Even after a twelve-way breakdown, a sample from one city cannot be taken as representative of cities in general. A study based on New York City (as nearly half of the previous sex studies have been) cannot be taken as representative of all other cities. The population in one city block differs from the population in the next block in the same city. A group from one church is not a duplicate of a group from the next church. The factory workers in one plant do not duplicate the factory workers in the next plant. Skilled carpenters must not be taken as representative of all skilled craftsmen. The students in one girls’ college must not be depended upon for the total sample from exclusively girls’ schools. The cases that are used to represent each ultimate cell in a human population should be drawn from a number of groups, widely distributed geographically, and including as great a diversity as is possible within the limits of the group.

HUNDRED PERCENT SAMPLES

Since it is impossible to secure a strictly randomized sample, the best substitute is to secure one hundred percent of the persons in each social unit from which the sample is drawn. One hundred percent of the members of a family group, all the persons living in a particular apartment house, all the members of a college sorority or fraternity, all the persons in some service club, all the members of some Sunday School class or some other church organization, all the persons in a city block, all the persons in a rural township, all the inmates of some penal or other institution, all the persons in some other unit, provided that unit has not been brought together by a common sexual interest.

image

Table 3. Comparisons of hundred percent and partial samples

The “partial samples” include both the hundred percent groups and the volunteers obtained outside of the hundred percent groups. Comparisons have been made on this basis in order that these “partial samples” should correspond with the samples on which calculations have been made throughout the present volume. Populations for the hundred percent samples in the three age groups are 655, 664, and 367, respectively; and for the partial samples, 2144, 2197, and 1531, respectively.

Securing a hundred percent of any group is, in actuality, more feasible than securing a good random sample of the same group; for, as already noted (Chapter 2), it is possible to develop a community interest in a group project, and this puts considerable pressure on each individual to contribute as a matter of loyalty or obligation to the group of which he is a part. It is, of course, easier to secure a hundred percent of a smaller group, unless it be a group of inmates in an institution, and it is ordinarily impossible to secure a hundred percent of any group unless the investigators can work with it for a period of weeks or months. Ordinarily it is not profitable to try to secure a complete sample until an appreciable portion (perhaps a half or more) of a group has contributed. Then the first persons who have given histories can help develop a group project by enlisting whatever organization there is to make it an official project. The time required to secure such a sample is costly, as calculated per history, and that is one reason why a larger number of hundred percent groups has not yet been secured for the present study. In some cases it has been necessary to work with the last few individuals in a group for as long as a year or two before they agree to contribute.

Of the 12,000 histories now at hand in the present study, 3104 (= 26%) have come from hundred percent groups. These groups have come from the following sources:

image

These hundred percent groups have come from some variety of sources, but only the college groups are well enough represented (by series of at least 300 cases) to allow their use in testing the validity of the partial sample in this study. The accumulation of many more histories in these complete samples is one of the important things to be followed through in the future development of this project.

image

Table 4. Comparisons of data obtained from partial and hundred percent samples

Based on males of the college level.

 

image

Figures 8-10. Comparisons of accumulative incidence curves based on hundred percent and partial samples.

For males of college level (13+).

 

image

Table 5. Comparisons of data obtained from partial and hundred percent samples

Based on males of the college level. Petting is pre-marital. Total intercourse includes pre-marital, marital, extra-marital, and post-marital relations with both companions and prostitutes.

By means of Table 3 it is possible to compare the frequency and incidence figures for the 15 groups on which there are sufficient cases in the hundred percent sample. It will be seen that the active incidence figures (recording the number of persons who are involved in any particular period of time) show a remarkable conformance between the partial sample and the hundred percent portion of that sample. The same is true of the accumulative incidence figures (recording the number of persons who have ever been involved) (Tables 46, Figures 813). The differences usually involve 1 per cent to 5 per cent of the population. Throughout this study it may, therefore, be accepted that both the active and accumulative incidence data and curves show the general locus of the reality, though the curves may need correction of a few percent one way or the other. For instance, the actual, accumulative incidence figure for masturbation in the college segment of the population must lie within a few degrees of the 96 per cent figure given by the data; and whether it is in actuality 94 per cent or 98 per cent is not of much moment; but it is certain that it is not the 85 per cent nor 90 per cent figure given by some studies, nor the 100 per cent figure often guessed at, nor the 7 per cent figure found in one study (Bromley and Britten 1938). Similarly, there can be no question that the actual accumulative incidence figure for the homosexual in the college-bred group lies somewhere between the 28 per cent figure derived from the hundred percent sample and the 34 per cent figure derived from the partial sample of college histories, and that it is nowhere near the 1 per cent to 2 per cent figure which has been commonly published, nor even the 10 per cent figure which has been the maximum previously suggested.

There are greater discrepancies between the frequency figures (the number of times per week each type of activity is engaged in), as calculated from the hundred percent samples and from the partial samples. The figures derived from the partial samples are consistently higher for the total sexual outlet and for all the individual outlets except nocturnal emissions. For this, there are a number of possible explanations, and it seems impossible to identify the primary factors until we can secure more material for analysis. It is quite probable that a number of factors are really involved. The following considerations should be kept in mind:

1. The volunteers who make up the partial sample may represent a more active group of individuals, of the type which is aggressive, responds to a call for cooperation in a survey, and is more responsive and less inhibited sexually. It is true that the last persons to contribute in a hundred percent sample are sometimes the more prudish, restrained, apathetic, and sexually less active individuals. If this is often true, then the frequency figures throughout this volume should be reduced by some percentage, and an increasing proportion of the future intake should be secured from hundred percent groups. However, there are other factors (given below) which are undoubtedly involved, and the discount made on the frequency data for the partial sample should not be more than some undetermined fraction of the difference between the figures for the partial sample and the figures for the hundred percent groups.

image

Table 6. Comparisons of data obtained from partial and hundred percent samples

Based on males of the college level.

 

image

Figures 11-13. Comparisons of accumulative incidence curves based on hundred percent and partial samples

For males of college level (13+).

2. The hundred percent samples are not entirely representative, for they are not as well distributed as the partial sample is through the whole of the population, even in the college group from which the largest hundred percent samples have come.

3. The hundred percent samples from college groups include an undue number of sexually less experienced freshmen, because the freshmen groups were large in the particular fraternities which contributed most heavily to these samples. Moreover, 28 per cent of the hundred percent sample is Jewish, while only 10 per cent of the partial sample is Jewish. The Jewish histories (Chapter 13) are less active than the histories of some other groups, and this will to some extent account for the lower figures in the present hundred percent sample.

4. The persons contributing to the hundred percent samples may have covered up more of the fact, because they did not contribute as willingly as the volunteers who made up the partial sample.

5. Persons with socially taboo items (e.g., pre-marital intercourse, extramarital intercourse, homosexual activity, animal contacts) in their histories are often among the last to contribute to a hundred percent sample, and in a number of instances complete collections may have been forestalled by such persons. On the other hand, these special histories can be secured in a partial sample by making contacts through the friends of these persons. There is no doubt that the more extreme histories will always have to be obtained in some way other than through hundred percent samples.

6. The hundred percent samples are of smaller size than the partial samples, and therefore less reliable. The partial samples show wider ranges of variation, and this raises the values of the means. With larger series, the means in the hundred percent samples might be raised.

CONTROLLING PARTIAL SAMPLES

The above comparisons indicate that there is considerable merit Jo samples obtained from volunteers who respond to a general appeal for histories at a lecture, or through some organizational agency, or who respond to a more individual appeal. But such volunteer samples can be quite inadequate, if they are not safeguarded at every step in a study.

1. All general appeals for histories have emphasized the importance of securing every kind of history—“histories that have everything in them and histories that are complete blanks”—“big histories and little histories and every other kind of history”—“histories that are quite usual and histories that have things in them that some people consider wrong or abnormal, but which we accept as objectively as any other kind of history.” The restrained histories have, on the whole, been the more difficult to get, and it has been constantly necessary to reassure individuals with relatively inactive histories that they were contributing to the study in as important a way as the persons with more active histories.

2. Contact persons have had to be educated to understand that “a good history” is a history that accurately reports everything, rather than a history that has some special element in it. Especially at lower levels, where the contact men have been paid, it was difficult at first for them to understand that the forty-minute history of an inexperienced teen-ager is as important as the two- or three-hour history of an older person who has been involved in every conceivable sort of sexual activity.

3. Experience indicates that the first volunteers from any group are likely to be the extrovert, aggressive, sexually less inhibited, and often more active individuals; but if a group is worked with over a longer period of time the sample becomes more diversified. For that reason, we have, in general, avoided working with groups where only a single appeal could be made, or where the time for taking histories was limited to a few days or even a week or two. Also for that reason, we have concentrated on securing samples from a more limited number of cities and towns, and from particular groups to whom we might return over periods of months and even years. Some of these groups have been contributing throughout the eight or nine years of the research. In such places, some persons contribute even after two or three years of refusing—finally convinced by the reaction of the community that their socially irregular or utterly blank histories can be reported without embarrassment, and that the project is, after all, worth while. The partial sample employed in this study would never have been as representative as it is if we had not had such long-time contacts with most of the groups.

It is unfortunate that we do not yet have large enough populations to measure the differences between first samples and subsequent samples from the same community. It is possible, however, to report measurements on one college group where Maslow’s dominance and security ratings were available on some of the females who contributed histories to the present study of sex behavior (see Maslow 1940, 1942a, 1942b; Maslow, Hirsh, Stein, and Honigmann 1945, for a detailed description of the tests). The first volunteers seemed to be more extrovert and assured individuals (though how that affects a sexual history is not yet clear). By staying nearly a month in the community, a sample was obtained from about 400 students, on 92 of whom dominance and/or security scores were available for comparison with about 80 students who were in the same psychology classes but who failed to volunteer for histories. The volunteer group showed the full range of variation in dominance and security ratings, from the most aggressive to the most timid levels. The mean dominance rating for the group that had volunteered for histories was about 10 per cent higher than for those who had not volunteered; the mean security score was about 3 per cent lower. We are indebted to Dr. A. H, Maslow for the data which allow this analysis.

4. Considerable attention must be given to securing an appreciable portion of each group from which histories are taken, even when it is not possible to secure a hundred percent sample. In many instances half to three-quarters or more of each group has been secured. We have an impression (but as yet insufficient data to test it) that such a sample is not so different from a complete sample. There is one statistical study (Shuttleworth 1941) that suggests that a sixty per cent sample is still insufficient to represent the whole.

Whenever, as in the present survey, it is not feasible to secure a strictly randomized sample, a combination of hundred percent sampling and controlled partial sampling seems the best that can be done. To attempt to base the entire study on hundred percent sampling would not be satisfactory, for it would be impossible to secure such complete samples in sufficient number from all of the diverse groups in a population. Sufficiently controlled partial samples seem to have considerable value, especially when they are offset by an even greater proportion of hundred percent samples than we have, as yet, utilized.

ORDER OF SAMPLING

The present study has been very much speeded up while the cost has been kept at a phenomenally low minimum—actually between 2 per cent and 4 per cent of the cost per history of the previous personal interview studies in this field. This has been primarily because of a policy of accepting whatever histories were immediately available, rather than going after particular sorts of histories in particular sequence. After securing the histories, they have been placed in the classificatory cells to which they belong. The value of such a policy was learned through our experience with insect sampling. The customary procedure of searching for particular persons to represent particular segments of the population is expensive because of the work involved in locating those particular cases. If one is satisfied to accept material in the order in which it appears, one sooner or later finds the particular cases which are necessary for the completion of a study. While we have always endeavored to secure some degree of diversity in our sample, we have not failed to seize the opportunity to take histories from the immediately available groups, until enough histories had been secured to satisfy the demands in those groups. At the present writing there are only two cells from which we have enough histories, and it is now a matter of avoiding cases that belong to those particular groups. In the course of time one has to go further out of his way to secure histories from certain other groups, and that will increase the cost; but the cost can always be kept relatively low if one bides his time and takes the material that is most available.

SYNTHESIZING A U. S. SAMPLE

While, as just indicated, data on each of the ultimate groups in the population are the first objectives of the present study, it has been desirable at certain points to calculate statistics which would be applicable to some larger group, as, for instance, ail single white males in the U. S. population, or all married white males, or all white males of all sorts in the total American population. This has been accomplished by weighting the raw data from each of the ultimate groups in the study, in proportion to the size of that group in the U. S. Census, and totalling the weighted results for all the groups. The Census of 1940 shows the distribution of the total population by all the items which are involved in the six-way breakdown employed in the present volume: sex, race, marital status, age, number of years of schooling (without a clear distinction between current and completed educational histories), and the rural-urban background (on a slightly different basis than the one employed in the present study). At a few points where the Census breakdowns do not exactly match our own (e.g., in their failure to indicate what proportion of the population is pre-adolescent, and in regard to the educational record as noted above), it has been possible to make estimates which cannot have introduced an error of more than a fraction of one per cent into the calculations. Tables 7 to 11 show the constants thus derived from the 1940 Census figures. They are the bases of the calculations which appear throughout the present volume as “U. S. Corrections” of the raw data. To make any correction from these tables, each item in the raw data should be multiplied by the figure shown at the appropriate point in the table. The products of all the items in any particular age group are then totalled and divided by the “age weight” figure (the second column in each table).

An examination of the tables and charts throughout this volume will show how far apart raw data and “U. S. Corrections” may be. Since the smaller groups in stratified sampling should be represented by samples of the same size as those used for the larger groups, they unduly affect the calculations made for a total population. Therefore, in the case of phenomena which occur most frequently in groups which constitute only a small proportion of the population (e.g., masturbation, nocturnal dreams, and petting in the college population), the raw data for the total population give higher averages than the U. S. corrected data (e.g., Figures 3842, 5357, 5963). Conversely, in the case of phenomena which are more common in groups that constitute a larger segment of the population (e.g., pre-marital intercourse and the homosexual in a population that has gone into high school but not beyond) the raw data are distinctly lower than the U. S. Corrections (e.g., Figures 7175, 7781, 8387). The public opinion polls and most of the government surveys are aware of this problem, but it is most unfortunate that students in psychology and the social sciences regularly publish raw data without corrections for the Census distributions of their populations. As these figures and many others will show, the raw data are sometimes as much as 34 per cent removed from the corrected data, and the general shape of the curve may be considerably changed by the corrections. Throughout the present volume, the figures given in the body of the text and the heavier lines shown in all the charts represent U. S. Corrections of the raw data, except in those relatively few instances where corrections have been impossible because of insufficient information in the Census.

image

Table 7. Six-way breakdown, U. S. Census, 1940

Weights to be used for correcting raw data on populations resulting from a 6-way successive breakdown on MALES where RACE, RURAL-URBAN BACKGROUND, EDUCATIONAL LEVEL, MARITAL STATUS, and AGE are known. Classification based on 44,743,534 white males aged 15 and over. Estimated number of single white adolescent males through 14 years of age — 2,052,793. These are not included in totals because the data are not segregated in the U. S. Census; but estimates are shown in parentheses on the first line of figures in the table.

image

Table 8. Five-way breakdown, U. S. Census, 1940

Weights to be used for correcting raw data on populations resulting from a 5-way successive breakdown on MALES where RACE, EDUCATIONAL LEVEL, MARITAL STATUS, and AGE are the items involved in the analyses. For males who are under 19 years of age and still in grade or high school, estimates have been made of the educational levels which they will ultimately attain. Persons who did not report their education in the Census are eliminated from this calculation. Cf. legend on Table 7.

 

image

Table 9. Five-way breakdown, U. S. Census, 1940

Weights to be used for correcting raw data on populations resulting from a 5-way successive breakdown on MALES where RACE, RURAL-URBAN BACKGROUND, EDUCATIONAL LEVEL, and AGE are the items involved in the analyses.

 

image

Table 10. Five-way breakdown, U. S. Census, 1940

Weights to be used for correcting raw data on populations resulting from a 5-way successive breakdown on MALES where RACE, AGE, the RURAL-URBAN BACKGROUND, and MARITAL STATUS are the items involved in the analyses. Cf. legend on Table 7.

 

image

Table 11. F o u r-way breakdown, U. S. C e n s u s , 1940

Weights to b e used for correcting data on populations resulting from a 4-way successive breakdown on MALES where RACE, AGE, and eitherthe RURAL-URBAN BACKGROUND, or the EDUCATIONAL LEVEL, or the MARITAL STATUS are the only items involved in the analyses.

STATISTICAL ANALYSES

All mathematical calculations on this project have been performed twice, independently by each of two persons. Computations have been set up on standard ruled forms, and these are all filed for consultation by any qualified student who needs to check the method or accuracy of the calculations.

The statistical manipulation of the data in this study has been kept at an absolute minimum. The incidence data (the record of the number of persons involved in the various sexual activities) are subject to error because of deliberate or unconscious cover-up, especially in regard to socially taboo items. The frequency data (the number of times the activities are engaged in) cannot be more than approximations to the actual fact, because sexual activities are more often irregular in their distribution, with days or weeks of high frequency alternating with days and weeks of low frequency, and only the persons accustomed to the handling of averages (as few people are) can estimate their mean frequencies in more than very approximate terms. Individuals who have kept diaries or calendars may have more accurate bases for their estimates; but few people have as yet turned in such records (see p. 74). For these reasons, the calculations in the present study are likely to involve greater errors than if it were a study of some other kind of phenomenon. In large series of cases, errors which are overestimates are sometimes compensated for by errors which are understatements, provided there is no bias which accumulates the errors primarily in one direction; but even then there can be no great precision to the calculations.

In consideration of the approximate nature of the original data, it would then be misleading to subject them to more than relatively simple mathematical treatment. For that reason, only the following statistical operations have been performed on each history and on each series of histories.

Individual Frequencies. Average frequencies of orgasm have been calculated on each history for each type of sexual activity, namely, masturbation, nocturnal dreams, heterosexual petting, heterosexual coitus, homosexual contacts, and contacts with animals of other species. Heterosexual relations have been calculated as pre-marital coitus with prostitutes, pre-marital coitus with females who are not prostitutes, marital coitus, extra-marital coitus with prostitutes, extra-marital coitus with other females, post-marital coitus with prostitutes, or post-marital coitus with other females. For the purposes of the present volume, only sexual activities which have led to orgasm have been included in these frequency calculations, although there are many other aspects of human sexual behavior which will also be considered in this and in later volumes. Throughout this volume all frequency figures have been calculated for each individual as average frequencies per week. In some of the previously published studies, such activities have been recorded as rates per month; but except for low frequencies, few persons are capable of estimating average rates for such a period of time. The social organization imposes a weekly periodicity on various human activities, including the sexual (Ellis 1901 (1936): 85 ff.), and weekly rates are consequently better known to most persons.

In summarizing the record on individuals and on groups, frequencies have been standardized as average frequencies per week extending over five-year periods involving ages 11–15 (inclusive), 16–20, 21–25, 26–30, etc. In these periods, weeks or years which were without sexual outlet have been averaged with the active periods, and in that way seasons of inactivity have lowered the weekly rates for the whole of a particular five-year period. Since the calculations apply only to the activities which occur after the onset of adolescence, the first age period really extends from adolescence to 15, and is usually something less than a five-year period. In the latter case, the averages shown are based on the active years, and are not reduced by being averaged with the pre-adolescent years. The last age period—the period in which the subject contributes his history—is treated in the same fashion, if it is less than a full five-year period.

For each outlet, average frequencies per week, per five-year period, have been calculated precisely to the first decimal place. Group averages have consequently been calculated to the second decimal place. Because of the approximate nature of the raw data, finer calculations have not seemed warranted.

Group Frequencies. The nature of any population has been found by classifying all of the individuals in it into frequency classes which have been named for their upper limits. The ranges of each class and the class means used for calculations have been as follows:

image

Frequency Curves. The number of individuals which fall into each of these frequency classes has been translated into percents of the whole population involved. Frequency curves throughout this volume have been based on such percents, rather than on the absolute number of cases in each frequency class. Many of the curves shown in psychologic and sociologic literature are uninterpretable because they are based on the absolute number, instead of upon the percentages of cases involved. All of the frequency curves in this volume are based on the actual calculations, and in no instance have they been smoothed by any process or approximated by interpolations or other sorts of estimates or predictions.

Group Averages. These have been calculated for each type of sexual outlet for the 5-year periods described above, for each population which has had 50 or more cases in it after 4-, 5-, or 6-way breakdowns of the total sample. All tabulations of data by groups, and all correlations, have been made by putting the data onto standard punch cards (Hollerith, IBM system), and all manipulations of cards have been performed on IBM machines. Both the punching of the cards and the handling of the machines on this project have been done by members of the research staff, in order that there be no betrayal of the confidence of the record. Each series of punch cards has carried a particular portion of each history, e.g., the frequencies and sources of outlet on one set, the record of the pre-adolescent material on another set, the accumulative incidence data on another, etc. Thirteen sets of cards (i.e., thirteen or more cards for each of the histories) have been punched for the calculation of the data presented in the present volume. Each of the thirteen cards in each set has carried the identical record of the age, educational level, occupational class, and other social backgrounds of the subject, mechanically reproduced on the thirteen cards to insure identity. Thus it has been possible to correlate all of the data on the thirteen sets with the same educational and social items. By a gang punch technique, it is possible to correlate the material on one card with the material on each other card.

Means. The averages which have been calculated have included mean frequencies for the population in each group, and means for the “active populations” in each group (i.e., for those individuals who had any activity in that five-year period, in that particular type of sexual outlet). Means have been calculated by the formula:

image

For those who are not familiar with statistical practice, it may be pointed out that a mean represents the total number of measurements (in the present instance, the total number of orgasms) in each group divided by the number of individuals in the group. The mean represents the midpoint of the measurements. Its position (in contrast to the position of the median, which is described below) is therefore materially affected by the presence of even a few high-rating individuals in a population; and although the arithmetic mean is the average which is most commonly employed, both by most people in their everyday affairs and by the trained statistician, it may give a distorted picture because a few high-rating individuals affect the means more than a large population of low-rating individuals. Since nearly all of the distribution curves on human sex behavior are strongly skewed to the right (to the high frequency end of the curve), the means are quite regularly higher than the location of the body of the population would lead one to expect. Conversely, inactive cases in a population (i.e., in the 0 class of frequencies) have a minimum effect on the position of the mean.

Standard Deviation of the Mean. This is also known as the standard error of the mean, and as the sigma of the mean. It is represented by the symbol σm. The standard deviation of each mean has been calculated in every instance, using the formula:

image

This formula is generally considered precise, and has the advantage of being calculable with maximum efficiency on a calculating machine. For the general reader, it may be pointed out that the standard deviation is attached to each mean shown in this volume, as follows:

image

The σm is supposed to indicate the size of the error which may be involved in the mean—the limits, plus or minus, within which the true mean (as distinguished from the calculated mean) stands a 2 to 1 chance of lying. The size of σm in relation to the size of the mean indicates the degree of reliability of the calculated mean, and the smaller the σm, the less the probable error.

Medians. Median frequencies have been calculated, in every group, for the total sample population, and for the active population. Medians have been calculated by the formula:

image

If all the individuals in a group are arranged linearly in accordance with the average frequencies of orgasm, the median designates the frequency of the individual who stands exactly midway in that series. Half of the individuals in the population have less frequent orgasm, half the individuals* have more frequent orgasm. While the median is an average which is less often calculated by people in their everyday affairs, and while it is a statistic which has often been neglected by statisticians, it answers the very common question: “How frequently does the average individual engage in such activity?” and it provides, therefore, a most useful type of information. Recently statisticians have paid more attention to its significance. The location of a median is determined solely by the sequence of the individuals in a population, and it is unaffected by the low or high rates of particular individuals. Means and medians are averages which summarize two very different ideas, and in consequence their relative importance cannot properly be discussed. Means measure average frequencies, medians describe the average individuals.

Persons not familiar with these matters should understand that where most of the individuals in a sample belong in a frequency class which is midway between the extremes of the distribution, and where an equal number of individuals lie in symmetrical distribution on either side of the mid-point, the mean becomes identical with the median. Where the curve is asymmetric, the median becomes removed from the mean, sometimes by a very considerable distance. The median is lower than the mean when there are high-rating individuals who stand apart from the mass of the population; and this is almost always true as regards nearly all types of human sexual activity. The distance between the median and the mean is a measure of the extent to which the frequency distribution for the population (the frequency curve) is skewed in the direction of higher activity (extends to the right of the area which includes the body of the population). When a large portion of a population falls into the zero class (is without activity) in any particular calculation, the median for that population is so lowered that it loses significance. If more than 50 per cent of the population falls into the zero class, the mean is in the zero class and is useless for any understanding of the situation. In the same instance, however, a median calculated on the active portion of the population may have significance.

Percents of Individual Outlet. On each history, calculations have been made showing (in percents) the portion of his total sexual outlet which the individual has derived from each possible source (masturbation, dreams, coitus, etc.). The. calculations have been made for the same fiveyear periods as were involved in the calculations of frequencies of total outlet.

Percents of Group Outlet. Similarly, frequency distributions have been plotted for these percents of outlets; and means, standard errors of the means, and medians have been routinely calculated on these percents for the total population and for the active portion of each population. When means are calculated in the usual way, the figures are the averages of all these percentages. When medians are calculated in this way, they show the percentage of the total outlet which the average individual derives from each of the possible sources. Neither of these calculations, however, answers the more usual question: “What percentage of the total orgasms of the population as a whole is derived from each kind of sexual activity?” In order to answer that question, it is necessary to compare the means of the absolute frequencies (not the percentage frequencies) for each type of out et in each group, with the mean of the absolute frequency of total outlet in the same group. The sum of the percentages so derived should total 100 per cent, which is the total outlet for the population.

Correlation Coefficients. At special points in the investigation, correlation coefficients and still other statistics have been calculated by standard procedures. Unless otherwise indicated the correlation coefficients represent the Pearsonian r, calculated by the formula:

image

In correlating data for which only two classes are possible, as with a yes or no situation, or with a record of presence or absence, the calculated coefficients represent the tetrachoric r derived from the tables published by Cheshire, Saffir, and Thurstone (1933).

Accumulative Incidence Curves. The one new statistical tool which we have had to develop for this study has been a curve which will show the number of persons who have ever had sexual experience of a particular sort up to any particular age of their lives. One of the questions most commonly asked is: “How many people do this— or t h a t ?” Specifically, it is “How many people masturbate?”—"How many people have homosexual experience?"—"What percentage of college students (or some other group) have intercourse before they marry?"—etc., etc. The question does not concern the number of persons having experience in any particular year (which is the active incidence figure), as often as it involves a question about the number of persons who ever have such experience in their lives, or in some portion of it. The answers usually given in both popular and technical literature are often incorrect because they are derived from curves based on cumulations of percentages. Such curves show the percentage increase of experienced persons (the increments) in each successive age group, the increments being totalled up to the end of the period of time under consideration. The cumulated percentages shown in Tables 28, 33, 3537, and in Figures 15, 26, 27, 29, covering data on the ages involved in adolescent developments among boys, are examples of such calculations. Such curves are known as integral curves or ogives (the two “are fundamentally the same,” according to Pearl 1940:143), and these are the curves that are ordinarily used in growth studies, learning studies, studies of social developments, etc.

But ogives are satisfactory only when the activity under consideration has involved a hundred percent of the population which is being studied, or when the histories of all the individuals in the study are concluded as far as that particular chapter in their lives is concerned. Cumulative percentage figures are quite sufficient in the cases cited above because all of the individuals on which they are based were adolescent when the data were gathered, i.e., a hundred percent of the population was ultimately involved, and all of the individuals had the experience (onset of adolescence) which was being studied. An ogive would be correctly used if the ages of first pre-marital intercourse were being studied, and the curve were based on persons all of whom were married. In an ogive, the size of the basic population is constant for each and every age group, since all of the persons are either experienced or past the age at which they could possibly begin experience, and the total sample in an ogive is the basis for calculating the percentage of experienced individuals at each particular age.

Ogives, however, do not answer the question when only a portion of a population is eligible for experience, or when the histories of any of the individuals are not complete at the time the data are gathered. For instance, if the question is one of determining how many people have extra-marital intercourse, the real issue concerns the number of married people who ever will have such experience before they die. This could be determined by the use of an ogive if all persons in the study had been married, and if all the histories were taken after each person had terminated his marriage by separation or divorce, or after he had died. But since that is not easily effected, a technique must be used which will show the number of experienced persons in each age group, in relation to the number of persons in each group who are eligible for such experience. This is the technique of the accumulative incidence curves which we have used in the present study.

image

Table 12. Form for calculation of an accumulative incidence curve

Starred columns (*) are derived from punch cards; other columns are calculations based on the starred columns. The curve derived from this table is shown in Figure 14.

Explanation of an Accumulative Incidence Curve

Coitus with Prostitutes

 Age.

  1. Age of first experience.

  2. Summation of Column 1. This is the ogive. It is based on the fictitious conception that the number of experienced persons in this population cannot be increased beyond the number now shown.

  3. Ages at reporting, of experienced individuals.

  4. Summation of Column 3, one step in advance. This represents the ages which the experienced individuals had not yet reached at the time they contributed their histories.

  5. Subtraction of Column 4 from Column 2. This represents the years actually lived by the experienced individuals.

  6. Ages at reporting, of inexperienced individuals.

  7. Addition of Columns 3 + 6. This is the age distribution of all subjects (both experienced and inexperienced) at time of reporting.

  8. Summation of Column 7, in reverse. This is the basic population for incidence calculations at each age.

  9. Division of Column 5 by Column 8. This is the percent of the population at each age with experience in that year, or in any previous year.

10. Increment, calculated from Column 9.

NOTE: Pre-adolescent experience was eliminated from this calculation by punching cards only for experience that had occurred after the onset of adolescence.

The problem was restricted at the lower ages by adolescence. If the problem had been restricted at the upper ages to a particular portion of the life span, e.g., to pre-marital years, Column 1 would have been corrected by sorting the experienced married individuals by age of marriage, and eliminating those whose first experience with prostitutes occurred after marriage. Column 3 would then have represented the sum of two groups of data: (1) the ages at marriage of the experienced population, and (2) the ages at reporting of the unmarried individuals who are experienced with prostitutes. Column 6 would then have represented the same sort of sum for the population which is not experienced with prostitutes.

Similarly, an accumulative incidence curve should be used when the ages of first pre-marital coital experience are to be determined for a population which contains some individuals who are not yet married. In such a problem, each point on the curve is based on a population which is independently calculated for each age. Each point is fixed by determining the number of persons in the sample who were not yet married, and by subtracting the persons who are no longer available for such experience because of marriage, or because the calculation has passed the ages at which those persons had contributed histories.

In order to build an accumulative incidence curve, two or more sets of data are needed on each individual involved in the study:

1. Age of first experience, for each subject.

2. Age of each subject at time of reporting.

3. In some cases, the age at which each subject became eligible for the sort of experience which is being studied (e.g., the age of adolescence, for the study of post-adolescent experience; the age of marriage, for the study of experience as a married person; etc.).

4. In some cases, the age at which each subject became ineligible for experience (e.g., the age at adolescence, as the end of the period at which the subject could have pre-adolescent experience; the age of marriage as an upper limit to pre-marital experience).

image

Figure 14. an accumulative incidence curve

Showing percentages of college males who have ever had intercourse with prostitutes by each of the indicated ages. Based on data in Table 12.

The derivation of an accumulative incidence curve was first worked out for a small sample by a hand manipulation of 1058 actual history sheets, adding them to piles as each individual became eligible, withdrawing them as each individual became ineligible for experience. It took some time to devise a procedure for Hollerith machine manipulation of punch cards on the problem, but a remarkably simple set-up has now been arrived at. It is shown in Figure 14). Since this seems to be a statistical procedure which has not been published before, it has seemed desirable to describe it at some length.

The usefulness of an accumulative incidence curve cannot be overemphasized. It supplies the answer to the commonest of questions: “How many people have such experience?” From such a curve, one may at a glance determine the percentage of the population which has ever had experience by any given age. At the same time, the curve gives the best possible basis for predicting what percentage of any group will ever, in its lifetime, have such experience. This use of the curve for making predictions is one of its most significant values. As already indicated, it is of prime concern in any research that the conclusions be extensible to wider areas than those covered by the particular sample which has been investigated, and accumulative incidence curves are the most effective tools for so translating data. An accumulative incidence curve can be built on data from subjects whose histories are not yet complete, and thus it utilizes a large body of data which is not available for building an ogive (which depends upon completed histories). An accumulative incidence curve is less accurate nearer its end, because the populations which establish the successive points on the curve become smaller in these upper age levels. However, the area in which the curve becomes unreliable is well enough indicated by the wider scatter of the individual points, which is in sharp contrast to the smooth trends in the more reliable portions of the curve.

In conclusion, it should be emphasized that, after all of this statistical manipulation, the calculations given in the present volume still should be taken as approximations which are not to be pushed in detail, although they undoubtedly show the general locus of the incidence and frequency figures, with plus or minus errors of some few percent. In the next chapter data will be given to show the size of the corrections that may need to be made.


* For definitions and explanations of the statistical terms used here, see later sections of the present chapter.