MASS

This is the main package of Venables and Ripley’s MASS.

FunctionDescription
NullGiven a matrix M, finds a matrix N giving a basis for the null space. That is, t(N) \%*\% M is the 0, and N has the maximum number of linearly independent columns.
ShepardOne form of nonmetric multidimensional scaling.
addtermTries fitting all models that differ from the current model by adding a single term from those supplied, maintaining marginality.
areaIntegrates a function of one variable over a finite range using a recursive adaptive method. This function is mainly for demonstration purposes.
as.fractionsFinds rational approximations to the components of a real numeric object using a standard continued fraction method.
bandwidth.nrdA well-supported rule of thumb for choosing the bandwidth of a Gaussian kernel density estimator.
bcvUses biased cross-validation to select the bandwidth of a Gaussian kernel density estimator.
boxcoxComputes and optionally plots profile log-likelihoods for the parameter of the Box-Cox power transformation.
con2trConverts lists to data frames for use by lattice.
contr.sdifA coding for unordered factors based on successive differences.
correspFinds the principal canonical correlation and corresponding row and column scores from a correspondence analysis of a two-way contingency table.
cov.mcd, cov.mve, cov.robCompute a multivariate location and scale estimate with a high breakdown point. This can be thought of as estimating the mean and covariance of the good part of the data. cov.mve and cov.mcd are compatibility wrappers.
cov.trobEstimates a covariance or correlation matrix assuming the data came from a multivariate t-distribution: this provides some degree of robustness to outliers without giving a high breakdown point.
denumerateloglm allows dimension numbers to be used in place of names in the formula. denumerate modifies such a formula into one that terms can process.
dose.pCalibrates binomial assays, generalizing the calculation of LD50.
droptermTries fitting all models that differ from the current model by dropping a single term, maintaining marginality.
eqscplotVersion of a scatter plot with scales chosen to be equal on both axes, i.e.,1 cm represents the same units on each.
fitdistrMaximum likelihood fitting of univariate distributions, allowing parameters to be held fixed, if desired.
fractionsFinds rational approximations to the components of a real numeric object using a standard continued fraction method.
gamma.dispersionA front end to gamma.shape for convenience. Finds the reciprocal of the estimate of the shape parameter only.
gamma.shapeFinds the maximum likelihood estimate of the shape parameter of the gamma distribution after fitting a Gamma generalized linear model.
ginvCalculates the Moore-Penrose generalized inverse of a matrix X.
glm.convertModifies an output object from glm.nb() to one that looks like the output from glm() with a negative binomial family. This allows it to be updated keeping the theta parameter fixed.
glm.nbA modification of the system function glm() to include estimation of the additional parameter, theta, for a negative binomial generalized linear model.
glmmPQLFits a generalized linear mixed model (GLMM) with multivariate normal random effects, using penalized quasi-likelihood.
hist.FDPlots a histogram with automatic bin width selection, using the Scott or Freedman-Diaconis formula.
hist.scottPlots a histogram with automatic bin width selection, using the Scott or Freedman-Diaconis formula.
huberFinds the Huber M-estimator of location with the median absolute deviation (MAD) scale.
hubersFinds the Huber M-estimator for location with scale specified, scale with location specified, or both if neither is specified.
is.fractionsFinds rational approximations to the components of a real numeric object using a standard continued fraction method.
isoMDSOne form of nonmetric multidimensional scaling.
kde2dTwo-dimensional kernel density estimation with an axis-aligned bivariate normal kernel, evaluated on a square grid.
ldaLinear discriminant analysis.
ldahistPlots histograms or density plots of data on a single Fisher linear discriminant.
lm.glsFits linear models by generalized least squares.
lm.ridgeFits a linear model by ridge regression.
lmsregFits a regression to the good points in the data set, thereby achieving a regression estimator with a high breakdown point. (lmsreg is a compatibility wrapper for lqs.)
lmworkThe standardized residuals. These are normalized to unit variance, fitted including the current data point.
loglmProvides a front end to the standard function, loglin, to allow log-linear models to be specified and fitted in a manner similar to that of other fitting functions, such as glm.
logtransFinds and optionally plots the marginal (profile) likelihood for alpha for a transformation model of the form log(y + alpha) ~ x1 + x2 + ....
lqs, lqs.formulaFit a regression to the good points in the data set, thereby achieving a regression estimator with a high breakdown point. lmsreg and ltsreg are compatibility wrappers.
ltsregA compatibility wrapper for lqs.
mcaComputes a multiple-correspondence analysis of a set of factors.
mvrnormProduces one or more samples from the specified multivariate normal distribution.
negative.binomialSpecifies the information required to fit a negative binomial generalized linear model, with known theta parameter, using glm().
parcoordParallel coordinates plot.
polrFits a logistic or probit regression model to an ordered factor response. The default logistic case is proportional odds logistic regression, after which the function is named.
psi.bisquare, psi.hampel, psi.huberPsi functions for rlm.
qdaFits quadratic discriminant analysis models.
rationalFinds rational approximations to the components of a real numeric object using a standard continued fraction method.
renumeratedenumerate converts a formula written using the conventions of loglm into one that terms is able to process. renumerate converts it back again to a form like the original.
rlmFits a linear model by robust regression using an M-estimator.
rms.curvCalculates the root mean square parameter effects and intrinsic relative curvatures, cθ and cι, for a fitted nonlinear regression.
rnegbinGenerates random outcomes from a negative binomial distribution, with mean mu and variance mu + mu^2/theta.
sammonOne form of nonmetric multidimensional scaling.
selectFits a linear model by ridge regression.
stdresThe standardized residuals. These are normalized to unit variance, fitted including the current data point.
stepAICPerforms stepwise model selection by AIC.
studresExtracts the Studentized residuals from a linear model.
theta.md, theta.ml, theta.mmGiven the estimated mean vector, estimate theta of the negative binomial distribution.
truehistCreates a histogram on the current graphics device.
ucvUses unbiased cross-validation to select the bandwidth of a Gaussian kernel density estimator.
width.SJUses the method of Sheather and Jones to select the bandwidth of a Gaussian kernel density estimator.
write.matrixWrites a matrix or data frame to a file or the console, using column labels and a layout respecting columns.
Data SetClassDescription
Aids2data.frameData on patients diagnosed with AIDS in Australia before July 1, 1991.
Animalsdata.frameAverage brain and body weights for 28 species of land animals.
Bostondata.frameThe Boston data frame has 506 rows and 14 columns.
Cars93data.frameThe Cars93 data frame has 93 rows and 27 columns.
Cushingsdata.frameCushing’s syndrome is a hypertensive disorder associated with oversecretion of cortisol by the adrenal gland. The observations are urinary excretion rates of two steroid metabolites.
DDTnumericA numeric vector of 15 measurements by different laboratories of the pesticide DDT in kale, in ppm (parts per million), using the multiple pesticide residue measurement.
GAGurinedata.frameData was collected on the concentration of the chemical glycosaminoglycan (GAG) in the urine of 314 children aged 0 to 17 years. The aim of the study was to produce a chart to help a pediatrician to assess if a child’s GAG concentration is “normal.”
Insurancedata.frameThe data given in data frame Insurance consists of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders in the third quarter of 1973.
Melanomadata.frameThe Melanoma data frame has data on 205 patients in Denmark with malignant melanoma.
OMEdata.frameExperiments were performed on children on their ability to differentiate a signal in broadband noise. The noise was played from a pair of speakers, and a signal was added to just one channel; the subject had to turn his/her head to the channel with the added signal. The signal was either coherent (the amplitude of the noise was increased for a period) or incoherent (independent noise was added for the same period to form the same increase in power). The threshold used in the original analysis was the stimulus loudness needed to get 75% correct responses. Some of the children had suffered from otitis media with effusion (OME).
Pima.tedata.frameA population of women who were at least 21 years old, of Pima Indian heritage, and living near Phoenix, Arizona, was tested for diabetes according to World Health Organization criteria. The data was collected by the National Institute of Diabetes and Digestive and Kidney Diseases. A total of 532 complete records were used, after dropping the (mainly missing) data on serum insulin.
Pima.trdata.frameA population of women who were at least 21 years old, of Pima Indian heritage, and living near Phoenix, Arizona, was tested for diabetes according to World Health Organization criteria. The data was collected by the National Institute of Diabetes and Digestive and Kidney Diseases. A total of 532 complete records were used, after dropping the (mainly missing) data on serum insulin.
Pima.tr2data.frameA population of women who were at least 21 years old, of Pima Indian heritage, and living near Phoenix, Arizona, was tested for diabetes according to World Health Organization criteria. The data was collected by the National Institute of Diabetes and Digestive and Kidney Diseases. A total of 532 complete records were used, after dropping the (mainly missing) data on serum insulin.
Rabbitdata.frameFive rabbits were studied on two occasions after treatment with saline (control) and after treatment with the 5-HT_3 antagonist MDL 72222. After each treatment, ascending doses of phenylbiguanide were injected intravenously at 10-minute intervals and the responses of mean blood pressure measured. The goal was to test whether the cardiogenic chemoreflex elicited by phenylbiguanide depends on the activation of 5-HT_3 receptors.
Rubberdata.frameData frame from accelerated testing of tire rubber.
SP500numericReturns of the Standard & Poor’s 500 Index in the 1990s.
Sitkadata.frameThe Sitka data frame has 395 rows and 4 columns. It gives repeated measurements on the log size of 79 Sitka spruce trees, 54 of which were grown in ozone-enriched chambers and 25 of which were controls. The size was measured five times in 1988, at roughly monthly intervals.
Sitka89data.frameThe Sitka89 data frame has 632 rows and 4 columns. It gives repeated measurements on the log size of 79 Sitka spruce trees, 54 of which were grown in ozone-enriched chambers and 25 of which were controls. The size was measured eight times in 1989, at roughly monthly intervals.
Skyedata.frameThe Skye data frame has 23 rows and 3 columns.
Trafficdata.frameAn experiment was performed in Sweden in 1961–1962 to assess the effect of a speed limit on the highway accident rate. The experiment was conducted on 92 days in each year, matched so that day j in 1962 was comparable to day j in 1961. On some days, the speed limit was in effect and enforced, while on other days there was no speed limit and cars tended to be driven faster. The speed limit days tended to be in contiguous blocks.
UScerealdata.frameThe UScereal data frame has 65 rows and 11 columns. The data comes from the 1993 American Statistical Association (ASA) Statistical Graphics Exposition and is taken from the mandatory Food and Drug Administration (FDA) food label. The data has been normalized here to a portion of 1 American cup.
UScrimedata.frameCriminologists are interested in the effect of punishment regimes on crime rates. This has been studied using the aggregate data on 47 states of the United States for 1960 given in this data frame. The variables seem to have been rescaled to convenient numbers.
VAdata.frameVeteran’s Administration lung cancer trial from Kalbfleisch and Prentice.
abbeynumericA numeric vector of 31 determinations of nickel content (ppm) in a Canadian syenite rock.
accdeathstsA regular time series giving the monthly totals of accidental deaths in the United States.
anorexiadata.frameThe anorexia data frame has 72 rows and 3 columns. Weight change data for young female anorexia patients.
bacteriadata.frameTests of the presence of the bacteria H. influenzae in children with otitis media in the Northern Territory of Australia.
beav1data.frameReynolds describes a small part of a study of the long-term temperature dynamics of the beaver (Castor canadensis) in north-central Wisconsin. Body temperature was measured by telemetry every 10 minutes for four females, but data from a period of less than a day for each of two animals is used here.
beav2data.frameReynolds describes a small part of a study of the long-term temperature dynamics of the beaver (Castor canadensis) in north-central Wisconsin. Body temperature was measured by telemetry every 10 minutes for four females, but data from a period of less than a day for each of two animals is used here.
biopsydata.frameThis breast cancer database was obtained from the University of Wisconsin Hospitals, Madison, from Dr. William H. Wolberg. He assessed biopsies of breast tumors for 699 patients up to July 15, 1992; each of nine attributes has been scored on a scale of 1 to 10, and the outcome is also known. There are 699 rows and 11 columns.
birthwtdata.frameThe birthwt data frame has 189 rows and 10 columns. The data was collected at Baystate Medical Center, Springfield, Massachusetts, during 1986.
cabbagesdata.frameThe cabbages data set has 60 observations and 4 variables.
caithdata.frameData on the cross-classification of people in Caithness, Scotland, by eye and hair color. This region of the United Kingdom is particularly interesting, as there is a mixture of people of Nordic, Celtic, and Anglo-Saxon origin.
catsdata.frameThe heart and body weights of samples of male and female cats used for digitalis experiments. The cats were all adult, over 2 kg in body weight.
cementdata.frameExperiment on the heat evolved in the setting of each of 13 cements.
chemnumericA numeric vector of 24 determinations of copper in wholemeal flour, in parts per million.
coopdata.frameSeven specimens were sent to six laboratories in three separate batches and each analyzed for analyte. Each analysis was duplicated.
cpusdata.frameA relative performance measure and characteristics of 209 CPUs.
crabsdata.frameThe crabs data frame has 200 rows and 8 columns, describing 5 morphological measurements on 50 crabs, each of 2 color forms and both sexes, of the species Leptograpsus variegatus, collected at Fremantle, Western Australia.
deathstsA time series giving the monthly deaths from bronchitis, emphysema, and asthma in the United Kingdom, 1974–1979, for both sexes.
driverstsA regular time series giving the monthly totals of car drivers in Great Britain killed or seriously injured from January 1969 to December 1984. Compulsory wearing of seat belts was introduced on January 31, 1983.
eaglesdata.frameKnight and Skagen collected data during a field study on the foraging behavior of wintering bald eagles in Washington state. The data concerned 160 attempts by one (pirating) bald eagle to steal a chum salmon from another (feeding) bald eagle.
epildata.frameThall and Vail give a data set on 2-week seizure counts for 59 epileptics. The number of seizures was recorded for a baseline period of eight weeks, and then patients were randomly assigned to a treatment group or a control group. Counts were then recorded for four successive two-week periods. The subjects’ age is the only covariate.
farmsdata.frameThe farms data frame has 20 rows and 4 columns. The rows are farms on the Dutch island of Terschelling, and the columns are factors describing the management of grassland.
fgldata.frameThe fgl data frame has 214 rows and 10 columns. It was collected by B. German on fragments of glass collected in forensic work.
forbesdata.frameA data frame with 17 observations on the boiling point of water and barometric pressure, in inches of mercury.
galaxiesnumericA numeric vector of velocities, in kilometers/second, of 82 galaxies from 6 well-separated conic sections of an unfilled survey of the Corona Borealis region. Multimodality in such surveys is evidence for voids and superclusters in the far universe.
gehandata.frameA data frame from a trial of 42 leukemia patients. Some were treated with the drug 6-mercaptopurine, and the rest were controls. The trial was designed as matched pairs, both withdrawn from the trial when either came out of remission.
genotypedata.frameData from a foster feeding experiment with rat mothers and litters of four different genotypes: A, B, I and J. Rat litters were separated from their natural mothers at birth and given to foster mothers to rear.
geyserdata.frameA version of the eruptions data from the Old Faithful geyser in Yellowstone National Park, Wyoming. This version comes from Azzalini and Bowman and is of continuous measurement from August 1 to August 15, 1985. Some nocturnal duration measurements were coded as 2, 3, or 4 minutes, having originally been described as “short,” “medium,” or “long.”
gilgaisdata.frameThis data set was collected on a line transect survey in gilgai territory in New South Wales, Australia. Gilgais are natural gentle depressions in otherwise flat land, and sometimes they seem to be regularly distributed. The data collection was stimulated by the question: are these patterns reflected in soil properties? At each of 365 sampling locations on a linear grid of 4 meters, spacing, samples were taken at depths 0–10 cm, 30–40 cm, and 80–90 cm below the surface. pH, electrical conductivity, and chloride content were measured on a 1:5 soil:water extract from each sample.
hillsdata.frameThe record times in 1984 for 35 Scottish hill races.
housingdata.frameThe housing data frame has 72 rows and 5 variables.
immerdata.frameThe immer data frame has 30 rows and 4 columns. Five varieties of barley were grown in six locations in 1931 and in 1932.
leukdata.frameA data frame of data from 33 leukemia patients.
mammalsdata.frameA data frame with average brain and body weights for 62 species of land mammals.
mcycledata.frameA data frame giving a series of measurements of head acceleration in a simulated motorcycle accident; used to test crash helmets.
menarchedata.frameProportions of female children at various ages during adolescence who have reached menarche.
michelsondata.frameMeasurements of the speed of light in air, made between June 5, and July 2, 1879. The data consists of 5 experiments, each consisting of 20 consecutive runs. The response is the speed of light, in kilometers/second, less 299,000. The currently accepted value, on this scale of measurement, is 734.5.
minn38data.frameMinnesota high school graduates of 1938 were classified according to four factors. The minn38 data frame has 168 rows and 5 columns.
motorsdata.frameThe motors data frame has 40 rows and 3 columns. It describes an accelerated life test at each of four temperatures of 10 motorettes and has rather discrete times.
muscledata.frameThe purpose of this experiment was to assess the influence of calcium in solution on the contraction of heart muscle in rats. The left auricle of 21 rat hearts was isolated, and on several occasions a constant-length strip of tissue was electrically stimulated and dipped into various concentrations of calcium chloride solution, after which the shortening of the strip was accurately measured as the response.
newcombnumericA numeric vector giving the “Third Series” of measurements of the passage time of light recorded by Newcomb in 1882. The given values divided by 1,000 plus 24 give the time, in millionths of a second, for light to traverse a known distance. The “true” value is now considered to be 33.02.
nlschoolsdata.frameSnijders and Bosker use as a running example a study of 2,287 eighth-grade pupils (aged about 11) in 132 classes in 131 schools in the Netherlands. Only the variables used in their examples are supplied.
npkdata.frameA classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on six blocks. Each half of a fractional factorial design confounding the NPK interaction was used on three of the plots.
npr1data.frameData on the locations, porosity, and permeability (a measure of oil flow) on 104 oil wells in the U.S. Naval Petroleum Reserve No. 1 in California.
oatsdata.frameThe yield of oats from a split-plot field trial using three varieties and four levels of manurial treatment. The experiment was laid out in six blocks of three main plots, each split into four subplots. The varieties were applied to the main plots and the manurial treatments to the subplots.
paintersdata.frameThe subjective assessment, on an integer scale of 0 to 20, of 54 classical painters. The painters were assessed on four characteristics: composition, drawing, color, and expression. The data is due to the 18th-century art critic, de Piles.
petroldata.frameThe yield of a petroleum refining process with four covariates. The crude oil appears to come from only 10 distinct samples. This data was originally used by Prater to build an estimation equation for the yield of the refining process of crude oil to gasoline.
phoneslistA list object with the annual number of telephone calls in Belgium.
quinedata.frameThe quine data frame has 146 rows and 5 columns. Children from Walgett, New South Wales, Australia, were classified by culture, age, sex, and learner status, and the number of days absent from school in a particular school year was recorded.
roaddata.frameA data frame with the annual deaths in road accidents for half the U.S. states.
rotiferdata.frameThe data give the numbers of rotifers falling out of suspension for different fluid densities.
shipsdata.frameData frame giving the number of damage incidents and aggregate months of service by ship type, year of construction, and period of operation.
shoeslistA list of two vectors, giving the wear of shoes of materials A and B for one foot each of 10 boys.
shrimpnumericA numeric vector with 18 determinations by different laboratories of the amount (percentage of the declared total weight) of shrimp in shrimp cocktail.
shuttledata.frameThe shuttle data frame has 256 rows and 7 columns. The first six columns are categorical variables giving example conditions; the seventh is the decision. The first 253 rows are the training set, the last 3 the test conditions.
snailsdata.frameGroups of 20 snails were held for periods of 1, 2, 3, or 4 weeks under carefully controlled conditions of temperature and relative humidity. There were two species of snail, A and B, and the experiment was designed as a 4-by-3-by-4-by-2 completely randomized design. At the end of the exposure time, the snails were tested to see if they had survived; the process itself is fatal for the animals. The object of the exercise was to model the probability of survival in terms of the stimulus variables and, in particular, to test for differences among species. The data are unusual in that, in most cases, fatalities during the experiment were fairly small.
steamdata.frameTemperature and pressure in a saturated steam-driven experimental device.
stormerdata.frameThe stormer viscometer measures the viscosity of a fluid by measuring the time taken for an inner cylinder in the mechanism to perform a fixed number of revolutions in response to an actuating weight. The viscometer is calibrated by measuring the time taken with varying weights while the mechanism is suspended in fluids of accurately known viscosity. The data comes from such a calibration, and theoretical considerations suggest a nonlinear relationship among time, weight, and viscosity of the form Time = (B1 * Viscosity)/(Weight - B2) + E, where B1 and B2 are unknown parameters to be estimated, and E is error.
surveydata.frameThis data frame contains the responses of 237 Statistics I students at the University of Adelaide to a number of questions.
synth.tedata.frameThe synth.tr data frame has 250 rows and 3 columns. The synth.te data frame has 100 rows and 3 columns. It is intended that synth.tr be used for training and synth.te for testing.
synth.trdata.frameThe synth.tr data frame has 250 rows and 3 columns. The synth.te data frame has 100 rows and 3 columns. It is intended that synth.tr be used for training and synth.te for testing.
topodata.frameThe topo data frame has 52 rows and 3 columns of topographic heights within a 310-foot square.
wadersdata.frameThe waders data frame has 15 rows and 19 columns. The entries are counts of waders in summer.
whitesidedata.frameDerek Whiteside of the UK Building Research Station recorded the weekly gas consumption and average external temperature at his own house in southeast England for two heating seasons, one of 26 weeks before, and one of 30 weeks after cavity-wall insulation was installed. The object of the exercise was to assess the effect of the insulation on gas consumption.
wtlossdata.frameThis data frame gives the weight, in kilograms, of an obese patient at 52 time points over an 8-month period of a weight rehabilitation program.