Transcription

TIBCO Spotfire S 8.2 Guide to Statistics, Volume 2November 2010TIBCO Software Inc.

IMPORTANT INFORMATIONSOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHERTIBCO SOFTWARE. USE OF SUCH EMBEDDED ORBUNDLED TIBCO SOFTWARE IS SOLELY TO ENABLE THEFUNCTIONALITY (OR PROVIDE LIMITED ADD-ONFUNCTIONALITY) OF THE LICENSED TIBCO SOFTWARE.THE EMBEDDED OR BUNDLED SOFTWARE IS NOTLICENSED TO BE USED OR ACCESSED BY ANY OTHERTIBCO SOFTWARE OR FOR ANY OTHER PURPOSE.USE OF TIBCO SOFTWARE AND THIS DOCUMENT ISSUBJECT TO THE TERMS AND CONDITIONS OF ALICENSE AGREEMENT FOUND IN EITHER A SEPARATELYEXECUTED SOFTWARE LICENSE AGREEMENT, OR, IFTHERE IS NO SUCH SEPARATE AGREEMENT, THECLICKWRAP END USER LICENSE AGREEMENT WHICH ISDISPLAYED DURING DOWNLOAD OR INSTALLATION OFTHE SOFTWARE (AND WHICH IS DUPLICATED IN TIBCOSPOTFIRE S LICENSES). USE OF THIS DOCUMENT ISSUBJECT TO THOSE TERMS AND CONDITIONS, ANDYOUR USE HEREOF SHALL CONSTITUTE ACCEPTANCEOF AND AN AGREEMENT TO BE BOUND BY THE SAME.This document contains confidential information that is subject toU.S. and international copyright laws and treaties. No part of thisdocument may be reproduced in any form without the writtenauthorization of TIBCO Software Inc.TIBCO Software Inc., TIBCO, Spotfire, TIBCO Spotfire S ,Insightful, the Insightful logo, the tagline "the Knowledge to Act,"Insightful Miner, S , S-PLUS, TIBCO Spotfire Axum,S ArrayAnalyzer, S EnvironmentalStats, S FinMetrics, S NuOpt,S SeqTrial, S SpatialStats, S Wavelets, S-PLUS Graphlets,Graphlet, Spotfire S FlexBayes, Spotfire S Resample, TIBCOSpotfire Miner, TIBCO Spotfire S Server, TIBCO Spotfire StatisticsServices, and TIBCO Spotfire Clinical Graphics are either registeredtrademarks or trademarks of TIBCO Software Inc. and/orsubsidiaries of TIBCO Software Inc. in the United States and/orother countries. All other product and company names and marksmentioned in this document are the property of their respectiveowners and are mentioned for identification purposes only. Thisii

software may be available on multiple operating systems. However,not all operating system platforms for a specific software version arereleased at the same time. Please see the readme.txt file for theavailability of this software version on a specific operating systemplatform.THIS DOCUMENT IS PROVIDED “AS IS” WITHOUTWARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,INCLUDING, BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF MERCHANTABILITY, FITNESS FOR APARTICULAR PURPOSE, OR NON-INFRINGEMENT. THISDOCUMENT COULD INCLUDE TECHNICALINACCURACIES OR TYPOGRAPHICAL ERRORS.CHANGES ARE PERIODICALLY ADDED TO THEINFORMATION HEREIN; THESE CHANGES WILL BEINCORPORATED IN NEW EDITIONS OF THIS DOCUMENT.TIBCO SOFTWARE INC. MAY MAKE IMPROVEMENTSAND/OR CHANGES IN THE PRODUCT(S) AND/OR THEPROGRAM(S) DESCRIBED IN THIS DOCUMENT AT ANYTIME.Copyright 1996-2010 TIBCO Software Inc. ALL RIGHTSRESERVED. THE CONTENTS OF THIS DOCUMENT MAYBE MODIFIED AND/OR QUALIFIED, DIRECTLY ORINDIRECTLY, BY OTHER DOCUMENTATION WHICHACCOMPANIES THIS SOFTWARE, INCLUDING BUT NOTLIMITED TO ANY RELEASE NOTES AND "READ ME" FILES.TIBCO Software Inc. Confidential InformationReferenceThe correct bibliographic reference for this document is as follows:TIBCO Spotfire S 8.2 Guide to Statistics Volume 2 TIBCO SoftwareInc.TechnicalSupportFor technical support, please visit http://spotfire.tibco.com/supportand register for a support account.iii

TIBCO SPOTFIRE S BOOKSNote about NamingThroughout the documentation, we have attempted to distinguish between the language(S-PLUS) and the product (Spotfire S ). “S-PLUS” refers to the engine, the language, and its constituents (that is objects,functions, expressions, and so forth). “Spotfire S ” refers to all and any parts of the product beyond the language, includingthe product user interfaces, libraries, and documentation, as well as general product andlanguage behavior. The TIBCO Spotfire S documentation includes books to addressyour focus and knowledge level. Review the following table to helpyou choose the Spotfire S book that meets your needs. These booksare available in PDF format in the following locations: In your Spotfire S installation directory (SHOME\help onWindows, SHOME/doc on UNIX/Linux). In the Spotfire S Workbench, from the Help 䉴 Spotfire S Manuals menu item. In Microsoft Windows , in the Spotfire S GUI, from theHelp 䉴 Online Manuals menu item. Spotfire S documentation.ivInformation you need if you.See the.Must install or configure your current installationof Spotfire S ; review system requirements.Installtion andAdministration GuideWant to review the third-party products includedin Spotfire S , along with their legal notices andlicenses.Licenses

Spotfire S documentation. (Continued)Information you need if you.See the.Are new to the S language and the Spotfire S GUI, and you want an introduction to importingdata, producing simple graphs, applying statisticalGetting StartedGuide models, and viewing data in Microsoft Excel .Are a new Spotfire S user and need how to useSpotfire S , primarily through the GUI.User’s GuideAre familiar with the S language and Spotfire S ,and you want to use the Spotfire S plug-in, orcustomization, of the Eclipse IntegratedDevelopment Environment (IDE).Spotfire S WorkbenchUser’s GuideHave used the S language and Spotfire S , andyou want to know how to write, debug, andprogram functions from the Commands window.Programmer’s GuideAre familiar with the S language and Spotfire S ,and you want to extend its functionality in yourown application or within Spotfire S .ApplicationDeveloper’s GuideAre familiar with the S language and Spotfire S ,and you are looking for information about creatingor editing graphics, either from a Commandswindow or the Windows GUI, or using SpotfireS supported graphics devices.Guide to GraphicsAre familiar with the S language and Spotfire S ,and you want to use the Big Data library to importand manipulate very large data sets.Big DataUser’s GuideWant to download or create Spotfire S packagesfor submission to the Comprehensive S-PLUSArchive Network (CSAN) site, and need to knowthe steps.Guide to Packagesv

Spotfire S documentation. (Continued)viInformation you need if you.See the.Are looking for categorized information aboutindividual S-PLUS functions.Function GuideIf you are familiar with the S language andSpotfire S , and you need a reference for therange of statistical modelling and analysistechniques in Spotfire S . Volume 1 includesinformation on specifying models in Spotfire S ,on probability, on estimation and inference, onregression and smoothing, and on analysis ofvariance.Guide to Statistics,Vol. 1If you are familiar with the S language andSpotfire S , and you need a reference for therange of statistical modelling and analysistechniques in Spotfire S . Volume 2 includesinformation on multivariate techniques, time seriesanalysis, survival analysis, resampling techniques,and mathematical computing in Spotfire S .Guide to Statistics,Vol. 2

GUIDE TO STATISTICS CONTENTS OVERVIEWVolume 1Chapter 1IntroductionIntroduction to Statistical Analysisin Spotfire S 1Chapter 2 Specifying Models in Spotfire S 27Chapter 3Probability49Chapter 4Descriptive Statistics93Estimation and Chapter 5 Statistical Inference for One- andTwo-Sample ProblemsInferenceChapter 6Chapter 7Chapter 8Goodness of Fit TestsStatistical Inference for Counts andProportionsCross-Classified Data and ContingencyTablesChapter 9 Power and Sample SizeRegression and Chapter 10 Regression and Smoothing forContinuous Response DataSmoothing117159181203221235Chapter 11 Robust Regression331Chapter 12379Generalizing the Linear ModelChapter 13 Local Regression Models433Chapter 14ModelsLinear and Nonlinear Mixed-Effects461Chapter 15Nonlinear Models525vii

Contents OverviewAnalysis ofVarianceChapter 16 Designed Experiments and Analysisof Variance567Chapter 17Further Topics in Analysis of Variance617Chapter 18Multiple Comparisons673Index, Volume 1Volume 2Chapter 19Classification and Regression TreesMultivariateTechniquesChapter 20Principal Components AnalysisSurvivalAnalysisviii699137Chapter 21 Factor Analysis65Chapter 22Discriminant Analysis83Chapter 23Cluster Analysis107Chapter 24Hexagonal Binning153Chapter 25Analyzing Time Series and Signals163Chapter 26Overview of Survival Analysis235Chapter 27 Estimating Survival249Chapter 28The Cox Proportional Hazards Model271Chapter 29Parametric Regression in SurvivalModels347Chapter 30Life Testing377Chapter 31Expected Survival415

Contents OverviewOther TopicsChapter 32Quality Control Charts443Chapter 33Resampling Techniques: Bootstrapand Jackknife475Chapter 34 Mathematical Computing inSpotfire S 501Index, Volume 2543ix

Contents Overviewx

CONTENTSChapter 19Classification and Regression Trees1Introduction2Growing Trees4Displaying Trees10Prediction and Residuals13Missing Data14Pruning and Shrinking17Graphically Interacting with Trees23References36Chapter 20Principal Components Analysis37Introduction38Calculating Principal Components40Principal Component Loadings44Principal Components Analysis Using Correlation47Estimating the Model Using a Covariance or CorrelationMatrix50Excluding Principal Components54Prediction: Principal Component Scores58Analyzing Principal Components Graphically60References63Chapter 21 Factor AnalysisIntroduction6566xi

ContentsEstimating the Model68Estimating the Model Using Maximum Likelihood71Estimating the Model Using a Covariance or CorrelationMatrix72Rotating Factors75Visualizing the Factor Solution78Prediction: Factor Analysis Scores80References82Chapter 2283Introduction84A Simple Example85Models87Hypothesis Testing92Estimation93Prediction96Error Analysis101References106Chapter 23Cluster Analysis107Introduction108Data and Dissimilarities109Partitioning Methods115Hierarchical Methods130Cluster Library Architecture147References151Chapter 24xiiDiscriminant AnalysisHexagonal Binning153Introduction154The Appeal of Hexagonal Binning155

ContentsReferencesChapter 25161Analyzing Time Series and Signals163Introduction165Autocorrelation in Series Data166Autoregression Methods175Univariate ARIMA Modeling186Long Memory Time Series Modeling199Spectral Analysis203Linear Filters214Robust Methods222References232Chapter 26Overview of Survival Analysis235Introduction236Overview of Spotfire S Functions237Missing Values245References247Chapter 27 Estimating Survival249Introduction250Kaplan-Meier Estimator252Nelson and Fleming-Harrington Estimators255Variance Estimation258Mean and Median Survival262Comparison of Survival Curves264More on survfit266References269xiii

ContentsChapter 28Model271Introduction273Hypothesis Tests279Stratification282Residuals285Using the Counting Process Notation298More Detailed Examples302Penalized Cox Models311Frailty Models322Additional Technical Details327References344Chapter 29ModelsParametric Regression in Survival347Introduction348Strata350Specifying a Distribution352Residuals353Predicted Values357Fitting the Model363Distributions368A Final Example373References376Chapter 30xivThe Cox Proportional HazardsLife Testing377Introduction378The Generalized Kaplan-Meier Estimate381Parametric Survival Models392Comparing Parametric Survival Models404

ContentsPlots for Parametric Survival Models406Computing Probabilities and Quantiles412References414Chapter 31Expected Survival415Introduction416Individual Expected Survival418Cohort Expected Survival419Approximations424Testing425Computing Expected Survival Curves428Examples429Creating Rate Tables436References441Chapter 32Quality Control Charts443Introduction444Control Chart Objects446Shewhart Charts450Cusum Charts460Extensions to Shewhart Charts466Process Capability467Process Monitoring469References473Chapter 33 Resampling Techniques: Bootstrapand Jackknife475Introduction476Creating a Resample Object479Methods for Resample Objects483xv

ContentsPercentile Estimates485Jackknife After Bootstrap486Examples487References500Chapter 34 Mathematical Computing inSpotfire S Introduction503Arithmetic Operations504Complex Arithmetic508Elementary Functions509Vector and Matrix Computations511Solving Systems of Linear Equations514Eigenvalues and Eigenvectors519Integrals, Differences, and Derivatives520Interpolation and Approximation522Initial Value Problems526The Fast Fourier Transform531Probability and Random Numbers534Primes and Factors538A Note on Computational Accuracy540References541Indexxvi501543

PrefacePREFACEIntroductionWelcome to the Spotfire S 8 Guide to Statistics, Volume 2.This book is designed as a reference tool for Spotfire S users whowant to use the powerful statistical techniques in Spotfire S . TheGuide to Statistics, Volume 2 covers a wide range of statistical andmathematical modeling. No single user is likely to tap all of theseresources, since advanced topics such as survival analysis and timeseries are complete fields of study in themselves.All examples in this guide are run using input through theCommands window, which is the traditional method of accessing thepower of Spotfire S . Many of the functions can also be run throughthe Statistics dialogs available in the graphical user interface. Wehope that you find this book a valuable aid for exploring both thetheory and practice of statistical modeling.Online VersionThe Guide to Statistics, Volume 2 is also available online. On Microsoft Windows , from the Help 䉴 Online Manuals menu,and in the /help/statman2.pdf file of your S HOME directory.On Solaris/Linux, in the /doc/statman2.pdf file of your S HOMEdirectory.You can open and view this file using Adobe Acrobat Reader, whichis required for reading all online manuals shipped with Spotfire S .The online version of the Guide to Statistics, Volume 2 has particularadvantages over print. For example, you can copy and paste exampleS-PLUS code into the Commands window and run it without havingto type the function calls explicitly. (When doing this, be careful notto paste t