R for Stata Users

R for Stata Users

von: Robert A. Muenchen, Joseph M. Hilbe

Springer-Verlag, 2010

ISBN: 9781441913180

Sprache: Englisch

549 Seiten, Download: 7459 KB

 
Format:  PDF, auch als Online-Lesen

geeignet für: Apple iPad, Android Tablet PC's Online-Lesen PC, MAC, Laptop


 

eBook anfordern

Mehr zum Inhalt

R for Stata Users



  Preface 6  
  Contents 10  
  List of Tables 20  
  List of Figures 22  
  1 Introduction 26  
     1.1 Overview 26  
     1.2 Similarities Between R and Stata 27  
     1.3 Why Learn R? 28  
     1.4 Is R Accurate? 29  
     1.5 What About Tech Support? 29  
     1.6 Getting Started Quickly 30  
     1.7 Programming Conventions 30  
     1.8 Typographic Conventions 31  
  2 Installing and Updating R 33  
     2.1 Installing Add-on Packages 34  
     2.2 Loading an Add-on Package 34  
     2.3 Updating Your Installation 38  
     2.4 Uninstalling R 39  
     2.5 Choosing Repositories 39  
     2.6 Accessing Data in Packages 41  
  3 Running R 43  
     3.1 Running R Interactively on Windows 43  
     3.2 Running R Interactively on Macintosh 45  
     3.3 Running R Interactively on Linux or UNIX 47  
     3.4 Running Programs That Include Other Programs 49  
     3.5 Running R in Batch Mode 49  
     3.6 Graphical User Interfaces 50  
        3.6.1 R Commander 50  
        3.6.2 Rattle for Data Mining 53  
        3.6.3 JGR Java GUI for R 54  
  4 Help and Documentation 60  
     4.1 Introduction 60  
     4.2 Help Files 60  
     4.3 Starting Help 60  
     4.4 Help Examples 62  
     4.5 Help for Functions That Call Other Functions 63  
     4.6 Help for Packages 64  
     4.7 Help for Data Sets 65  
     4.8 Books and Manuals 65  
     4.9 E-mail Lists 65  
     4.10 Searching the Web 66  
     4.11 Vignettes 66  
  5 Programming Language Basics 68  
     5.1 Introduction 68  
     5.2 Simple Calculations 69  
     5.3 Data Structures 70  
        5.3.1 Vectors 70  
        5.3.2 Factors 74  
        5.3.3 Data Frames 79  
        5.3.4 Matrices 83  
        5.3.5 Arrays 86  
        5.3.6 Lists 86  
     5.4 Saving Your Work 90  
     5.5 Comments to Document Your Programs 92  
     5.6 Controlling Functions (Commands) 93  
        5.6.1 Controlling Functions with Arguments 93  
        5.6.2 Controlling Functions with Formulas 95  
        5.6.3 Controlling Functions with an Object's Class 96  
        5.6.4 Controlling Functions with Extractor Functions 98  
     5.7 How Much Output is There? 100  
     5.8 Writing Your Own Functions (Macros) 104  
     5.9 R Program Demonstrating Programming Basics 107  
  6 Data Acquisition 114  
     6.1 The R Data Editor 114  
     6.2 Reading Delimited Text Files 116  
        6.2.1 Reading Comma-Delimited Text Files 117  
        6.2.2 Reading Tab-Delimited Text Files 118  
        6.2.3 Missing Values for Character Variables 120  
        6.2.4 Trouble with Tabs 121  
        6.2.5 Skipping Variables in Delimited Files 122  
        6.2.6 Example Programs for Reading Delimited TextFiles 123  
     6.3 Reading Text Data Within a Program 125  
        6.3.1 The Easy Approach 125  
        6.3.2 The More General Approach 127  
        6.3.3 Example Programs for Reading Text Data Within a Program 127  
     6.4 Reading Fixed-Width Text Files, One Record per Case 129  
        6.4.1 Macro Substitution 132  
        6.4.2 Example Programs for Reading Fixed-Width Text Files, One Record Per Case 133  
     6.5 Reading Fixed-Width Text Files, Two or More Records per Case 134  
        6.5.1 Example Programs to Read Fixed-Width Text Files with Two Records per Case 135  
     6.6 Importing Data from Stata into R 136  
        6.6.1 R Program to Import Data from Stata 137  
     6.7 Writing Data to a Comma-Delimited Text File 137  
        6.7.1 Example Programs for Writing a Comma-Delimited File 138  
     6.8 Exporting Data from R to Stata 139  
  7 Selecting Variables 141  
     7.1 Selecting Variables in Stata 141  
     7.2 Selecting All Variables 142  
     7.3 Selecting Variables Using Index Numbers 142  
     7.4 Selecting Variables Using Column Names 145  
     7.5 Selecting Variables Using Logic 146  
     7.6 Selecting Variables Using String Search 148  
     7.7 Selecting Variables Using $ Notation 150  
     7.8 Selecting Variables Using Component Names 151  
        7.8.1 The attach Function 151  
        7.8.2 The with Function 152  
        7.8.3 Using Component Names in Formulas 152  
     7.9 Selecting Variables with the subset Function 153  
     7.10 Selecting Variables Using List Index 154  
     7.11 Generating Indexes A to Z from Two Variable Names 154  
     7.12 Saving Selected Variables to a New Dataset 155  
     7.13 Example Programs for Variable Selection 156  
        7.13.1 Stata Program to Select Variables 156  
        7.13.2 R Program to Select Variables 156  
  8 Selecting Observations 161  
     8.1 Selecting Observations in Stata 161  
     8.2 Selecting All Observations 162  
     8.3 Selecting Observations Using Index Numbers 162  
     8.4 Selecting Observations Using Row Names 165  
     8.5 Selecting Observations Using Logic 167  
     8.6 Selecting Observations Using String Search 170  
     8.7 Selecting Observations Using the subset Function 172  
     8.8 Generating Indexes A to Z from Two Row Names 173  
     8.9 Variable Selection Methods with No Counterpart for Selecting Observations 174  
     8.10 Saving Selected Observations to a New Data Frame 174  
     8.11 Example Programs for Selecting Observations 174  
        8.11.1 Stata Program to Select Observations 175  
        8.11.2 R Program to Select Observations 175  
  9 Selecting Variables and Observations 179  
     9.1 The subset Function 179  
     9.2 Selecting Observations by Logic and Variables by Name 180  
     9.3 Using Names to Select Both Observations and Variables 181  
     9.4 Using Numeric Index Values to Select Both Observations and Variables 182  
     9.5 Using Logic to Select Both Observations and Variables 183  
     9.6 Saving and Loading Subsets 184  
     9.7 Example Programs for Selecting Variables and Observations 184  
        9.7.1 Stata Program for Selecting Variables and Observations 184  
        9.7.2 R Program for Selecting Variables and Observations 185  
  10 Data Management 189  
     10.1 Transforming Variables 189  
        10.1.1 Example Programs for Transforming Variables 193  
     10.2 Functions or Commands? The apply Function Decides 194  
        10.2.1 Applying the mean Function 195  
        10.2.2 Finding N or NVALID 198  
        10.2.3 Example Programs for Applying StatisticalFunctions 200  
     10.3 Conditional Transformations 202  
        10.3.1 Example Programs for ConditionalTransformations 204  
     10.4 Multiple Conditional Transformations 205  
        10.4.1 Example Programs for Multiple Conditional Transformations 207  
     10.5 Missing Values 208  
        10.5.1 Substituting Means for Missing Values 210  
        10.5.2 Finding Complete Observations 211  
        10.5.3 When ``99'' Has Meaning 212  
        10.5.4 Example Programs to Assign Missing Values 214  
     10.6 Renaming Variables (and Observations) 216  
        10.6.1 Renaming Variables---Advanced Examples 218  
        10.6.2 Renaming by Index 219  
        10.6.3 Renaming by Column Name 220  
        10.6.4 Renaming Many Sequentially Numbered Variable Names 221  
        10.6.5 Renaming Observations 222  
        10.6.6 Example Programs for Renaming Variables 222  
     10.7 Recoding Variables 226  
        10.7.1 Recoding a Few Variables 227  
        10.7.2 Recoding Many Variables 227  
        10.7.3 Example Programs for Recoding Variables 230  
     10.8 Keeping and Dropping Variables 231  
        10.8.1 Example Programs for Keeping and Dropping Variables 232  
     10.9 Stacking/Appending Data Sets 232  
        10.9.1 Example Programs for Stacking/AppendingData Sets 235  
     10.10 Joining/Merging Data Sets 236  
        10.10.1 Example Programs for Joining/Merging Data Sets 239  
     10.11 Creating Collapsed or Aggregated Data Sets 241  
        10.11.1 The aggregate Function 241  
        10.11.2 The tapply Function 243  
        10.11.3 Merging Aggregates with Original Data 244  
        10.11.4 Tabular Aggregation 246  
        10.11.5 The reshape Package 248  
        10.11.6 Example Programs for Collapsing/AggregatingData 248  
     10.12 By or Split-File Processing 250  
        10.12.1 Comparing Summarization Methods 254  
        10.12.2 Example Programs for By or Split-file Processing 255  
     10.13 Removing Duplicate Observations 256  
        10.13.1 Example Programs for Removing Duplicate Observations 258  
     10.14 Selecting First or Last Observations per Group 259  
        10.14.1 Example Programs for Selecting Last Observation per Group 261  
     10.15 Reshaping Variables to Observations and Back 262  
        10.15.1 Example Programs for Reshaping Variables to Observations and Back 264  
     10.16 Sorting Data Frames 265  
        10.16.1 Example Programs for Sorting Data Sets 268  
     10.17 Converting Data Structures 269  
        10.17.1 Converting from Logical to Numeric Indexand Back 272  
  11 Enhancing Your Output 274  
     11.1 Value Labels or Formats (and Measurement Level) 274  
        11.1.1 Character Factors 275  
        11.1.2 Numeric Factors 277  
        11.1.3 Making Factors of Many Variables 279  
        11.1.4 Converting Factors into Numeric or Character Variables 281  
        11.1.5 Dropping Factor Levels 283  
        11.1.6 Example Programs for Value Labels or Formats 284  
     11.2 Variable Labels 287  
        11.2.1 Variable Labels in The Hmisc Package 287  
        11.2.2 Long Variable Names as Labels 288  
        11.2.3 Other Packages That Support Variable Labels 291  
        11.2.4 Example Programs for Variable Labels 291  
     11.3 Output for Word Processing and Web Pages 292  
        11.3.1 The xtable Package 293  
        11.3.2 Other Options for Formatting Output 295  
        11.3.3 Example Programs for Formatting Output 296  
  12 Generating Data 298  
     12.1 Generating Numeric Sequences 299  
     12.2 Generating Factors 300  
     12.3 Generating Repetitious Patterns (Not Factors) 301  
     12.4 Generating Integer Measures 302  
     12.5 Generating Continuous Measures 304  
     12.6 Generating a Data Frame 306  
     12.7 Example Programs for Generating Data 306  
        12.7.1 Stata Program for Generating Data 306  
        12.7.2 R Program for Generating Data 307  
  13 Managing Your Files and Workspace 312  
     13.1 Loading and Listing Objects 312  
     13.2 Understanding Your Search Path 315  
     13.3 Attaching Data Frames 317  
     13.4 Attaching Files 319  
     13.5 Removing Objects from Your Workspace 320  
     13.6 Minimizing Your Workspace 322  
     13.7 Setting Your Working Directory 322  
     13.8 Saving Your Workspace 323  
        13.8.1 Saving Your Workspace Manually 323  
        13.8.2 Saving Your Workspace Automatically 324  
     13.9 Getting Operating Systems to Show You ``.RData'' Files 324  
     13.10 Organizing Projects with Windows Shortcuts 325  
     13.11 Saving Your Programs and Output 325  
     13.12 Saving Your History 326  
     13.13 Large Data Set Considerations 326  
     13.14 Example R Program for Managing Filesand Workspace 328  
  14 Graphics Overview 332  
     14.1 Stata Graphics 333  
     14.2 R Graphics 333  
     14.3 The Grammar of Graphics 334  
     14.4 Other Graphics Packages 336  
     14.5 Graphics Procedures and Graphics Systems 336  
     14.6 Graphics Devices 337  
     14.7 Practice Data: mydata100 339  
  15 Traditional Graphics 340  
     15.1 Bar Plots 340  
        15.1.1 Bar Plots of Counts 340  
        15.1.2 Bar Plots for Subgroups of Counts 345  
        15.1.3 Bar Plots of Means 347  
     15.2 Adding Titles, Labels, Colors, and Legends 348  
     15.3 Graphics Parameters and Multiple Plots on a Page 351  
     15.4 Pie Charts 352  
     15.5 Dot Charts 354  
     15.6 Histograms 354  
        15.6.1 Basic Histograms 355  
        15.6.2 Histograms Stacked 357  
        15.6.3 Histograms Overlaid 358  
     15.7 Normal QQ Plots 362  
     15.8 Strip Charts 363  
     15.9 Scatter Plots and Line Plots 368  
        15.9.1 Scatter plots with Jitter 371  
        15.9.2 Scatter plots with Large Data Sets 371  
        15.9.3 Scatter plots with Lines 373  
        15.9.4 Scatter plots with Linear Fit by Group 374  
        15.9.5 Scatter plots by Group or Level (Coplots) 375  
        15.9.6 Scatter plots with Confidence Ellipse 377  
        15.9.7 Scatter plots with Confidence and PredictionIntervals 378  
        15.9.8 Plotting Labels Instead of Points 383  
        15.9.9 Scatter plot Matrices 385  
     15.10 Dual-Axes Plots 387  
     15.11 Box Plots 389  
     15.12 Error Bar Plots 391  
     15.13 Interaction Plots 391  
     15.14 Adding Equations and Symbols to Graphs 392  
     15.15 Summary of Graphics Elements and Parameters 393  
     15.16 Plot Demonstrating Many Modifications 394  
     15.17 Example Program for Traditional Graphics 395  
        15.17.1 Stata Program for Traditional Graphics 396  
        15.17.2 R Program for Traditional Graphics 396  
  16 Graphics with ggplot2 406  
     16.1 Introduction 406  
        16.1.1 Overview qplot and ggplot 407  
        16.1.2 Missing Values 408  
        16.1.3 Typographic Conventions 409  
     16.2 Bar Plots 410  
        16.2.1 Pie Charts 413  
        16.2.2 Bar Charts for Groups 414  
     16.3 Plots by Group or Level 415  
     16.4 Presummarized Data 417  
     16.5 Dot Charts 418  
     16.6 Adding Titles and Labels 420  
     16.7 Histograms and Density Plots 421  
        16.7.1 Histograms 421  
        16.7.2 Density Plots 422  
        16.7.3 Histograms with Density Overlaid 422  
        16.7.4 Histograms for Groups, Stacked 424  
        16.7.5 Histograms for Groups, Overlaid 425  
     16.8 Normal QQ Plots 426  
     16.9 Strip Plots 426  
     16.10 Scatter Plots and Line Plots 429  
        16.10.1 Scatter Plots with Jitter 431  
        16.10.2 Scatter Plots for Large Data Sets 432  
        16.10.3 Hexbin Plots 435  
        16.10.4 Scatter Plots with Fit Lines 436  
        16.10.5 Scatter Plots with Reference Lines 437  
        16.10.6 Scatter Plots with Labels Instead of Points 441  
        16.10.7 Changing Plot Symbols 442  
        16.10.8 Scatter Plot with Linear Fits by Group 443  
        16.10.9 Scatter Plots Faceted for Groups 443  
        16.10.10 Scatter Plot Matrix 445  
     16.11 Box Plots 446  
     16.12 Error Bar Plots 449  
     16.13 Logarithmic Axes 451  
     16.14 Aspect Ratio 451  
     16.15 Multiple Plots on a Page 452  
     16.16 Saving ggplot2 Graphs to a File 454  
     16.17 An Example Specifying All Defaults 454  
     16.18 Summary of Graphic Elements and Parameters 456  
     16.19 Example Programs for ggplot2 457  
  17 Statistics 474  
     17.1 Scientific Notation 474  
     17.2 Descriptive Statistics 475  
        17.2.1 The Hmisc describe Function 475  
        17.2.2 The summary Function 477  
        17.2.3 The table Function and Its Relatives 478  
        17.2.4 The mean Function and Its Relatives 480  
     17.3 Cross-Tabulation 481  
        17.3.1 The CrossTable Function 481  
        17.3.2 The tables and chisq.test Functions 483  
     17.4 Correlation 486  
        17.4.1 The cor Function 489  
     17.5 Linear Regression 491  
        17.5.1 Plotting Diagnostics 494  
        17.5.2 Comparing Models 495  
        17.5.3 Making Predictions with New Data 496  
     17.6 t-Test: Independent Groups 497  
     17.7 Equality of Variance 498  
     17.8 t-Test: Paired or Repeated Measures 499  
     17.9 Wilcoxon Mann-Whitney Rank Sum Test: IndependentGroups 500  
     17.10 Wilcoxon Signed-Rank Test: Paired Groups 501  
     17.11 Analysis of Variance 502  
     17.12 Sums of Squares 507  
     17.13 The Kruskal--Wallis Test 508  
     17.14 Example Programs for Statistical Tests 510  
        17.14.1 Stata Program for Statistical Tests 510  
        17.14.2 R Program for Statistical Tests 512  
  18 Conclusion 518  
  Glossary of R jargon 519  
  Comparison of Stata commands and R functions 525  
  Automating Your R Setup 527  
     C.1 Setting Options 527  
     C.2 Creating Objects 528  
     C.3 Loading Packages 528  
     C.4 Running Functions 528  
     C.5 Example .Rprofile 530  
  Example Simulation 531  
     D.1 Stata Example Simulation 531  
     D.2 R Example Simulation 532  
  References 533  
  Index 537  

Kategorien

Service

Info/Kontakt