Subject: Re: [R] FDA and ICH Compliance of R From: Frank E Harrell Jr Date: Thu, 27 Nov 2003 19:11:06 -0500 To: ggrothendieck@myway.com CC: r-help@stat.math.ethz.ch, antoniamarija@net.hr On Thu, 27 Nov 2003 09:59:57 -0500 (EST) "Gabor Grothendieck" wrote: > > From: Frank E Harrell Jr > > per year in SAS licenses and have to hire armies of non-intellectually > > challenged SAS programmers to do the work of significantly fewer > > programmers that use modern statistical computing tools like R and > > S-Plus, it is surprising that SAS is still the most commonly used tool > > in the clinical side of drug development. I quit using SAS in 1991 > > because my productivity jumped at least 20% within one month of using > > S-Plus. > > I have not used SAS for even longer than you but to > give SAS its due: > > - its pretty easy to produce all the info you need for a > complete analysis with a few SAS commands. It would be > possible to create analogous R commands but as it stands > you have to keep going back and forth with R rather than > just get it all out at once like you can with SAS. Thanks for your note Gabor. It depends on what you mean by "complete analysis". SAS often would give me things I didn't need but was and is short on modern methods. But to address the needs I think you are getting at, this is the reason I developed the Hmisc package (especially summary.formula). > > - SAS has more functionality in missing values. You > can have different types of SAS missing values but in R you > can have only one type of missing value. Several points here. First, I always liked the 27 levels of missing that SAS supported, but I've never seen a pharmaceutical company actually use more than the standard missing (.). Second, you can easily implement them in R and S-Plus anyway; the sas.get function in Hmisc imports all SAS special missing values and lets you work with them (e.g., is.special.miss(x, 'B')) while treating all of them as NA in standard calculations. In S you can add your own attributes on the fly (as long as you don't use the new class mechanism) so you can do things much more generally than with SAS. For example, I can add 'comment' attributes and attributes documenting file names containing the image of the case report form page containing the variable, etc. When you get to missing value imputation, S has more methods available than SAS. > > - the BY phrase in SAS is incredibly powerful and handy. You > can get the same effect in R but I think that specific > functionality is easier with SAS. Again I'll have to respectfully disagree. BY in SAS is very good for within-procedure repetition of analyses, but not between procedure. And if you need any SAS PROC IML code to do customized matrix programming, you lose the ability to do by-processing. In S you can put any number of things within a loop, with an easier-to-use mechanism for collecting the results. > > Obviously R is incredibly powerful and functional and I really > am out of touch with the SAS world but I thought I would make > whatever case I could. I am willing to be corrected by those > more in the know with SAS if this wrong. My view is that SAS is best at handling massive databases when you need standard (i.e., older) methods run, and SAS is very good at getting P-values in mixed effects models. Other than that, S is better in almost every way. Over the years I've developed documents demonstrating how to do data manipulation in S (yes, S is superior to SAS for this task) and how to make semi-advanced statistical reports and fairly complex tables. Granted, the learning curve for S is not shallow but the payoff is great in terms of productivity, beauty of output (when coupling S with LaTeX) and availability of modern applied statistical methods. --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help