1 The SAS System 15:38 Saturday, March 16, 2013 NOTE: Unable to open SASUSER.REGSTRY. WORK.REGSTRY will be opened instead. NOTE: All registry changes will be lost at the end of the session. WARNING: Unable to copy SASUSER registry to WORK registry. Because of this, you will not see registry customizations during this session. NOTE: Unable to open SASUSER.PROFILE. WORK.PROFILE will be opened instead. NOTE: All profile changes will be lost at the end of the session. NOTE: Copyright (c) 2002-2010 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.3 (TS1M0) Licensed to BOSTON UNIVERSITY - SFA - T&R, Site 70009029. NOTE: This session is executing on the W32_7PRO platform. NOTE: SAS initialization used: real time 4.72 seconds cpu time 0.07 seconds 1 **************************** 2 * ec782581_SASDemoProgram.SAS 3 * Randy Ellis Programmer 4 * March 8, 2013 5 * This file contains the sample SAS programs 6 * that were developed interactively in the march 8, SAS class 7 * Like all good programs, it starts with this header. I added comments later 8 ********************************************; 9 data test; 10 do i=1 to 10; 11 x=10; * indenting is very helpful with nested do groups; 12 end; *end do loop over i, comments like this are useful; 13 output test; 14 run; NOTE: The data set WORK.TEST has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 15 16 17 18 * this is how a comment is made in the middle of a program; 19 * TEST is a temporary dataset that will disappear when SAS ends 20 SAS is not case sensitive. Output is in caps. 21 permanent SAS datasets use two node file names like in.test 22 comments can be multiple lines and end with a semicolon.; 23 24 /* This is another way to comment out programming code 25 data in.test; 26 *creating test file permanently; 27 set test; 28 run; 2 The SAS System 15:38 Saturday, March 16, 2013 29 */ 30 %macro junk; *macros can also be used to comment out whole blocks of programming code; 31 /* This is another way to comment out programming code 32 data in.test; 33 *creating test file permanently; 34 set test; 35 run; 36 */ 37 %mend; * end of JUNK macro; 38 39 40 *tell SAS where to store datasets using libname command; 41 42 libname in "c:/data/"; NOTE: Libref IN was successfully assigned as follows: Engine: V9 Physical Name: c:\data 43 44 * now execute the code in the junk macro; 45 46 %junk; 47 48 ******************* 49 * Program: test1, by randy ellis March 8 2013. Starting over. 50 ******************; 51 libname in "c:/data/"; NOTE: Libref IN was successfully assigned as follows: Engine: V9 Physical Name: c:\data 52 data in.test; 53 *creating test file permanently; 54 set test; 55 run; NOTE: There were 1 observations read from the data set WORK.TEST. NOTE: The data set IN.TEST has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 56 57 proc print data =test; 58 var i x x; 59 format x 10.; 60 run; NOTE: There were 1 observations read from the data set WORK.TEST. NOTE: The PROCEDURE PRINT printed page 1. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 3 The SAS System 15:38 Saturday, March 16, 2013 61 * you can also use simply; 62 proc print; 63 run; NOTE: There were 1 observations read from the data set IN.TEST. NOTE: The PROCEDURE PRINT printed page 2. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 64 *but the danger is that it may end up printing a very large dataset, the last one 64 ! created; 65 66 proc contents data=test; 67 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: The PROCEDURE CONTENTS printed page 3. 68 69 *You can create multiple datasets in one datastep; 70 71 data one two three; 72 array x {10} x01-x10; 73 array y {10} y1-y10; *less preferred array members: does not sort well; 74 array z {10} x01-x10; 75 array all x01-x10 y1-y10 i n counter; *implicit array indexing; 76 do i = 1 to 10; 77 X[i]=10; 78 y[i]=x[i]; 79 if i = 10 then counter+1; 80 /* *or could use; 81 retain counter 0; 82 if i = 10 then counter=counter+1; 83 */ 84 output one two; *inside loop, done ten times; 85 end; 86 output three; *Outputs THREE once; 87 *sample use of DO OVER; 88 do over all; all=0; end; 89 output three; *outputs THREE again; 90 run; NOTE: The data set WORK.ONE has 10 observations and 23 variables. NOTE: The data set WORK.TWO has 10 observations and 23 variables. NOTE: The data set WORK.THREE has 2 observations and 23 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds 4 The SAS System 15:38 Saturday, March 16, 2013 cpu time 0.01 seconds 91 92 proc print data = one (obs=10); NOTE: There were 10 observations read from the data set WORK.ONE. NOTE: The PROCEDURE PRINT printed page 4. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 93 proc print data = two (obs=10); NOTE: There were 10 observations read from the data set WORK.TWO. NOTE: The PROCEDURE PRINT printed page 5. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 94 proc print data = three (obs=10); *Get in the habit of using (obs= 100) option; 95 run; NOTE: There were 2 observations read from the data set WORK.THREE. NOTE: The PROCEDURE PRINT printed page 6. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 96 97 %macro compresstest(NNN=1000); 98 data test; 99 retain x1-x1000 1; 100 do i=1 to &NNN; *note use of macro variable; 101 output; 102 yy=xx; *XX is not defined. program runs but gives warning in log; 103 end; 104 run; 105 %mend; 106 * reuse code for different size NNN; 107 options compress = no; 108 %compresstest (); *uses default for nnn of 1000; NOTE: Variable xx is uninitialized. NOTE: The data set WORK.TEST has 1000 observations and 1003 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.01 seconds 5 The SAS System 15:38 Saturday, March 16, 2013 109 %compresstest (nnn=100); NOTE: Variable xx is uninitialized. NOTE: The data set WORK.TEST has 100 observations and 1003 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 110 %compresstest (nnn=1000); NOTE: Variable xx is uninitialized. NOTE: The data set WORK.TEST has 1000 observations and 1003 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.01 seconds 111 %compresstest (nnn=10000); NOTE: Variable xx is uninitialized. NOTE: The data set WORK.TEST has 10000 observations and 1003 variables. NOTE: DATA statement used (Total process time): real time 0.20 seconds cpu time 0.20 seconds 112 options mprint; * to see macro language code; 113 %compresstest (nnn=30000); MPRINT(COMPRESSTEST): data test; MPRINT(COMPRESSTEST): retain x1-x1000 1; MPRINT(COMPRESSTEST): do i=1 to 30000; MPRINT(COMPRESSTEST): *note use of macro variable; MPRINT(COMPRESSTEST): output; MPRINT(COMPRESSTEST): yy=xx; MPRINT(COMPRESSTEST): *XX is not defined. program runs but gives warning in log; MPRINT(COMPRESSTEST): end; MPRINT(COMPRESSTEST): run; NOTE: Variable xx is uninitialized. NOTE: The data set WORK.TEST has 30000 observations and 1003 variables. NOTE: DATA statement used (Total process time): real time 0.40 seconds cpu time 0.40 seconds 114 *I almost always use these options; 115 options compress = yes 116 nocenter 117 ps = 9999 118 ls = 200; 119 *or; 120 options compress = yes nocenter ps = 9999 ls = 200; 121 *usually much faster not here; 122 %compresstest(nnn=30000); MPRINT(COMPRESSTEST): data test; MPRINT(COMPRESSTEST): retain x1-x1000 1; MPRINT(COMPRESSTEST): do i=1 to 30000; MPRINT(COMPRESSTEST): *note use of macro variable; MPRINT(COMPRESSTEST): output; MPRINT(COMPRESSTEST): yy=xx; MPRINT(COMPRESSTEST): *XX is not defined. program runs but gives warning in log; MPRINT(COMPRESSTEST): end; MPRINT(COMPRESSTEST): run; NOTE: Variable xx is uninitialized. NOTE: The data set WORK.TEST has 30000 observations and 1003 variables. NOTE: Compressing data set WORK.TEST decreased size by 49.94 percent. Compressed is 7512 pages; un-compressed would require 15005 pages. NOTE: DATA statement used (Total process time): real time 1.85 seconds cpu time 1.29 seconds 123 124 *examples using retain and put _all_ (to log); 125 126 data test; 127 retain x1-x3 1; 128 do i=1 to 3; 129 output; 130 yy='asdfd'; * yy is initialized as a Character variable; 131 yy=1; * Storing 1 as a character variable see notes; 132 put _all_; 133 end; 134 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 131:5 x1=1 x2=1 x3=1 i=1 yy=1 _ERROR_=0 _N_=1 x1=1 x2=1 x3=1 i=2 yy=1 _ERROR_=0 _N_=1 x1=1 x2=1 x3=1 i=3 yy=1 _ERROR_=0 _N_=1 NOTE: The data set WORK.TEST has 3 observations and 5 variables. NOTE: Compressing data set WORK.TEST increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 135 136 *Common errors; 137 *I intentionally left in this as an example of what can happen with missing asterisk; 138 139 Data options are very powerful; 140 run; NOTE: Compression was disabled for data set WORK.OPTIONS because compression overhead would increase the size of the data set. NOTE: Compression was disabled for data set WORK.ARE because compression overhead would increase the size of the data set. NOTE: Compression was disabled for data set WORK.VERY because compression overhead would increase the size of the data set. NOTE: Compression was disabled for data set WORK.POWERFUL because compression overhead would increase the size of the data set. NOTE: The data set WORK.OPTIONS has 1 observations and 0 variables. NOTE: The data set WORK.ARE has 1 observations and 0 variables. NOTE: The data set WORK.VERY has 1 observations and 0 variables. NOTE: The data set WORK.POWERFUL has 1 observations and 0 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 141 *see log to see what happened; 142 143 Data bad; 144 *missing semicolons can cause lots of problems and be hard to debug 145 consider the following code which does not generate an error message; 146 x1=0; 147 retain x1 148 array x x1 x2 149 keep x1 /*keeps only specified variables in output */ 150 drop x2 /* should not be used with keep, but does the opposite*/ 151 if x1 then stop /* tests whether x1 =1 and if so stops */ 152 else output bad 153 length z 8; /* allows double precision to be used for z */ 154 run; NOTE: The data set WORK.BAD has 1 observations and 14 variables. NOTE: Compressing data set WORK.BAD increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 155 156 proc means data =bad; *very popular proc to use; 157 *See anything unusual among the variables?; 158 run; NOTE: There were 1 observations read from the data set WORK.BAD. NOTE: The PROCEDURE MEANS printed page 7. NOTE: PROCEDURE MEANS used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 159 160 *Sorting can take serious time and disck space so try to minimize on large datasets; 161 proc sort data = one out = one; 162 by i; 163 run; NOTE: There were 10 observations read from the data set WORK.ONE. NOTE: The data set WORK.ONE has 10 observations and 23 variables. NOTE: Compressing data set WORK.ONE increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 164 *sloppier but more compact; 165 proc sort data=three; by i; run; NOTE: There were 2 observations read from the data set WORK.THREE. NOTE: The data set WORK.THREE has 2 observations and 23 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 166 167 *merging is uniquely powerful in SAS; 168 *Lots of powerful dataset options Here are several in use; 169 data four; 170 merge one (in=in1 keep=x01 i x02) three (obs=10 in=in2 drop=x10); 171 by i; 172 if first.i=0 then count+1; else count=1; 173 if last.i=1 then countlast+1; *automatically retained by implicit retain; 174 if in1=1 and in2=1 then output; *only output people in both files; 175 label countlast = count of distinct people indexed by i; 176 label count = count of records by person i; 177 run; NOTE: There were 10 observations read from the data set WORK.ONE. NOTE: There were 2 observations read from the data set WORK.THREE. NOTE: The data set WORK.FOUR has 0 observations and 24 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 178 *these three procs are your workhorses which you will use a lot for debugging; 179 180 proc print data = four; 181 run; NOTE: No observations in data set WORK.FOUR. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 182 proc contents data = four; 183 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: The PROCEDURE CONTENTS printed page 8. 184 proc means data = four; 185 run; NOTE: No observations in data set WORK.FOUR. NOTE: PROCEDURE MEANS used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 186 187 *Or why not create your own macro to do this efficently; 188 189 %macro pcm(data=,nobs=100); 190 proc print data = &data (obs=&nobs); run; 191 proc contents data = &data;run; 192 proc means data = &data;run; 193 %mend; *end of macro pcm; 194 %pcm(data=four); MPRINT(PCM): proc print data = four (obs=100); MPRINT(PCM): run; NOTE: No observations in data set WORK.FOUR. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(PCM): proc contents data = four; MPRINT(PCM): run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: The PROCEDURE CONTENTS printed page 9. MPRINT(PCM): proc means data = four; MPRINT(PCM): run; NOTE: No observations in data set WORK.FOUR. NOTE: PROCEDURE MEANS used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 195 run; 196 197 * here is another example o fusing macro language; 198 data test; 199 %let list= 1 2 3 4 5; 200 x=3; 201 if x in(1,2,4,5) then zzz=1; 202 if x in( &list. )then zzz=2; 203 * Use PUT to debug and view all variables in data set at a given point in programs 204 slashes put in carriage returns, 205 = shows variable name, 206 _all shows all variables; 207 208 put zzz / zzz= / x= _all_; 209 run; 2 zzz=2 x=3 x=3 zzz=2 _ERROR_=0 _N_=1 NOTE: The data set WORK.TEST has 1 observations and 2 variables. NOTE: Compressing data set WORK.TEST increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 210 211 data five; 212 do i= 1 to 100; 213 x=rannor(12345); *rannor generates N(0,1) random numbers. Lots of other functions; 214 y=x+rannor(1234567); *seed ensures same random numbers generated each run; 215 xsq=x*x; 216 xint=int(x); 217 weight=i; 218 i10=int((i-1)/10); 219 output; 220 end; * end do group over i; 221 run; NOTE: The data set WORK.FIVE has 100 observations and 7 variables. NOTE: Compressing data set WORK.FIVE increased size by 50.00 percent. Compressed is 3 pages; un-compressed would require 2 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 222 223 *simplest regession; 224 proc reg data = five outest=beta; 225 model y = x; 226 run; 227 NOTE: The data set WORK.BETA has 1 observations and 7 variables. NOTE: Compressing data set WORK.BETA increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: The PROCEDURE REG printed page 10. NOTE: PROCEDURE REG used (Total process time): real time 0.07 seconds cpu time 0.03 seconds 228 proc reg data = five outest=beta TABLEOUT; 229 title 'proc reg model y = x'; *titles are very useful, this is one style; 230 *you can give a name to your model before the model statement see output; 231 Y_on_X_Xsq: model y=x xsq; 232 run; 233 234 * you can add other information to the OUTEST file using various options; 235 NOTE: The data set WORK.BETA has 6 observations and 8 variables. NOTE: Compressing data set WORK.BETA increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: The PROCEDURE REG printed page 11. NOTE: PROCEDURE REG used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 236 proc print data=beta; 237 run; NOTE: There were 6 observations read from the data set WORK.BETA. NOTE: The PROCEDURE PRINT printed page 12. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 238 239 Proc glm data=five; 240 class xint; *creates dummy variables for each value; 241 model y = xint; 242 title 'GLM using class xint model y = xint'; 243 run; NOTE: The PROCEDURE GLM printed pages 13-14. NOTE: PROCEDURE GLM used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 244 Proc glm data=five; 245 class xint; *creates dummy variables for each value; 246 weight i; *weighted regressions; 247 where (i lt 50); *Powerful for subsetting data; 248 model y = xint xsq /solution; *show coeficients; 249 title 'GLM using class xint model y = xint xsq'; 250 run; 251 NOTE: The PROCEDURE GLM printed pages 15-16. NOTE: PROCEDURE GLM used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 252 Proc GENMOD data=five; *lots of nonlinear models to explore; 253 class xint; *creates dummy variables for each value; 254 freq i; *each obs now represents multiple counts ; 255 model y = xint; 256 title 'GENMOD using class xint model y = xint'; 257 run; NOTE: Algorithm converged. NOTE: The scale parameter was estimated by maximum likelihood. NOTE: The PROCEDURE GENMOD printed page 17. NOTE: PROCEDURE GENMOD used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 258 259 *Other useful procs; 260 title ; *Stop using last title; 261 proc univariate data = five; 262 var x xsq xint y; 263 run; NOTE: The PROCEDURE UNIVARIATE printed pages 18-21. NOTE: PROCEDURE UNIVARIATE used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 264 proc corr data = five; 265 var x xsq xint y; 266 run; NOTE: The PROCEDURE CORR printed page 22. NOTE: PROCEDURE CORR used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 267 proc freq data = five; 268 tables xint i10 xint*i10; 269 run; NOTE: There were 100 observations read from the data set WORK.FIVE. NOTE: The PROCEDURE FREQ printed page 23. NOTE: PROCEDURE FREQ used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 270 proc summary data = five; *very powerful this is a baby example; 271 class xint i10; 272 var x y; 273 output out=six mean(x y)=mnx mny std(x y)=stdx stdy; 274 run; NOTE: There were 100 observations read from the data set WORK.FIVE. NOTE: The data set WORK.SIX has 43 observations and 8 variables. NOTE: Compressing data set WORK.SIX increased size by 50.00 percent. Compressed is 3 pages; un-compressed would require 2 pages. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 275 proc print data = six; 276 run; NOTE: There were 43 observations read from the data set WORK.SIX. NOTE: The PROCEDURE PRINT printed page 24. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 277 title Freq output using formats which are very powerful; 278 proc format; 279 value male 280 0 ='female' 281 1 ='male' 282 2 ='other' 283 -1='missing' 284 -2='invalid' 285 ; NOTE: Format MALE has been output. NOTE: PROCEDURE FORMAT used (Total process time): real time 0.09 seconds cpu time 0.03 seconds 286 proc freq data =five; 287 format xint male.; 288 title proc freq using format to assign label values; 289 tables xint; 290 run; NOTE: There were 100 observations read from the data set WORK.FIVE. NOTE: The PROCEDURE FREQ printed page 25. NOTE: PROCEDURE FREQ used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 291 292 *finally for cluster regressions and various fixed effects, use 293 Proc surveyreg or other survey* options Here is one example; 294 295 proc surveyreg data =five; 296 title 'surveyreg class i10 cluster xint weight weight y=x'; 297 class i10; *fixed effects; 298 cluster xint; *clusters with correlated errors; 299 weight weight; 300 model y = x /solution; 301 run; NOTE: The PROCEDURE SURVEYREG printed page 26. NOTE: PROCEDURE SURVEYREG used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 302 303 *Enjoy!; NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 NOTE: The SAS System used: real time 10.34 seconds cpu time 2.48 seconds