I do a lot of SAS programming on large datasets, and thought it would be productive to share some of my programming tips in one place.Large data is defined to be a dataset so large that it cannot be stored in the available memory. (My largest data file to date is 1.7 terabytes.)
Suggestions and corrections welcome!
Use SAS macro language whenever possible;
It is so much easier to work with short strings than long lists, especially with repeated models and datasteps;
%let rhs = Age Sex hcc001-hcc394;
Design your programs for efficient reading and writing of files, and minimize temporary datasets.
SAS programs on large data are generally constrained by IO, not by CPU. I have found that some computers with high speed CPU and multiple cores are slower than simpler computers because they are not optimized for speedy hard drives. Hard drive speeds will likely be your limiting factor.
Clean up your main hard drive it at all possible.
Otherwise you risk SAS crashing when your hard drive gets full. If it does, cancel the job and be sure to delete the temporary SAS datasets that may have been created before you crashed. The SAS default for storing temporary files is something like
C:\Users\”your_user_name”.AD\AppData\Local\Temp\SAS Temporary Files
Use only internal hard drives for routine programming
Very large files may require storage or back up on external hard drives, but these are incredibly slow. Try to minimize their use for actual project work. Instead, buy more internal drives if possible.
Always try to write large datasets to a different disk drive than you read them in from.
Do some tests copying large files from
Consider using binary compression to save space and time if you have a lot of binary variables.
By default, SAS stores datasets in a fixed rectangular dataset that leaves lots of empty space when you use integers instead of real numbers. Although I have been a long time fan of using OPTIONS COMPRESS=YES to save space and run time (but not CPU time) I only recently discovered that
is even better for integers and binary flags when they outnumber real numbers. For some large datasets with lots of zero one dummies it has reduced my file size by as much as 97%! Standard variables are stored as 8 bytes, which have 8*256=2048 bits. In principle you could store 2000 binary flags in the space of one real number. Try saving some files on different compression and see if your run times and storage space improve. Note: compression INCREASES files size for real numbers! It seems that compression saves space when binary flags outnumber real numbers or integers;
Create a macro file where you store macros that you want to have available anytime you need them. Do the same with your formats;
Use a bell!
My latest addition to my macro list is the following bell macro, which makes sounds.
Use %bell; at the end of your SAS program that you run batch and you may notice when the program has finished running.
*plays the trumpet call, useful to put at end of batch program to know when the batch file has ended;
*Randy Ellis and Wenjia Zhu November 18 2014;
call sound(392.00,70); *first argument is frequency, second is duration;