Monthly Archives: February 2015

Ellis SAS tips for experienced SAS users

If you are a beginning SAS programmer, then the following may not be particularly helpful, but the books suggested in the middle may be. BU students can obtain a free license for SAS to install on their own computer if it is required for a course or research project. Both will require an email from an adviser. SAS is also available on various computers in the economics department computer labs.

I also created a Ellis SAS tips for new SAS programmers.

I do a lot of SAS programming on large datasets, and thought it would be productive to share some of my programming tips on SAS in one place. Large data is defined to be a dataset so large that it cannot be stored in the available memory. (My largest data file to date is 1.7 terabytes.)

Suggestions and corrections welcome!

Use SAS macro language whenever possible;

It is so much easier to work with short strings than long lists, especially with repeated models and datasteps;

%let rhs = Age Sex HCC001-HCC394;

 

Design your programs for efficient reading and writing of files, and minimize temporary datasets.

SAS programs on large data are generally constrained by IO (input output, reading from your hard drives), not by CPU (actual calculations) or memory (storage that disappears once your sas program ends). I have found that some computers with high speed CPU and multiple cores are slower than simpler computers because they are not optimized for speedy hard drives. Large memory really helps, but for really huge files it can almost almost be exceeded, and then your hard drive speeds will really matter. Even reading in and writing out files the hard drive speeds will be your limiting factor.

This implication of this is that you should do variable creation steps in as few datastep steps as possible, and minimize sorts, since reading and saving datasets will take a lot of time. This requires a real change in thinking from STATA, which is designed for changing one variable at a time on a rectangular file. Recall that STATA can do this efficiently since it usually starts by bringing the full dataset into memory before doing any changes. SAS does not do this, one of its strengths.

Learning to use DATA steps and PROC SQL is the central advantage of an experienced SAS programmer. Invest, and you will save time waiting for your programs to run.

Clean up your main hard drive if at all possible.

Otherwise you risk SAS crashing when your hard drive gets full. If it does, cancel the job and be sure to delete the temporary SAS datasets that may have been created before you crashed. The SAS default for storing temporary files is something like

C:\Users\”your_user_name”.AD\AppData\Local\Temp\SAS Temporary Files

Unless you have SAS currently open, you can safely delete all of the files stored in that directory. Ideally, there should be none since SAS deletes them when it closes normally. It is the abnormal endings of SAS that cause temporary files to be saved. Delete them, since they can be large!

Change the default hard drive for temporary files and sorting

If you have a large internal secondary hard drive with lots of space, then change the SAS settings so that it uses temp space on that drive for all work files and sorting operations.

To change this default location to a different internal hard drive, find your sasv9.cfg file which is in a location like

“C:\Program Files\SASHome\x86\SASFoundation\9.3\nls\en”

“C:\Program Files\SASHome2-94\SASFoundation\9.4\nls\en”

Find the line in the config firl that starts -WORK and change it to your own location for the temporary files (mine are on drive j and k) such as:

-WORK “k:\data\temp\SAS Temporary Files”

-UTILLOC “j:\data\temp\SAS Temporary Files”

The first one is where SAS stores its temporary work files such as WORK.ONE where you define the ONE such as by DATA ONE;

The second line is where SAS stores its own files such as when sorting a file or when saving residuals.

There is a reason to have the WORK and UTIL files on different drives, so that it is in generally reading in from one drive and writing out to a different one, rather than reading in and writing out on the same drive. Try to avoid the latter. Do some test on your own computer to see how much time you can save by switching from one drive to another instead of only using one drive.

Use only internal hard drives for routine programming

Very large files may require storage or back up on external hard drives, but these are incredibly slow. External drives are three to ten times slower than an internal hard drive. Try to minimize their use for actual project work. Instead, buy more internal drives if possible. You can purchase additional internal hard drives with 2T of space for under $100. You save that much in time the first day!

Always try to write large datasets to a different disk drive than you read them in from.

Do some tests copying large files from c: to c: and from C: to F: You may not notice any difference until the file sizes get truly huge, greater than your memory size.

Consider using binary compression to save space and time if you have a lot of binary variables.

By default, SAS stores datasets in  a fixed rectangular dataset that leaves lots of empty space when you use integers instead of real numbers. Although I have been a long time fan of using OPTIONS COMPRESS=YES to save space and run time (but not CPU time) I only recently discovered that

OPTIONS COMPRESS=BINARY;

is even better for integers and binary flags when they outnumber real numbers. For some large datasets with lots of zero one dummies it has reduced my file size by as much as 97%! Standard variables are stored as 8 bytes, which have 8*256=2048 bits. In principle you could store 2000 binary flags in the space of one real number. Try saving some files on different compression and see if your run times and storage space improve. Note: compression INCREASES files size for real numbers! It seems that compression saves space when binary flags outnumber real numbers or integers;

Try various permutations on the following on you computer with your actual data to see what saves time and space;

data real;           retain x1-x100 1234567.89101112; do i = 1 to 100000; output; end;run; proc means; run;

data dummies; retain d1-d100 1;                                do i = 1 to 100000; output; end; proc means; run;

*try various datasteps with this, using the same or different drives. Bump up the obs to see how times change.

 

Create a macro file where you store macros that you want to have available anytime you need them. Do the same with your formats;

options nosource;
%include “c://data/projectname/macrofiles”;
%include “c://data/projectname/allformats”;
options source;

Be aware of which SAS procs create large, intermediate files

Some but not all procs create huge temporary datasets.

Consider: PROC REG, and PROC GLM generates all of the results in one pass through the data unless you have an OUTPUT statement. Then they create large,uncompressed, temporary files that can be a multiple of your original file sizes. PROC SURVEYREG and MIXED create large intermediate files even without an output statement. Plan accordingly.

Consider using OUTEST=BETA to more efficiently create residuals together with PROC SCORE.

Compare two ways of making residuals;

*make test dataset with ten million obs, but trivial model;

data test;
do i = 1 to 10000000;
retain junk1-junk100 12345;  * it is carrying along all these extra variables that slows SAS down;
x = rannor(234567);
y = x+rannor(12345);
output;
end;

Run;    * 30.2 seconds);
*Straightforward way; Times on my computer shown following each step;
proc reg data = test;
y: model y = x;
output out=resid (keep=resid) residual=resid;
run;  *25 seconds;
proc means data = resid;
run;  *.3 seconds;

*total of the above two steps is 25.6 seconds;

proc reg data = test outest=beta ;
resid: model y = x;
run;                     *3.9 seconds;
proc print data = beta;
run;  *take a look at beta that is created;
proc score data=test score=beta type=parms
out=resid (keep=resid) residual;
var x;
run;       *6 seconds!;
proc means data = resid;
run;  .3 seconds;

*total from the second method is 10.3 seconds versus 25.6 on the direct approach PLUS no temporary files needed to be created that may crash the system.

If the model statement in both regressions is

y: model y = x junk1-junk100; *note that all of the junk has coefficients of zero, but SAS does not this going in;

then the two times are

Direct approach:    1:25.84
Scoring approach:  1:12.46 on regression plus 9.01 seconds on score = 1:21.47 which is a smaller savings

On very large files the time savings are even greater because of the reduced IO gains; SAS is still able to do this without writing onto the hard drive in this “small” sample on my computer. But the real savings is on temporary storage space.

Use a bell!

My latest addition to my macro list is the following bell macro, which makes sounds.

Use %bell; at the end of your SAS program that you run batch and you may notice when the program has finished running.

%macro bell;
*plays the trumpet call, useful to put at end of batch program to know when the batch file has ended;
*Randy Ellis and Wenjia Zhu November 18 2014;
data _null_;
call sound(392.00,70); *first argument is frequency, second is duration;
call sound(523.25,70);
call sound(659.25,70);
call sound(783.99,140);
call sound(659.25,70);
call sound(783.99,350);
run;
%mend;
%bell;

Purchase essential SAS programming guides.

I gave up on purchasing the paper copy of SAS manuals, because they take up more than two feet of shelf space, and are still not complete or up to date. I find the SAS help menus useful but clunky. I recommend the following if you are going to do serious SAS programming. Buy them used on Amazon or whatever. I would get an older edition, and it will cost less than $10 each. Really.

The Little SAS Book: A Primer, Fifth Edition (or an earlier one)

Nov 7, 2012

by Lora Delwiche and Susan Slaughter

Beginners introduction to SAS. Probably the best single book to buy when learning SAS.

 

Professional SAS Programmer’s Pocket Reference Paperback

By Rick Aster

http://www.amazon.com/Professional-SAS-Programmers-Pocket-Reference/dp/189195718X

Wonderful, concise summary of all of the main SAS commands, although you will have to already know SAS to find it useful. I use it to look up specific functions, macro commands, and optoins on various procs because it is faster than using the help menus. But I am old style…

Professional SAS Programming Shortcuts: Over 1,000 ways to improve your SAS programs Paperback

By Rick Aster

http://www.amazon.com/Professional-SAS-Programming-Shortcuts-programs/dp/1891957198/ref=sr_1_1?s=books&ie=UTF8&qid=1417616508&sr=1-1&keywords=professional+sas+programming+shortcuts

I don’t use this as much as the above, but if I had time, and were learning SAS instead of trying to rediscover things I already know, I would read through this carefully.

Get in the habit of deleting most intermediate permanent files

Delete files if either

1. You won’t need them again or

2. You can easily recreate them again.  *this latter point is usually true;

Beginner programmers tend to save too many intermediate files. Usually it is easier to rerun the entire program instead of saving the intermediate files. Give your final file of interest a name like MASTER or FULL_DATA then keep modifying it by adding variables instead of names like SORTED, STANDARDIZED,RESIDUAL,FITTED.

Consider a macro that helps make it easy to delete files.

%macro delete(library=work, data=temp, nolist=);

proc datasets library=&library &nolist;
delete &data;
run;
%mend;

*sample macro calls

%delete (data=temp);   *for temporary, work files you can also list multiple files names but these disappear anyway at the end of your run;

%delete (library =out, data = one two three) ; *for two node files in directory in;

%delete (library=out, data =one, nolist=nolist);   *Gets rid of list in output;

 

 

Ellis SAS tips for New SAS programmers

There is also a posting on Ellis SAS tips for Experienced SAS programmers

It focuses on issues when using large datasets.

 

Randy’s SAS hints for New SAS programmers, updated Feb 21, 2015

  1. ALWAYS

    begin and intermix your programs with internal documentation. (Note how I combined six forms of emphasis in ALWAYS: color, larger font, caps, bold, italics, underline.) Normally I recommend only one, but documenting your programs is really important. (Using only one form of emphasis is also important, just not really important.)

A simple example to start your program in SAS is

******************
* Program = test1, Randy Ellis, first version: March 8, 2013 – test program on sas features
***************;

Any comment starting with an asterisk and ending in a semicolon is ignored;

 

    1. Most common errors/causes of wasted time while programming in SAS.

a. Forgetting semicolons at the end of a line

b. Omitting a RUN statement, and then waiting for the program to run.

c. Unbalanced single or double quotes.

d. Unintentionally commenting out more code than you intend to.

e. Foolishly running a long program on a large dataset that has not first been tested on a tiny one.

f. Trying to print out a large dataset which will overflow memory or hard drive space.

g. Creating an infinite loop in a datastep; Here is one silly one. Usually they can be much harder to identify.

data infinite_loop;
x=1;
nevertrue=0;
do while x=1;
if nevertrue =1 then x=0;
end;
run;

h. There are many other common errors and causes of wasted time. I am sure you will find your own

 

  1. With big datasets, 99 % of the time it pays to use the following system OPTIONS:

 

options compress =yes nocenter;

or

options compress =binary nocenter;

binary compression works particularly well with many binary dummy variables and sometimes is spectacular in saving 95%+ on storage space and hence speed.

 

/* mostly use */
options nocenter /* SAS sometimes spends many seconds figuring out how to center large print outs of
data or results. */
ps=9999               /* avoid unneeded headers and page breaks that split up long tables in output */
ls=200;                /* some procs like PROC MEANS give less output if a narrow line size is used */
 

*other key options to consider;

Options obs = max   /* or obs=100, Max= no limit on maximum number of obs processed */
Nodate nonumber /* useful if you don’t want SAS to embed headers at top of each page in listing */
Macrogen     /* show the SAS code generated after running the Macros. */
Mprint   /* show how macro code and macro variables resolve */
nosource /* suppress source code from long log */
nonotes   /* be careful, but can be used to suppress notes from log for long macro loops */

;                       *remember to always end with a semicolon!;

 

  1. Use these three key procedures regularly

Proc contents data=test; run; /* shows a summary of the file similar to Stata’s DESCRIBE */
Proc means data = test (obs=100000); run; /* set a max obs if you don’t want this to take too long */
Proc print data = test (obs=10); run;

 

I recommend you create and use regularly a macro that does all three easily:

%macro cmp(data=test);
Proc Contents data=&data; Proc means data = &data (obs=1000); Proc print data = &data (obs=10); run;
%end;

Then do all three (contents, means, print ten obs) with just

%cmp(data = mydata);

 

  1. Understand temporary versus permanent files;

Data one;   creates a work.one temporary dataset that disappears when SAS terminates;

Data out.one; creates a permanent dataset in the out directory that remains even if SAS terminates;

 

Define libraries (or directories):

Libname out “c:/data/marketscan/output”;
Libname in “c:/data/marketscan/MSdata”;
 

 

Output or data can be written into external files:

Filename textdata “c:/data/marketscan/textdata.txt”;

 

  1. Run tests on small samples to develop programs and then Toogle between tiny and large samples when debugged.

A simple way is

Options obs =10;
*options obs = max; *only use this when you are sure your programs run.
 

OR, some procedures and data steps using End= dataset option do not work well on partial samples. For those I often toggle between two different input libraries. Create a subset image of all of your data in a separate directory and then toggle using the libname commands;

 

*Libname in ‘c:/data/projectdata/fulldata’;
Libname in ‘c:/data/projectdata/testsample’;

 

Time spent creating a test data set is time well spent.

You could even write a macro to make it easy. (I leave it as an exercise!)

 

  1. Use arrays abundantly. You can use different array names to reference the same set of variables. This is very convenient;

 

%let rhs=x1 x2 y1 y2 count more;
Data _null_;
Array X {100} X001-X100; *usual form;
Array y {100} ;                     * creates y1-y100;
Array xmat {10,10} X001-X100; *matrix notation allows two dimensional indexes;
Array XandY {*} X001-X100 y1-y100 index ; *useful when you don’t know the count of variables in advance;
Array allvar &rhs. ;     *implicit arrays can use implicit indexes;
 

*see various ways of initializing the array elements to zero;

Do i = 1 to 100; x{i} = 0; end;
 

Do i = 1 to dim(XandY); XandY{i} = 0; end;

 

Do over allvar; allvar = 0; end;   *sometimes this is very convenient;

 

Do i=1 to 100 while (y(i) = . );
y{i} = 0;   *do while and do until are sometimes useful;
end;

 

run;

  1. For some purposes naming variables in arrays using leading zeros improves sort order of variables

Use:
Array x {100} X001-X100;
not
Array x {100} X1-X100;

With the second, the alphabetically sorted variables are x1,x10,x100, x11, x12,..,x19, x2,x20 , etc.

 

  1. Learn Set versus Merge command (Update is for rare, specialized use)

 

Data three;   *information on the same person combined into a single record;
Merge ONE TWO;
BY IDNO;
Run;

 

  1. Learn key dataset options like

Obs=
Keep=
Drop=
In=
Firstobs=
Rename=(oldname=newname)
End=

 

  1. Keep files being sorted “skinny” by using drop or keep statements

Proc sort data = IN.BIG(keep=IDNO STATE COUNTY FROMDATE) out=out.bigsorted;
BY STATE COUNTY IDNO FROMDATE;
Run;

Also consider NODUP and NODUPKEY options to sort while dropping duplicate records, on all or on BY variables, respectively.

 

  1. Take advantage of BY group processing

Use FIRST.var and LAST.var abundantly.

 

USE special variables
_N_ = current observation counter
_ALL_ set of all variables such as Put _all_. Or when used with PROC CONTENTS, set of all datasets.

 

Also valuable is

PROC CONTENTS data = in._all_; run;

 

  1. Use lots of comments

 

* this is a standard SAS comment that ends with a semicolon;

 

/*   a PL1 style comment can comment out multiple lines including ordinary SAS comments;

* Like this; */

 

%macro junk; Macros can even comment out other macros or other pl1 style comments;

/*such as this; */ * O Boy!; %macro ignoreme;   mend; *very powerful;

 

%mend; * end macro junk;

 

  1. Use meaningful file names!

Data ONE TWO THREE can be useful.

 

  1. Put internal documentation about what the program does, who did it and when.
  2. Learn basic macro language; See SAS program demo for examples. Know the difference between executable and declarative statements used in DATA step

 

17. EXECUTABLE COMMANDS USED IN DATA STEP (Actually DO something, once for every record)

 

Y=y+x (assignment. In STATA you would use GEN y=x or REPLACE Y=X)
 
Do I = 1 to 10;
End; (always paired with DO, can be nested nearly unlimited deepness)

 

INFile in ‘c:/data/MSDATA/claimsdata.txt’;               define where input statements read from;
File out ‘c:/data/MSDATA/mergeddata.txt’;             define where put statements write to;

 

Goto johnny;      * always avoid. Use do groups instead;

 

IF a=b THEN y=0 ;
ELSE y=x; * be careful when multiple if statements;
CALL subroutine(); (Subroutines are OK, Macros are better)

 

INPUT   X ; (read in one line of X as text data from INFILE)
PUT   x y= / z date.; (Write out results to current LOG or FILE file)

 

MERGE IN.A IN.B ;
BY IDNO;         *   Match up with BY variable IDNO as you simultaneously read in A&B;

Both files must already be sorted by IDNO.

SET A B;                                           * read in order, first all of A, and then all of B;

UPDATE   A B; *replace variables with new values from B only if non missing in B;

 

OUTPUT out.A;      *Write out one obs to out.A SAS dataset;
OUTPUT;                *Writes out one obs of every output file being created;

DELETE;   * do not output this record, and return to the top of the datastep;

STOP;                               * ends the current SAS datastep;

 

18. Assignment commands for DATA Step are

only done once at the start of the data step

 

DATA ONE TWO IN.THREE;

*This would create three data sets, named ONE TWO and IN.THREE

Only the third one will be kept once SAS terminates.;

Array x {10} x01-x10;
ATTRIB x length =16 Abc length=$8;
RETAIN COUNT 0;
BY state county IDNO;
Also consider  
BY DESCENDING IDNO; or BY IDNO UNSORTED; if grouped but not sorted by IDNO;
DROP i;   * do not keep i in final data set, although it can still be used while the data step is running
KEEP IDNO AGE SEX; *this will drop all variables from output file except these three;
FORMAT x date.;   *permanently link the format DATE. To the variable link;

INFORMAT ABC $4.;

LABEL AGE2010 = “Age on December 31 2010”;
LENGTH x 8; *must be assigned the first time you reference the variable;
RENAME AGE = AGE2010; After this point you must use the newname (AGE2010);
OPTIONS NOBS=100; One of many options. Note done only once.

 

19. Key Systems language commands

LIBNAME to define libraries
FILENAME to define specific files, such as for text data to input or output text

TITLE THIS TITLE WILL APPEAR ON ALL OUTPUT IN LISTING until a new title line is given;

%INCLUDE

%LET year=2011;

%LET ABC = “Randy Ellis”;

 

20. Major procs you will want to master

DATA step !!!!! Counts as a procedure;

PROC CONTENTS

PROC PRINT

PROC MEANS

PROC SORT

PROC FREQ                      frequencies

PROC SUMMARY      (Can be done using MEANS, but easier)

PROC CORR (Can be done using Means or Summary)

PROC REG       OLS or GLS

PROC GLM   General Linear Models with automatically created fixed effects

PROC FORMAT /INFORMAT

PROC UNIVARIATE

PROC GENMOD nonlinear models

PROG SURVEYREG clustered errors

None of the above will execute unless a new PROC is started OR you include a RUN; statement.

21. Formats are very powerful. Here is an example from the MarketScan data. One use is to simply recode variables so that richer labels are possible.

 

Another use is to look up or merge on other information in large files.

 

Proc format;
value $region
1=’1-Northeast Region           ‘
2=’2-North Central Region       ‘
3=’3-South Region               ‘
4=’4-West Region               ‘
5=’5-Unknown Region             ‘
;

 

value $sex

1=‘1-Male           ‘
2=‘2-Female         ‘
other=‘ Missing/Unknown’

;

 

*Three different uses of formats;

Data one ;
sex=’1’;
region=1;
Label sex = ‘patient sex =1 if male’;
label region = census region;
run;

Proc print data = one;

Run;

 

data two;
set one;
Format sex $sex.; * permanently assigns sex format to this variable and stores format with the dataset;
Run;

Proc print data = two;
Run;

Proc contents data = two;
Run;

*be careful if the format is very long!;

 

Data three;
Set one;
Charsex=put(sex,$sex.);
Run;

*maps sex into the label, and saves a new variable as the text strings. Be careful can be very long;

Proc print data =three;
Run;

 

Proc print data = one;
Format sex $sex.;
*this is almost always the best way to use formats: Only on your results of procs, not saved as part of the datasets;
Run;

 

If you are trying to learn SAS on your own, then I recommend you buy:

The Little SAS Book: A Primer, Fifth Edition (or an earlier one)

Nov 7, 2012

by Lora Delwiche and Susan Slaughter

Beginners introduction to SAS. Probably the best single book to buy when learning SAS.

Deflategate pressure drop is consistent with a ball air temperature of 72 degrees when tested initially.

Deflategate pressure drop is consistent with a ball air temperature of 72 degrees when tested initially.

I revised my original Deflategate posting after learning that it is absolute air pressure not pressure above standard sea level pressure that follows the Ideal Gas Law.  I also allowed for stretching of the leather once the ball becomes wet. And for the possibility that the cold rain was was colder (45 degrees F) below the recorded air temperature at 53 degrees F.  Together these adjustments make it even easier for the weather to fully explain the drop in ball pressure.

My Bottom Line: The NFL owes the Patriot Nation and Bob Kraft a big apology.

Correction #1: My initial use of the ideal gas formula did not recognize that it is absolute pressure, not pressure above the ambient air pressure that matters. Hence a ball with a pressure of 12.5 PSI is actually 12.5 PSI above the surrounding air pressure, which is about 14 PSI at sea level. So a decline from 12.5 PSI to 10.5 PSI is actually only an 8.2 percent decline in absolute pressure from 26.5 to 24.5 PSI. This makes it much easier for temperature changes to explain the difference in ball pressure. Only an 8.2 percent change in absolute temperature (approximately a 42 degree Fahrenheit drop) would be required it that were the only change needed.

Correction #2: It is well established that water allows leather to stretch. I found one site that noted that water can allow leather to stretch by 2-5% when wet.  It does not specify how much force is needed to achieve this.

https://answers.yahoo.com/question/index;_ylt=A0LEVvwgfs9UP0AAr40nnIlQ?qid=20060908234923AAxt7xP

It is plausible that a new ball made of leather under pressure (scuffed up to let in the moisture quickly)  might stretch 1 percent upon getting wet (such as in the rain). Since volume goes up with the cube of this stretching, this would be a (1.01)^3 -1= 3 percent increase in ball volume or decline in pressure. This amount would reduces the absolute temperature difference needed for the 2 PSI drop to only 5.2 percent (a change of only 27 degrees F.)

Correction #3: It was raining on game day, and the rain was probably much colder than the outside air temperature. So it is plausible that the game ball was as cold as 45 degrees Fahrenheit at game time when the low ball pressures were detected. This makes even lower initial testing temperatures consistent with the professed levels of underinflation.

A single formula can be used to calculate the ball temperature needed when tested initially to explain a ball pressure detected during the game that is 2 PSI lower, after getting colder (to 45 degrees F), .004 smaller (since ball volume shrinks when cold), and stretched 1% due to rain. It would be

Pregame testing temperature in F =(pressure change as a ratio)/(volume change due to cold)/(volume change due to leather stretching 1% when wet)*(45 degree ball temperature during game+460 degrees) – 460 degrees

(12.5+14)/(10.5+14)/(.996)/(1.01^3)(45+460) – 460 = 72 degrees Fahrenheit

Given this math, it would have been surprising if the ball pressure had NOT declined significantly.

Final comment #1: All of these calculations and hypotheses can be tested empirically. See the empirical analysis done by Headsmart Labs (http://www.headsmartlabs.com). They find that a rain plus a 25 degree drop is consistent with a 1.82 PSI decrease.

Final comment #2: Since the original game balls were reinflated by officials during halftime, the true ball pressures during the first half will never be known. Moreover there seems to be no documentary record of their pressures at the time they were re-inflated.

The XLIX Superbowl was a terrific game from the point of view of Patriots fans. Now it is time for the NFL  to own up to its own mistake in accusing the Patriots of cheating.  It was just a matter of physics.

Revised calculations

 

Various combinations of testing temperatures and PSI
A B C D E F G H I J K L M N O
Adjustments for temperature only, correcting for absolute pressure at 14 PSI at sea level Adjustments for changes in ball volume Adjusting for temperature and football volume
Temperature F Degrees above Absolute zero Temperature adjustment Various game time or testing PSI readings surface area sphere radius mean football radius volume Volume adjustment Various game time or testing PSI readings
Game time temperature 45 505 1.000 10.5 11 11.5 189 3.8782 3.81183 232 1.000 10.5 11 11.5
60 520 1.030 11.2 11.7 12.3 189.2427 3.8807 3.81427 232.447 0.998 11.3 11.8 12.3
70 530 1.050 11.7 12.2 12.8 189.4045 3.8824 3.81590 232.7451 0.997 11.8 12.3 12.8
Possibl e testing temperatures 80 540 1.069 12.2 12.7 13.3 189.5663 3.8840 3.81753 233.0434 0.996 12.3 12.9 13.4
90 550 1.089 12.7 13.2 13.8 189.7280 3.8857 3.81916 233.3418 0.994 12.8 13.4 13.9
100 560 1.109 13.2 13.7 14.3 189.8898 3.8873 3.82079 233.6403 0.993 13.4 13.9 14.5
110 570 1.129 13.7 14.2 14.8 190.0516 3.8890 3.82242 233.939 0.992 13.9 14.5 15.0
120 580 1.149 14.1 14.7 15.3 190.2134 3.8906 3.82404 234.2378 0.990 14.4 15.0 15.6
130 590 1.168 14.6 15.2 15.8 190.3752 3.8923 3.82567 234.5367 0.989 14.9 15.5 16.1
140 600 1.188 15.1 15.7 16.3 190.5370 3.8940 3.82730 234.8357 0.988 15.5 16.1 16.7
150 610 1.208 15.6 16.2 16.8 190.6988 3.8956 3.82892 235.1349 0.987 16.0 16.6 17.2
160 620 1.228 16.1 16.7 17.3 190.8606 3.8973 3.83054 235.4342 0.985 16.5 17.1 17.8
Temperature (Fo) at which ball would pass test. 2 PSI diff 1.5 PSI diff 1 PSI diff 88 77 67
Temperature only 86 75 65
Temperature and volume change from temp 88 77 67
temp, volume, and stretching from wetness 72 62 51
Last row calculated as (12.5+14)/(inferred test level+14)/(0.996)/(1.01^3)*(45+460)-460
Notes
Revised calculations allow for sea level temperature to be 14 PSI, so a change from 10.5 to 12.5 PSI (above this level requires only a (12.5+14)/(10.5+14)-1=8.2 percent change in absolute temperature.
See notes at the top, but final calculations also allow for the possiblities that ball temperature was 45 degrees, not 53 due to cold rain, and 1% stretching in leather due to rain.
Fields in first row and first column are input parameters, others are calculated

 

Original post

There is no mention of the temperature at which the footballs need to be stored or tested in the official NFL rule book. (Sloppy rules!)

The process of scuffing up the new balls to make them feel better no doubt warms them up. It would be impossible for it to be otherwise. An empirical question is how much did it warm them up and what temperature were they when tested?

Surface temps could have been below their internal temperature of the air, which is what matters for the pressure. Leather is a pretty good insulator (hence its use in many coats).

Anyone who took high school physics may remember that pressure and temperature satisfy

PV=nRT

Pressure*Volume=Number of moles*ideal gas constant*Temperature  (Ideal Gas Law)

Temperature needs to be measured in degrees above absolute zero, which is -459.67 Fahrenheit (sorry metric readers!). The temperature at game time was 53 degrees. So the right question to ask is:At what temperature,  T1, would the air in the ball have to be at the time the balls were tested such that once they cooled down to T0=53 degrees they measures two pounds per square inch (PSI) below the allowed minimum?

The lowest allowed temperature for testing was 12.5 PSI. We are told only vaguely that the balls were 2 PSI lower than this, but this is not a precise number. It could be it was rounded from 1.501 PSI. that would mean they  might have been 11 pounds PSI when tested during the game.  I examine 10.5, 11 and 11.5 as possible game time test PSI levels.The following tables shows possible combinations of game time testing temperature and half-time testing temperatures that would be consistent with various pressures.The right hand side of the table makes an adjustment for the fact that the leather/rubber in the ball would also have shrunk as the ball cooled down, which works against the temperature.Using the formulaPSI1=PSI0*((T1+459.67)/(T0+459.67). (See correction above!) Ignoring the volume change of the ball, it is straightforward to solve for what initial temperature the balls would have had to be for the observed game time temperatures.

Adjusting for a plausible guess at the small amount that the leather plus rubber bladder would have also changed makes only a small difference.

For a 1.5 PSI difference from testing to halftime , the air inside of them would have had to be at about 128 degrees at the time they were tested. (The leather skin could have been a lower temperature.) This would have made them feel warm but not burning hot to the hand.

Allowing the balls to be warm when tested is sneaky or perhaps accidental, but not cheating.

Go Pats!

Various combinations of testing temperatures and PSI
A B C D E F G H I J K L M N O
Adjustments for temperature only Adjustments for changes in ball volume Adjusting for temperature and football volume
Temperature F Degrees above Absolute zero Temperature adjustment Various game time or testing PSI readings surface area sphere radius mean football radius volume Volume adjustment Various game time or testing PSI readings
Game time temperature 53 512.67 1.000 10.5 11 11.5 189 3.8782 3.81183 232 1.000 10.5 11 11.5
Possibl e testing temperatures 80 539.67 1.053 11.1 11.6 12.1 189.4368 3.8827 3.81623 232.8048 1.003 11.0 11.5 12.1
90 549.67 1.072 11.3 11.8 12.3 189.5986 3.8844 3.81786 233.1031 1.005 11.2 11.7 12.3
100 559.67 1.092 11.5 12.0 12.6 189.7604 3.8860 3.81949 233.4015 1.006 11.4 11.9 12.5
110 569.67 1.111 11.7 12.2 12.8 189.9222 3.8877 3.82112 233.7001 1.007 11.6 12.1 12.7
120 579.67 1.131 11.9 12.4 13.0 190.0840 3.8893 3.82274 233.9988 1.009 11.8 12.3 12.9
130 589.67 1.150 12.1 12.7 13.2 190.2458 3.8910 3.82437 234.2976 1.010 12.0 12.5 13.1
140 599.67 1.170 12.3 12.9 13.5 190.4076 3.8926 3.82600 234.5965 1.011 12.1 12.7 13.3
150 609.67 1.189 12.5 13.1 13.7 190.5693 3.8943 3.82762 234.8956 1.012 12.3 12.9 13.5
160 619.67 1.209 12.7 13.3 13.9 190.7311 3.8959 3.82924 235.1948 1.014 12.5 13.1 13.7
Temperature (Fo) at which ball would pass test. 151 123 98 159 128 101
Notes
Fields in yellow are input parameters, others are calculated
Column C is temperature minus absolute zero
Column D is the ratio of column C to the game time temp in absolute degrees and shows how much higher PSI would have been than at game time.
Columns E through G show possible testing PSI for three possible game time PSI levels.
Columns H through L show adjustments to volume which tend to reduce the PSI as a ball is heated. Calculations use rate of expansion of hard rubber per square inch per degree.
Columns M through O show Balll PSI after adjusting for both air temperature and football volume
Parameters and formulas
absolute zero= -459.67 fahrenheit
hard rubber expansion 42.8 (10-6 in/(in oF))*) http://www.engineeringtoolbox.com/linear-expansion-coefficients-d_95.html
or 0.0000428 Used for column I expansion of surface area
Surface area assume to grow with the square of this proportion with temperature.
The approximate volume and surface area of a standard football are 232 cubic inches and 189 square inches, respectively.
http://www.answers.com/Q/Volume_and_surface_area_of_a_football
Surface of a sphere formula
4pr2 Used to calculate radius of sphere
volume of sphere formula
4/3*pi*radius3 Used to calculate volume of football. Volume adjusted downward by a fixed proportion because footballs are not spheres.

 

NFL rules

Rule 2 The BallSection 1BALL DIMENSIONSThe Ball must be a “Wilson,” hand selected, bearingthe signature of the Commissioner of the League, Roger Goodell.The ball shall be made up of an inflated (12 1/2 to 13 1/2 pounds) urethane bladder enclosed in a pebble grained, leather case(natural tan color) without corrugations of any kind. It shall have the form of a prolate spheroid and the size and weightshall be: long axis, 11 to 11 1/4 inches; long circumference, 28 to 28 1/2 inches; short circumference, 21 to 21 1/4 inches;weight, 14 to 15 ounces.The Referee shall be the sole judge as to whether all balls offered for play comply with these specifications. A pump is to befurnished by the home club, and the balls shall remain under the supervision of the Referee until they are delivered to theball attendant just prior to the start of the game.

From the Free Dictionaryideal gas lawn.A physical law describing the relationship of the measurable properties of an ideal gas, where P (pressure) × V (volume) = n (number of moles) × R (the gas constant) × T (temperature in Kelvin). It is derived from a combination of the gas laws of Boyle, Charles, and Avogadro. Also called universal gas law.