Press "Enter" to skip to content

Fixing truncated strings in SAS

SAS, as powerful as it is, behaves oddly at times. The behavior is usually very well-documented, so it is just something that one has to account for. When SAS is creating a string variable, the maximum length it chooses is the length of the first value it encounters: any string longer than the first value is truncated (the end letters are just ignored). For example, if 'CAfter' and 'EBefore' are values, and 'CAfter' is encountered first, then the maximum value for all strings will be 6 and 'EBefore' will be shortened to 'EBefor'. If 'EBefore' were encountered first, there wouldn't be a problem. This can sometimes prove tricky if the data is read in unsorted once and then sorted later.

To correct this, one should manually specify the length the variable should be when it is created, like so:


data ling.vowel;
 set ling.vowel;
 length Group $ 7;
 Group = Class || When;
 obs = _N_;
run;

In the above, the length of the variable Group is fixed at 7. (Group is created by concatenating the variables Class and When).

Sometimes, the truncation is caused instead when the data are imported. By default, SAS only looks at the first 20 observations in a datasetto determine the maximum length of string variables. This can be changed by adding a guessingrows=32767 statement to PROC import (32767 is the maximum value):


proc import datafile="&mypath/vot-data-0921.csv" out=ling.vot dbms=csv replace;
 getnames=yes;
 guessingrows=32767;
run;

These are the two solutions I've found for truncated strings, and I was tired of having to find the SAS files that I had used them in.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *