DO loops are a fundamental feature in SAS, used to repeat a block of statements multiple times. This article opens with a brief overview of DO loops in Base SAS, covering iterative, list-based, and conditional loops, plus nested and array-driven patterns. We then move on to PROC DS2, SAS’s newer procedure that adds object-oriented syntax and multithreading. You’ll see how to convert familiar DO loops into DS2 code, declare variables and methods, and run loops in parallel for faster execution. By the end, you’ll have a concise reference from basic DATA-step loops to advanced DS2 constructs. Let’s begin.
DO loops in SAS repeat a block of statements as many times as needed. For example:
data example;
do i = 1 to 5;
output;
end;
run;
In base SAS, there are four main loop forms:
Iterative
do i = start to stop by step;
…
end;
The BY clause can also be used here to control the increment size. If omitted, SAS assumes a default step of 1. A negative BY value can be used to count down.
For example:
do i = 1 to 10 by 2; /* increments by 2 */ …
end;
do i = 10 to 1 by -1; /* counts down */ …
end;
List-Based
do i = value1, value2, expression;
…
end;
Conditional WHILE
do while(condition);
…
end;
Conditional UNTIL
do until(condition);
…
end;
You can nest loops to handle multiple dimensions:
data report;
do year = 2020 to 2022;
do month = 1 to 12;
/* process data for each month */
output;
end;
end;
run;
To process groups of variables, combine ARRAY with DO OVER or the OF operator:
data scores;
set exams;
array marks[3] score1-score3;
total = 0;
do over marks;
total + marks;
end;
run;
All these loops run sequentially on one CPU core.
In healthcare, datasets diagnosis codes often span multiple columns (e.g. diag1 to diag12). A common task is to identify records where any of these codes match a specific condition, such as identifying patients with atopic dermatitis (using ICD-10 codes like L20, L2081, etc).
Here’s how a DO loop can streamline this process:
data claims_AD;
set claims_diag;
/* Create an array of all diagnosis fields */
array diags{12} diag1-diag12;
/* Loop through each diagnosis field */
do i = 1 to 12;
if diags{i} in ('L209', 'L20', 'L2081', 'L2083', 'L2082', 'L208', 'L2089', 'L2084') then do;
output; /* Output record if any diagnosis matches */
leave; /* Exit loop after first match to avoid duplicates */
end;
end;
drop i; /* Drop loop index to keep dataset clean */
run;
Next, we’ll see how PROC DS2 can spread similar logic across multiple threads to cut real-time execution.
PROC DS2 is a new SAS® proprietary programming language (available since SAS 9.4) and adds object-oriented programming and multithreading to your workflows. In this article, we’ll cover OOP basics, DS2 DO loops, multithreading techniques, and real-world examples.
Multithreading splits an input dataset into subsets, processes them in parallel across multiple cores, and then combines the results. In PROC DS2, you declare threads and add threads= to your SET statement to reduce execution time.
OOP groups data and related procedures into ‘objects’ that inherit methods and properties from a parent class. This approach maps code to real-world data concepts and simplifies updates, since you modify individual objects rather than rewriting code throughout a programme.
Our focus here is on DS2’s OOP features, such as the DCL statement for variable declaration and the METHOD statement for defining processing logic.
DS2 Code Block Structure proc ds2;
data dsetout2/overwrite=yes;
dcl char(2) anl01fl;
method run ();
dcl int i;
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
end;
enddata;
run;
quit;
Declare Statement
For DS2, all variables must be declared. Throughout this blog in various illustrations of SAS code, we have explicitly declared our variables by using the DCL statement or the equivalent DECLARE statement. DCL associates a data type with each variable. For example here we associate our Analysis Flag 1 variable (ANL01FL) as a character variable of length 2.
dcl char(2) an101fl;
The next thing to bear in mind is that where the DCL statement is positioned within the code also determines the scope of the variable. If used outside a METHOD statement, a global variable is created. Variables that we want to keep in the output must have this so called global scope. If DCL is used within a METHOD, i.e. after METHOD RUN(); but before the corresponding END; statement, a local variable is created. A local variable can be used to manipulate the data but ultimately will not be kept in the final dataset. Within a METHOD, DCL statements must precede METHOD statements or an error will be encountered.
In the code below the first DCL statement is of global scope as it is outside of the METHOD. The second DCL statement is within a METHOD so is local to that METHOD:
proc ds2;
data dsetout2/overwrite=yes;
dcl char(2) anl01fl;
method run();
dcl int i;
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
end;
enddata;
run;
quit;
Data Types
There is an expanded range of data types available in DS2 for us to specify within a DCL statement and some of the more useful ones are displayed below. It should be noted that when outputting to a .sas7bdat dataset all variables must be converted to the traditional character and numeric data types. In the table below CHAR(n) and INTEGER data types are detailed:
Data Type |
Description |
CHAR(n) |
Stores a fixed-length character string, where n is the maximum number of characters to store. The maximum number of characters is required to store each value regardless of the actual size of the value. If char (10) is specified and the character string is only five characters long, the value is right padded with spaces. |
INTEGER or INT |
Stores a regular size signed, exact whole number, with a precision of ten digits. The range of integers is -2,147,483,648 to 2,147,483,647. Integer data types do not store decimal values; fractional portions are discarded. |
[2]
Method Statement
In the second DS2 statement we will make use of is the METHOD statement. All data-processing code in DS2 (initialisation, derivation, outputting, etc.) must reside within a METHOD statement; there are three system-defined methods: RUN, INIT and TERM. It is also possible to create user-defined methods. We will be making use solely of the METHOD RUN. In Base SAS, the entire DATA Step program is included in the implicit loop. In DS2, the implicit loop is represented by the METHOD RUN, with the METHODs INIT and TERM (outside the scope of this blog) providing initialization and finalization code, respectively.[1]
When a system-defined METHOD is explicitly stated it must be defined without any parameters and without a return value, as we have done below with METHOD RUN(), if we add parameters and/or a returning value this will result in a compile error. Each METHOD statement must have a corresponding END as shown here:
proc ds2;
data dsetout2/overwrite=yes;
dcl char(2) anl01fl;
method run();
dcl int i;
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
end;
enddata;
run;
quit;
For use throughout our examples, we first create a dummy dataset of 10000 subjects each with 201 visits. Though not a common example of a dataset that might be encountered it is large and simple enough for us to illustrate the differences between a basic data step and a similar process in DS2.
The dummy dataset SUBJVISITDATA to be used throughout this blog is created with the following code and though this contains a DO loop we will not be converting this to DS2:
%let subjects=10000;
%let visits=1000;
data subjvisitdata;
do usubjid=1 to &subjects;
do avisitn=0 to &visits by 5;
output;
end;
end;
run;
This creates the below dataset, SUBJVISITDATA:
DO Loop in Regular Data Step
The purpose of our simple loop will be to flag each visit that is a multiple of 10 and also visit 0. To start with we will do this without using DS2 and instead use a traditional data step to create DSETOUT1:
data dsetout1 (drop=1);
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
run;
NOTE: There 2010000 observations read from the data set WORK.SUBJVISITDATA
NOTE: The data set WORK.DSETOUT1 has 2010000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.68 seconds
cpu time 1.69 seconds
This produces the following dataset, DSETOUT1:
The real time can be reduced and the CPU time increased by converting to DS2 and then implementing multithreading. It is important to note that the real and CPU times will be the measure used to compare the efficiency. Shorter real time indicates the most important efficiency gain: less waiting time. Higher CPU time is only of secondary importance as a measure for efficiency: it might indicate that a higher percentage of (available) computing capacity has been used, i.e. more efficient use of resources; however, an increase in CPU time with unchanged real time actually would be a loss in efficiency. Also using more capacity of a shared CPU might actually lead to loss of efficiency for co-users of the CPU.
DO Loop in DS2
First we create a new dataset, DSETOUT2. We will not multithread in this example and instead demonstrate a straight conversion of the data step into DS2 to highlight aforementioned features of the procedure.
Here we create our required dataset. The DCL statements declare the new variables, their types and length. We declare the character variable ANL01FL globally outside the METHOD as this is a variable we want to keep. We can declare the i variable as INTEGER, within the METHOD locally as we do not want this variable in our output dataset:
*******************DS2 without multithreading;
proc ds2;
data dsetout2/overwrite=yes;
dcl char(2) anl01fl;
method run();
dcl int i;
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
end;
enddata;
run;
quit;
NOTE: PROCEDURE DS2 used (Total process time):
real time 2.84 seconds
cpu time 2.23 seconds
Comparing these numbers we can see that a straight conversion into DS2 increases the real time and CPU time spent crafting exactly the same result. This is because calling PROC DS2 in SAS generates additional work for the CPU when compared to DATA step, even when performing the same core function, and so an increase in both CPU and real time is expected. We can decrease the real time taken by making use of other techniques such as multithreading.
Usage of multithreading can reduce the real time spent on the same operation at the expense of increased CPU time and hence increased utilisation of your system. Multithreading sections the data into the number of threads, executes the split data simultaneously across multiple processors or a distributed system and then sets the processed data back together.
DO Loop with 4 Threads
First, we need to turn the data step into a thread program. To do this we take the previous conversion of the data step into DS2 and then simply change DATA to THREAD and ENDDATA to ENDTHREAD we then apply the thread to our original dataset and specify the number of threads to be used.
The code contained within the THREAD and ENDTHREAD statements is the same as that contained within the previous PROC DS2. We have changed DSETOUT2 to THREAD1, this will still create a dataset called THREAD1 but this contains information about the thread for processing.
**********Do Loop with Multithreading with 4 threads;
proc ds2;
thread thread1/overwrite=yes;
dcl char(2) anl01fl;
method run();
dcl int i;
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
end;
endthread;
Within the same PROC DS2 we then want to create the dataset dsetout3 using the previously defined thread. We must declare the THREAD, THREAD1 and the object, SUBJVISITDATA.
data dsetout3/overwrite=yes;
dcl thread thread1 subjvisitdata;
method run();
set from subjvisitdata threads=4;
end;
run;
quit;
NOTE: PROCEDURE DS2 used (Total process time):
real time 0.65 seconds
cpu time 2.57 seconds
We can see from the reduced real time and increased CPU time that we successfully multithreaded our DO loop.
DO Loop with 18 ThreadsWe can now try increasing the number of threads from 4 to 18 to add another point of comparison. All we need to change is the number after THREADS= as below:
******************Do Loop with Multithreading with 18 threads;
proc ds2;
thread thread1/overwrite=yes;
dcl char(2) anl01fl;
method run();
dcl int i;
set subjvisitdata;
do i=0 to &visits by 10;
if avisitn=i then anl01fl='Y';
end;
end;
endthread;
data dsetout3/overwrite=yes;
dcl thread thread1 subjvisitdata;
method run();
set from subjvisitdata threads=18;
end;
run;
quit;
NOTE: PROCEDURE DS2 used (Total process time):
real time 0.62 seconds
cpu time 3.12 seconds
We can see that using 18 threads and DS2 to complete the operation reduces the real time taken to run the procedure. The increased CPU time can be accounted for as the time taken to split the data, process in respective threads (roughly the CPU time taken with basic data step divided by the number of threads), and set back together.
However, if we try to increase the number of threads to 100 the reduction below in real time is significantly less than the jump from 4 to 18 threads but the CPU time is again increased. This can be illustrated with the following approximations of real and CPU time:
Approximation of Real Time when using DS2 |
Approximation of CPU Time when using DS2 |
Real time | CPU time | |
Data step | 1.68 | 1.69 |
Data step conversion to DS2 | 2.84 | 2.23 |
DS2 + Multithreading 4 threads | 0.65 | 2.57 |
DS2 + Multithreading 19 threads | 0.62 | 3.12 |
Above, we can see the differences in real time and CPU time between the different techniques we employed, showing an increased efficiency of using DS2 over a standard data step.
Although DS2 can be a useful tool in some circumstances, and using it with multithreading can save time in performing some tasks, it does have its limitations and drawbacks. Whilst experimenting within the procedure we encountered plenty of scenarios where using other SAS functions or procedures were more efficient than PROC DS2. For example:
Conditional Do Loops
Iterative Do Loops are where multithreading can increase CPU time as the iterations can be split between threads and thus decrease real time. Conditional Do Loops cannot be sped up with multithreading however DO loops which are both iterative and conditional can be.
System Limitations
If the system you are running SAS in has only 1 core in the CPU then multithreading will not be of any use to you. A SAS program will still run however the real time will not be decreased compared to a data step.
Recreating Other Procedures in DS2
We attempted to recreate PROC FREQ in DS2 making use of multithreading, starting with a basic PROC FREQ, then transforming this into DS2. Though we managed to recreate the results of the PROC FREQ in DS2, comparing the results showed that both the real time and CPU time were higher in the DS2 version. This had shown us that although we had multithreaded the process we had failed to decrease real time compared to the PROC FREQ we were trying to replicate. This seems to confirm the robustness and efficiency of SAS Procedures.
By integrating core base SOS DO loops with advanced PROC DS2 features such as real-data processing, list-based and conditional loops, reverse iteration, hybrid constructs, array iteration, and multithreading, you now have a comprehensive reference for repetitive data tasks. PROC DS2 preserves familiar DATA-step syntax while adding object-oriented methods and parallel execution to improve performance on large datasets. Apply these examples to streamline your workflows and achieve faster, more maintainable code.
Quanticate's statistical programming team can help you implement efficient DO loops and PROC DS2 multithreading solutions, as well as create TLFs and perform CDISC mappings or SDTM conversions. If you’d like to discuss how we can support your project, submit an RFI and a member of our Business Development team will be in touch.
[1] SAS 9.4 DS2 Programmers Guide https://documentation.sas.com/api/docsets/ds2pg/9.4/content/ds2pg.pdf?locale=en
[2] Expansion of Opportunities in Programming: DS2 Features and Examples of Usage Object Oriented Programming in SAS. PharmaSUG 2017, BB09. Serhii Voievutkyi e.a. https://www.pharmasug.org/proceedings/2017/BB/PharmaSUG-2017-BB09.pdf
DS2 is a SAS proprietary programming language that is appropriate for advanced data manipulation. DS2 is included with SAS 9.4 and intersects with the SAS DATA step. Its advantages over data step programming include, ANSI SQL types, programming structure elements, the capability to write user-defined methods and packages and multithreading. [1]Bring your drugs to market with fast and reliable access to experts from one of the world’s largest global biometric Clinical Research Organizations.
© 2025 Quanticate