rubinghsoftware.de/projects/own_swtools/basicchecker.html

http://rubinghsoftware.de/projects/own_swtools/basicchecker.html

`Project`
	BASIC variable declaration checker
`For`
	Own project
`Datum`
	1993-1994
`Platforms`
	C/C++, Windows/Linux PC

<<back

Page contents

Abstract

The problem

How the tool works

Skeleton routine

Globals extraction routine

Undeclared variables detection routine

Abstract

In this page I describe a small tool that I created some time ago, in my own time while working in a job doing development and maintenance on a large BASIC program. As a C programmer, I had previously gotten used to the compiler checking for typing errors in variable names, via the mechanism that variables have to be declared before use. This mechanism is lacking in BASIC, so I filled this functionality gap in the BASIC compiler by quickly creating this tool, in order to allow me to develop more efficiently.

The problem

A problem with BASIC, and also with most scripting languages, is that variables do not have to be declared before use. The absence of the requirement for declaration may sound user-friendly, however in my view it makes the language unsuitable for serious software projects, because of the following problem:
In a language like BASIC, when a typing error is made in the name of a variable, then the compiler silently creates a new variable at that point. Thus, typing errors in variable names are not caught immediately by the compiler (as they are in languages that do require variable declarations like C/C++, Java, FORTRAN, Pascal), but lead to a successfully compiled executable with a bug in it (which will then have to be detected the slow way via testing and debugging).

Consider the following BASIC code:

 DIM varOne AS INTEGER
 DIM varTwo AS INTEGER
 varOne = 1
 varTwo = 2
 mySwap( varOne, varTwo )

 END

 SUB mySwap( firstParam AS INTEGER, secondParam AS INTEGER )
	DIM tmp as INTEGER
	tmp = firstParam
	fisrtParam = secondParam
	secondParam = tmp
 END SUB

The name of the parameter firstParam is misspelled in the body of the sub mySwap(), so it will not do what it is intended to do (which is to swap the values of the two variables varOne and varTwo on the caller's side). Instead of being copied into firstParam, the original value of the parameter secondParam is is written to a newly created variable with the name fisrtParam, which is then not used any further. Thus, in the sub call, the original value of the caller's variable varTwo is destroyed and lost.

Such errors are easily detected by a set of small scripts or programs as described below. These are intended for use in projects in which the coding rule is followed that all variables should be declared before use, and the tool finds all occurences of variables that are used without a prior declaration. In the description below, I arbitrarily assume that the BASIC dialect used is similar to QBASIC.

How the tool works

Skeleton routine

The basis of the tool is a routine that processes one source file as follows: after comment removal, it makes one pass through the file, to detect where the definitions of subs, functions, and TYPE definitions begin and end. This is easy to detect from the keywords SUB ... END SUB, FUNCTION ... END FUNCTION, TYPE ... END TYPE with only rudimentary lexical processing. This routine is the skeleton (template) for the two routines or scripts that the actual tool consists of.

Globals extraction routine

The first routine actually used in the tool extracts from a given BASIC source file the names of all global objects defined in it, i.e. global variables, subs/functions, and datatypes defined in TYPE definitions. For this, we only need to extend the skeleton routine so that, outside of SUB/FUNCTION/TYPE definition blocks, it additionally detects all statements beginning with the keyword DIM, which are the definitions of global variables.
This globals-extracting routine is then further extended with one more element of functionalits, namely on encountering an $INCLUDE line, it should open the included file and processes that file in the same way (for which it will recursively call itself).
The output of this final extended routine is a list of the names of all the global objects defined in the BASIC source file, plus in all source files $INCLUDEd by it.

Undeclared variables detection routine

The second routine actually used in the tool then does the final work of processing one given BASIC source file individually, and finding all uses of undeclared variables, as follows.
The first step is to run the first routine on the source file, to create a list of the names of the globals objects defined in the source file and in the files $INCLUDEd by it. We will call this the globals list.
After this, the source file is split up into chunks of text in such a way that each chunk contains the definition of one of the subs and functions in the file, plus there is a chunk that contains all the code outside of SUB/FUNCTION/TYPE definition blocks, which is then treated the same as the sub/function definition blocks except that there are no sub/function parameters.
Each of the resulting chunks of text is then processed individually, as follows. Perform a simple lexical analysis and parsing to do the following:

Detect the names of the formal sub/function parameters;
Split up the body of the sub/function into its successive statements;
Detect in each statement: the variable names used in it (the (identifiers that are not language keywords), plus also the DIM keyword.

The formal parameters of the sub/function are put in a list of words (strings), which we will call the local list; this local list is only used during the processing of this one sub/function, and is separate from the globals list).
Then one pass is made hrough the parsed sub/function body from begin to end. On encountering a statement beginning with DIM, add to the local list all variable names that are being declared. For any other kind of statement, look up all variables used in it in the local list and also in the globals list; if found in neither of these two lists, then it is a variable that is being used without a prior declaration. As its output, the routine prints out all these undeclared variable uses encountered in the source file.