Double click to edit!

Function declaration syntax

Idea: Allow two syntaxes for declaring functions. A highly verbose, but literate syntax is used for public functions. For private functions, such as class-private functions or inner functions/closures, use a simpler syntax similar to most existing languages.

function quickSort:
    parameters:
        array is array(integer).

    returns:
        array(integer).

    -- Sort the array.

    function findPivot is:
        -- Helper function.

        return {magic}.

Analysis without declared type information may be more difficult. Will have to look into that.

Ease of implementation

Don't limit permitted constructs based on ease of implementation. By ease of implementation I mean not only the practical ease—the difficulty of designing and actually writing the source code—but also the theoretical ease of implementation.

Features of existing languages, like access levels and type checking, are like hard security. They allow programmers to restrict themselves by having the compiler limit what actions are permissible. Usually permissible actions are limited by what the compiler is able to verify. If the compiler can't verify that you're using a variable in a meaningful manner, it throws a fit and proffers an error message like a token of apology. Sorry, Dave, but I don't understand what you're trying to do. So don't do it.

As an example of what I mean, let me examine how I implemented part of this web site. In C++ or Java you cannot split a class definition across multiple files—not easily, at least. Sometimes I'll have a simply massive class that has all kinds of disparate functionality all mixed together. The Compiler class in my Cyan compiler is a good example of this. The idea behind it is that you can feed it a bunch of source code files, hit the compile button (er, method), and it spits out a Program object. In essence it's a factory for Program objects.

The problem is that the Compiler class is immense; it contains a lexer, parser, type checker, code generator, etc., etc., etc. The implementation of this class is in fact the bulk of the entire program source. In any non-trivial compiler the implementation of the compiling part is split among hundreds or thousands of files.

I really like the idea of putting all of this flotstam under the umbrella of the Compiler class. It makes the interface to the compiler so crystal clear, and limits access to any of the nasty moldy, cob-webbed internals. But seeing as how there are so many loosely coupled sub-modules, the class really deserves to be split into many small pieces.

As it stands I'm still fighting with C++ to declare all of these parts separately, yet keep everything inside of the Compiler class. C++ isn't very nice about it. I've managed an ugly hack where inside the class declaration I include all of the headers for the compiler's pieces—something like

class Compiler
{
private:
    #include "compiler-lexer.h"
    #include "compiler-parser.h"
    #include "compiler-type_checker.h"
};

Shudder. Not so pretty, but problem solved. Right?

Well, kinda. I mean, everything is split up really cleanly, yeah, but the compile times attest to the fact that in C++'s mind, I haven't really done anything. Every time I work on the lexer, the parser and type checker recompile as well. It's the same deal as having one gigantic .h file for the whole project, which is unacceptable.

Local variable declarations

Variable declarations are useful, but in some contexts they are redundant. Declaring class instance variables is necessary to document the interface of the class; declaring local variables is redundant in that it provides no new information that cannot be inferred by examining the body of the function in question.

Reasons for local variable declarations:

Reasons against local variable declarations:

When to check constraints

Let's examine the duplication in efforts between compile-time and run-time. In general language designers work really hard to design their languages so that certain constructs are checked at compile-time, and others at run-time. Having things checked at compile-time is more efficient time-wise, but limits expressiveness and requires more work designing the compiler.

Think of the big debate regarding static vs. dynamic type checking.

This also creates artificial delineations between features that are and features that aren't handled at compile-time. Some things can't in general be handled at compile-time, so the compiler doesn't check them at all, even though it could handle some (most?) cases.

So, what if there's a way to implement the compiler where you only have to specify the operations and constraints once, and the compiler automatically handles determining whether or not it can check a certain construct at compile-time, and if it can't, inserts a run-time check? How possible is this, and does it even make sense?

For example, the compiler could check constraints. So let's say you have a function:

function squareRoot:
    parameters:
        n is integer.

    requires:
        n >= 0.

    ensures:
        return >= 0.

    // Calculate the square root.
    ...

    return n.

And a series of calls:

squareRoot(-2).
squareRoot(x).
squareRoot(abs(x)).
squareRoot(-1 - abs(x)).

It's easy to see that the first call is illegal. Assuming nothing about x's value we know nothing about the second call, but no matter what its value is we can prove that the third call is always valid and the fourth is never valid.

The compiler presumably knows how to evaluate expressions—I mean, in some sense it knows since it compiles them. So if there's a way to express the semantics of expressions and statements so that the compiler "knows" how to evaluate them as well as compile them, it should be able to determine the validity of the above function calls. The only difference is that at compile-time it won't have all the information available about every variable, so it will have to be able to use constraints and assertions to handle proofs about general properties of a program. If it can handle expressions involving free variables...

So the compiler, when evaluating these calls to squareRoot, needs to check if the parameters satisfy the constraints for squareRoot. It can easily check -2 >= 0, which is true. If it knows nothing about x, the check x >= 0 is unknown and so is deferred until run-time. It knows that abs always returns a non-negative value, so abs(x) >= 0 is always true. And if it's really smart enough it can determine that -1 - abs(x) >= 0 is always false.