# Writing a custom language?

I'd love to read the Dragon book, but I don't really have the money for it, yet. I spent all that I had on 3 domain names and a hosting payment. :P

snk_kid: I'm reading some tutorials on SML. It looks pretty good. Can you recommend some good reading materials for it?

Also, what's the best lex and yacc SML port? I've seen mlLex and mlYacc, but are they the only ones?

cypher543 said:

snk_kid: I'm reading some tutorials on SML. It looks pretty good. Can you recommend some good reading materials for it?

SML is a good choice for someone like you, one thing to note is SML and O'Caml are related they are both descendents of the ML family. SML is mainly a standardization of ML (hence the S in SML for standard) while O'Caml derives from ML with some (very) subtle changes and has other features/language extensions including support for OO. So starting SML should make things simplified but it's still an expressive and powerful language.

If you can get your hands on it ML for the Working Programmer, 2nd Edition is good book for someone like yourself.

Some resources on functional programming in general (not necessarily in SML):

cypher543 said:

Also, what's the best lex and yacc SML port? I've seen mlLex and mlYacc, but are they the only ones?

I'm not so sure, try checking the ML-faqs.

Well, I successfully ported my Lex file to the ml-lex format. But I really have no idea what to do now. :P I know what I need to do with the C versions of lex and yacc, but the SML ports don't seem to be made the same way. With my C tests, a simple -ll parameter spit out an executable that let me test my lex file. But it looks like I have to write my own for SML. Since I'm new to it, I dunno how to do that, exactly. :(

Did you read the documentation that comes with both ml-lex/yacc?

This might be useful: User’s Guide to ML-Lex and ML-Yacc

thanks! That guide is really helping! :D

You should probably check out FreeBasic.

Suposedly fast, and with inline asm.

geon: I have, and thanks. But I'd like the experience of writing my own.

Everyone: I started working on it last night, and at the moment, I have a simple app that takes source code and breaks it down into tokens. Yeah... that's all. But it's a start.

I've trying to get a name for it, too. I liked "lambda" (the mathmatical symbol and logo for Half-Life), but it's already a programming language. :( So, I dunno. Any ideas?

How about the "cypher" language? ;)

BTW, are you doing a design before diving into this? Perhaps defining the syntax in EBNF would be helpful.

Edit: Fixed the Extended Backus-Naur Form (EBNF) acronym.
Quote

Quote

(the mathmatical symbol and logo for Half-Life)
;) Also, it's not "lambada". It's "lambda". Besides, I can't call it that anyway, since it's already a programming language. Oh well.

Quote

Erm. I don't think so. :p

Quote

BTW, are you doing a design before diving into this? Perhaps defining the syntax in EBF would be helpful.
Yes, I've designed the syntax. I really have no clue was EBF is. There's nothing on google or wikipedia. *shrugs* The syntax will be closely related to C, in that it uses brackets to enclose code after keywords such as "if, else, elif, while", etc. The keywords, however, will relate more to the style of BASIC. For example, a simple loop would look like:

i = 2
while not i = 32 {
i = i * 2
output "Value of i is " + i
}

That would multiply i by 2 until it's value becomes 32. Happy?

EDIT: I also found out what else the lambda symbol stands for... which is one reason why I'm not going to use it.

cypher543 said:

Yes, I've designed the syntax. I really have no clue was EBF is. There's nothing on google or wikipedia.

He probably meant EBNF which is an extension of BNF, you should know what it is, they're used to describe context-free grammars.

Sorry, I mistyped the acronym. It stands for (Extended) Backus-Naur Form. That's the name of the guys credited with coming up with it. It's a great way for describing the syntax of a language. You can find a good introduction here.

If you've looked at boost::spirit then I'm sure you've seen this. That library attempts to allow BNF to be used in C++ to create a parser.

For example, the following is the complete EBNF description of the Lua langauge syntax:

Quote

chunk ::= {stat [;´]} [laststat [;´]]

block ::= chunk

stat ::= varlist1 =´ explist1 |
functioncall |
do block end |
while exp do block end |
repeat block until exp |
if exp then block {elseif exp then block} [else block] end |
for Name =´ exp ,´ exp [,´ exp] do block end |
for namelist in explist1 do block end |
function funcname funcbody |
local function Name funcbody |
local namelist [=´ explist1]

laststat ::= return [explist1] | break

funcname ::= Name {.´ Name} [:´ Name]

varlist1 ::= var {,´ var}

var ::= Name | prefixexp [´ exp ]´ | prefixexp .´ Name

namelist ::= Name {,´ Name}

explist1 ::= {exp ,´} exp

exp ::= nil | false | true | Number | String | ...´ | function |
prefixexp | tableconstructor | exp binop exp | unop exp

prefixexp ::= var | functioncall | (´ exp )´

functioncall ::= prefixexp args | prefixexp :´ Name args

args ::= (´ [explist1] )´ | tableconstructor | String

function ::= function funcbody

funcbody ::= (´ [parlist1] )´ block end

parlist1 ::= namelist [,´ ...´] | ...´

tableconstructor ::= {´ [fieldlist] }´

fieldlist ::= field {fieldsep field} [fieldsep]

field ::= [´ exp ]´ =´ exp | Name =´ exp | exp

fieldsep ::= ,´ | ;´

binop ::= +´ | -´ | *´ | /´ | ^´ | %´ | ..´ |
<´ | <=´ | >´ | >=´ | ==´ | ~=´ |
and | or

unop ::= -´ | not | `#´

With just that information, anyone should be able to go off and write a parser for Lua. If you had a BNF description of your language syntax, then it would be easier to consicely relate it to others.
monjardin said:

If you had a BNF description of your language syntax, then it would be easier to consicely relate it to others.

The fact that he/she is using a yacc like parser generator should indiciate that he using a BNF like DSL already. If he/she doesn't realize this then i would be slightly concerned ;).

Writing parsers/compliers is much more fun when you haven't had any actual training in it (like reading the dragon book). You come up with all kinds of odd solutions (not nessecarly better) like data driven rather than state driven compliers.

I'd highly suggest if you want to do this that you write a complier that generates C code and not direct assembly. It'll elimitate all of the various hardware related issues and be easier to get running at first. You'll also have a lot of handy debug output.

Vandervecken said:

I'd highly suggest if you want to do this that you write a complier that generates C code and not direct assembly. It'll elimitate all of the various hardware related issues and be easier to get running at first. You'll also have a lot of handy debug output.

I'd highly suggest only writing a compiler front-end, targeting some compiler backend like GCC. Doing this will make things much more easier, you could probably come up with a compiler in matter of hours just doing:

• Lexer.
• Parser.
• Semantic analysis
• Symbol tables.
• Type Checking.
• Convert AST to intermediate representation of backend.

cypher543 said:

I'd love to read the Dragon book, but I don't really have the money for it, yet. I spent all that I had on 3 domain names and a hosting payment. :P

Yes, these books are way overpriced.. That's why you should always live close to a university with a decent library... :)

hm..it seams he want's to have some fun just doing it..so maybe you should give him a chance to run into some walls, come back and ask some questions.
He'll realize himself what can be done with his language and what not as well as what works.
I agree that if you just wanted to create a compiler for an already specified language you'd best use a backend creating the parser with some lexx/yacc combination.
In order to learn something it might actually be better to try and do it all yourself and later replace the various bits with the 'professional' libs.

It does sound a bit as if you're making the language up as you progress doing the compiler. That is totally ok in order to play with different constructs etc. Just don't exspect too many ppl to jump on it until you have a well defined and stable language that is neatly integrated with lot's of tools. Eventually you will find yourself rewriting either some sort of DarkBasic or C++. Most languages have evolved a long time..to become what they are today. Maybe they could be a bit cleaner in places but they carry some backward compatiblity burden. But they DO WORK. And it is quite tricky to design a language that is not just a clone of a working language that aktually works in all the cases the programmer might devise.

Anyways..good luck and have fun :)

