Can anyone point my way to a decent C parser/tokenizer written in C?
That may sound like an odd request, I know. Thanks in advance.
Please log in or register to post a reply.
I don’t know any parser dedicated to C. But Lex/Yacc is a reference
parser, and it is written in C. See also Flex/Bison.
Flex and Bison very hard to find. I think they’re archived somewhere on
the xengine sf page somewhere.
Instead of Flex and Bison you can use Spirit from the
boost library which IMO is way better and easier to
As for the actual C parser written in C… don’t know of any.
What’s hard to find about them?
Or use AntLR, an LL(k) parser generator that
supports EBNF syntax. Can only output C++ code though (among other
languages, but just not C)
Gcc is written in C, correct? How difficult would it be to tear out the
parser framework and form a library with it? Anybody think it would be
superfluous to just write the whole project myself?
If you want your code to be GPL’ed, that’s an option, yes. I personally
don’t like GPL. At all :)
sorry, meant flex++ and bison++.
(c++ version of flex and bison)
Hmm… VBCC is written entirely in
C and isn’t GPL but unfortunately it is closed source and I haven’t seen
the source since version 0.4.
My next-best suggestion is to look into writing a trie map structure and
have all of the tokens terminate with a null by checking the character
type on the first tier of the trie. Essentially cascade a partial trie
(with all of the ascii characters) into a full trie for all of the
multibyte tokens. It’s tricky but can be accomplished in constant
What do you need the parser for? If you are looking for parser
generators and lexical analyzer generators there are quite a few of them
as mentioned by the previous posts. If you are going to attempt to write
your own I suggest that you do a recursive descent parser for starters,
as it is the most easy to understand and implement (IMHO). It has
certain disadvantages though, like it cannot parse left recursive
grammars.Try getting hold of the dragon book, if you can.
Jesse, as others have mentioned, I recommend bison/flex. I’m not sure
the license that comes with code generated *by* bison/flex (I’ve only
used it in companies that use it for internal tools, so I don’t believe
we had to worry about licensing restrictions… IANAL tho), so you
should probably check that out.
That said, roxtar asks a very intelligent question. What are you
attempting to do? There might be an easier way of going about it if we
know your situation or goal, as well as your constraints.
That said, there’s a site here that has the grammar for C in Lexx/Yacc
At least that should cut some work out for you.