C parser/tokenizer

F33eec98e0fc761467c315aad881c035
0
Jesse_M 101 Oct 04, 2005 at 17:13

Can anyone point my way to a decent C parser/tokenizer written in C? That may sound like an odd request, I know. Thanks in advance.

10 Replies

Please log in or register to post a reply.

4a93ba032c357d782afb0820f328d14e
0
zavie 101 Oct 04, 2005 at 17:54

I don’t know any parser dedicated to C. But Lex/Yacc is a reference parser, and it is written in C. See also Flex/Bison.

22b3033832c5c699c856814b0cf80cb1
0
bladder 101 Oct 05, 2005 at 01:24

Flex and Bison very hard to find. I think they’re archived somewhere on the xengine sf page somewhere. Instead of Flex and Bison you can use Spirit from the boost library which IMO is way better and easier to use.

As for the actual C parser written in C… don’t know of any.

065f0635a4c94d685583c20132a4559d
0
Ed_Mack 101 Oct 05, 2005 at 05:51
340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Oct 05, 2005 at 11:01

Or use AntLR, an LL(k) parser generator that supports EBNF syntax. Can only output C++ code though (among other languages, but just not C)

F33eec98e0fc761467c315aad881c035
0
Jesse_M 101 Oct 05, 2005 at 16:23

Gcc is written in C, correct? How difficult would it be to tear out the parser framework and form a library with it? Anybody think it would be superfluous to just write the whole project myself?

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Oct 05, 2005 at 21:22

If you want your code to be GPL’ed, that’s an option, yes. I personally don’t like GPL. At all :)

22b3033832c5c699c856814b0cf80cb1
0
bladder 101 Oct 06, 2005 at 14:23

sorry, meant flex++ and bison++.

(c++ version of flex and bison)

B7dcbc0c0f07253f25ff5c25fe38c081
0
SamuraiCrow 101 Oct 15, 2005 at 02:52

Hmm… VBCC is written entirely in C and isn’t GPL but unfortunately it is closed source and I haven’t seen the source since version 0.4.

My next-best suggestion is to look into writing a trie map structure and have all of the tokens terminate with a null by checking the character type on the first tier of the trie. Essentially cascade a partial trie (with all of the ascii characters) into a full trie for all of the multibyte tokens. It’s tricky but can be accomplished in constant time.

F373d9db2d13fdef465493773a6affab
0
roxtar 101 Oct 15, 2005 at 03:38

What do you need the parser for? If you are looking for parser generators and lexical analyzer generators there are quite a few of them as mentioned by the previous posts. If you are going to attempt to write your own I suggest that you do a recursive descent parser for starters, as it is the most easy to understand and implement (IMHO). It has certain disadvantages though, like it cannot parse left recursive grammars.Try getting hold of the dragon book, if you can.

Cd577ee1cb56aa2ad5645b7daa0a2830
0
eddie 101 Oct 28, 2005 at 17:16

Jesse, as others have mentioned, I recommend bison/flex. I’m not sure the license that comes with code generated *by* bison/flex (I’ve only used it in companies that use it for internal tools, so I don’t believe we had to worry about licensing restrictions… IANAL tho), so you should probably check that out.

That said, roxtar asks a very intelligent question. What are you attempting to do? There might be an easier way of going about it if we know your situation or goal, as well as your constraints.

That said, there’s a site here that has the grammar for C in Lexx/Yacc http://www.lysator.liu.se/c/c-faq/c-17.html#17-25. At least that should cut some work out for you.