Jump to content


C parser/tokenizer


10 replies to this topic

#1 Jesse M

    Member

  • Members
  • PipPip
  • 32 posts

Posted 04 October 2005 - 05:13 PM

Can anyone point my way to a decent C parser/tokenizer written in C? That may sound like an odd request, I know. Thanks in advance.
FRAG THE PLANET
Ed Helms: Alcohol causes problems and guns solve problems. I don't see why you can't have guns in bars.
Other guy: That's a stupid idea.
Ed Helms: Yeah, if your a pussy.

#2 zavie

    Member

  • Members
  • PipPip
  • 91 posts

Posted 04 October 2005 - 05:54 PM

I don't know any parser dedicated to C. But Lex/Yacc is a reference parser, and it is written in C. See also Flex/Bison.

#3 bladder

    DevMaster Staff

  • Members
  • PipPipPipPip
  • 1057 posts

Posted 05 October 2005 - 01:24 AM

Flex and Bison very hard to find. I think they're archived somewhere on the xengine sf page somewhere. Instead of Flex and Bison you can use Spirit from the boost library which IMO is way better and easier to use.

As for the actual C parser written in C... don't know of any.

#4 Ed Mack

    Senior Member

  • Members
  • PipPipPipPip
  • 1239 posts

Posted 05 October 2005 - 05:51 AM

http://www.gnu.org/software/flex/
http://www.gnu.org/software/bison/

What's hard to find about them?

#5 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 05 October 2005 - 11:01 AM

Or use AntLR, an LL(k) parser generator that supports EBNF syntax. Can only output C++ code though (among other languages, but just not C)
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#6 Jesse M

    Member

  • Members
  • PipPip
  • 32 posts

Posted 05 October 2005 - 04:23 PM

Gcc is written in C, correct? How difficult would it be to tear out the parser framework and form a library with it? Anybody think it would be superfluous to just write the whole project myself?
FRAG THE PLANET
Ed Helms: Alcohol causes problems and guns solve problems. I don't see why you can't have guns in bars.
Other guy: That's a stupid idea.
Ed Helms: Yeah, if your a pussy.

#7 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 05 October 2005 - 09:22 PM

If you want your code to be GPL'ed, that's an option, yes. I personally don't like GPL. At all :)
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#8 bladder

    DevMaster Staff

  • Members
  • PipPipPipPip
  • 1057 posts

Posted 06 October 2005 - 02:23 PM

sorry, meant flex++ and bison++.

(c++ version of flex and bison)

#9 SamuraiCrow

    Senior Member

  • Members
  • PipPipPipPip
  • 459 posts

Posted 15 October 2005 - 02:52 AM

Hmm... VBCC is written entirely in C and isn't GPL but unfortunately it is closed source and I haven't seen the source since version 0.4.

My next-best suggestion is to look into writing a trie map structure and have all of the tokens terminate with a null by checking the character type on the first tier of the trie. Essentially cascade a partial trie (with all of the ascii characters) into a full trie for all of the multibyte tokens. It's tricky but can be accomplished in constant time.

#10 roxtar

    Member

  • Members
  • PipPip
  • 94 posts

Posted 15 October 2005 - 03:38 AM

What do you need the parser for? If you are looking for parser generators and lexical analyzer generators there are quite a few of them as mentioned by the previous posts. If you are going to attempt to write your own I suggest that you do a recursive descent parser for starters, as it is the most easy to understand and implement (IMHO). It has certain disadvantages though, like it cannot parse left recursive grammars.Try getting hold of the dragon book, if you can.

#11 eddie

    Senior Member

  • Members
  • PipPipPipPip
  • 751 posts

Posted 28 October 2005 - 05:16 PM

Jesse, as others have mentioned, I recommend bison/flex. I'm not sure the license that comes with code generated *by* bison/flex (I've only used it in companies that use it for internal tools, so I don't believe we had to worry about licensing restrictions... IANAL tho), so you should probably check that out.

That said, roxtar asks a very intelligent question. What are you attempting to do? There might be an easier way of going about it if we know your situation or goal, as well as your constraints.

That said, there's a site here that has the grammar for C in Lexx/Yacc http://www.lysator.l...c-17.html#17-25. At least that should cut some work out for you.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users