Jump to content


Merging XML Files


14 replies to this topic

#1 CrazyDef

    New Member

  • Members
  • Pip
  • 1 posts

Posted 28 August 2010 - 12:18 PM

Hi,

I realise that many many people in the games industry despise XML as a file format, myself included, but sometimes it really is unavoidable. (Visual Studio project files, anyone?)

Equally unavoidable is the necessity to compare or merge XML files - especially if your team is using some form of SCM software.

Finally being fed up with trying to resolve thousands of artificial conflicts due to the merge tool I was using treating XML as a regular text file, I decided to do something about it. Project: Merge is a tool that can compare and merge XML files *as* XML and not as text. More information can be found on the Project: Merge website at http://www.projectmerge.com

I hope some of you will find it useful.
James

#2 Mihail121

    Senior Member

  • Members
  • PipPipPipPip
  • 1059 posts

Posted 28 August 2010 - 01:45 PM

I find it useful and I most developers *love* XML, because of the clear advantages it brings. Will give your program a try!

#3 Reedbeta

    DevMaster Staff

  • Administrators
  • 5340 posts
  • LocationSanta Clara, CA

Posted 28 August 2010 - 05:27 PM

Cool! Sounds like a very useful thing to have around.
reedbeta.com - developer blog, OpenGL demos, and other projects

#4 Wernaeh

    Senior Member

  • Members
  • PipPipPipPip
  • 368 posts

Posted 28 August 2010 - 10:13 PM

Nice :) I'm not really in favour of XML, but I do have to work with it all once in a while, and normal SVN compare just doesn't work if someone used that nasty autoformatting option of Visual Studio -.-

Cheers,
- Wernaeh
Some call me mathematician, some just call me computer guy. Yet, I prefer the term professional weirdo :)

#5 alphadog

    DevMaster Staff

  • Moderators
  • 1716 posts

Posted 30 August 2010 - 12:55 AM

I am firmly in the camp of "XML is an Abomination upon all that is Good and Binary". :)

Too verbose for computers, and too unreadable for humans to use. It's got to be on of the Top 10 examples in development of scope creep. The only thing it had going for it was, for some reason unknown to me, a huge vendor push by the big guns like Microsoft, Sun, etc, that propelled it to a level it should have never reached by virtue of the plethora of tools made for it.

Anyone want a slightly-used sopabox? :)
Hyperbole is, like, the absolute best, most wonderful thing ever! However, you'd be an idiot to not think dogmatism is always bad.

#6 JarkkoL

    Senior Member

  • Members
  • PipPipPipPip
  • 477 posts

Posted 30 August 2010 - 06:10 AM

XML has its uses, but I wouldn't build anything that extensively relies on XML that has to deal with a lot of data, like games. It's just too slow and fat format for the purpose.

#7 Mihail121

    Senior Member

  • Members
  • PipPipPipPip
  • 1059 posts

Posted 30 August 2010 - 09:33 AM

JarkkoL said:

It's just too slow and fat format for the purpose.

The argument is mostly invalid. The format is indeed "fat", but nobody is expecting from you to embed any actual data in it. You could data that, yes, but could just include a description of the data held in other (binary?) files. For example a description of your model with mesh/texture data in separate files. The sole purpose of XML was and will ever be to make the life of the developer/user easier and I think it does so greatly. Parsing text is somewhat slow, yes, but computers are damn fast and frankly nobody cares except if millions of files are parsed during run-time. Why are all the gigahertz there if not to make our life easier? And just don't tell me you don't see the advantages of XML and domain-specific languages as a whole.

#8 JarkkoL

    Senior Member

  • Members
  • PipPipPipPip
  • 477 posts

Posted 30 August 2010 - 09:41 AM

That argument isn't invalid. Just have a look at COLLADA ;) People do care about load speeds and that's one of the places you can easily fail in TRC.

#9 Mihail121

    Senior Member

  • Members
  • PipPipPipPip
  • 1059 posts

Posted 30 August 2010 - 09:50 AM

Yes, in one very narrow area of computer software people care a lot for loading speeds, but then again these people must be idiots as waiting 2 seconds more wont kill anyone and probably results in a cheaper product as developers have to work less on formats, libraries, etc. I don't really believe it's more than 2 as analysing the logs of a mobile operator for a whole day (XML) takes about 4 minutes on a single desktop computer.

#10 JarkkoL

    Senior Member

  • Members
  • PipPipPipPip
  • 477 posts

Posted 30 August 2010 - 09:53 AM

You obviously know nothing about performance requirements for game development so I leave it at that.

#11 Mihail121

    Senior Member

  • Members
  • PipPipPipPip
  • 1059 posts

Posted 30 August 2010 - 10:09 AM

JarkkoL said:

You obviously know nothing about performance requirements for game development so I leave it at that.

I know most of it sadly as I get forced to ignore obvious, useful technologies quite often, because somebody else (money giver!) is dictating the rules. For me as a developer and computer scientist, having a single customizable format is clearly better than having tons of formats just as having a single customizable language would be better than having a huge variety of languages. I would clearly make my job simpler and the product easier to develop. Other benefit: we could translate from one format to another with at almost no cost. I know the practical side of things does not agree with me, but we have to slowly work in that direction. You are right that the user just wants his game to start immediately (not like initial version of Postal 2 back then). He also wants his operating system to boot right away, but when this requirement is met the user starts complaining the product is buggy. Well of course it's buggy, meeting the other requirements made the design highly complex. Searching for errors, bugs is now many times harder.

#12 alphadog

    DevMaster Staff

  • Moderators
  • 1716 posts

Posted 30 August 2010 - 01:18 PM

Mihail121 said:

having a single customizable format is clearly better than having tons of formats just as having a single customizable language would be better than having a huge variety of languages.

Actually, the classic argument here is "best-of-breed" versus "jack-of-all-trades". There are general trade-offs no matter where this battle is applied, and to prefer one over the other is dangerous for the professional.

Mihail121 said:

Parsing text is somewhat slow, yes, but computers are damn fast and frankly nobody cares except if millions of files are parsed during run-time.

Well, that's a pet peeve of mine. Many other fellow developers think I've got endless RAM, GBs and GHz to spare, so they build their bloatware without respect for my resources. And, while one such program is fine, but then two, three... the aggregated programs later, my systems a mess because of it.

Mihail121 said:

And just don't tell me you don't see the advantages of XML and domain-specific languages as a whole.

Just one real advantage to XML: the plethora of tools and libraries. Often-quoted biggies like human-readable is a myth, and vendor independence is another.

PS: DSLs are orthogonal to things like markup languages. You can use XML as the syntax for a DSL, but they aren't the same thing. In fact, a DSL goes contrary to one of your expressed preferences, in my first quote, where you state that you prefer general formats and languages.
Hyperbole is, like, the absolute best, most wonderful thing ever! However, you'd be an idiot to not think dogmatism is always bad.

#13 JarkkoL

    Senior Member

  • Members
  • PipPipPipPip
  • 477 posts

Posted 30 August 2010 - 01:55 PM

Mihail121 said:

For me as a developer and computer scientist, having a single customizable format is clearly better than having tons of formats just as having a single customizable language would be better than having a huge variety of languages.
Different languages have pros and cons compared to others. Wasn't it you who just said that you shouldn't overlook the benefits DSL's ;) Loading times for games are very critical, even more if you got streaming worlds where loading must happen during the game play. If your loading times in the beginning of level are too long, you don't pass TRC and you fail to ship the game (it's not a matter of convenience, it's a hard limit). This is a real issue for game developers, though it's more the matter of size of the data than text parsing when you read the data from DVD, but XML fails in that as well. Also, if loading / saving times of levels are long, it starts to eat into artist / designer productivity while developing games, so it's not only matter of shipped games.

The thing is, this is totally solved problem for me. I don't have to use XML for any of my data because I have tools to deal with the data efficiently in binary format. I don't need any artificial separation of bulk vs. meta data, which isn't as clear as you think when things get more complicated. And like alphadog kind of said, having data in XML has nothing to do with apps being easier to debug. When you got tons of data it doesn't matter if it's in binary or human readable since you can't make sense of it anyway, and that's from someone who has done his fair share of COLLADA parsing ;)

#14 juhnu

    Valued Member

  • Members
  • PipPipPip
  • 292 posts

Posted 30 August 2010 - 02:53 PM

I don't see why there has to be a strong divide between xml and binary formats. The most important thing XML has is hierarchical data presentation and well defined rules for character encodings, tag names etc. and these do not necessarily require a text-based format.

What I have been doing for years is to have a high-level reader/writer interface for writing structured "xml-like" data and then I have normal XML and binary backend for those. For the development purposes plain XML is often a best choice for easier debugging, and binary format is then used for the release builds or when needing faster loading times. All the data in the versioning control system I keep in XML form to make debugging and changes as easy as possible, which can be converted to a binary format by a simple command-line tool (absolutely no source code changes required for saving/loading codes).

I'd definitely recommend this approach for all of you since it's working really well and the feedback I have gotten during all these years have been 100% positive. It also avoids flamewars inside companies, since ppl can understand there is support for both camps.

#15 Mihail121

    Senior Member

  • Members
  • PipPipPipPip
  • 1059 posts

Posted 30 August 2010 - 03:01 PM

To both of you:

Yes, this is what I meant, XML is favourable because of all the tools available out there. And yes, it's nice for describing specialized languages. I know developers tend to think nowadays that we (users) have unlimited resources. Using XML to store data on cheap storage (e.g. magnetic bands) is not the same as using highly efficient structures during run-time, I'm not an idiot to deliver my users Java Desktop Environment or NetBeans-like performance and resource utilization, especially on mobile devices where power efficiency is required. But the many available tools also help us to transform formats easily, e.g. Office 2007 to OpenOffice. Of course both software suites use different formats, but the transformation is not that difficult as they share a common ground.

To summarize: I prefer customizable data description languages for the persistent storage of data. I respect and share the run-time concerns of the users myself, having a computer with a single GB of memory. I will never ever ever recommend parsing during run-time.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users