Merging XML Files
Started by CrazyDef, Aug 28 2010 12:18 PM
14 replies to this topic
#1
Posted 28 August 2010 - 12:18 PM
Hi,
I realise that many many people in the games industry despise XML as a file format, myself included, but sometimes it really is unavoidable. (Visual Studio project files, anyone?)
Equally unavoidable is the necessity to compare or merge XML files - especially if your team is using some form of SCM software.
Finally being fed up with trying to resolve thousands of artificial conflicts due to the merge tool I was using treating XML as a regular text file, I decided to do something about it. Project: Merge is a tool that can compare and merge XML files *as* XML and not as text. More information can be found on the Project: Merge website at http://www.projectmerge.com
I hope some of you will find it useful.
James
I realise that many many people in the games industry despise XML as a file format, myself included, but sometimes it really is unavoidable. (Visual Studio project files, anyone?)
Equally unavoidable is the necessity to compare or merge XML files - especially if your team is using some form of SCM software.
Finally being fed up with trying to resolve thousands of artificial conflicts due to the merge tool I was using treating XML as a regular text file, I decided to do something about it. Project: Merge is a tool that can compare and merge XML files *as* XML and not as text. More information can be found on the Project: Merge website at http://www.projectmerge.com
I hope some of you will find it useful.
James
#2
Posted 28 August 2010 - 01:45 PM
I find it useful and I most developers *love* XML, because of the clear advantages it brings. Will give your program a try!
#3
Posted 28 August 2010 - 05:27 PM
Cool! Sounds like a very useful thing to have around.
reedbeta.com - developer blog, OpenGL demos, and other projects
#4
Posted 28 August 2010 - 10:13 PM
Nice :) I'm not really in favour of XML, but I do have to work with it all once in a while, and normal SVN compare just doesn't work if someone used that nasty autoformatting option of Visual Studio -.-
Cheers,
- Wernaeh
Cheers,
- Wernaeh
Some call me mathematician, some just call me computer guy. Yet, I prefer the term professional weirdo :)
#5
Posted 30 August 2010 - 12:55 AM
I am firmly in the camp of "XML is an Abomination upon all that is Good and Binary". 
Too verbose for computers, and too unreadable for humans to use. It's got to be on of the Top 10 examples in development of scope creep. The only thing it had going for it was, for some reason unknown to me, a huge vendor push by the big guns like Microsoft, Sun, etc, that propelled it to a level it should have never reached by virtue of the plethora of tools made for it.
Anyone want a slightly-used sopabox?
Too verbose for computers, and too unreadable for humans to use. It's got to be on of the Top 10 examples in development of scope creep. The only thing it had going for it was, for some reason unknown to me, a huge vendor push by the big guns like Microsoft, Sun, etc, that propelled it to a level it should have never reached by virtue of the plethora of tools made for it.
Anyone want a slightly-used sopabox?
Hyperbole is, like, the absolute best, most wonderful thing ever! However, you'd be an idiot to not think dogmatism is always bad.
#7
Posted 30 August 2010 - 09:33 AM
JarkkoL said:
It's just too slow and fat format for the purpose.
The argument is mostly invalid. The format is indeed "fat", but nobody is expecting from you to embed any actual data in it. You could data that, yes, but could just include a description of the data held in other (binary?) files. For example a description of your model with mesh/texture data in separate files. The sole purpose of XML was and will ever be to make the life of the developer/user easier and I think it does so greatly. Parsing text is somewhat slow, yes, but computers are damn fast and frankly nobody cares except if millions of files are parsed during run-time. Why are all the gigahertz there if not to make our life easier? And just don't tell me you don't see the advantages of XML and domain-specific languages as a whole.
#9
Posted 30 August 2010 - 09:50 AM
Yes, in one very narrow area of computer software people care a lot for loading speeds, but then again these people must be idiots as waiting 2 seconds more wont kill anyone and probably results in a cheaper product as developers have to work less on formats, libraries, etc. I don't really believe it's more than 2 as analysing the logs of a mobile operator for a whole day (XML) takes about 4 minutes on a single desktop computer.
#11
Posted 30 August 2010 - 10:09 AM
JarkkoL said:
You obviously know nothing about performance requirements for game development so I leave it at that.
I know most of it sadly as I get forced to ignore obvious, useful technologies quite often, because somebody else (money giver!) is dictating the rules. For me as a developer and computer scientist, having a single customizable format is clearly better than having tons of formats just as having a single customizable language would be better than having a huge variety of languages. I would clearly make my job simpler and the product easier to develop. Other benefit: we could translate from one format to another with at almost no cost. I know the practical side of things does not agree with me, but we have to slowly work in that direction. You are right that the user just wants his game to start immediately (not like initial version of Postal 2 back then). He also wants his operating system to boot right away, but when this requirement is met the user starts complaining the product is buggy. Well of course it's buggy, meeting the other requirements made the design highly complex. Searching for errors, bugs is now many times harder.
#12
Posted 30 August 2010 - 01:18 PM
Mihail121 said:
having a single customizable format is clearly better than having tons of formats just as having a single customizable language would be better than having a huge variety of languages.
Actually, the classic argument here is "best-of-breed" versus "jack-of-all-trades". There are general trade-offs no matter where this battle is applied, and to prefer one over the other is dangerous for the professional.
Mihail121 said:
Parsing text is somewhat slow, yes, but computers are damn fast and frankly nobody cares except if millions of files are parsed during run-time.
Well, that's a pet peeve of mine. Many other fellow developers think I've got endless RAM, GBs and GHz to spare, so they build their bloatware without respect for my resources. And, while one such program is fine, but then two, three... the aggregated programs later, my systems a mess because of it.
Mihail121 said:
And just don't tell me you don't see the advantages of XML and domain-specific languages as a whole.
Just one real advantage to XML: the plethora of tools and libraries. Often-quoted biggies like human-readable is a myth, and vendor independence is another.
PS: DSLs are orthogonal to things like markup languages. You can use XML as the syntax for a DSL, but they aren't the same thing. In fact, a DSL goes contrary to one of your expressed preferences, in my first quote, where you state that you prefer general formats and languages.
Hyperbole is, like, the absolute best, most wonderful thing ever! However, you'd be an idiot to not think dogmatism is always bad.
#13
Posted 30 August 2010 - 01:55 PM
Mihail121 said:
For me as a developer and computer scientist, having a single customizable format is clearly better than having tons of formats just as having a single customizable language would be better than having a huge variety of languages.
The thing is, this is totally solved problem for me. I don't have to use XML for any of my data because I have tools to deal with the data efficiently in binary format. I don't need any artificial separation of bulk vs. meta data, which isn't as clear as you think when things get more complicated. And like alphadog kind of said, having data in XML has nothing to do with apps being easier to debug. When you got tons of data it doesn't matter if it's in binary or human readable since you can't make sense of it anyway, and that's from someone who has done his fair share of COLLADA parsing ;)
#14
Posted 30 August 2010 - 02:53 PM
I don't see why there has to be a strong divide between xml and binary formats. The most important thing XML has is hierarchical data presentation and well defined rules for character encodings, tag names etc. and these do not necessarily require a text-based format.
What I have been doing for years is to have a high-level reader/writer interface for writing structured "xml-like" data and then I have normal XML and binary backend for those. For the development purposes plain XML is often a best choice for easier debugging, and binary format is then used for the release builds or when needing faster loading times. All the data in the versioning control system I keep in XML form to make debugging and changes as easy as possible, which can be converted to a binary format by a simple command-line tool (absolutely no source code changes required for saving/loading codes).
I'd definitely recommend this approach for all of you since it's working really well and the feedback I have gotten during all these years have been 100% positive. It also avoids flamewars inside companies, since ppl can understand there is support for both camps.
What I have been doing for years is to have a high-level reader/writer interface for writing structured "xml-like" data and then I have normal XML and binary backend for those. For the development purposes plain XML is often a best choice for easier debugging, and binary format is then used for the release builds or when needing faster loading times. All the data in the versioning control system I keep in XML form to make debugging and changes as easy as possible, which can be converted to a binary format by a simple command-line tool (absolutely no source code changes required for saving/loading codes).
I'd definitely recommend this approach for all of you since it's working really well and the feedback I have gotten during all these years have been 100% positive. It also avoids flamewars inside companies, since ppl can understand there is support for both camps.
#15
Posted 30 August 2010 - 03:01 PM
To both of you:
Yes, this is what I meant, XML is favourable because of all the tools available out there. And yes, it's nice for describing specialized languages. I know developers tend to think nowadays that we (users) have unlimited resources. Using XML to store data on cheap storage (e.g. magnetic bands) is not the same as using highly efficient structures during run-time, I'm not an idiot to deliver my users Java Desktop Environment or NetBeans-like performance and resource utilization, especially on mobile devices where power efficiency is required. But the many available tools also help us to transform formats easily, e.g. Office 2007 to OpenOffice. Of course both software suites use different formats, but the transformation is not that difficult as they share a common ground.
To summarize: I prefer customizable data description languages for the persistent storage of data. I respect and share the run-time concerns of the users myself, having a computer with a single GB of memory. I will never ever ever recommend parsing during run-time.
Yes, this is what I meant, XML is favourable because of all the tools available out there. And yes, it's nice for describing specialized languages. I know developers tend to think nowadays that we (users) have unlimited resources. Using XML to store data on cheap storage (e.g. magnetic bands) is not the same as using highly efficient structures during run-time, I'm not an idiot to deliver my users Java Desktop Environment or NetBeans-like performance and resource utilization, especially on mobile devices where power efficiency is required. But the many available tools also help us to transform formats easily, e.g. Office 2007 to OpenOffice. Of course both software suites use different formats, but the transformation is not that difficult as they share a common ground.
To summarize: I prefer customizable data description languages for the persistent storage of data. I respect and share the run-time concerns of the users myself, having a computer with a single GB of memory. I will never ever ever recommend parsing during run-time.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users











