lav_coyote25 wrote:
hmmmm... k - question!
these txt files - weapons etc - would/could they be incorporated into what is being attempted ( tech tree as part of the gui ) ??
as i said its just a question... needed to be asked. ;D
no idea what you mean...
karmazilla wrote:
I cringe at the idea of a homegrown XML parser

libxml is a very small dependancy - I'm looking at the ubuntu package right now, and libxml1 only depends on libc6 and zlib1g. Besides, because of the SGML legacy in XML, there
are some very hairy syntax rules in XML, like DTDs, Entity resolving, xml:id and CDATA sections. Plus, if we're going to create XSD Schemas and validate against them, then libxml
might (I'm not entierly sure) have some functionality solve that.
i entirely agree with karma. if you create your own syntax parser, you want it to accept
any valid xml, even if you choose to ignore namespace declarations, processor directives, attributes, etc. it's not worth the effort to make a new parser, as you'll spend 3 months just making it so it'll fit to spec and accept valid xml. if all you really want it basic tags, with pretty much no support for anything else, then go ahead and make a parser, as you'll save a lot of time ignoring the stuff you wont use, but don't call it xml (call it wzml or something), because if you call it xml, people will get pissed when they can't use output from their xml editor.
karmazilla wrote:
And, if performance in XML parsing is an issue, then wouldn't you expect the lads and lassies behind libxml to know a thing or two about it?
the libxml people aren't the people behind xml, so they can't really decide such things, but yes, they do know a thing or two about it: aside from keeping up with the frequently changing specs, most of their time goes into trying to shave off every last calculation. in either case though, if it turned out to be "slow", then what are they going to do? turn libxml into an audio resampling library? most of the stuff people are familiar with when it comes to sgml derivatives is html, which generally has 7-8 orders of magnitude more content than it does meta-data (the tags), so of course it parses quickly. storing game data would have about 3-4x more bytes put into the metadata than it does the actual data, and there'd be a much different performance curve for that.
in case i need to refresh memories: what's fast and ideal for machines is exactly the opposite of what's fast and ideal for humans -- csv files are much closer to the machine side, and if you don't allow for meaningless whitespace, it's very fast for a machine to parse, but impossibly slow for a human to understand. xml was never designed to be a "speedy alternative to the other stuff" -- it was designed for abstract compatibility: whether or not the content would make sense to a machine, the format definitely does, and same for a human.
what most people don't know is that xml comes in two forms: the textual representation, and the post-parser map that is usually stored in memory (dom). both are as much a part of xml as the other, and you can convert from one to the other without loss of data -- this binary representation is necessary because parsing the xml text (taking into account all parts of the spec, such as namespaces, processor directives, cdata sections, comments, and a few other little bits) *is* so damned slow. when you use dtd's or schemas, what they do immediately after the parser goes through the text, is they validate the entire document by iterating through the dom from start to finish... if you're using xml schema's, then the schema itself first needs to be parsed and iterated so that the schema can first be validated against a hard-coded dtd, so that the schema can be confirmed as a valid schema before the main xml document can be validated against this schema; every single xml resource you use goes through this process of being validated and then being validated against its dtd or schema if any is present.
after all that, the dom is finally presented to the program, which usually iterates the entire dom once more -- all xml documents that use a schema must be iterated through 3 times from start to finish, and there is absolutely no way to optimize this and cut out one or two iterations since the xml spec is very clear and solid about this very point: the schema should not even try to validate an xml document if it's not first determined to be valid xml, and the xml document should not be presented to the program
until the xml is determined to be both valid xml, and valid as per any optional schemas: their being draconian about it is really a good thing (because document authors tend to slip through the cracks wherever they can if a format isn't draconian), but it does remove potential for optimization. so, if you use the dom approach, and you have a seperate schema for each type of xml document, then you end up with 6 implicit iterations for each xml data file used in warzone (3 for the schema, and 3 for the actual document).
now you have sax: the only reason it was created is that one of the people working at the w3c on the xml spec realized that typical xml parsing ops both used way too much memory, and had too much overhead involved in parsing an xml file (took too long), and that most features in the dom were not used, being that most programs just iterated through the dom to grab data and then freed it from memory. so david megginson, the original developer of sax made an api that didn't validate the document before initial program access: it instead fired off callbacks every time it encountered a different kind of data (such as "start of element", "end of element", or an attribute, or a namespace declaration), and the program can choose to ignore any kind of data, and beyond that, stuff like the logical heirarcy is something that the receiving program must keep track of if it is of any importance, and if any errors crop up, the program can choose to ignore it. downside is that xml documents only now have as much logical structure as the receiving program chooses to give them. upside is the same as the downside, plus it's really really really fast compared to the dom approach, and it only involves a single iteration. other downside is that you can't validate the document without reinventing the wheel, but this isn't of great concern.
sax is still no where near as fast as the scanf method, but it'd take that 50x longer processing time and cut it down to about 5-7x.
i'd say that if we do use xml, then we really don't need an external form of validation, such as a schema (internal validation is just as good for this kind of use), and should go with sax. sax probably would be fast enough, and if not, as watermelon said, we could switch to something faster, or as kamaze said, we could just parse once and cache all results for future use (using timestamp checks, of course).