Jump to content


handling changes to structs


  • You cannot reply to this topic
10 replies to this topic

#1 hunguptodry

    New Member

  • Members
  • PipPip
  • 31 posts

Posted 12 November 2007 - 03:48 AM

hi,

1) i deal mostly with arrays of structs. streaming them in/out of files in binary is very straightforward. this is good. i keep a few of these files around for testing.

2) thing works well if i change my code without touching the structs.

3) once i modify the structs, all the files become unusable.

4) i need write code to migrate the old files to the new versions. this is a major pain in the butt.

5) is there a better way to do this?

#2 Reedbeta

    DevMaster Staff

  • Administrators
  • 4979 posts
  • LocationBellevue, WA

Posted 12 November 2007 - 05:28 AM

Well, no matter what you do, if you want to support reading different versions of files, you'll have to write migration code unfortunately.

At the place I worked over the summer, we had a system where each file format had a major version and minor version. The files themselves were tagged with the version used to write them and the engine knew what the latest version numbers were. For small changes to the file formats we'd bump the minor version defined in the engine and write some migration code - the engine, upon reading the files, would see the version difference and trigger the migration code. Every now and then, or for larger changes, we'd bump the major version and remove all the built-up migration code from the last few minor bumps. The engine would refuse to load files from an earlier major version, so they'd have to be rebuilt from their source files (Maya files, texture bitmaps etc) using our toolchain. This rebuilding process was automatic though, so it wasn't a huge amount of pain (unless somebody broke something).

So, I guess for your case, if your files can be generated in a preprocess from some source whose format isn't changing, you can add version numbers and hook up an automatic build process that forces the files to be rebuilt when the version number in the application is higher than the version number in the file.
reedbeta.com - developer blog, OpenGL demos, and other projects

#3 J22

    Member

  • Members
  • PipPip
  • 92 posts

Posted 12 November 2007 - 06:20 AM

If you have proper data serialization system in place which supports versioning, you can have the version conversion done automatically in most of the cases. What you need to do is to store type information into the file and do conversion once the type information doesn't match the one in your executable. This isn't trivial to implement in C++ though because C++ doesn't support class introspection. What engines like Unreal Engine do though, is to have this meta language (Unreal Script), which defines the classes. Instead of trying to implement this yourself, I would recommend looking for existing libraries to do this for you.

#4 hunguptodry

    New Member

  • Members
  • PipPip
  • 31 posts

Posted 12 November 2007 - 09:48 AM

1) i don't really need to handle multiple versions.

2) i just need a way to make development less painful.

3) now, each time i make a small change to one or more structures, it is a struggle to get the files to load properly.

4) i just want the old members of the new structs to retain their old values when file is loaded.

4) here is the band-aid i have come up with ...


struct oldstruct {

   float a,b,c,d;

  char filler[100];

};


struct newstruct {

  float a,b,c,d;

  int e,f,g;

  char filler[88];

}


not very elegant but better.

#5 Sol_HSA

    Senior Member

  • Members
  • PipPipPipPip
  • 482 posts
  • LocationNowhere whenever

Posted 12 November 2007 - 12:45 PM

Generally speaking, don't read and write structures directly; handle each member variable separately. This way you can handle byte order issues as well as versioning problems.

Slightly more work to set up, but much more convenient in the long run.
http://iki.fi/sol - my schtuphh

#6 Wernaeh

    Senior Member

  • Members
  • PipPipPipPip
  • 368 posts

Posted 12 November 2007 - 01:55 PM

Another way to get around this is to have some tool extract the declarations of your structures from your C++ files into appropriate variables, and then implement an automated saving / loading method. This has the advantage that you do not need to store all versioning information in every file you create.

Note:
On MS platforms, the compiler already creates appropriate information within the pdb databases, but this is not a reliable or compatible way to handle advanced rtti / reflection as required for the automated serialization process.


-------------------------------------------------------------------------------
Step 1: Exporting from CPP/H files.

For this, #define an empty macro StructGenStart. Use this to tag all structures that are loaded or saved at runtime, such as:


#define StructGenStart


/* ... */


StructGenStart struct myStructure

{

    int a, b, c;

    float otherValue;

};



-------------------------------------------------------------------------------
Step 2: Parsing exports

Create a preprocessing tool which links into your building process, specifically at the pre-build stage.

This tool parses all C++ files inside some directory, extracting all tagged structures into a memory representation of the structures' layout, and then creates a CPP/H file pair from that memory representation, which is later on linked into the build process. I.e. fill structures


class StructureDescriptor

{

    String sName;

    int iLength;


    std::vector<VarDescriptor> vVariables;


    [optionally: also parent, structors, member functions....]

};


class VarDescriptor

{

    enum EVarType { eVT_Char, eVT_Float, eVT_Boolean, ....... };


    String sVarName;

    EVarType eVarType;

    int iOffsetInStruct;


    [optional: array count, pointer to type for typed variables, ...]

};


with information extracted from the source files.

This tool need not contain any sophisticated parser or lexer. The lexer can easily be created with flex or similar helpers, and need only recognize identifiers, and some delimiters ( { and ; )

For parsing, skip everything until the macro #defined above is found. Then, parse the following identifier as a structure name, seek for the opening bracket, then, parse first an identifier for the member type id, and then a list of comma seperated member names.

This puts some restrictions on what types you accept inside your structs, but makes the tool easier to write (especially, member structs and inheritance need to use a more sophisticated scheme, while pointers and double-inheritance get _very nasty_)


-------------------------------------------------------------------------------
Step 3: Creating the CPP/H file pair including type information.

Create a new CPP/H file pair. For each of the tagged structures parsed before, a new static StructDescriptor needs to be inserted into the file pair, and needs to be initialized (this can, f.e., done in the constructor).

The resulting file might look like this:

/*In the H file:*/

class myStructure_StructDescriptor : public StructDescriptor

{

    myStructure_StructDescriptor()

    {

        sName = "myStructure"; // Name parsed in, note this is statically built!

        iSize = sizeof(myStructure); // Again this is built pre-compiling, and

                                               // then basically ends up hardcoded.


        // Note the little trick here to get to the relative address of the

        // variable :)

        vVariables.push_back

            (*new VariableDescriptor

                  ("a", eVT_Integer, &(((myStructure*)null)->a)));

       /* and so on */

    }


    // Static instance of the structure description, not nice coding practice,

    // find something better if you want to ;)

    static const myStructure_StructDescriptor myInstance;

};



-------------------------------------------------------------------------------
Step 4: Structure storage files

Now instead of just writing out your structure into the file, a more flexible
file format needs to be used. Going the textbased way, the first thing which
comes into mind is a file format similar to .ini files, which contains a list of
assignment statements, such as


a = 5;

b = 10;

c = 200;

myValue = 200.5;


As one can see, instead of assuming a structure layout, the file provides
named data which then needs to be interpreted at load time.

A structure may easily be saved out by a general structure saving procedure,
which accepts an (untyped) structure pointer, and an associated struct definition. (Note you lose compile-time type checks here! Either implement your own ones, do a rtti lookup, or be _very_ careful)
For each variable entry in the struct definition, an associated entry
within the file is created.


-------------------------------------------------------------------------------
Step 5: The automated load process

This is rather simple: Instead of accepting just a pointer to your struct,
create a load call that accepts an untyped pointer, and an associated structure definition object (Note you lose compile-time type checks here, again!)

The memory area at the pointer is assumed to conform to the provided structure definition.

So, read each line in the file, extract (as a string!) the variable name and the assignment value. Also run through your variable declarations inside the structure definition, and see if there is a matching variable name in memory. If so, cast your incoming assignment value into an appropriate type for this variable (note that some conversions, such as string-to-float, should produce a warning here). Store the cast type at (pointerlocation + variable offset).

If no matching variable is found or no type conversion is possible, ignore the statement. This solves type change and variable removal conflicts.

To "solve" cases where a new variable is inserted into a structure, just see that the new variable is properly initialized in the constructor of the structure.

After any change to structure layout, old files now correctly are parsed, and when saving, everything will be stored as in the new layout.


-------------------------------------------------------------------------------
Step 6: Further possibilities and enhancements

Once in place, such a system may be used for several other purposes as
well:
- Upgrade to a full symbol database:
Include methods, scopes, inheritance into the scheme presented above, and
you end up with a very potent C++ reflection system.
- In-place editing and debugging: Display and change variable values on screen while your program is running
- Script / C++-Interoperability: Easily modify C++ variables and call C++ methods from inside some user-based scripting language _without_ any difficulties (i.e. only a single keyword needs to be added to exported stuff, no need to explicitly define each parameter manually, which is error prone and ugly)
- Advanced error-sentient serialization: For example, serialize into a network stream and deserialize on the other end (for performance insensitive data, such as slow changing environments, or regular interval synching), no need to write explicit network transmission code anymore.



If anyone is interested, I've written a complete Structure/Class reflection tool for C++ with functionality as outlined in this post (plus inheritance, plus pointer handling), and can provide both source code and implementation guidelines.


Hope this helps,
Sorry for all typos or unclarities, I'm a little in a hurry here :)

Cheers,
- Wernaeh
Some call me mathematician, some just call me computer guy. Yet, I prefer the term professional weirdo :)

#7 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 12 November 2007 - 02:54 PM

I once wrote a system that used a text based definition of the classes (With a default value). You would then define a version nmber that would get written out in the memory mapped structure. On loading if the memory mapped structure had changed it would move the various bits of data around (by referring to the previous definition file and the current) and set the default values on all the new members (members that had been removed were, simply, removed). This worked really well as it happens :)

It gave me a few advantages such as being able to instantiate classes through a factory that gots its definitions from the description file. In its later days all you needed to do was define a new class inthe description file and it would even gnerate you a .cpp and .h file :)

Its a shame i lost the code really :(

#8 Jare

    Valued Member

  • Members
  • PipPipPip
  • 247 posts

Posted 12 November 2007 - 05:57 PM

Guys, hunguptodry apparently wants a quick & dirty solution. :) One thing you could do that is slightly better than your current solution, is to store the size of the structures contained in the file, then read them in a loop rather as a single fread. This way, you don't need to increase the size of your struct with fillers.


FILE *f = fopen("thefile.bin", "rb");

int sizeInFile;

int numInFile;

fread(&sizeInFile, sizeof(sizeInFile), 1, f);

fread(&numInFile, sizeof(numInFile), 1, f);

// You could deduce numInFile from the size of the file


MyStruct *s = new[numInFile] MyStruct;

for (int i = 0; i < numInFile; ++i)

    fread(s + i, sizeInFile, 1, f);

As long as you keep the previous structure intact and only ADD members to it, this approach should work, and even lets you define default values for the new fields.

That said, I suggest you think about some of the more sophisticated ideas offered - at some point your serialization will need these features.

#9 hunguptodry

    New Member

  • Members
  • PipPip
  • 31 posts

Posted 13 November 2007 - 02:41 AM

1) thanks to everyone. it'll take me a while to digest all these.

2) Jare - "quick and dirty" is indeed one of the the key principles i follow. it is second only to "trial and error" .

#10 J22

    Member

  • Members
  • PipPip
  • 92 posts

Posted 13 November 2007 - 04:49 AM

Quick and dirty is good way to shoot yourself to foot later, but I wish you good luck anyway. It's quite restrictive when you compare it to how classes usually evolve and potentially causes lots of grief later. Anyway, it might be good enough if you think only in very short term, but generally that's not how you should think when you are talking about serialization of data.

#11 Mattias Gustavsson

    Senior Member

  • Members
  • PipPipPipPip
  • 413 posts

Posted 13 November 2007 - 08:47 AM

I'm all for quick and dirty and trial and error, they are great principles to live by when one first starts out making games for fun. The only important thing is to get things to run and do what you want, doesn't matter how it is done.

When you're more experienced, you'll want to pick up good practices and do things the "right" way (whatever that is :rolleyes:), to enable you to do bigger projects with fewer bugs. But when first starting out, you won't do projects of that scale...
  • www.mattiasgustavsson.com - My blog and current projects
  • www.rivtind.com - My Fantasy world and isometric RPG engine
  • www.pixieuniversity.com - My Software 2D Game Engine





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users