Modding Star Wars Clone Wars

Modding Star Wars Clone Wars

Reverse engineering the gcm file (or .msh)

XXD, python and coding

In the first article, I had a glance at the various files that can be found on the disc. That was it. Now it is time to look at them more closely. I figured out that the gcm files could be important, as they basically had the names of all possible vehicle of the game.

I never had any previous experience in reverse engineering a file. So I randomly opened it with xxd.

xxd bluelightsaber.gcm | head -n 20
00000000: 4845 4452 7808 0000 4d53 4832 6808 0000  HEDRx...MSH2h...
00000010: 5349 4e46 6400 0000 4e41 4d45 1400 0000  SINFd...NAME....
00000020: 6c69 6768 7473 6162 6572 5f61 6e61 6b69  lightsaber_anaki
00000030: 6e00 0000 4652 414d 0c00 0000 0100 0000  n...FRAM........
00000040: 6400 0000 9fc2 ef41 4242 4f58 2c00 0000  d......ABBOX,...
00000050: 0000 0000 0000 0000 0000 0000 0000 803f  ...............?
00000060: 0000 0000 0000 4033 95fd 67bf faff 7f3e  ......@3..g....>
00000070: fbff 7f3e d6d8 e33f 3c4c e83f 4341 4d52  ...>...?<L.?CAMR
00000080: 4000 0000 4e41 4d45 0800 0000 4361 6d65  @...NAME....Came
00000090: 7261 0000 4441 5441 2800 0000 e2e5 b1c0  ra..DATA(.......
000000a0: d1eb 7540 3a9b 0541 85ed 1abf 6bdd cc3f  ..u@:..A....k..?
000000b0: f33f 0b3f 0000 0000 29a8 6f3f cdcc cc3d  .?.?....).o?...=
000000c0: 0000 0047 4d41 544c f000 0000 0200 0000  ...GMATL........
000000d0: 4d41 5444 6000 0000 4e41 4d45 1000 0000  MATD`...NAME....
000000e0: 5363 656e 655f 4d61 7465 7269 616c 3000  Scene_Material0.
000000f0: 4441 5441 3400 0000 3333 333f 3333 333f  DATA4...333?333?
00000100: 3333 333f 0000 803f 0000 803f 0000 803f  333?...?...?...?
00000110: 0000 803f 0000 803f 9a99 993e 9a99 993e  ...?...?...>...>
00000120: 9a99 993e 0000 803f 0000 4842 4154 5242  ...>...?..HBATRB
00000130: 0400 0000 0000 0000 4d41 5444 7c00 0000  ........MATD|...

I do not have a clue of how other people figure out patterns. I do not have any method at all, just random guessing and testing. The strings HEDR, MSH2, SINF … seemed to be present on other files as well. Besides, the four bytes following them seemd to represent (big endian) integers, with small values.

More particulary, the NAME, was followed by the value 20 (in hex), with is exactly the length of the string “lightsaber_anakin…”. I just made the hypothesis that the file was cut in sections defined by char[4]:title int32:section_size. Which I could test with the following python code.

from struct import unpack_from


def read_array(buffer, offset, length):
    return buffer[offset:offset + length]


def read_float(buffer, offset):
    return unpack_from(">f", buffer, offset)[0]


def read_int32(buffer, offset):
    return unpack_from(">i", buffer, offset)[0]


def read_uint32(buffer, offset):
    return unpack_from(">I", buffer, offset)[0]


def read_uint32_big_endian(buffer, offset):
    return unpack_from("<I", buffer, offset)[0]


def read_uint16(buffer, offset):
    return unpack_from(">H", buffer, offset)[0]


def read_uint8(buffer, offset):
    return unpack_from("B", buffer, offset)[0]


def represent(input_object):
    attrs = vars(input_object)
    return ', '.join("%s: %s" % item for item in attrs.items())


class GamecubeModel(object):

    def __init__(self):
        pass

    def from_file(self, f):
        current_data = f.read()
        self._header = current_data[:0x04]
        self._filesize = read_uint32_big_endian(current_data, 0x04) # the file size (minus 8 bytes)

        self._header1 = current_data[0x08:0x0C] 
        self._unk1 = read_uint32_big_endian(current_data, 0x0C)

        self._header2 = current_data[0x10:0x14]
        self._unk2 = read_uint32_big_endian(current_data, 0x14)

        self._header3 = current_data[0x18:0x1C]
        self._unk3 = read_uint32_big_endian(current_data, 0x1C)

        self._name1 = current_data[0x20:(0x20 + self._unk3)]

        self._next = current_data[(0x20 + self._unk3):(0x24 + self._unk3)]
        self._next_size = read_uint32_big_endian(current_data, 0x24 + self._unk3)

        new_offset = 0x24 + self._unk3 + self._next_size + 0x04

        self._next2 = current_data[new_offset:(new_offset+0x04)]

        self._size = len(current_data)
        return self


if __name__ == "__main__":
    model = GamecubeModel()
    with open("./rep_inf_anakin_anims.gcm", "rb") as f:
        my_data = model.from_file(f)
        print represent(my_data)

    model2 = GamecubeModel()
    with open("./bluelightsaber.gcm", "rb") as f:
        my_data = model2.from_file(f)
        print represent(my_data)

    model3 = GamecubeModel()
    with open("./rep_walk_assault_skel.gcm", "rb") as f:
        my_data = model2.from_file(f)
        print represent(my_data)

    model4 = GamecubeModel()
    with open("./objective_arrow_1.gcm", "rb") as f:
        my_data = model4.from_file(f)
        print represent(my_data)

The results were quite consistent:

_header1: MSH2, _header2: SINF, _header3: NAME, _next2: BBOX, _next_size: 12, _name1: REP_inf_anakin_masterfile_cin_head, _unk2: 116, _unk3: 36, _unk1: 131936, _next: FRAM, _filesize: 452680, _size: 452688, _header: HEDR
_header1: MSH2, _header2: SINF, _header3: NAME, _next2: BBOX, _next_size: 12, _name1: lightsaber_anakin, _unk2: 100, _unk3: 20, _unk1: 2152, _next: FRAM, _filesize: 2168, _size: 2176, _header: HEDR
_header1: MSH2, _header2: SINF, _header3: NAME, _next2: BBOX, _next_size: 12, _name1: REP_walk_assault_multiAnim_V4, _unk2: 112, _unk3: 32, _unk1: 101084, _next: FRAM, _filesize: 117148, _size: 117156, _header: HEDR
_header1: MSH2, _header2: SINF, _header3: NAME, _next2: BBOX, _next_size: 12, _name1: Objective_Arrow, _unk2: 96, _unk3: 16, _unk1: 1564, _next: FRAM, _filesize: 1580, _size: 1588, _header: HEDR

It seemed, however, that in some cases, a section did not pay attention to the length that is stated before jumping to another section. Therefore, I made the hypothesis that there was some kind of xml/json tree-like structure, where a node could have many elements.

After some trial and errors, some nodes are terminal (they cannot be decomposed into sub-nodes), and other can. The following code parses the files and shows the tree in the console output.

class GamecubeModel(object):

    def __init__(self):
        self._terminal_states = {"NAME",
                "FRAM",
                "BBOX",
                "DATA",
                "ATRB",
                "MATL",
                "MTYP",
                "MNDX",
                "TRAN",
                "MATI",
                "POSL",
                "NRML",
                "UV0L",
                "STRP",
                "PRNT",
                "FLGS",
                "CL1L",
                "SKL2",
                "BLN2",
                "CYCL",
                "KFR3"}

    def from_file_rec(self, current_data, cursor, depth):
        section_name = current_data[cursor:(cursor + 0x04)]
        cursor = cursor + 0x04
        section_size = read_uint32_big_endian(current_data, cursor)
        cursor = cursor + 0x04
        print('\t' * depth + section_name + '\t' + str(section_size))

        if section_name in self._terminal_states:
            pass
        else:
            self.from_file_rec(current_data[cursor:(cursor + section_size)], 0, depth + 1)

        if cursor + section_size < len(current_data) :
            self.from_file_rec(current_data, cursor + section_size, depth)

    def show_in_console(self, current_data):
        self.from_file_rec(current_data, 0,0)

if __name__ == "__main__":
    model = GamecubeModel()
    with open("./bluelightsaber.gcm", "rb") as f:
        my_data = model.show_in_console(f.read())

And the result seem to be consistent as the sum over a section is equal to the sum of the sizes of the subsections. Example:

(SINF) 100 = (NAME) 8+20 + (FRAM) 8+12 + (BBOX) 8+44

Here is the decomposition of bluelightsaber.gcm according to these hypothesis.

HEDR	2168
	MSH2	2152
		SINF	100
			NAME	20
			FRAM	12
			BBOX	44
		CAMR	64
			NAME	8
			DATA	40
		MATL	240
		MODL	1252
			MTYP	4
			MNDX	4
			NAME	16
			TRAN	40
			GEOM	1148
				BBOX	44
				GSEG	312
					MATI	4
					POSL	100
					NRML	76
					STRP	100
				GSEG	768
					MATI	4
					POSL	208
					NRML	112
					UV0L	220
					STRP	184
		MODL	456
			MTYP	4
			MNDX	4
			NAME	12
			PRNT	16
			FLGS	4
			TRAN	40
			GEOM	320
				BBOX	44
				GSEG	260
					MATI	4
					POSL	76
					NRML	64
					STRP	84
	CL1L	0

The next step ?

Well, time to look for what these attributes stand for ! NRML could mean “normal”, NAME is obviously the name of a section, MAT. can stand for matrices… Looking for these keyword on the web, I discover this git page : schlechtwetterfront. Basically, this file format is used by the ZeroEngine, which has been reversed engineered (already…) for mods to Star Wars Battlefront… And a plugin already exists to read and write these files with SoftImage… xsizetools.

So, not much to do any more… Vehicles can be modified, and reimported.

Remaining work ?

It seems that I did not manage to import the six leg walker instead of another character in the game. But with these progress, many new things can be changed. What happens if a gcm file is replaced with another ? With this can I rescale a model ? Change the texture, add an add for my blog on the vehicles ? More answers to come…

GamecubeKid

GamecubeKid
Passionate about Gamecube, coding, statistics, beta games...

The lost Dekoboko track

Play on an unreleased Mario Kart Double Dash Track Continue reading

Compiling tgctogcm.c on Linux

Published on December 28, 2017

Modding MKDD

Published on December 23, 2017