Some differences in mxOPAQUE CLASS and Object metadata, particularly for Strings #31

Open
opened 2025-03-18 19:05:52 +00:00 by foreverallama · 2 comments
foreverallama commented 2025-03-18 19:05:52 +00:00 (Migrated from github.com)

Hi!
I was looking at how objects are stored in MAT files, and came across this repository and your explanation in MatFileHandler/objects.md. When trying to replicate, I observed some small differences in the object metadata.

Some general differences I observed:

  • Instead of 6 offsets, there are 8 offsets in the list.
  • The fieldContentsID in Region 4 is numbered starting from 0.
  • Region 3 has a slightly different structure as follows: (classID, 0, 0, X, Y, objectID). I'll get into what X and Y are doing below

Some differences for string class:

  • Region 3 had a slightly different structure. As expected, it was a block of six 32-bit integers, one block for each object with data as (classID, 0, 0, X, 0, objectID). X here I will call as stringObjectID, as it started with 1 and incremented for each string object in the file. On the contrary, for user defined classes X is set to 0. Instead, the fifth field (Y as mentioned earlier) is set starting from 1 and incrementing for every user-defined class in the file. I also tried with datetime which was using the Y field. My guess is these two fields are some type of internal identifiers for certain categories of objects.
  • Region 2 was always empty for user defined classes. However, for string, Region 2 was present, and structured exactly the same as Region 4 would be, i.e., three 32-bit integers with the format (fieldID, 1, fieldContentsID). Only one field for each string object is present. No String related data was present in Region 4. In some examples, fieldID was set to 5, and in others it was set to 1. Need to take up a few more examples for this.
  • The fieldContents cell for strings was interesting. The array flag for this was set as mxUINT64_CLASS with dimensions [1, (5+k)], where k depends on the length of the string. The first four 64-bit integers was set as [1,2,1,1], and the fifth integer specified the number of characters in the string. The next k columns contained the actual string contents, which is null terminated and padded to 8-byte blocks. However, the content was stored as UTF-16 characters within these 64-bit columns. Hence, each column essentially stores 4 characters.

I was looking for some help to decode what's happening with strings here, or if there's something else I could be missing. I'm looking to incorporate more objects/examples to help break this down.

The input data I used:

astring1 = "Hello!"
dt = datetime('today')
obj3 = myclass(30)
obj6 = myclass(myclass(5))
stringVar2 = "Goodbye!"

% myclass is defined with 3 properties - message, num, aeroplanes
% myclass.message is set to the input argument
Hi! I was looking at how objects are stored in MAT files, and came across this repository and your explanation in `MatFileHandler/objects.md`. When trying to replicate, I observed some small differences in the object metadata. Some general differences I observed: - Instead of 6 offsets, there are 8 offsets in the list. - The `fieldContentsID` in Region 4 is numbered starting from 0. - Region 3 has a slightly different structure as follows: `(classID, 0, 0, X, Y, objectID)`. I'll get into what X and Y are doing below Some differences for `string` class: - Region 3 had a slightly different structure. As expected, it was a block of six 32-bit integers, one block for each object with data as `(classID, 0, 0, X, 0, objectID)`. `X` here I will call as `stringObjectID`, as it started with `1` and incremented for each `string` object in the file. On the contrary, for user defined classes `X` is set to `0`. Instead, the fifth field (`Y` as mentioned earlier) is set starting from `1` and incrementing for every user-defined class in the file. I also tried with `datetime` which was using the `Y` field. My guess is these two fields are some type of internal identifiers for certain categories of objects. - Region 2 was always empty for user defined classes. However, for `string`, Region 2 was present, and structured exactly the same as Region 4 would be, i.e., three 32-bit integers with the format `(fieldID, 1, fieldContentsID)`. Only one field for each string object is present. No String related data was present in Region 4. In some examples, `fieldID` was set to `5`, and in others it was set to `1`. Need to take up a few more examples for this. - The `fieldContents` cell for strings was interesting. The array flag for this was set as `mxUINT64_CLASS` with dimensions `[1, (5+k)]`, where `k` depends on the length of the string. The first four 64-bit integers was set as `[1,2,1,1]`, and the fifth integer specified the number of characters in the string. The next `k` columns contained the actual string contents, which is null terminated and padded to 8-byte blocks. However, the content was stored as UTF-16 characters within these 64-bit columns. Hence, each column essentially stores 4 characters. I was looking for some help to decode what's happening with strings here, or if there's something else I could be missing. I'm looking to incorporate more objects/examples to help break this down. The input data I used: ```MATLAB astring1 = "Hello!" dt = datetime('today') obj3 = myclass(30) obj6 = myclass(myclass(5)) stringVar2 = "Goodbye!" % myclass is defined with 3 properties - message, num, aeroplanes % myclass.message is set to the input argument ```
mahalex commented 2025-03-19 19:04:01 +00:00 (Migrated from github.com)

@foreverallama Hi!
Thanks for the information, this looks very interesting! I noticed some additional things while doing the latest beta with enumerations support, but I didn't update the description in objects.md yet. I need some time to think about all this, and to do more investigations.
Thanks again!

@foreverallama Hi! Thanks for the information, this looks very interesting! I noticed some additional things while doing the latest beta with enumerations support, but I didn't update the description in `objects.md` yet. I need some time to think about all this, and to do more investigations. Thanks again!
foreverallama commented 2025-03-20 17:12:29 +00:00 (Migrated from github.com)

Sure! I'm also looking into decoding what these flags mean in more detail using other objects, and maybe expand upon the existing documentation as well. I'd be happy to help out along these lines if needed 😄

Sure! I'm also looking into decoding what these flags mean in more detail using other objects, and maybe expand upon the existing documentation as well. I'd be happy to help out along these lines if needed 😄
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mahalex/MatFileHandler#31
No description provided.