Equinor/Statoil’s Volve Dataset – Well Technical Data – part 5

The following article is based on the original blog post here.

This is the last post in a series where we investigate the information related to well trajectories and casing design released by Equinor in their Volve dataset.

Marius Kjeldahl
CTO
TwitterLinkedinEmail

We continue where we left of in the previous article.

Next up we have a few rubbish entries in the xml file:

<ALWAYS-UPDATE>
</ALWAYS-UPDATE>
<execsql command="delete from CD_POLYLINE_HEADER  where header_id in (select header_id from CD_POLYLINE_HEADER where attachment_locator like 'policy_id=(vyxGFUk61o)+project_id=(vnRTGWCauO)+site_id=(WD03eCqbHz)%' or attachment_locator='policy_id=(vyxGFUk61o)+project_id=(vnRTGWCauO)' or attachment_locator='policy_id=(vyxGFUk61o)')"/>
<execsql command="delete from CD_CHANGE_HISTORY_JOURNAL  where parent_locator like 'policy_id=(vyxGFUk61o)+project_id=(vnRTGWCauO)+site_id=(WD03eCqbHz)%'"/>
<execsql command="delete from CD_EXPLORER_FOLDER  where folder_id in (select folder_id from CD_EXPLORER_FOLDER where parent_locator like 'policy_id=(vyxGFUk61o)+project_id=(vnRTGWCauO)+site_id=(WD03eCqbHz)%')"/>

The execsql commands I’ve commented on before. The empty ALWAYS_UPDATE doesn’t accomplish much I guess. After these however, there is a BIG section with data such as this:

<LINKTABLE>
    <CD_PROJECT_TARGET_SITE_LINK project_id='vnRTGWCauO' project_target_id='0wZoY' site_id='WD03eCqbHz' />
    <CD_PROJECT_TARGET_SITE_LINK project_id='vnRTGWCauO' project_target_id='1KmC3' site_id='WD03eCqbHz' />
...
    <CD_PROJECT_TARGET_WB_LINK well_id='2DfUHhBj3x' wellbore_id='3W9ZjV2yj0' project_id='vnRTGWCauO' project_target_id='aHqay' />
    <CD_PROJECT_TARGET_WB_LINK well_id='2DfUHhBj3x' wellbore_id='3W9ZjV2yj0' project_id='vnRTGWCauO' project_target_id='aOdk3' sequence_no='0' />
...
    <CD_PROJ_TARG_WELL_LINK project_id='vnRTGWCauO' project_target_id='0wZoY' well_id='z6foKJAbj2' />
    <CD_PROJ_TARG_WELL_LINK project_id='vnRTGWCauO' project_target_id='1KmC3' well_id='3X2QhAYU8Z' />
...
    <CD_PROJ_TARG_SCENARIO_LINK project_id='vnRTGWCauO' project_target_id='0wZoY' well_id='z6foKJAbj2' wellbore_id='1j6NvagBYE' scenario_id='yhiIk' />
    <CD_PROJ_TARG_SCENARIO_LINK project_id='vnRTGWCauO' project_target_id='1KmC3' well_id='3X2QhAYU8Z' wellbore_id='8tpZWNk9CK' scenario_id='dLjER' />
...
    <CD_SCENARIO_FORMATION_LINK well_id='2DfUHhBj3x' wellbore_id='uLkXB6Hbm2' scenario_id='qaNuv' wellbore_formation_id='ficOq' />
    <CD_SCENARIO_FORMATION_LINK well_id='2DfUHhBj3x' wellbore_id='Z96sl4wqH0' scenario_id='vYY0k' wellbore_formation_id='2j9Ld' />
...
</LINKTABLE>

From a data point of view, these entries does not contain any actual data, it just groups entries. I’m not entirely sure if it is needed at all, since a lot of the data objects I’ve looked at contain these links explicitely as part of the data (the ..._id entries). They might be important to the programs using the data, I have no real way of knowing.

The next sections from here look troubling though:

<BINARY_DATA data_encoding="base64">
    <CD_ATTACHMENT_JOURNAL  attachment_id="TKCMq" attachment_journal_id="03i5H" attachment_locator="policy_id=(vyxGFUk61o)+project_id=(vnRTGWCauO)+site_id=(WD03eCqbHz)+well_id=(Z4mkCnnlAH)+wellbore_id=(3upbskyIdF)+scenario_id=(MxiPG)+parameter_id=(lMXyw)" attachment_name="" attachment_point="TU_CASE_PARAMETER" attachment_type="0" create_app_id="COMPASS" create_date="{ts '2013-01-23 09:03:00'}" create_user_id="jies(JIES@STATOIL.NET)" description="" flags="" is_compressed="" is_visible="N" mime_type="dws-app-data/stresscheck" update_app_id="StressCheck 5000.1" update_date="{ts '2018-04-11 13:46:52'}" update_user_id="jies(JIES@STATOIL.NET)">
        <CD_ATTACHMENT attachment_id="TKCMq" create_app_id="COMPASS" create_date="23.01.13 09:03" create_user_id="jies(JIES@STATOIL.NET)">
DwAAAAgAAAD//v8JVwBvAHIAawAgAGEAcgBlAGEAAAAAAAIAAQAqADkAAABkAAAAAAAHBAAAAABDBAAAAAAAAAAAAAD//v8KQwBzAGcAIABjAG8AbgBmAGkAZwABAAAAAgABACsAOAAAAGQAAAAAAAcEAAAAABQIAAAAAAAAAAAAAP/+/wVDAG8AbgBuAHMAAQAAAAEAAQBkAAAAAABkAAAAAAAUCAAAAAAAAAAAAAAAAAAAAAD//v8GUwBrAGUAdABjAGgAAQAAAAEAAQBkAAAAAABkAAAAAABOBAAAAAAAAAAAAAAAAAAAAAD//v8FQgB1AHIAcwB0AAAAAAACAAIALwA0AAAAMAAzAAAAVgRFBAAASARpBAAAAAAAAAAA//7/CEMAbwBsAGwAYQBwAHMAZQAAAAAAAgACADEAMgAAAEMAIAAAAEoERgQAAEkEagQAAAAAAAAAAP/+/wVBAHgAaQBhAGwAAAAAAAIAAgA0AC8AAABBACIAAAAiDEcEAABMBGsEAAAAAAAAAAD//v8LQwBvAG0AcAByACAAYwBoAGUAYwBrAAEAAAABAAEAZAAAAAAAZAAAAAAANQwAAAAAAAAAAAAAAAAAAAAA//7/CVQAcgBpAC0AYQB4AGkAYQBsAAEAAAABAAIAZAAAAAAANAAvAAAAGggZCAAAAAAAAAAAAAAAAAAA//7/CE0AaQBuACAAUwBGACcAcwABAAAAAQABAGQAAAAAAGQAAAAAAHMEAAAAAAAAAAAAAAAAAAAAAP/+/wlXAGUAbABsACAAcwB1AG0AbQABAAAAAQABAGQAAAAAAGQAAAAAAFsEAAAAAAAAAAAAAAAAAAAAAP/+/wtTAHQAcgBpAG4AZwAgAHMAdQBtAG0AAQAAAAEAAQBkAAAAAABkAAAAAADTCwAAAAAAAAAAAAAAAAAAAAD//v8KVwBlAGEAcgAgAGEAbABsAG8AdwABAAAAAQABAGQAAAAAAGQAAAAAAA0IAAAAAAAAAAAAAAAAAAAAAP/+/xRQAG8AcgBlACwAIABGAHIAYQBjACwAIABNAFcALAAgAFQAZQBtAHAAAQAAAAEAAgBkAAAAAABDACAAAAAiBCMEAAAAAAAAAAAAAAAAAAD//v8MUABhAHQAaAAsACAARABvAGcAbABlAGcAAQAAAAEAAgBkAAAAAABEAB8AAABPBFoEAAAAAAAAAAAAAAAAAAA=        </CD_ATTACHMENT>
    </CD_ATTACHMENT_JOURNAL>
...
</BINARY_DATA>

Why is this troubling? Well, it’s binary data. When I decode it I get more binary data. It contains references to “Work area”, “Collapse” , “Axial”, “Compr”, “Min SF”, “String”, “Wear allowdd”, “Pore”, “Frac”, “MW”, “TempdC”, “Path”, “Dogleg” etc. These are all common terms related to casing design and load cases. The trouble is that nobody else - except Compass/StressCheck - can probably make sense of the data it contains, without serious efforts into reverse engineering their internal data formats.

If we ever want to liberate data from old legacy systems, we need to get rid of all such unreadable blobs, and replace them with sensible data that both computers and humans can read.

Looking at some other of the blobs, I see references to “Well schematic”, “Design Load Line”, “Pipe Rating” etc. Another blobs show reference to casing steel catalogs. Again, casing design related data. Best case, these are output data only. Worst case, they are also input data. Again, neither can be read or written by anything else than the original programs that created the blobs. Further detective work on other blobs strongly indicate that there are large volumes of input data hidden inside these blobs.

The rest of the file consist of these blobs, which is then approximately 55% of the whole EDM XML data file dump.

0
0
April 15, 2019