Equinor/Statoil’s Volve Dataset – Well Technical Data – part 1

The following information is based on the original article here. It was written by Oliasoft CTO Marius Kjeldahl

Equinor (formely Statoil) has released a dataset in open for anybody who wants to peek into what kind of data oil companies typically need and/or produce. It can be found here15 (registration needed).

Oliasoft is a software company providing software and services for oil- and oil service companies, and we work with Equinor and many other companies to modernize the architecture and software products used by the industry. This article is the first in a series where we dive into the open dataset Equinor has released publicly, discussing what kind of data it actually is and how it can and can not be used in modern software services such as our own.

Marius Kjeldahl
CTO
TwitterLinkedinEmail

Oliasoft has a suite of cloud-based engineering tools, commonly referred to as Oliasoft ICE - Integrated Cloud Engineering, and our first product Oliasoft WellDesign, dealing with trajectories, casing- and tubing design.

In this first article I will take a look at the dataset named Well Technical Data, and discuss how we can make use of this data in Oliasoft WellDesign or similar products.

The dataset is a 212MB zip file. After unzipping it, we are left with a directory structure as follows:

Well_technical_data
├── CasingSeat
├── CasingWear
├── Compass
├── Daily\ Drilling\ Report\ -\ HTML\ Version
├── Daily\ Drilling\ Report\ -\ XML\ Version
├── Daily\ Drilling\ report\ -\ PDF\ Version
├── EDM.XML
├── EDT_EDM_read_me.txt
├── Site
├── Site_TemplateSlot
├── StressCheck
├── WellPlan
├── WellWellbore
├── Wellcat
└── license.txt

Of these, the directory Compass is easy to describe. Compass is the name of Landmark Software’s Windows software for designing well trajectories. In this dataset it’s easy to describe, the directory is empty (except a comment about it being left intentionally blank). I have no idea why it has been left blank. Maybe Equinor will put some actual data in there eventually. Oliasoft’s tools for the same are included in Oliasoft WellDesign. Similarily, the directories for CasingSeatCasingWear and WellPlan are also marked empty in the same way.

I suspect the actual reason why these directories exist is that all these products are quite old. When they were made, everything was based on file input and file output, and both data and software had to live and run on the same computer systems running Windows. Later, as computers became networked, and smart people started separating data storage from the software using the data, architectures evolved to typically use networked database servers. Landmark (and at least some their products) were modernized to support such architectures, and also made it easier to share data across Landmark’s own products. This ended up as the well known EDMdatabase that most Landmark programs can read and write data to. It’s worth mentioning though that the actual data inside EDM is still pretty much tied to the software that uses it. It is very hard for other software to access EDM and create higher order data from it, because the logic for creating such objects typically still lives inside the Windows based (thick client) software. The files EDT_EDM_read_me.txt says the data file EDM.XML has data for most (if not all) of the Landmark products Equinor has dumped data from.

There are more directories that map directory to Landmark products, namely StressCheck and Wellcat. These are also two windows programs, dealing with casing- and tubing design. They contain StressCheck (*.sck) and Wellcat (*.wcd) data files. These are binary files and are only readable for people who already have access to StressCheck and Wellcat. That means from a digitalzation point of view, or if you need to migrate this data to other systems, that these data formats are virtually useless. For all practical purposes they are only readable by the original software that produced the files, and virtually unreadable for everybody else. If this is the only format you have the data in, moving the data will be expensive and/or very time consuming. I’m guessing some of the data for StressCheck and Wellcat also lives inside the EDM data dump also provided, we will see later.

The file EDT_EDM_read_me.txt contains the following text:

Statoil 2018-04-11
EDT_EDM

The data in the EDT_EDM folder structure is the EDT software export using the software from Landmark.

EDT_EDM
CasingSeat
CasingWear
Compass
EDM.XML
Site
Site_TemplateSlot
StressCheck
Wellcat
WellPlan
WellWellbore

This pretty much confirm the suspicion mentioned that most of the data used in the Landmark products lives inside the EDM database. The directory EDM.XML furthermore contains the following files:

EDM.XML
├── Volve\ F.edm.xml
└── Volve\ F.edm.zip

The xml file is 201MB. The zip file is the same as the xml, only zipped. There’s really no reason to provide both an xml and zip version of the same file inside this file dump, so it’s probably done by mistake.

The EDM database

Looking at the Volve\ F.edm.xml file, it looks very much like some kind of XML dump of a complete relational database with multiple tables. I’ll make some notes about my observations below.

The file contains almost 321 000 lines of XML code. If you want to inspect this file manually you probably need a specialized tool, or at the very least an editor that has decent performance for even large files such as this one.

After a couple of mandatory XML headers and start tags, the XML file shows what looks as relatively direct dumps of database tables. The first part is found in a section named TOPLEVEL, which I will discuss below.

The first section I find contain entries such as these:

<TU_TEMP_DERATION_SCHED schedule_name="4145H MOD [SH-]" temp_deration_sched_id="SH494145HM      " />
<TU_TEMP_DERATION_SCHED schedule_name="S-135_2 [SH]" temp_deration_sched_id="SH92S-135_      " />
<TU_TEMP_DERATION_SCHED schedule_name="V-150 [SH]" temp_deration_sched_id="SH21V-150[      " />
<TU_TEMP_DERATION_SCHED schedule_name="4145H MOD [SH]" temp_deration_sched_id="SH284145HM      " />
<TU_TEMP_DERATION_SCHED is_api="N" schedule_name="SM125S (Active)" temp_deration_sched_id="H9hEldsgdvAV3KHa" />
<TU_TEMP_DERATION_SCHED is_api="N" schedule_name="C-110 (Active)" temp_deration_sched_id="CF6dk7mlsB036oyR" />
<TU_TEMP_DERATION_SCHED schedule_name="XH" temp_deration_sched_id="XH              " />
<TU_TEMP_DERATION_SCHED is_api="N" schedule_name="S13CrS110 (Active)" temp_deration_sched_id="MPdFRZLGL6kKUFlQ" />

This looks like a table for temperature deration name and a relation to another table through temp_deration_sched_it. The name is typically the name given, and the id is used to connect the name to actual data. For instance the first entry points to another entry in the same file which looks like this:

<CD_GRADE
    grade="4145H MOD [SH-]"
    grade_id="4145MOD" min_yield_stress="1.5840000009598175E7"
    create_date="{ts '2003-07-27 17:07:56'}"
    fatigue_endurance_limit="2880000.381480192"
    is_api="N"
    create_user_id="edm"
    ultimate_tensile_strength="2.0160000581818465E7"
    create_app_id="WELLPLAN 2003.5"
    update_date="{ts '2015-02-09 22:35:13'}"
    update_user_id="akis(AKIS@STATOIL.NET)"
    update_app_id="StressCheck 5000.1"
    temp_deration_sched_id="SH494145HM      "
    radial_yield_factor="100.0"
    hoop_yield_factor="100.0"
/>

Next I find the following entries:

<CD_LITHOLOGY_CLASS lithology_id="18" lithology="SANDSTONE" display_code="52" />
<CD_LITHOLOGY_CLASS lithology_id="DWS38" lithology="SANDSTONE" />
<CD_LITHOLOGY_CLASS lithology_id="Empty" lithology="Empty" />

The lithology_id is an identifier used by CD_WELLBORE_FORMATION entries to point to these entries. Other than the name in lithology, these lithology entries does not seem to carry any additional information.

Next I find entries such as these:

<CD_MATERIAL 
  temp_deration_sched_id="MPdFRZLGL6kKUFlQ"
  material_id="3ERFPCqCi9"
  density="489.9968731139747"
  material="S13CrS110 (Active)"
  youngs_modulus="4.469482903629459E9"
  poissons_ratio="0.27"
  expansion_coefficient="5.938888888888894E-6"
  radial_yield_factor="100.0"
  hoop_yield_factor="100.0"
  is_api="N"
  create_date="{ts '2009-12-01 17:24:39'}"
  create_user_id="gfol"
  create_app_id="StressCheck 2003.16"
  update_date="{ts '2014-10-17 17:13:25'}"
  update_user_id="olsk(OLSK@STATOIL.NET)"
  update_app_id="WELLPLAN 5000.1"
  />

These are obviously material definitions. The one showing has a reference to a temp_deration_sched_idalready mentioned, but some of these have no such relations as well. It is identified by the material_id. Inside this file, this specific material is references in entries such as the ones shown below:

<CD_GRADE
  grade="S13CRS110 (ACTIVE)"
  material_id="3ERFPCqCi9"
  grade_id="NHxtfZqVig"
  min_yield_stress="1.5840000009598175E7"
  create_date="{ts '2009-12-01 17:24:40'}"
  is_api="N"
  create_user_id="gfol"
  ultimate_tensile_strength="1.6560058584168866E7"
  create_app_id="StressCheck 2003.16"
  update_date="{ts '2015-02-09 22:35:14'}"
  update_user_id="akis(AKIS@STATOIL.NET)"
  update_app_id="StressCheck 5000.1"
  temp_deration_sched_id="MPdFRZLGL6kKUFlQ"
  radial_yield_factor="100.0"
  hoop_yield_factor="100.0"
  />

This just ties the material to a certain steel grade, with additional grade related properties.

<CD_ASSEMBLY_COMP
  well_id="2DfUHhBj3x"
  wellbore_id="tRSGmE9FNr"
  assembly_id="jdKWy"
  assembly_comp_id="n9EZQ"
  sect_type_code="CAS"
  material_id="3ERFPCqCi9"
  grade_id="NHxtfZqVig"
  average_joint_length="40.0"
  closed_end_displacement="0.047599885638892"
  catalog_key_desc="7 in, 29,000 ppf, S13CRS110 (ACTIVE), Vam TOP HT"
  id_drift="6.059"
  connection_grade="NHxtfZqVig"
  connection_name="Vam TOP HT"
  fatigue_endurance_limit="0.0"
  density="489.99687311397"
  grade="S13CRS110 (ACTIVE)"
  length="5479.00262465"
  id_body="6.184"
  linear_capacity="0.037149142290183"
  material="S13CrS110 (Active)"
  joint_strength="929.40275867223"
  joints="137"
  manufacturer="Sumitomo"
  nominal_size="7"
  makeup_torque="17701.4915832"
  poissons_ratio="0.27"
  min_yield_stress="109999.95655464"
  nominal_weight="29,000"
  od_body="7.0"
  od_connection="7.656"
  thermal_expansion_coef="5.9444444444445"
  ultimate_tensile_strength="115000.42133754"
  pressure_burst="11220.0"
  approximate_weight="29.0"
  pressure_collapse="8531.6351225771"
  yield_weight_body="929436.91930818"
  youngs_modulus="3.1038075718E7"
  sequence_no="400000"
  wall_thickness_percent="87.5"
  comp_type_code="CAS"
  press_rating_top="0.0"
  press_rating_bot="0.0"
  wl_bore_size="0.0"
  pocket_size="0.0"
  create_user_id="thhi(THHI@STATOIL.NET)"
  create_app_id="WELLPLAN 5000.1"
  create_date="{ts '2013-06-13 16:17:52'}"
  update_app_id="WELLPLAN 5000.1"
  update_date="{ts '2013-06-13 17:12:05'}"
  />

CD_ASSEMBLY_COMP looks like a casing component.

Next there is a list of CD_GRADE entries which I’ve shortly described already, in addition to these:

<CD_GRADE_SECT_TYPE sect_type_code="CAS" grade_id="4145MOD" />
<CD_GRADE_SECT_TYPE sect_type_code="DC" grade_id="4145MOD" />
<CD_GRADE_SECT_TYPE sect_type_code="CAS" grade_id="9QW0P1MJVP" />

These CD_GRADE_SECT_TYPE just seems to map actual grades (through grade_id) to sect_type_code. Not sure why (seems redudant).

Next in file there is this one entry (and only one of this type):

<CD_CLASS service_class="NEW"
  class_id="KnPlQ"
  wall_thickness_percent="100.0"
  description="New Pipe"
  create_date="{ts '2006-03-01 11:56:47'}"
  create_user_id="HBL3739"
  create_app_id="WELLPLAN 2003.14"
  update_date="{ts '2006-03-01 11:56:47'}"
  update_user_id="HBL3739"
  update_app_id="WELLPLAN 2003.14"
  />

It simply looks like a casing type, probably something Wellplan requires. I’ll take this opportunity to mention the field create_app_id and update_app_id. These contain identifiers identifying where the entry first got created, and later updated. So far most of the entries mentioned were created by StressCheck, but the last two was created by the Wellplan application.

Next in the file I find entries such as these:

<CD_REAL_TIME_CONFIG code="SPPA" config_id="00815" curve_mnemonic="PUMP" />
<CD_REAL_TIME_CONFIG code="HKLD_AVG" config_id="00K99" curve_mnemonic="TFHA" />
<CD_REAL_TIME_CONFIG code="ECD_SHOE" config_id="00MiZ" curve_mnemonic="ECD" />

I’ve done a couple of searches to see if I can find references to config_id used by other entries in the same file, but so far I haven’t found much. So not much further analysis related to these now. What I can comment on is that there are a lot of these entries, and based on the field curve_mnemonic it looks like “simple” configuration settings (but I haven’t found the actual “values” used yet).

Next there are entries such as these:

<WP_CUSTOM_BASE_FLUID fluid_id="0196J" fluid_name="INNOVERTGuardian" />
<WP_CUSTOM_BASE_FLUID
  fluid_id="042FB"
  fluid_name="NeoDrill"
  percent_oil="70.0"
  percent_water="30.0"
  conc_nacl="10.0"
  temp_density="70.0"
  solids_sg_avg="2.0" />
<WP_CUSTOM_BASE_FLUID fluid_id="0BT69" fluid_name="70-30" />

I haven’t found other refererences to these fluids (through fluid_id) yet. But there are a lot of these, and fluid properties are pretty important for a whole class of calculations, so I’m sure I’ll find some later.

Next there are some simple geographical definitions, like these:

<CD_GEO_SYSTEM
  geo_system_id="UTM"
  measure_id="121"
  geo_system_name="Universal Transverse Mercator"
  />
<CD_GEO_ZONE
  geo_system_id="UTM"
  geo_zone_id="UTM-31N"
  zone_name="Zone 31N (0 E to 6 E)"
  lat_origin="0.0"
  lon_origin="3.0"
  standard_lat0="0.0"
  scale_factor="0.9996"
  standard_lat1="0.0"
  standard_lon0="0.0"
  standard_lat2="0.0"
  standard_lon1="0.0"
  false_easting="500000.0"
  false_northing="0.0"
  radius_sphere="0.0"
  skew_azimuth0="0.0"
  skew_azimuth1="0.0"
  proj_type="0"
  />
<CD_GEO_ELLIPSOID
  geo_ellipsoid_id="INTERNATIONAL"
  name="International 1924"
  semi_major="6378388.0"
  first_eccentricity="0.0819918899790292"
  />

This is pretty standard map related stuff, defining map references and what model of earth is used in calculations.

This is also the end of the TOPLEVEL definitions.

Final observations here is that most things starting with CD_ is related to casing design (StressCheck, possibly also Wellcat) and things starting with WP_ are WellPlan type data.

Part 2 of this series is available here.

0
0
April 14, 2019