Engineering analysis and the Semantic Web

Engineering analysis and the Semantic Web

2008-03-19, version 1.1

author: David Leal - mailto:david.leal@caesarsystems.co.uk
CAESAR Systems Limited - http://www.caesarsystems.co.uk

Abstract

This document describes how the Semantic Web can be used for the management of engineering analysis data. This technology requires two initial steps:

the creation of a vocabulary for engineering analysis by an authoritative organisation such as NAFEMS;
the assignment of URIs to analysis codes and file formats by vendors.

The document also describes how NAFEMS could assist vendors to assign permanent URIs.

Acknowledgement

This document has been funded by the EU FP6 project DEPUIS - http://www.depuis.enea.it/ (Design of Environmentally friendly Products Using Information Standards). The support from the European Union is gratefully acknowledged.

1 What is the Semantic Web

1.1 Basics of the Semantic Web

The "Semantic Web" is a web of information on the Internet (on an Intranet) which is annotated so that it can be accessed by precise queries, such as;

What activities used the results of the natural frequency analysis carried out on "MyTown by-pass bridge"?

Today you might use Google and search for human readable documents which contain the words or phrases "result", "natural frequency", "MyTown by-pass", "bridge". Perhaps you might get something relevant, but you may also get a lot of junk.

The "Semantic Web" is called "semantic" because a query can be formulated in terms of object types and properties with precise meaning - a vocabulary. In this example, the query relies upon a vocabulary which includes:

the activity type natural frequency analysis;
the property result (or output, valid for most types of activity);
the property uses (or input,valid for most types of activity);
the property subject of analysis (valid for an analysis activity).

Each of these object types and properties could be picked from a menu relevant to engineering analysis. The menu can be presented using words in any language - English, Dutch, Mandarin or Farsi.

A precise query also relies upon a precise identification of the object of interest - "MyTown by-pass bridge" in this example. This identification requires a new business practice - each object of interest is identified as an Internet "resource", and given a URI (see What is a URI). If the bridge is a project of MyTown District Council, then it may be given a URI by the council, such as http://www.my.town.gov.uk/projects/by-pass/bridge.

NOTE 1 Dereferencing the URI for an analysis type, such as heat diffusion, could obtain a representation of the governing equations:

∇·(k∇T) + Q

∂T

∂t

The equations could be represented using MathML. The definition of the analysis type could be explicit about whether non-linear behaviour, such as temperature dependence of the conductivity k, is taken into account.

NOTE 2 Many of the objects of interest of engineering analysis are fields, which can be described with respect to a mesh. An input loading for one simulation could be a result from another. Even the material properties could be a distribution resulting from a manufacturing process simulation.

Each of these fields could be represented in an open format, but initially the cost of moving from vendor formats to an open format may be too great. The Semantic Web approach works, and gives benefits, even if fields are still represented in vendor formats.

1.2 Why the Semantic Web

The Semantic Web allows you to record the information you have - whatever it is, however incomplete - precisely. The Semantic Web does not constrain what that information is, except that it must be recorded using a defined and public vocabulary.

The Semantic Web has advantages over traditional approaches to data warehouses, or PDM (Product Data Management) as follows:

you do not need a data base schema;

For a traditional approach, you have to define a data base schema, i.e. make a decision about the what data which can be recorded and what its structure is.

data can be distributed over the Web, if required;

The Semantic Web does not do security, but does not prevent it either. If some data is password protected, or behind a firewall, so be it.

it is cheap.

Companies such as SAP will create a bespoke system for you, at a price. The system will do what you specify and no more - it will define your business processes. If you don't want your business processes defined by a bespoke system, or don't have the money to buy one, then the Semantic Web may be for you.

There are free implementations of semantic web technologies, such as the query language SPARQL (see Semantic Web technologies).

1.3 More about the Semantic Web

The Semantic Web is a W3C (World Wide Web Consortium) activity.

A good overview is provided by W3C Semantic Web Activity - http://www.w3.org/2001/sw/

A follow-up article in the Scientific American by Lee Feigenbaum, Ivan Herman, Tonya Hongsermeier, Eric Neumann and Susie Stephens was published in December 2007, see The Semantic Web in Action - http://www.sciamdigital.com/index.cfm?fa=Products.ViewIssuePreview&ARTICLEID_CHAR=3734452E-3048-8A5E-1068474BA8D770C8. Access to this article on the Web requires a Scientific American subscription.)

1.4 Who is already using the Semantic Web

The development of the Semantic Web, just like the development of the Internet, was largely funded by the US DoD. The Semantic Web is currently in use for many military applications, including logistics. A presentation by John Gilligan, Chief Information Officer of the USAF, is The Semantic Web - Imagine the Possibilities - http://www.daml.org/meetings/2005/04/pi/DOD_Venues.pdf.

Early adopters of the Semantic Web are in health care and life sciences. This community has similar requirements to the engineering analysis community because:

it must deal with data from many different sources;
there is no concept of completeness for the data;
the data may be used for many different purposes.

For more information, see the W3C Semantic Web Health Care and Life Sciences Interest Group - http://www.w3.org/2001/sw/hcls/.

2 Semantic Web for engineering analysis

2.1 What a Semantic Web for engineering analysis can be

Engineering analysis involves lots of data sets in different formats:

problem definition - probably a human readable document;
shape definition in native formats or in ISO STEP format;
analyis mesh definition;
loading definition - perhaps initially as a person readable document, and then as a computer interpretable definition of a field over a part of the mesh;
materials definition - perhaps a combination of person readable and computer interpretable documents;
results data sets, from different types of analysis and for different states of the product - probably in a format defined by the analysis system;
feedback from analysis - probably a human readable document, but possibly a computer interpretable file containing an optimised shape or material selection.

These data sets are about different objects, and are outputs from and inputs to different engineering activities. If we identify the objects (which the data sets are about), and the activities (which use and create the data sets), then we have a "web" of information.

To make the web of information a "Semantic Web", it is necessary to annotate each data set, and specify:

what sort of data it is, e.g. natural frequency results;
what format it is in, e.g. SuperStruct version 4.5 results file.

It is necessary to ensure that the are no missing nodes in this web. If there are a number of different data sets about an object, then it is necessary to specify:

the identifier of the object e.g. http://www.my.town.gov.uk/projects/by-pass/bridge;
what that object is, e.g. Bridge;

It is necessary to ensure that the are no missing links in this web. If one data set is linked to another by an activity, then it is necessary to specify:

what sort of activity it is, e.g. create analysis model;
what input data sets there were, and what were their roles, e.g. material specification, and shape specification;
what output data sets were;
who performed the analysis and when;
the software tools used, e.g. SuperMesh version 4.5.3.

The Semantic Web annotation is "glue" which joins existing data sets together. These data sets can be in a open computer interpretable format (such as ISO STEP), in a proprietory computer interpretable format defined by an analysis software vendor (such as Catia or Siemens PLM), or in a human readable document format (such as MicroSoft .doc, Adobe .pdf).

NOTE In the long term, some of the data sets can be replaced by the "glue". For example, there may be no need to have a data set which describes an activity, if there are precise statements within the Semantic Web which specify its type, date, performer, inputs and outputs. Instead, all that is necessary is a URI to identify the activity.

2.2 The benefits of a Semantic Web for engineering analysis

A Semantic Web for engineering analysis will give the following benefits:

data won't be lost in forgotten dusty corners of your servers;
you will not need an expensive and inflexible PDM system;
data will be queryable across companies - it you want it to be;
you will be able to navigate back along the audit trail to material test results and analysis validation test results, if required.

One day, "due dilligence" will require a Semantic Web for engineering analysis - or something like it.

Early adopters will get benefits, if only because they will be able to find:

previous analyses which can be modified for a new purpose;
the people who did the previous analyses, and who may be able to help.

3 How to create a Semantic Web for engineering analysis

3.1 Who will create a Semantic Web for engineering analysis

The semantic web requires that every thing of interest ("resource" in web jargon) has a unique identifiers on the Web (a URI - Uniform Resource Identifier). The things are defined and identified by different people, as follows:

problem owners;

The problem owners define all the important things - the products, the operating environments of the products, the loading cases, the manufacturing activities. It is up to the problem owners to identify these things.
analysts;

The analysts define some things - the individual activities which they perform, and different models they create for different behaviours of the products, and lots and lots of data sets. It is up to the analysts to identify these things.
data suppliers;

Data suppliers define some things - material product types, standard loading cases, standard assessment criteria. It is up to the suppliers of data about these things, such as ASTM, DoD, and regulatory authorities, to identify them.
standardisation bodies;

The key information about a product is "what sort of thing is it" - a bridge, a building, a transmission tower, a pressure vessel. This is a classification of a product with respect to a standard class. This classification may determine what types of analysis are required, and what codes of practise are relevant. It is up to standardisation bodies, such as ISO, IEC, API, to identify these classes.
NAFEMS;

A basic vocabulary for engineering analysis is required, containing terms such as:
- the activity type natural frequency analysis;
- the property output;
- the information type mode frequencies;
- the information type mode shapes;
- the property input;
- the information type analysis mesh;
- the information type analysis boundary conditions;
- the property subject of analysis.
NAFEMS can define and identify these terms.
analysis system vendors;

It is necessary to identify:
- the code (and precise version) which is used to carry out an activity;
- the analysis type performed by the code;
- the format (and precise version of the format) for all files which are read or written by codes.
It is up to the vendors to identify codes, analysis types and file formats.

The essential first steps to create a Semantic Web for engineering analysis have to be taken by NAFEMS and the vendors. Problem owners and analysts can then modify their business practices to take advantage of what is available.

3.2 A NAFEMS core

NAFEMS can define a basic vocabulary for engineering analysis.

The Dublin Core - http://dublincore.org/documents/2008/01/14/dc-rdf/ is a basic vocabulary for document meta-data, defining terms such as title, author, publisher, language, subject. NAFEMS can do the same for engineering analysis.

NAFEMS could to do this in liaison with ISO TC184/SC4. The ISO STEP standard for engineering analysis (ISO 10303-209) contains an activity model for engineering analysis which defines types of analysis activity and types of analysis information. This activity model is now nearly 20 years old and needs updating, but it is nonetheless a useful starting point.

3.3 The role of the vendors

The vendors control the versions of native file formats, analysis codes and analysis types. The vendors have an obligation to give unique identifiers:

to versions of file formats, so that their customers can assign meta-data to input and results files; and
to versions of analysis condes and analysis types, so that their customers can record what analyses have been performed.

For the Semantic Web, these identifiers need to be URIs.

One approach would be for vendors to allocate URIs within their own Internet domain. Hence if Fred Bloggs and Co. has has the domain http://www.fred.bloggs.co.uk, it could allocate:

to the file format "SuperStruct version 4.5 results file", the URI http://www.fred.bloggs.co.uk/format/SuperStruct/4.5/results; and
to the preprocessor "SuperMesh version 4.5.3", the URI http://www.fred.bloggs.co.uk/application/SuperMesh/4.5.3.

A drawback to this approach is that URI are expected to persist unchanged (see Cool URIs don't change - http://www.w3.org/Provider/Style/URI). Unfortunately, the owners of analysis codes do change. It could be said that this doesn't matter very much because a URI is only an identifier. However, this is not the full story because:

a company might be unhappy that its code is for ever identified by a URI within the domain of the former owner; and
what happens when you "go to a URI" using a Web browser is controlled by the owner of the domain (which may not be the owner of the code).

Quite reasonally, if you go to a URI which identifies a code, you expect information about that code, such as:

a description of the code, and information about any subsequent versions;
information about bugs; and
a contact number for technical support.

NAFEMS could help by offering a registry service. Hence a file format or code could be given a NAFEMS URI, such as:

http://www.nafems.org/format/SuperStruct/4.5/results; and
http://www.nafems.org/application/SuperMesh/4.5.3.

The NAFEMS site could host a brief description of the code or file format, which would remain unchanged. The NAFEMS site could provide a link to the web site of the current analysis code owner. This link could change from time to time.

A What is a URI

A.1 Use of a URI

A URI (Uniform Resourse Identifier) is a unique identifier of a thing, for use by the Internet.

Anybody can assign a URI to any thing. Hence I can assign the URI http://www.caesarsystems.co.uk/animals/Babar to "BaBar the Elephant". It does not matter that:

"BaBar the Elephant" is not part of the Internet in any sense;
CAESAR Systems Limited does not own "Babar the Elephant";
if you go to ("dereference" in Web jargon) http://www.caesarsystems.co.uk/animals/Babar with your web browser. the only thing that happens is that you get "HTTP error 404" - i.e. the server returned nothing.

The first part of a URI, http://www.caesarsystems.co.uk in this case, determines whether or not you trust it. If you believe that CAESAR Systems Limited is an appropriate authority for identifying fictitious animals, then you are free to use this identifier.

It is good, if HTTP access to an HTTP URI actually returns something. If access to http://www.caesarsystems.co.uk/animals/Babar obtains a file which is readable by your browser (an HTML file say), and which tells you what/who "Babar the Elephant" was/is, then this is useful. If HTTP access obtains a file which says "the CAESAR Systems dictionary of fictitious animals is available from all good bookshops", then this is useful too - but less so.

If NAFEMS were to assign the URI http://www.nafems.org/vocabulary/NaturalFrequencyAnalysis to the concept of natural frequency analysis, then many would trust that NAFEMS has provided an authoritative definition.

A.2 Types of URI

There are two principal types of URI:

HTTP URI, formerly called URL (Uniform Resource Locator) which starts http://, and which uses / as a field separator thereafter.

There are many billions of these URIs in use. All you need is an Internet domain, and you can assign them at will.
URN (Uniform Resourse Name) which starts urn:, and which uses : as a field separator thereafter.

There are many thousands of these in use. You have to negotiate an agreement with IETF (Internet Engineering Task Force) in order to use them, and a few organisations such as ISBN and ISO have done so.

Since the use of URNs is six orders of magnitude smaller than the use of HTTP URIs, we can safely forget about them.

A.3 What a URI identifies

A URI can identify anything. Sometimes a URI identifies an electronic document which can be downloaded over the Web. Sometimes a URI identifies something else.

A URI can identify Babar the Elephant or the Eiffel Tower. Neither can be downloaded - the first because it is a fictitious animal, and the second because it is 2000 tonnes of steel.

Dereferencing an HTTP URI may cause a document to be downloaded to your browser. This does not mean that the URI identifies the document. The document is a "representation" of the object identified by the URI, which the owner of the domain has chosen to provide. The owner of the domain, may not choose to provide a document at all - so in this case you will get "HTTP error 404".

Sometimes an HTTP URI identifies an electronic document, and when you dereference the URI the document is what you get.

NOTE There is an ambiguity about what an HTTP URI identifies - is it a thing, where the document is merely a representation, or is it the document itself. In practice, the ambiguity is something which we can live with.

B Semantic Web technologies

B.1 Semantic Web standards and software

The semantic web relies upon two basic technologies:

RDF does what it says - it is a methodology for describing resources. The resources can be data sets, or other things. RDF statements are published on the Web. The statements can be queried using SPARQL Query Language for RDF - http://www.w3.org/TR/rdf-sparql-query/. A free implementation of SPARQL is provided by Jena - http://jena.sourceforge.net/, which was initial developed by HP Labs Semantic Web Research - http://www.hpl.hp.com/semweb/.

RDF is intended to be extended by vocabularies. OWL - http://www.w3.org/2004/OWL/ is a basic vocabulary for vocabularies, which is usually the first extension to RDF.

B.2 Telling a story with the Semantic Web

The story is about a bridge:

Figure 1: My Town by-pass bridge

The story is:

There is a bridge, which is identified by My Town District Council.
There is a state of the bridge with the design load lorry at mid-span.
A.D. Vance and Partners carried out a stress analysis of the bridge using "SuperStruct version 4.5.7.
The calculated description of the stress distribution within the bridge for the design load lorry at mid-span is in file http://www.a.d.vance.co.uk/projects/MT_BB/run3/result#HA_MidSpan.stress.

The web of objects is as follows:

Figure 2: About the analysis of My Town by-pass bridge

This web of objects can be thought of as just meta-data for the file http://www.a.d.vance.co.uk/projects/MT_BB/run3/result#HA_MidSpan.stress, but really it is much more - it is a record of the problem and of what was done.

Each object in Figure 2 is defined, and assigned a URI, by somebody. Each player has his or her own namespace (the front bit of the URI) as follows:

My Town District Council - http://www.my.town.gov.uk
A.D Vance and Partners (the consulting engineer) - http://www.a.d.vance.co.uk
The Institution of Civil Engineers (who provide a vocabulary about civil structures) - http://www.ice.org.uk
NAFEMS (who provide a vocabulary about engineering analysis) - http://www.nafems.org
Fred Bloggs and Co. (who own SuperStruct) - http://www.fred.bloggs.co.uk

Unfortunately, computers cannot process the diagram shown in Figure 2. Hence there has to be a text representation of the diagram. An representation of RDF as XML is widely used. Using XML, we can represent the statements:

"My Town by-pass bridge" is a bridge, and has state "MT-BB - HA at mid-space".
"MT-BB - HA at mid-space"is a state.

as:

<rdf:RDF
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:owl="http://www.w3.org/2002/07/owl#
         xmlns:nafems="http://www.nafems.org/vocabulary/">

 <owl:Thing rdf:about="http://www.my.town.gov.uk/projects/by-pass/bridge">
  <rdf:type rdf:resource="http://www.ice.org.uk/vocabulary/Bridge">
  <nafems:hasState rdf:resource="http://www.a.d.vance.co.uk/projects/MT_BB/HA_MidSpan">
 </owl:Thing>

 <owl:Thing rdf:about="http://www.a.d.vance.co.uk/projects/MT_BB/HA_MidSpan">
  <rdf:type rdf:resource="http://www.nafems.org/vocabulary/State">
 </owl:Thing>

</rdf:RDF>

This is OK for computers, but not easily readable by people. Fortunately, there is an alternative - Notation 3 "A readable language for data on the Web" - http://www.w3.org/DesignIssues/Notation3. The same statements can be represented in Notation 3 (or "N3") as:

@prefix ice:    http://www.ice.org.uk/vocabulary/ .
@prefix nafems: http://www.nafems.org/vocabulary/ .
@prefix myTown: http://www.my.town.gov.uk/projects/ .
@prefix adv:    http://www.a.d.vance.co.uk/projects/ .

myTown:by-pass/bridge a               ice:Bridge ;
                      nafems:hasState adv:MT_BB/HA_MidSpan .

adv:MT_BB/HA_MidSpan  a               nafems:State .

This is simpler and more readable (once you have got past the namespace specifications). The next statements are:

"ADV - MT-BB run 3" is a "stress analysis" activity.
"ADV - MT-BB run 3" analyses "MT-BB - HA at mid-space".
"ADV - MT-BB run 3" runs analysis code "SuperStruct 4.5.7".
"ADV - MT-BB run 3" gives result file http://www.a.d.vance.co.uk/projects/MT_BB/run3/result#HA_MidSpan.stress.

Using N3, these statements can be represented simply as follows:

@prefix nafems: http://www.nafems.org/vocabulary/ .
@prefix adv:    http://www.a.d.vance.co.uk/projects/ .
@prefix fbc:    http://www.fred.bloggs.co.uk/ .

adv:MT_BB/run_3  a                        nafems:StressAnalysis ;
                 nafems:analyses          adv:MT_BB/HA_MidSpan ;
                 nafems:runsAnalysisCode  fbc:application/SuperStruct/4.5.7 ;
                 nafems:givesResult       adv:MT_BB/run3/result#HA_MidSpan.stress .

Two presentations on the use of the Semantic Web for engineering analysis are:

Semantic Web technologies for the management of Engineering Analysis processes and data - http://www.caesarsystems.co.uk/research_projects/DEPUIS/nafems_esa_presentation_2007.ppt;

This was presented at the NAFEMS-ESA seminar on Engineering Analysis Quality, Verification and Validation, in December 2007
Beyond data models (what we can do with vocabularies/ontologies alone) - http://www.caesarsystems.co.uk/research_projects/DEPUIS/iso_tc184-sc4_presentation_2007-03.ppt.

This was presented at the Open Technical Forum of ISO TC184/SC4 in March 2008.

C Questions for analysis system vendors

Do you already assign a unique identifier to each version of a file format, application code, or analysis type? This will enable your users to record precise meta-data about file types and about analysis activities.
If you do, is information about the version of the file format, and the version of the creating software included within output data files? Having the information within the file is good. The Semantic Web requires the information be available as meta-data outside the file as well.
Do you already assign a URI to each version of a file format, application code, or analysis type? A URI makes the identification unique on the Web, and enables a Semantic Web approach.
Do you already provide information about a version of a file format, application code, or analysis type, on the web? If you do is this information obtained by dereferencing the URI of the file format, application conde or analysis type? If the format of an archived file is specified by a URI, then one day your customer might want to access that URI to find out what it is.
Would you register a file format, application code, or analysis type with an outside body in order to obtain a permanent URI? Perhaps your customers would feel happier if there was an access route to information about your file formats, analysis codes and analysis types using the Web, which would remain unchanged even if you were taken over by another company.