Information on the Web for REACH and LCA

David Leal
Version 0.1, 2008-05-22

1 Introduction

There are many reasons why companies may wish to publish computer interpretable information on the Web about their products. Two significant ones are:

For REACH it is necessary to know the chemical composition of each part within an assembly (where "part" includes adhesives and coatings). For LCA it is necessary to know what resources have been consumed and what waste has been emitted to the environment to make each part within an assembly. In each case, the manufacturer of a product needs to navigate back down the supply chain collecting information about parts and parts of parts. The Semantic Web is the obvious tool to implement this. A natural approach is:

The manufacturer of a product can run a computer application which navigates back down the supply chain processing the computer interpretable information published by each tier of supplier.

A key question is: "What is the format of the of the computer interpretable information that is published on the Web?" This document argues that:

  1. the RDF (Resource Description Framework) component of the Semantic Web is the best available base technology;
  2. most of the concepts necessary for the RDF vocabulary have been defined in ISO TC184/SC4 standards, or in the LCA standards developedby ISO TC207/SC5,such as the data documentation format ISO/TS 14048;
  3. most of the concepts are currently locked up within information models, where they are not useable;
  4. the automated generation of an RDF vocabulary from existing ISO TC184/SC4 information models will not give a useable vocabulary.

If these points are accepted, then the next step is the extraction by hand of an RDF vocabulary from ISO TC184/SC4 and ISO TC207/SC5 standards which supports the REACH and LCA data requirements.

2 Key role of RDF

2.1 An RDF formula

RDF (Resource Description Framework) is a way of representing statements and publishing them on the Web.

EXAMPLE 1 The following statements can be made about the item with URI http://www.f.bloggs.co.uk/items/98-12345:

The RDF diagram for these statements is shown in Figure 1.

Electrical equipment item RDF

Figure 1: Electrical equipment item RDF

RDF has representations as XML or as N3. N3 is much more readable, and is supported by software such as the Tabulator plug-in for Firefox.

EXAMPLE 2 The diagram in the previous example can be represented in RDF as follows:

<http://www.f.bloggs.co.uk/items/98-12345>
    a                        vocab:ElectricalEquipmentItem ;
    vocab:serialNumber       "98-12345" ;
    vocab:manufacturer       <http://www.f.bloggs.co.uk/> ;
    vocab:operatingVoltage   [ vocab:volt 230 ] .

<http://www.f.bloggs.co.uk/>
    vocab:registeredName     "Fred Bloggs and Company Limited" .

In this example, "vocab" is a reference to a formal vocabulary which is published somewhere on the Web, which makes the statements computer interpretable and semantically precise.

A set of RDF statements is called a "formula".

2.2 Dereferencing

Objects can be identified on the Web by URIs.

NOTE 1 A URI can but need not identify computer file. A URI can identify a real world object such as the Eiffel Tower, a person, an organisation, or a type of object provided by a supplier.

If a URI is an HTTP URI, then it can be dereferenced - i.e. accessed over the Web using HTTP. Deferencing obtains a representation of the object.

NOTE 2 If a URI identifies a computer file, then it is reasonable that dereferencing should obtain the file. (This is not always what happens because the owner of the file may want you to identify yourself, or pay, first.).

If a URI identifies something else, then only a representation can be obtained. You cannot download the Eiffel Tower to your computer because it is 2000 tonnnes of steel.

Often the representation of the object is an HTML file, which can be displayed to a person on a browser. The HTML file may contain links to other objects, which can be dereferenced in turn to obtain other HTML files. This is the basis of Web 1.0.

Dereferencing can obtain an RDF formula. The way in which a client can choose the form of the representation is explained in Cool URIs for the Semantic Web. An RDF formula contains statements which mention the URIs of objects. These URIs can be dereferenced to get other formulae. The navigation from semantically precise formula to semantically precise formula is the basis of Web 3.0.

EXAMPLE 1 In the RDF example in section 2.1, the equipment item has the URI http://www.f.bloggs.co.uk/items/98-12345. This URI could be readable from the item itself as text on the manufacturers plate, as a bar code, or in a RFID chip. Dereferencing this URI could obtain either:

The computer interpretable formula can be processed by an application using a query language such as SPARQL. It can also be displayed by an RDF browser.

EXAMPLE 2 The Tabulator plug-in for Firefox displays the RDF example as shown in Figure 2.

Browser display of RDF

Figure 2: Browser display of RDF

2.3 Why Web published data is different

Using the Semantic Web we have:

data pull
the person who wants the data navigates the Semantic Web to get what he or she actually wants.

rather than:

data push
the person who sends the data gathers together into a "lump" all the data that he or she things the receiver wants.

In the data push scenario, data models are crucial because they can be an agreement about what is exchanged. In the data pull scenario, data models are less important because the information is distributed in many different formulae. It is up to the user to check that he or she has obtained all the necessary information. A data model can be a criterion for completeness can consistency against which the published information is checked.

The same published information can be checked against different information models for different purposes.

EXAMPLE 1 The formula in the RDF example could be separated into two formulae, so that:

Neither formula complies with a data model which requires that "the registered name shall be specified for the manufacturer of an equipment item", but the information gathered from the two does.

EXAMPLE 2 A producer of a product may publish information about the product on the Web. The information could include the product structure, a materials breakdown to support REACH compliance, and LCA information about manufacturing, use and disposal activities.

A user of the product can retrieve the information of interest, and ignore the rest. Perhaps the LCA information complies with ISO/TS 14048, and perhaps the product structure information complies with a conformance class of ISO 10303-214. Maybe not, it is up to the user to check.

3 What is a data model

NOTE 1 The terms "data model" and "information model" are regarded as synonyms within this document.

A data model defines what types of information are contained within a"lump",and what types of information are excluded. A data model is designed to support the exchange of information between types of activity. A model ensures that:

NOTE 2 The strong link between data models and activities is the reason for the inclusion of an activity model within each ISO 10303 Application Protocol.

EXAMPLE 1 A conformance class of AP 203 ensures that all the information necessary to define a shape is present. A conformance class excludes information which cannot be processed. Hence if a CAD system can process a boundary representation but not CSG, then a conformance class is chosen which excludes CSG.

The EXPRESS shown in Figure 3 is not a data model.

Not a data model

Figure 3: Not a data model

This says what you can say, but not what you shall say because all the attributes are optional. Hence it is merely a vocabulary. It is the representation in EXPRESS of the vocabulary used in the RDF example. This vocabulary can be representated equivalently in RDF/OWL without any loss of information, as follows:

:EquipmentItem           a                owl:Class .

:ElectricalEquipmentItem a                owl:Class ;
                         rdfs:subClassOf  :EquipmentItem .

:Person                  a                owl:Class .

:Organization            a                owl:Class .

:ElectricalPotential     a                owl:Class .

:manufacturer            a                owl:ObjectProperty ;
                         rdfs:domain      :EquipmentItem ;
                         rdfs:range       :Person ;
                         rdfs:range       :Organization .

:serialNumber            a                owl:DatatypeProperty ;
                         rdfs:domain      :EquipmentItem .

:operatingVoltage        a                owl:FunctionalProperty ;
                         rdfs:domain      :ElectricalEquipmentItem ;
                         rdfs:range       :ElectricalPotential .

:volt                    a                owl:DatatypeProperty ;
                         rdfs:domain      :ElectricalPotential .

:givenName               a                owl:DatatypeProperty ;
                         rdfs:domain      :Person .

:familyName              a                owl:DatatypeProperty ;
                         rdfs:domain      :Person .

:registeredName          a                owl:DatatypeProperty ;
                         rdfs:domain      :Organization .

NOTE 3 In the early days of ISO 10303 development, it was suggested that all the attributes in the resource models should be optional, and that they should only be made mandatory by constraints in Application Protocols. This was not followed, probably because this approach is less valid for geometry and topology where many applications require the same constraints.

The ISO 10303 schemas are clearly data models. The ISO 15926-2 schema looks very different, and does not constrain the data that shall be contained within a conformant "lump". The role played by the different sort of data model in ISO 15926-2 is discussed in section 5.

4 Strengths and weaknesses of data models

Traditional data models can ensure that a "lump" of data is complete and consistent for a particular activity. This is useful in two cases:

an agreement between a sender and receiver of data about what shall be sent
The parties can choose a data model, often one that is an international standard, as the basis of the agreement. Compliance with the agreement can be validated by software tools.
an interface to a software package
A software designer can implement input from, and export to, a "lump" which is in accordance with a data model.

The link between data models and activities makes this possible. It is also the weakness of traditional data models, because there are many activities with different information requirements - so there are many different information models.

NOTE ISO 10303 has created an architecture which builds different data models to support different activities from common components. This has been largely successful, but at the cost of awesome complexity. This complexity has the potential to crush the entire ISO 10303 project.

Some constraints on the completeness and consistency of data are common to all ISO 10303 APs. If these constraints are incompatible with an activity, then using ISO 10303 is a problem.

EXAMPLE 3 ISO 10303-41, which is a part of every ISO 10303 AP, says that:

If the activities performed by an organisation are not consistent with these constraints, then the organisation has the following choices:

  1. change the activities to comply with ISO 10303-41;
  2. create dummy data;
  3. ignore ISO 10303-41 and use some other standard.

Organisations have opted for (2) or (3) according to whether the other parts of ISO 10303 are of sufficient value.

5 Is ISO 15926-2 a data model?

ISO 15926-2 is not a data model in the sense that the ISO 10303 Application Protocols are data models, because ISO 15926-2 does not constrain the data that shall be recorded. Instead, ISO 15926-2 is two things:

A vocabulary complies with ISO 15926-2, if it consists of:

Just looking at a set of RDF statements, it is not possible to tell whether a they comply with ISO 15026-2. They comply if the relationship with ISO 15926-2 has been recorded by a file which specifies the relationship.

EXAMPLE Consider again the RDF example. Most of vocabulary can be mapped to ISO 15926-2, as follows:

vocab:ElectricalEquipmentItem rdfs:subClassOf    iso15926-2:physical_object .

vocab:serialNumber            rdfs:subPropertyOf iso15926-2:class_of_identification .

vocab:operatingVoltage        rdfs:subPropertyOf iso15926-2:indirect_property .

vocab:volt                    rdfs:subPropertyOf iso15926-2:property_quantification .

The relationship is expressed more neatly if we move up a metalevel to the power sets on the ISO 15926-2 side, as follows:

vocab:ElectricalEquipmentItem a                  iso15926-2:class_of_physical_object .

vocab:serialNumber            a                  iso15926-2:class_of_class_of_identification .

vocab:operatingVoltage        a                  iso15926-2:class_of_indirect_property .

vocab:volt                    a                  iso15926-2:scale .

This file can be published on the Web as quality metadata about the vocabulary.

The property vocab:manufacturer does not fit into ISO 15026-2 except as a class_of_class_of_relationship. Hence it is probably best to regard this as outside the ISO 15926-2 quality validation. A formal use of ISO 15926-2 for this purpose would record a manufacturing activity, for which the organisation has the role of performer, and the equipment item has the role of output.

NOTE 1 In the EU funded CASCADE project, the vocabulary for LCA defined in ISO/TS 14048 was expressed as RDF, and mapped to ISO 15926-2. This may be of little interest to most users of the RDF vocabulary for LCA, but it is not an implementation overhead. For some users, the mapping to ISO 15926-2 gives useful additional information about the precise semantics of the vocabulary.

A business process can insist that only vocabularies with a documented relationship with ISO 15926-2 are used. This mapping may ensure a level of quality within the vocabulary.

6 Implementing REACH and LCA

A conformance class could be defined for an ISO 10303 standard such as ISO 10303-214 or ISO 10303-237, which ensured that all data necessary for REACH or LCA standard compliance was recorded. This would be useful for companies which already implement ISO 10303, but it is not practical as a solution for the Web published information. The key problems are:

In order to implement REACH and LCA over the Web, an RDF vocabulary is required to cover:

Many of the concepts necessary for this vocabulary already exist within ISO TC84/SC4 standards, but they are locked away within data models which impose constraints.

7 Making it happen

The following steps are needed to support the publication of product data on the Web:

Key liaisons for this activity may include:

This activity also fits within the proposal to ISO TC184/SC4 for "Standardisation for Environmental Evaluation of Manufacturing Systems", from MSTC (Manufacturing Science and Technology Center), Japan, presented by Professor Fumihiko Kimura.