finding common superclass and length of path in class hierarchies

Tag: rdf , sparql , owl , dbpedia Author: limingzhi00001 Date: 2013-10-14

I have two classes, A and B, from DBpedia. How can I calculate the distance (number of edges) from each class to a common superclass C, and how can I find this common superclass?

Other Answer1

You can do this, but a couple of things should be noted first:

  1. Two classes may have lots of superclasses in common, not necessarily just one. This means that there may not be a unique most specialized common superclass.
  2. If some class C is a superclass of A and B, then every superclass of C is also a superclass of A and B.
  3. A class D might be a superclass of C by multiple paths, which can cause some difficulties if you're trying to compute length. E.g.,

    Computer Hardware
      Monitors
        Flatscreen Monitors
          Dell Flatscreen Monitors  *
      Dell Hardware
        Dell Flatscreen Monitors    *
    

    In this hierarchy, Dell Flatscreen Monitors is a subclass of Computer Hardware by a path of length 2 (DFM ? DH ? CH) and by a path of length 3 (DFM ? FM ? M ?CH). That's fine, but if you're computing a length from DFM to another subclass of CH, which of those should you use?

  4. There might not be any common superclasses in the data. This is also a perfectly legal situation. Now, in OWL, every class is a subclass owl:Thing, but that doesn't hold for RDF in general, and you probably won't even get that result from DBpedia because there's no OWL reasoner attached.

Assuming that you can work out the details that you need to address those issues, this isn't too hard. It's easiest, in my opinion, to build up this query step by step. First, using a query like this, you can get the superclasses of a class, and the length of the path to each of the superclasses. This does presume that there is a unique path from the subclass to the superclass. If there are multiple paths, I think the length reported will be the sum of the different paths. I'm not sure how you could get around this.

select ?sub ?super (count(?mid) as ?length) where {
  values ?sub { dbpedia-owl:Person } 
  ?sub rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf+ ?super .
}
group by ?sub ?super

SPARQL results

sub                                super                               length
http://dbpedia.org/ontology/Person http://dbpedia.org/ontology/Agent   1
http://dbpedia.org/ontology/Person http://www.w3.org/2002/07/owl#Thing 2

Now the trick is to use this approach for both the subclasses, and then join the results based on the superclasses that they have in common, using a query like this:

select * 
{
  values (?a ?b) { (dbpedia-owl:Person dbpedia-owl:SportsTeam) }

  { select ?a ?super (count(?mid) as ?aLength) { 
      ?a rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?a ?super
  }
  { select ?b ?super (count(?mid) as ?bLength) { 
      ?b rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?b ?super
  }
}

SPARQL results

That query still finds the path lengths for all the common superclasses, not just most specific ones, and it's still not adding the length from ?a to ?super and the length from ?b to ?super to get the full path length. That's just a bit of arithmetic though. You can order these results by the length, and then limit to just one result so that you're getting the shortest one. As I pointed out, there might not be a unique most specific common subclasses, but the result with the shortest length will be one of the most specific common subclasses.

select ?a ?b ?super (?aLength + ?bLength as ?length)
{
  values (?a ?b) { (dbpedia-owl:Person dbpedia-owl:SportsTeam) }

  { select ?a ?super (count(?mid) as ?aLength) { 
      ?a rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?a ?super
  }
  { select ?b ?super (count(?mid) as ?bLength) { 
      ?b rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?b ?super
  }
}
order by ?length
limit 1

SPARQL results

a      b          super length
Person SportsTeam Agent 3

comments:

I want the FIRST common superclass!
@user2837896 I'm putting the query together now, but note that there may not be a single most-specific common superclass. There can be multiple most-specific superclasses.
I understand. But How can I do to calculate this if the classes are referenced by different namaspaces? for example dbpedia.org/ontology/NameOfTheClass_A and dbpedia.org/class/yago/NameOfTheClass_B ?
The namespace doesn't have anything to do with it. The predefined namespace declarations allow you to write yago:PeopleFromSantiago for one and dbpedia-owl:Writer for the other.
@user2837896 I can certainly see why you'd expect there to be individuals that are members of both, but why would there need to be any superclasses? The DBpedia classes are from this ontology. The Yago classes and hierarchy is defined to aid in interoperability with the Yago vocabulary. There's no reason to assume that someone added rdfs:subClassOf links between classes from the two hierarchies. Indeed, that's a very difficult thing to do, because the class definitions may be very subtle.