Ontologies are playing a major role in federating multiple sources of structured data within enterprises. However, the unstructured documents remain mostly untouched or require manual labor to be included into consolidated knowledge management process.
At Lymba, we are developing a knowledge extraction tool that automatically identifies instances of concepts/classes and relations between them in the text. The extraction is driven by a semantic model or ontology. For example, using FIBO terminology, the system recognizes time/duration constraints in contracts, money values, and their meaning - transaction value, penalty, fee, etc. and links them to the parties in the contract.
The extracted knowledge is represented in a form of semantic triples, which can be persisted in an RDF storage to allow integration with other sources, inference, and querying. One more useful add-on is natural language querying capability, when a query like “Find clauses with time constraints for payor” is automatically converted into semantic triples, and then into SPARQL.
This talk will give the audience an overview of Lymba’s knowledge extraction pipeline, as well as our knowledge representation framework. Semantic parsing and triple-based representation provide a bridge between semantic technologies and NLP, leveraging inference techniques and existing ontologies. We will show how Lymba’s Semantic Calculus framework allows easy customization of the solution to different domains.