Converting Text into FIBO-Aligned Semantic Triples

Ontologies are playing a major role in federating multiple sources of structured data within enterprises. However, the unstructured documents remain mostly untouched or require manual labor to be included into consolidated knowledge management process.

At Lymba, we are developing a knowledge extraction tool that automatically identifies instances of concepts/classes and relations between them in the text. The extraction is driven by a semantic model or ontology. For example, using FIBO terminology, the system recognizes time/duration constraints in contracts, money values, and their meaning - transaction value, penalty, fee, etc. and links them to the parties in the contract.

The extracted knowledge is represented in a form of semantic triples, which can be persisted in an RDF storage to allow integration with other sources, inference, and querying. One more useful add-on is natural language querying capability, when a query like “Find clauses with time constraints for payor” is automatically converted into semantic triples, and then into SPARQL.

This talk will give the audience an overview of Lymba’s knowledge extraction pipeline, as well as our knowledge representation framework. Semantic parsing and triple-based representation provide a bridge between semantic technologies and NLP, leveraging inference techniques and existing ontologies. We will show how Lymba’s Semantic Calculus framework allows easy customization of the solution to different domains.

Tatiana Erekhinskaya is a Research Scientist and Product Manager at Lymba Corporation. She received a PhD degree in Computer Science from the University of Texas at Dallas with a dissertation on probabilistic models for text understanding. Tatiana has been working in Natural Language Processing for more than 10 years. In her career, she acted as a technical leader on a broad range of projects that included misspelling-robust syntactic parsing for Russian, the first syntax-based opinion mining for Russian, and more recently semantics-driven projects for English in the medical domain, national security, and enterprise applications. One of her latest projects is knowledge extraction from Chinese texts. Her primary research areas are deep semantic processing and big data with a special emphasis on the medical domain.