My colleague in India, Biswanath Dutta and I recently had another paper about our work on the CODO ontology accepted for publication. In this version we had a much larger amount of data that we incorporated from spreadsheets published by the Indian government on the spread of the pandemic. The spreadsheets did not follow any canonical format. The same information could be represented in many different ways. As a result we had to spend a significant amount of effort writing transformations to match patterns in the data and transform them into objects and property values. This paper describes our process for implementing these transformations. One of the things that I only realized after writing the paper is that we essentially redefined the standard ETL (Extract, Transform, Load) model to be ELT (Extract, Load, Transform). I recently found an interesting article that indicated we aren't unique in using this approach.
We followed an Agile approach in our development process. This was noteworthy because in my review of the literature on various ontology engineering methodologies many of them embrace a waterfall model of development. Our experience in this project was consistent with virtually all my experience developing software both for research and business applications: that an agile approach provides better productivity, quality, and risk reduction than the waterfall model. In this case we found that we had to evolve our design. We began with a simple design in order to deliver value ASAP. As we received more data and added additional capabilities we accordingly refactored our design. The paper is available at the following link: An Agile Approach to Knowledge Graph Development.
I will present the work at the KGSWC 2021 Conference on 11/24/21. My presentation can be found here: KGSWC 2021 Presentation.
Comments