Contribution to DBpedia
Main focus of I2G concerning the contribution to DBpedia are: quality and references. Both topics are in fact related as improving the quality also requires assessing the sources of information.
The first topic – quality – concerns building and training machine learning models for assessment of the quality of Wikipedia pages. We base on grades provided by community of Wikipedia users and editors. Basically, we are looking for features that are good predictors for article being featured article (FA) or good article (GA). There are differences in grading systems between languages therefore for each language a separate model is constructed. Currently we support 55 languages. We also consider popularity as the related factor for quality assessment. Popular articles, as visited by many users, tend to be corrected more often than less popular ones.
The second topic – references – involves extraction of references from Wikipedia articles and assessing their reliability by looking into external databases like Crossref or Altmetric. References are used not only in text of article but also in infoboxes. Therefore, they can be useful for checking the reliability and timeliness of the provided data. They are also a building block in Wikidata. Additional cross-language statistics are provided where well-recognized identifiers are used, e.g. DOI, PubMed, arXiv, ISBN.