Wiktionary 2 RDF Online Editor GUI 

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis
Involved communities: Wiktionary2RDF and therefore DBpedia and Wiktionary

The recently released Wiktionary extension of the DBpedia Extraction Framework allows to configure scrapers that produce RDF from Wiktionary (currently for 4–6 languages). The configuration is currently done by configuring an XML file that serves as a wrapper for a language specific Wiktionary edition. The community would greatly benefit from an online tool that helps to create and update such xml wrapper configurations.
The editor should be able to load an existing config file and then give the RDF output for a piece of Wiki syntax. Measure of success is how easy a wrapper can be created and maintained.
Links:

Wiktionary 2 RDF Live Extraction

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor (see also the Master thesis extension: Built an Application Suite on Top of the Wiktionary2RDF Live Extraction
Involved communities: Wiktionary2RDF and therefore DBpedia and Wiktionary

The recently released Wiktionary extension of the DBpedia Extraction Framework allows to configure scrapers that produce RDF from Wiktionary (at the moment for 4–6 languages). However, only static dumps are produced and made available as download here: http://downloads.dbpedia.org/wiktionary
The DBpedia Live Extraction is also built on the framework and receives regular updates from Wikipedia and keeps an RDF triple store in sync. On the basis of this synced triple store, we can find potential improvements and provide *simple* suggestions to Wiktionary editors.
Goal is:

  1. to have a running Live Extraction for Wiktionary 2 RDF
  2. provide simple suggestions for editors of Wiktionary with a GUI

The DBpedia Extraction Framework is written in Scala, the GUI can be in any language (Java Script seems to be a good choice, however)


Links:


Built an (NLP) Application Suite on Top of the Wiktionary 2 RDF Live Extraction

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, research, Master thesis (see also the prerequisite topic Wiktionary2RDF Live Extraction, which is included here )
Involved communities: Wiktionary2RDF and therefore DBpedia and Wiktionary

See description above (Wiktionary 2 RDF Live Extraction). This project has three potential flavours:

  1. collect existing tools and methods, which might be improved using Wiktionary data and then extend them and improve them. This includes tools such as QuicDic Apertium
  2. Create useful tools for the Wiktionary users and editors. Either by providing suggestions for editors of Wiktionary with a GUI e.g. based on Parsoid, checking consistency or automatically by bots
  3. Research: improve existing NLP systems based on the yet unproven assumption that Wiktionary data is much better than traditional lexica

Goal is:

  1. to have a running Live Extraction for Wiktionary 2 RDF
  2. built an application suite on top of the Live Extraction for Wiktionary 2 RDF that provides measurable benefits (for one of the three flavours)

Possible tools to include in the suite:


Links:


Web Annotation: Annotate It + NIF 

The tool Annotate It is written in Coffee Script and uses its own JSON format for data serialization. Goal of this project is to make it interoperable with NIF.
Links:

Web Annotation: My Annotation + NIF 

Improve Usability of MyAnnotation and make it interoperable with NIF.
Links:

NIF + Clarin's TCF 

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis
Involved communities: NLP2RDF, Clarin-D

Create a converter from Clarin's TCF format to NIF (round trip). The resulting software will be hosted as a web service and will connect the Clarin and NLP2RDF world.
Links:

Explication of Statistical Relations

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Research / Explorative, Master thesis
Good chance for a scientific publication, if results are good.

Recent advances in semantic technologies (RDF, SPARQL, Sparqlify, RDB2RDF, Wiktionary 2 RDF, NIF, LLOD ) allow us to efficiently combine statistical data sets such as the Wortschatz with structured data such as Wiktionary 2 RDF with context information found in text using NIF .
The goal of this project is to research if statistical measures such as significant left/right neighbour/sentence co-occurrences or iterated co-occurrences can be used to assert the “correct” relation between terms in text and explicate it as an RDF property. The available structured data
Examples:

  • He wears a green shirt.
  • He wears a green t-shirt.

“shirt” and “t-shirt” occur frequently with color adjectives and the verb “wear”.
Research Questions:

  • Can we entail synonymy of “shirt” and “t-shirt”? Generally, not, but in certain contexts, yes. How can we measure this and produce RDF?
  • What can we do with the combination of Wortschatz, Wiktionary, NIF in RDF?

Links:

Technology:


Use-Case-driven Development and Improvement of RDB2RDF Mapping (Sparqlify)

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de, Claus Stadler – cstadler@informatik.uni-leipzig.de
Type: Research / Engineering, Master thesis
Involved communities: OWLG, LLOD cloud

The members of the Working Group for Open Data in Linguistics OWLG have expressed great interest in the conversion of their databases to RDF. In this context, we see a good chance that the Sparqlify tool, which is developed by our research group can greatly help to speed up this process. We expect you to help convert and interlink a great number of linguistic data sets and produce high quality Linked Data bubbles for the LLOD cloud. This process will, of course, reveal many missing features and gaps for improvements of Sparqlify. Goal of the project is therefore:

  1. Create mappings from relational databases to RDF (data is supplied by community)
  2. Systematically collect gaps and missing features
  3. Based on 2.: fix and improve Sparqlify with the aim to creating a production-ready software with high impact and great performance (involves possible benchmarking).

Links:

Badge Server for Gamification of Reward System for Online Communities

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis, Internship
Involved communities: DBpedia and many more.

The AKSW research group is involved in many online communities. Although the members of these communities are highly active and contribute work to the projects, there are not any reward mechanisms in place to remunerate this effort)). This practical problem can be solved by a badge server, which implements three operations:

  1. an admin/moderator can add a badge (including a description, what has to be achieved and an image)
  2. a community member can apply for a badge, presenting proof that he has successfully completed the task
  3. the admin/moderator approves the request and awards the badge to the user.

Example badges: “Over 100 commits to the DBpedia framework”, «Created a language config for http://dbpedia.org/Wiktionary".
To be an effective system usability needs to be high and the awarding and displaying of badges need to be disseminate correctly, e.g. with RDF, RSS, Widgets, etc.
Before implementing the student has to do research and make sure, whether there is already a similar open-source implementation.


Overview Links:


Bridging NIF, ITS and XLIFF

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis (or Master thesis, if an integration into the Okapi Framework is done)
Involved communities: NLP2RDF and the http://www.w3.org/International/multilingualweb/lt/ MultilingualWeb-LT Working Group

ITS and XLIFF are two formats, which are widely used in the internationalisation and localization industry. The main goal of this thesis is to provide software and algorithms that are able to transform the both formats lossless into the newly developed NLP Interchange Format and back (round-trip bridge). One of the challenge is to provide an algorithm that allows to transform offset annotations into XML/HTML markup.
Overview Links:

Technical details:

Make the DBpedia resource page more interactive/attractive

Note: this project has to be in English only
Click here for an explanations about how the process works or write an email.
Contact: Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de, Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Voluntary student project / internship / Bachelor thesis
Involved communities: DBpedia

Change the DBpedia resource html display page (i.e. http://dbpedia.org/page/Steve_Martin) and make it more interactive by integrating information from other LOD datasets. This can be achieved by working on both the front-end (html/css/javascript) and the server-side. For the server side we use Open Link Virtuoso VSP technology (similar to PHP) but we are open to other suggestions as well. If the implementation is good, this work will be deployed to DBpedia.


Ideas on what we imagine to accomplish may come from Freebase.

Create a new geodata LOD dataset

Note: this project has to be in English only
Click here for an explanations about how the process works or write an email.
Contact: Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de, Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis / Master thesis
Involved communities: DBpedia, linkedgeodata

Bachelor thesis: The purpose of this thesis will be to create a new LOD Dataset from Yahoo WOEID – Where On Earth Identifier. WOIED is supported / updated by Yahoo, used by many geodata applications (i.e. Flickr) and is considered to be an established place categorization. After converting the dataset to triples, the next step will be to link it with other LOD datasets, such as Linked Geo Data, DBpedia, geonames.
Master thesis: It’s not very clear if this is qualified to be extended as a Master thesis, but an idea would be to use this and other LOD datasets and create a LOD tourist guide.

Semi-automatic Wikipedia article correction

Note: this project has to be in English only
Click here for an explanations about how the process works or write an email.
Contact: Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de, Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis
Involved communities: DBpedia and Wikipedia

We have developed a DBpedia extractor that extracts metadata from templates, thus, we have information for correct infobox usage in Wikipedia articles (metadata). Using this metadata we can check if Wikipedia authors use proper infobox declarations inside articles, identify wrong usage and suggest corrections. The benefits of resolving these errors are: 1) Correct the Wikipedia articles and 2) get better and more DBpedia triples.


We have 2 specific example cases here:

  • An editor misspells an infobox instance (i.e.“Infobox Artist” to “Infobox Artist”). This would result in hiding the infobox from the page display, as mediawiki would not know how to render it.
  • An editor misspells an infobox property (i.e. “birth place” to “birthplace”). This would also result in hiding the property from the Infobox display.

In both cases DBpedia gets wrong or no results at all.


There is already a simple preliminary shell script implemented for this but we are looking for a program implementation (preferably in Scala / Java in order to include it in the DBpedia Framework) that will also use LIMES for suggesting corrections.


Links:


 
Zu dieser Seite gibt es keine Dateien. [Zeige Dateien/Upload]
Kein Kommentar. [Zeige Kommentare]