Student Opportunities

Disclaimer: Alle Aufgaben (z.B. Abschlussarbeiten) können auch komplett auf Deutsch gemacht werden (Ausnahme sind Publikationen auf Konferenzen, Code und Dokumentation, Mailinglisten und andere Kommunikation mit Partnern und eingebundenen Communities). Die Aufgabenstellung sind alle auf Englisch, da er sich an ein breiteres Publikum wendet und da man den Text so besser wiederverwenden kann.

Inhalt

Motivation

Dear student,
personally, I am convinced that students should only receive tasks and topics, that make sense and provide real value to 

  1. the supervisor;
  2. an online open-source or research community;
  3. the student him-/herself.

The reason is simple: There will be three stakeholders of your work, which will happily invest counselling time in your work:

  1. the supervisor (normally me) will have an interest in providing you with good counselling as your results will be better;
  2. the online community will provide feedback (e.g. on mailing lists), if it is interested in your work (if not you should rethink and rather ask, what you should do differently);
  3. you will be motivated as you see that (a) what you do is valuable; (b) you are getting proficient in state-of-the-art technologies (and possibly unix shell scripting); (c) if you are successful, you probably have earned some “badges”, that can go on your laundry list of achievements, which can be any of:
  • a public Git Hub account showing your code (see this page and search for github );
  • participation in a well-known, high impact, international, open-source projects such as DBpedia, NLP2RDF or DL-Learner;
  • scientific publications;
  • other publications such as blog entries, tutorial, online documentation;
  • almost everything we do is open-source and you will have good and verifiable online references and visibility;

Overall, doing your internship, Bachelor / Master thesis with me should give you the opportunity to meet my requirements for new Ph D students (see below).

Types

Opportunities, I offer, generally include :

  • internships (Praktikum und Seminar) for the modules Semantic Web, Software aus Komponenten, Softwaretechnikpraktikum
  • Bachelor / Master Theses
  • Ph D Theses (Note that I can only help you acquire funding, not supervise it as I do not have tenure, see below)
  • other

The point other is especially important. If you have any ideas, please contact me. I will also supervise projects of students outside of University of Leipzig as well as projects outside of academia. So if you are interested in implementing something without writing a thesis, please contact me.
Note that, I will not supervise projects, that are not within my interest.
Regarding the topic and handling I distinguish between these three types of projects:

  • Engineering: Focus is on proper process, i.e. good requirements engineering, good design, right choice of technology, working software. This type is suited well for bachelor theses.
  • Research: At the core of this is a relevant scientific problem and probably an experimental evaluation you will need to conduct. So, yes there might be an implementation involved, but the code and or engineering does not matter. The most important thing is the correctness, the good design of the experiment and the results. The main output here is basically: tables (either statistical or comparing PDF Documentrelated work (see page 10 ), graphs and diagrams and hard facts. Classically this is a master thesis or a Seminararbeit (homework paper)
  • Explorative: The topic is quite new, not clearly defined and presumably there is not a lot of previous work. You will need to look around, analyse what is there and then find the gap yourself. Normally, you will have to work with a lot of bleeding edge software frameworks (which sometimes won't even compile). Ideally, this is a task for an internship (Praktikum), because there is a certain risk of failure (i.e. previous work covers the topic already, problem is not easily solvable with current techniques, etc... ). For the internship, I will let you pass, if you tried hard enough (at least you learnt something, that is what internships are for :) .

Of course you can often not plan anything really. So sometimes you will start an engineering work and end up doing something explorative as you suddenly encounter a hard problem.


For any larger project, I require that you write a very rough Exposé or synopsis, which will:

  • show me, that you understood the problem and know what you have to do;
  • (for German students) show me, that you are able to write in English (if you choose English as your language)
  • list related work (e.g. link list and one to two sentences per approach), showing that you have done some research
  • contain a plan on how to proceed (e.g. initial idea of experiment or list of requirements)
  • contain a rough time schedule with milestones (this is for your own time management and self-control)

The exposé (Wikipedia, synopsis) should be a Google Doc / Etherpad and does not have any formatting requirements (pertinence over layout).

Process

During the course of your project(s), you will receive feedback from me, in the form of audits. Audits should be minimum of 30 minutes up to a maximum of 90 minutes. You should suggest dates for the audit and also make an agenda, what part of your work (e.g. code, outline, releases, chapter, design, benchmark results) we will review during the meeting.
Some simple rules:

  • please provide a new agenda and audit date 48 hours after we had the last audit. Nach dem Spiel ist vor dem Spiel.
  • Each audit agenda should have concrete deliverables from you side.
  • If you do not succeed to finish all of your deliverables, there are two options: 1. we still conduct the audit and analyse where the problems are (maybe shorten the time a little), 2. we reschedule the audit

Ph D Students


Note that I can only help you acquire funding for your Ph D time, not supervise it as I do not have tenure, yet. Dr. Sören Auer or Prof. Fähnrich will be your supervisors.
We are always looking for new Ph D students (Doktoranden), so please feel free to send applications. If we think you are suitable, we will help you to apply for funding, e.g. DAAD PhD scholarships or overview .


Besides sending the usual CV with your picture per email, please include your bachelor and master thesis, a link to any open-source projects (including any mailing lists you have been active on), a list of publications (either scientific or otherwise, e.g. documentation, blog posts) and your favourite unix one line command (ideally related to RDF).


Publishing Guidelines

Here is a collection of rules and guidelines, how we publish at AKSW:

  • Virtually all projects are done using Latex and SVN
  • Latex guideline is: one line per sentence to resolve merging conflicts faster
  • Everything you submit should be read *once more* before submission to correct spelling mistakes and improve grammar and style
  • Images and table should be placed [tb] (top, bottom), this saves space (about 3–6 lines in LNCS )
  • References should be shortened to ideally take up only two lines. Use abbreviations heavily.
  • All images and screenshots should be cropped to make the content larger. They should be really dense with information, i.e. no uneccessary, wasted white space.
  • *Never* add the .aux .log .blg .bbl .pdf files to svn

List of Topics


Wiktionary 2 RDF Online Editor GUI 

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis
Involved communities: Wiktionary2RDF and therefore DBpedia and Wiktionary

The recently released Wiktionary extension of the DBpedia Extraction Framework allows to configure scrapers that produce RDF from Wiktionary (currently for 4–6 languages). The configuration is currently done by configuring an XML file that serves as a wrapper for a language specific Wiktionary edition. The community would greatly benefit from an online tool that helps to create and update such xml wrapper configurations.
The editor should be able to load an existing config file and then give the RDF output for a piece of Wiki syntax. Measure of success is how easy a wrapper can be created and maintained.
Links:

Wiktionary 2 RDF Live Extraction

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor (see also the Master thesis extension: Built an Application Suite on Top of the Wiktionary2RDF Live Extraction
Involved communities: Wiktionary2RDF and therefore DBpedia and Wiktionary

The recently released Wiktionary extension of the DBpedia Extraction Framework allows to configure scrapers that produce RDF from Wiktionary (at the moment for 4–6 languages). However, only static dumps are produced and made available as download here: http://downloads.dbpedia.org/wiktionary
The DBpedia Live Extraction is also built on the framework and receives regular updates from Wikipedia and keeps an RDF triple store in sync. On the basis of this synced triple store, we can find potential improvements and provide *simple* suggestions to Wiktionary editors.
Goal is:

  1. to have a running Live Extraction for Wiktionary 2 RDF
  2. provide simple suggestions for editors of Wiktionary with a GUI

The DBpedia Extraction Framework is written in Scala, the GUI can be in any language (Java Script seems to be a good choice, however)


Links:


Built an (NLP) Application Suite on Top of the Wiktionary 2 RDF Live Extraction

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, research, Master thesis (see also the prerequisite topic Wiktionary2RDF Live Extraction, which is included here )
Involved communities: Wiktionary2RDF and therefore DBpedia and Wiktionary

See description above (Wiktionary 2 RDF Live Extraction). This project has three potential flavours:

  1. collect existing tools and methods, which might be improved using Wiktionary data and then extend them and improve them. This includes tools such as QuicDic Apertium
  2. Create useful tools for the Wiktionary users and editors. Either by providing suggestions for editors of Wiktionary with a GUI e.g. based on Parsoid, checking consistency or automatically by bots
  3. Research: improve existing NLP systems based on the yet unproven assumption that Wiktionary data is much better than traditional lexica

Goal is:

  1. to have a running Live Extraction for Wiktionary 2 RDF
  2. built an application suite on top of the Live Extraction for Wiktionary 2 RDF that provides measurable benefits (for one of the three flavours)

Possible tools to include in the suite:


Links:


Web Annotation: Annotate It + NIF 

The tool Annotate It is written in Coffee Script and uses its own JSON format for data serialization. Goal of this project is to make it interoperable with NIF.
Links:

Web Annotation: My Annotation + NIF 

Improve Usability of MyAnnotation and make it interoperable with NIF.
Links:

NIF + Clarin's TCF 

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis
Involved communities: NLP2RDF, Clarin-D

Create a converter from Clarin's TCF format to NIF (round trip). The resulting software will be hosted as a web service and will connect the Clarin and NLP2RDF world.
Links:

Explication of Statistical Relations

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Research / Explorative, Master thesis
Good chance for a scientific publication, if results are good.

Recent advances in semantic technologies (RDF, SPARQL, Sparqlify, RDB2RDF, Wiktionary 2 RDF, NIF, LLOD ) allow us to efficiently combine statistical data sets such as the Wortschatz with structured data such as Wiktionary 2 RDF with context information found in text using NIF .
The goal of this project is to research if statistical measures such as significant left/right neighbour/sentence co-occurrences or iterated co-occurrences can be used to assert the “correct” relation between terms in text and explicate it as an RDF property. The available structured data
Examples:

  • He wears a green shirt.
  • He wears a green t-shirt.

“shirt” and “t-shirt” occur frequently with color adjectives and the verb “wear”.
Research Questions:

  • Can we entail synonymy of “shirt” and “t-shirt”? Generally, not, but in certain contexts, yes. How can we measure this and produce RDF?
  • What can we do with the combination of Wortschatz, Wiktionary, NIF in RDF?

Links:

Technology:


Use-Case-driven Development and Improvement of RDB2RDF Mapping (Sparqlify)

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de, Claus Stadler – cstadler@informatik.uni-leipzig.de
Type: Research / Engineering, Master thesis
Involved communities: OWLG, LLOD cloud

The members of the Working Group for Open Data in Linguistics OWLG have expressed great interest in the conversion of their databases to RDF. In this context, we see a good chance that the Sparqlify tool, which is developed by our research group can greatly help to speed up this process. We expect you to help convert and interlink a great number of linguistic data sets and produce high quality Linked Data bubbles for the LLOD cloud. This process will, of course, reveal many missing features and gaps for improvements of Sparqlify. Goal of the project is therefore:

  1. Create mappings from relational databases to RDF (data is supplied by community)
  2. Systematically collect gaps and missing features
  3. Based on 2.: fix and improve Sparqlify with the aim to creating a production-ready software with high impact and great performance (involves possible benchmarking).

Links:

Badge Server for Gamification of Reward System for Online Communities

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis, Internship
Involved communities: DBpedia and many more.

The AKSW research group is involved in many online communities. Although the members of these communities are highly active and contribute work to the projects, there are not any reward mechanisms in place to remunerate this effort)). This practical problem can be solved by a badge server, which implements three operations:

  1. an admin/moderator can add a badge (including a description, what has to be achieved and an image)
  2. a community member can apply for a badge, presenting proof that he has successfully completed the task
  3. the admin/moderator approves the request and awards the badge to the user.

Example badges: “Over 100 commits to the DBpedia framework”, «Created a language config for http://dbpedia.org/Wiktionary".
To be an effective system usability needs to be high and the awarding and displaying of badges need to be disseminate correctly, e.g. with RDF, RSS, Widgets, etc.
Before implementing the student has to do research and make sure, whether there is already a similar open-source implementation.


Overview Links:


Bridging NIF, ITS and XLIFF

Click here for an explanations about how the process works or write an email.
Contact: Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis (or Master thesis, if an integration into the Okapi Framework is done)
Involved communities: NLP2RDF and the http://www.w3.org/International/multilingualweb/lt/ MultilingualWeb-LT Working Group

ITS and XLIFF are two formats, which are widely used in the internationalisation and localization industry. The main goal of this thesis is to provide software and algorithms that are able to transform the both formats lossless into the newly developed NLP Interchange Format and back (round-trip bridge). One of the challenge is to provide an algorithm that allows to transform offset annotations into XML/HTML markup.
Overview Links:

Technical details:

Make the DBpedia resource page more interactive/attractive

Note: this project has to be in English only
Click here for an explanations about how the process works or write an email.
Contact: Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de, Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Voluntary student project / internship / Bachelor thesis
Involved communities: DBpedia

Change the DBpedia resource html display page (i.e. http://dbpedia.org/page/Steve_Martin) and make it more interactive by integrating information from other LOD datasets. This can be achieved by working on both the front-end (html/css/javascript) and the server-side. For the server side we use Open Link Virtuoso VSP technology (similar to PHP) but we are open to other suggestions as well. If the implementation is good, this work will be deployed to DBpedia.


Ideas on what we imagine to accomplish may come from Freebase.

Create a new geodata LOD dataset

Note: this project has to be in English only
Click here for an explanations about how the process works or write an email.
Contact: Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de, Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis / Master thesis
Involved communities: DBpedia, linkedgeodata

Bachelor thesis: The purpose of this thesis will be to create a new LOD Dataset from Yahoo WOEID – Where On Earth Identifier. WOIED is supported / updated by Yahoo, used by many geodata applications (i.e. Flickr) and is considered to be an established place categorization. After converting the dataset to triples, the next step will be to link it with other LOD datasets, such as Linked Geo Data, DBpedia, geonames.
Master thesis: It’s not very clear if this is qualified to be extended as a Master thesis, but an idea would be to use this and other LOD datasets and create a LOD tourist guide.

Semi-automatic Wikipedia article correction

Note: this project has to be in English only
Click here for an explanations about how the process works or write an email.
Contact: Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de, Sebastian Hellmann – hellmann@informatik.uni-leipzig.de
Type: Engineering, Bachelor thesis
Involved communities: DBpedia and Wikipedia

We have developed a DBpedia extractor that extracts metadata from templates, thus, we have information for correct infobox usage in Wikipedia articles (metadata). Using this metadata we can check if Wikipedia authors use proper infobox declarations inside articles, identify wrong usage and suggest corrections. The benefits of resolving these errors are: 1) Correct the Wikipedia articles and 2) get better and more DBpedia triples.


We have 2 specific example cases here:

  • An editor misspells an infobox instance (i.e.“Infobox Artist” to “Infobox Artist”). This would result in hiding the infobox from the page display, as mediawiki would not know how to render it.
  • An editor misspells an infobox property (i.e. “birth place” to “birthplace”). This would also result in hiding the property from the Infobox display.

In both cases DBpedia gets wrong or no results at all.


There is already a simple preliminary shell script implemented for this but we are looking for a program implementation (preferably in Scala / Java in order to include it in the DBpedia Framework) that will also use LIMES for suggesting corrections.


Links:



 
Zu dieser Seite gibt es keine Dateien. [Zeige Dateien/Upload]
Kein Kommentar. [Zeige Kommentare]