Here you will find the criteria used in assessing your final reports.
Your conference-style report will be assessed by two independent reviewers according to the following evaluation criteria.
You get points for the following:
- Scope (max 2 points): Is the problem well presented? Do students understand the challenges/contributions? Here we expected to learn about:
- is the problem, namely, learning general purpose sentence representations, formulated and its relevance discussed clearly? (1 point)
- do students discuss what’s expected out of the analysis (e.g. assessing the role of context in embedding models) (1 point)
- Theoretical description (max 3 points): Are the models presented clearly and correctly? Here we expected to learn about:
- skip-gram (1 point): discriminative, context independent embeddings, linear composition function
- embed-align (1 point): generative, context sensitive, multilingual
- benchmark (1 point): linear classifiers whose features are pre-trained embeddings, composition function (typically average), cross-validation.
- Empirical evaluation (max 5 points): Is the experimental setup sound/convincing? Are experimental findings presented in an organised and effective manner? Here we expected to learn about:
- is the data and tasks described correctly (1 point)
- are the results compatible with what is expected for skip-gram pre-trained on the provided data? (1 point)
- are the results compatible with what is expected for the pre-trained embed-align we provided? (1 point)
- a discussion of findings (1 point): are results discussed and do students relate the differences in performance to aspects of each model (even if as speculation)
- criticism (1 point): examples of what to look for
- some investigation of hyperparameters of models or benchmark
- simple qualitative analysis (e.g. cherry-picked examples)
- plots and figures highlighting an interesting pattern
You lose points for bad writing style (because you were asked to prepare a conference-style report).
- Writing style
- did not make proper use of the latex template (e.g. tweaked the template): -0.5
- did not respect the page limit: 1 column is tolerated, beyond that it’s -0.5 for the first page, we stop reading beyond that (which will affect your grade for other criteria as well).
- bad structure (e.g. missing important sections such as introduction and conclusion): -0.5 per section.
- command of English: judged case by case