Sessions

Project 1

Deadline Project1 April 26, 2019

Tutorial for theory of IBM

Tutorial for practical Collins

NAACL alignment format

Output format

The results file should include one line for each word-to-word alignment identified by the system. The lines in the results file should follow the format below:

sentence_no position_L1 position_L2 [S P]

where:

The S P field overlap is optional.

Running example

Consider the two following aligned sentences:

[from the English file]

They had gone .

[from the French file]

Ils etaient alles .

A correct word alignment that will be produced for this sentence is

18 1 1

18 2 2

18 3 3

18 4 4

Which states that all these alignments are from sentence 18, and the English token 1 (“They”) aligns with the French token 1 (“Ils”), the English token 2 (“had”), aligns with the French token 2 (“etaient”), and so on. Note that the punctuation is also aligned (English token 4 (“.”) align with French token (“.”)), and will count towards the final scoring figures.

With missing S P fields considered by default to be S.