@Corinnebelle @dakanga We seem to be inundated with word sources.
We have the
- Duolingo Wiki (has definitions)
- PDF by Multiwerp (has definitions)
- Memrise courses (has definitions)
- Github by Liuch (no definitions)
OK...What I am proposing here will be difficult and will have trade-offs - but I am interested in your POV.
I have thought about this for a while. I am prepared to spend some time writing a script for this. It is possible to get all conjugations from Pealim for any particular word. We can use a semantic similarity score to match the definitions with the ones on Pealim. I cannot guarantee how accurate this will be. We can remove any duplicate words on a particular skill as some words are conjugations of the same word.
We need to pick one of these lists - but these lists will all be more incomplete than our spreadsheet. IMO this is the main tradeoff. The Duolingo Wiki is missing some skills (it might be outdated). The Multiwerp PDF is a bit outdated. The Memrise course is a bit outdated. So they are all missing some vocab.
To complicate further - learning ALL the conjugations - construct, absolute when people are starting will be much more difficult than the Duolingo approach which is to sort of phrase them in. The alternative is we ONLY use the conjugations that the list provides. This is probably the better approach...but this will be incomplete. A halfway approach would only take words from the same line on Pealim.
My view is we take these words from the Wiki. The wiki seems the most up-to-date? But it seems to be from the beta. The main question is then how much of the course has changed from the beta?
This decision will significantly alter the scope/scale of the project. So I want to hear from both of you before I continue.
What do you think?
EDIT: Some different conjugations use the same letters - this means I can't parse out which letters refer to which conjugation - this means we would have to get ALL the conjugations of every word - which is less than ideal. Sorry!