rtqs-sandra wrote: ↑Sun Aug 27, 2023 9:58 pmgmads wrote: ↑Sun Aug 27, 2023 8:27 pmJust wondering… since there is already a downloadable copy of the DL forums at duolingo.hobune.stream.
From what I see so far there's a large file (and weirdly way too large file for the data it should contain, if 2m is the correct number, but probably they just saved the original json files which provide a huge amount of trash data) accessible via archive.org what doesn't need any sort of approval since it's their primary purpose to provide files for download.
In this large file I might find a set of valid IDs which seems to be incomplete?
These IDs, even if uncomplete, still might be useful, since I could either:
- extract the respective data from the file and add the IDs to the skip list (= do not download these from duo directly), so we would only need to check all other IDs again
- or use the IDs within the repeat list (= download only these records from duo directly) to concentrate activities on known valid IDs
The size must have to do with the fact that it includes not only the json but "also includes the HTML/JS files."
Oh, I hadn't checked the address of the .tar and .torrent files. I thought they were at hobune.
Yes, even if incomplete, hopefully they will be useful.