Dear anyone,
Your duolingo forum registration isn't automaticaly transferred to duome forum so in order to join duome forums you need to register with your existing or any other username and email; in any case it's advised that you choose a new password for the forum.
~ Duome Team

Sentence Discussion? No More!

We are not Duolingo, we cannot solve any problems directly, but we can provide community-based advice.


User avatar
gmads
Mexico

Re: Sentence Discussion? No More!

Post by gmads »

rtqs-sandra wrote: Sun Aug 27, 2023 9:58 pm
gmads wrote: Sun Aug 27, 2023 8:27 pm

Just wondering… since there is already a downloadable copy of the DL forums at duolingo.hobune.stream.

From what I see so far there's a large file (and weirdly way too large file for the data it should contain, if 2m is the correct number, but probably they just saved the original json files which provide a huge amount of trash data) accessible via archive.org what doesn't need any sort of approval since it's their primary purpose to provide files for download.

In this large file I might find a set of valid IDs which seems to be incomplete?

These IDs, even if uncomplete, still might be useful, since I could either:

  1. extract the respective data from the file and add the IDs to the skip list (= do not download these from duo directly), so we would only need to check all other IDs again
  2. or use the IDs within the repeat list (= download only these records from duo directly) to concentrate activities on known valid IDs

The size must have to do with the fact that it includes not only the json but "also includes the HTML/JS files."

Oh, I hadn't checked the address of the .tar and .torrent files. I thought they were at hobune.

Yes, even if incomplete, hopefully they will be useful.

🦎  We are living in dystopia, in a world that is dominated by technology and disconnect,
alienation, loneliness, and dysfunction.  🦎
Phaxe & Morten Granau - Lost

🇲🇽 :us:  ·  :it: 🇧🇷  ·  :ru: 🇦🇪

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

gmads wrote: Sun Aug 27, 2023 10:13 pm
rtqs-sandra wrote: Sun Aug 27, 2023 9:58 pm
gmads wrote: Sun Aug 27, 2023 8:27 pm

Just wondering… since there is already a downloadable copy of the DL forums at duolingo.hobune.stream.

From what I see so far there's a large file (and weirdly way too large file for the data it should contain, if 2m is the correct number, but probably they just saved the original json files which provide a huge amount of trash data) accessible via archive.org what doesn't need any sort of approval since it's their primary purpose to provide files for download.

In this large file I might find a set of valid IDs which seems to be incomplete?

These IDs, even if uncomplete, still might be useful, since I could either:

  1. extract the respective data from the file and add the IDs to the skip list (= do not download these from duo directly), so we would only need to check all other IDs again
  2. or use the IDs within the repeat list (= download only these records from duo directly) to concentrate activities on known valid IDs

The size must have to do with the fact that it includes not only the json but "also includes the HTML/JS files."

Oh, I hadn't checked the address of the .tar and .torrent files. I thought they were at hobune.

Yes, even if incomplete, hopefully they will be useful.

Okay, this would make sense, that's trash but quick to delete trash. Hope the raw data json is included as well..

Any indicator how complete or incomplete it is would be super useful. But tbh the fact that we've already detected at least 1 id missing is no good sign since I really didn't parse a large range, it was rather tiny and I only found 40 records within that range.

User avatar
gmads
Mexico

Re: Sentence Discussion? No More!

Post by gmads »

Someone with a real fast connection and enough space is needed to take a look at the compressed file. Too bad that a small set of json files wasn't saved apart for reviewing and checking what was kept.

🦎  We are living in dystopia, in a world that is dominated by technology and disconnect,
alienation, loneliness, and dysfunction.  🦎
Phaxe & Morten Granau - Lost

🇲🇽 :us:  ·  :it: 🇧🇷  ·  :ru: 🇦🇪

User avatar
Corinnebelle

Re: Sentence Discussion? No More!

Post by Corinnebelle »

hobune.stream wrote about sentence discussions here You can contact him with the mention.

🇺🇸 L1 🇮🇱 Advanced beginner Duolingo levels

User avatar
Corinnebelle

Re: Sentence Discussion? No More!

Post by Corinnebelle »

rtqs-sandra wrote: Sun Aug 27, 2023 8:08 am

@Corinnebelle @gmads
& @ anybody else who wants to have sentence discussions:

what about you? Are you missing anything in the export?

Downloaded the sample pack. It's got the audio link.

🇺🇸 L1 🇮🇱 Advanced beginner Duolingo levels

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Corinnebelle wrote: Mon Aug 28, 2023 1:03 am
rtqs-sandra wrote: Sun Aug 27, 2023 8:08 am

@Corinnebelle @gmads
& @ anybody else who wants to have sentence discussions:

what about you? Are you missing anything in the export?

Downloaded the sample pack. It's got the audio link.

Fyi no longer necessary, I just downloaded it too, extracting it will take a while bc I started it on my 2gb ram machine, but it's just a matter of time..

Thank you so much! Could you please upload one of these .json files? No matter which exactly, just one of the discussion jsons..

Last edited by Deleted User 1400 on Mon Aug 28, 2023 9:06 am, edited 1 time in total.
Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Corinnebelle wrote: Mon Aug 28, 2023 1:01 am

hobune.stream wrote about sentence discussions here You can contact him with the mention.

I've looked into that thread and now I'm somewhat confused. Was this done to save everything but the sentence discussions? At least I couldn't find any on the linked overview/subforum pages right now? And there was a call to send notifications about discussions to save if I got that right.

And of what I already thought how one could provide this for offline use, I thought (again automatically) making an index page for each course would be sufficient (you might know these ugly absolutely free of any fancy stuff pages where just the folders and items are listed and you can't do anything but navigate or select a file).. but I'm a terminal and backend person, so I might be more used to have no fancy user interface.

@hobune.stream hi fellow archiver, could you please tell us how you selected the discussion ids to save? Also: can u provide some estimates how many sentence discussions you found and archived? I found numbers indicating 2m (including non sentence discussions, would be less than half of the number of records I expect to find), is that correct?

User avatar
gmads
Mexico

Re: Sentence Discussion? No More!

Post by gmads »

rtqs-sandra wrote: Mon Aug 28, 2023 6:18 am
Corinnebelle wrote: Mon Aug 28, 2023 1:03 am
rtqs-sandra wrote: Sun Aug 27, 2023 8:08 am

@Corinnebelle @gmads
& @ anybody else who wants to have sentence discussions:

what about you? Are you missing anything in the export?

Downloaded the sample pack. It's got the audio link.

Fyi no longer necessary, I just downloaded it too, extracting it will take a while bc I started it on my 2gb ram machine, but it's just a matter of time..

Thank you so much! Could you please upload one of these .json files? No matter which exactly, just one of the discussion jsons..

At the risk of being mistaken, I think she meant the zip file you uploaded here, the sample file.

🦎  We are living in dystopia, in a world that is dominated by technology and disconnect,
alienation, loneliness, and dysfunction.  🦎
Phaxe & Morten Granau - Lost

🇲🇽 :us:  ·  :it: 🇧🇷  ·  :ru: 🇦🇪

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

gmads wrote: Mon Aug 28, 2023 4:12 pm
rtqs-sandra wrote: Mon Aug 28, 2023 6:18 am
Corinnebelle wrote: Mon Aug 28, 2023 1:03 am

Downloaded the sample pack. It's got the audio link.

Fyi no longer necessary, I just downloaded it too, extracting it will take a while bc I started it on my 2gb ram machine, but it's just a matter of time..

Thank you so much! Could you please upload one of these .json files? No matter which exactly, just one of the discussion jsons..

At the risk of being mistaken, I think she meant the zip file you uploaded here, the sample file.

Oh true, though weird, I felt like I've already had read that 🤔

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

So there are the original jsons in this file, mixed up with several things, seems to contain comments, discussions and sentence discussions. So I can use it to extend the skip list with all known comments, discussions and sentence discussions with data available in this file. I'll do this locally and just provide the lists. The code for you remains unchanged.

So you, someone of you who's reliably often online, should take responsibility for the number ranges and the central skip & repeat files. Who wants to have this job? 👀

(Theoretically there should be easier methods to do this, but this is knowledge from web development, which I don't have.. if you know someone who'd say oh this is like a coffee break fun activity -> send them here)

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

fyi just processing the data, used the same script I posted here as basis and added another field and found a minor dispensable line in the script, so I'll probably make an update, but it's really super unimportant (the additional field can also be derived on runtime and the unnecessary line doesn't do much and doesn't cost much)

It's currently processing 100k records/s, so it will run around 10h.

Test records were good, all necessary data is contained. The lists with IDs to skip will be enormous and super useful for searching for missing records on duo itself. I might optimize the handling since the test file indicated the full file could be of a size of 1.2GB what's a bit too large to keep it in ram (though you might laugh at me for running this on a 2gb machine ;). Probably I'll just split it for chunks of 1 or 10m records.

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Another update since they seem to progress removing discussion section everywhere: I've changed the processing scripts behavior in way that primarily quickly serves lists with missing IDs in chunks of 1m. So I'll upload little packages with pre configured files for each of the 58 chunks tomorrow and I'll create a new thread for that. It would be great if you all could try at least some of them before the discussions are gone forever. Instructions will be included, it's basically the same procedure as I already described with the first version, just without the most check steps.

User avatar
Corinnebelle

Re: Sentence Discussion? No More!

Post by Corinnebelle »

@rtqs-sandra I don't see the file.

Yes, I meant the zip file.

🇺🇸 L1 🇮🇱 Advanced beginner Duolingo levels

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Obviously there's a file size limit for attachments which I didn't take in account before, so I'll have to cut the packages a bit smaller, guess this will take 1 more coffee break until upload 👀 expect it at UTC 14

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Obviously there's also a limit for the amount of attachments, so things will require a lot of manual collaboration now..

Please note, that the IDs to be saved already excluded all IDs I found and could extract from the hobune export, so this is the scanning for the missing records only:

viewtopic.php?t=17970-save-the-senctence-discussions

User avatar
gmads
Mexico

Re: Sentence Discussion? No More!

Post by gmads »

rtqs-sandra wrote: Wed Aug 30, 2023 3:07 pm

Obviously there's also a limit for the amount of attachments, so things will require a lot of manual collaboration now..

Please note, that the IDs to be saved already excluded all IDs I found and could extract from the hobune export, so this is the scanning for the missing records only:

viewtopic.php?t=17970-save-the-senctence-discussions

Just wondering… about how many missing records are there?

🦎  We are living in dystopia, in a world that is dominated by technology and disconnect,
alienation, loneliness, and dysfunction.  🦎
Phaxe & Morten Granau - Lost

🇲🇽 :us:  ·  :it: 🇧🇷  ·  :ru: 🇦🇪

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

gmads wrote: Wed Aug 30, 2023 3:23 pm

Just wondering… about how many missing records are there?

Difficult to estimate how many discussions there finally actually be found, but there are 2,4m ids to check. I'm currently processing the chunks #200 and #201 and found so far 15 sentence discussions that were missing in the hobune stream file. Can provide you with better estimates when I've processed them and some chunks from other places within the id range completely.

User avatar
gmads
Mexico

Re: Sentence Discussion? No More!

Post by gmads »

rtqs-sandra wrote: Wed Aug 30, 2023 3:29 pm
gmads wrote: Wed Aug 30, 2023 3:23 pm

Just wondering… about how many missing records are there?

Difficult to estimate how many discussions there finally actually be found, but there are 2,4m ids to check. I'm currently processing the chunks #200 and #201 and found so far 15 sentence discussions that were missing in the hobune stream file. Can provide you with better estimates when I've processed them and some chunks from other places within the id range completely.

Oh! I see. So maybe it will be in the hundreds range.

🦎  We are living in dystopia, in a world that is dominated by technology and disconnect,
alienation, loneliness, and dysfunction.  🦎
Phaxe & Morten Granau - Lost

🇲🇽 :us:  ·  :it: 🇧🇷  ·  :ru: 🇦🇪

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

gmads wrote: Wed Aug 30, 2023 3:40 pm
rtqs-sandra wrote: Wed Aug 30, 2023 3:29 pm
gmads wrote: Wed Aug 30, 2023 3:23 pm

Just wondering… about how many missing records are there?

Difficult to estimate how many discussions there finally actually be found, but there are 2,4m ids to check. I'm currently processing the chunks #200 and #201 and found so far 15 sentence discussions that were missing in the hobune stream file. Can provide you with better estimates when I've processed them and some chunks from other places within the id range completely.

Oh! I see. So maybe it will be in the hundreds range.

Rather thousands, probably several ten thousands, I've just started processing the chunks #200 and #201, they're not progressed even half and there are 241 chunks.

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

gmads wrote: Wed Aug 30, 2023 3:40 pm
rtqs-sandra wrote: Wed Aug 30, 2023 3:29 pm
gmads wrote: Wed Aug 30, 2023 3:23 pm

Just wondering… about how many missing records are there?

Difficult to estimate how many discussions there finally actually be found, but there are 2,4m ids to check. I'm currently processing the chunks #200 and #201 and found so far 15 sentence discussions that were missing in the hobune stream file. Can provide you with better estimates when I've processed them and some chunks from other places within the id range completely.

Oh! I see. So maybe it will be in the hundreds range.

Temporary forecast more than 25k

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Started 2 more packages (100+101) and unfortunately there seems to be even more missing, I assumed there were less bc I expected the hobune export to be at least almost equally distributed. 100+101 already 20 each so far after only 5 minutes, so new forecast definitely way more than 26k, probably even more than 100k. I checked some IDs with the link provided on the previous page to be sure it's not there on hobune.stream and none of the newly found IDs was.

You really need to help with this as long as the records are still available and this might be only a short time left.

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

gmads wrote: Wed Aug 30, 2023 3:40 pm
rtqs-sandra wrote: Wed Aug 30, 2023 3:29 pm
gmads wrote: Wed Aug 30, 2023 3:23 pm

Just wondering… about how many missing records are there?

Difficult to estimate how many discussions there finally actually be found, but there are 2,4m ids to check. I'm currently processing the chunks #200 and #201 and found so far 15 sentence discussions that were missing in the hobune stream file. Can provide you with better estimates when I've processed them and some chunks from other places within the id range completely.

Oh! I see. So maybe it will be in the hundreds range.

Last estimate for today 200k. Running 4 packages right now, together they have already found 710 records, average runtime for each should be 39h.

User avatar
gmads
Mexico

Re: Sentence Discussion? No More!

Post by gmads »

Amazing. Yes, one would have imagined the hobune backup being almost complete.

🦎  We are living in dystopia, in a world that is dominated by technology and disconnect,
alienation, loneliness, and dysfunction.  🦎
Phaxe & Morten Granau - Lost

🇲🇽 :us:  ·  :it: 🇧🇷  ·  :ru: 🇦🇪

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

This morning it's 4320 in 4 packages, circa 1/4 of runtime, so we're already at 1m. I had to stop 2 packages for now bc the 4 already consumed 1.5GB of my data volume over night, I'll have to start them next month then.

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

I had to pause the other 2 as well, I've got only 300MB left for today. And I'll likely not be able to process more than these 4 next month, so you all need to process the remaining 237 packages. If you don't all this was for nothing and the sentence discussions will be lost sooner or later.

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

Not sure where this was asked for, but for those who are missing a search functionality, it could be that easy (just imagine there were the sentences displayed instead of the IDs, what's pretty easy to change):

Bildschirmfoto vom 2023-09-03 14-57-32.png
Bildschirmfoto vom 2023-09-03 14-57-32.png (142.37 KiB) Viewed 496 times
N1chope

Re: Sentence Discussion? No More!

Post by N1chope »

rtqs-sandra wrote: Wed Aug 30, 2023 3:07 pm

Obviously there's also a limit for the amount of attachments, so things will require a lot of manual collaboration now..

Please note, that the IDs to be saved already excluded all IDs I found and could extract from the hobune export, so this is the scanning for the missing records only:

viewtopic.php?t=17970-save-the-senctence-discussions

Hi! I was added to the "no-discussion" A/B testing group today, and I found this thread while searching for info on the topic. I just created an account to be able to help out, but Duome says that I'm not authorized into the thread.

I may be able to help from both Windows and GNU/Linux, but I'm a bit busier than I can handle right now, and not very consistent with my availability, so I would probably only be able to do something which I can set up, leave in the background, and check later, without needing to micro-manage anything (i.e. not need to report and update the indices where the script is working every now and then).
I'm not sure of what's the preferred way of communication here (I did not manage to make the "contact" link work from the phone), but feel free to use whichever works, or to add me to the thread.
In any case, thanks a lot for the great work you're doing.

Not so related, but for the record:
Am I the only one who used to enjoy the non-strictly-language-learning comments in the sentence discussions? Often times I learned interesting target-language-country/society/culture-related stuff which I also considered valuable, and even the comments which were straight up jokes made the experience more fun for me (and I guess that learning while having fun is one of the points in duolingo). I did try to upvote language content more to make it more efficient for other learners, but I would definitely not consider any other comments as "garbage" that needs to be disposed of.
Although I skipped the first skills of the course, so I may be missing that context. If there are several comments saying just stuff like "me too", then that does seem inconsequential. I'm not sure what should happen to "disposable" comments which actually have interesting comments in reply to them, though, since deleting the parent comment may affect the visibility of the replies (in the official forum it did hide them)

Native 🇪🇸 Image / Fluent 🇺🇸 / Rusty fluent 🇫🇷 / Unsophisticatedly conversational 🇯🇵(learning)

Deleted User 1400

Re: Sentence Discussion? No More!

Post by Deleted User 1400 »

@N1chope with Linux knowledge you're a perfect candidate and the tool once started just runs in the background for days.. you'll also find a thread on data quality topics with call for input, so @Basler Biker please add?

User avatar
Basler Biker
Switzerland

Re: Sentence Discussion? No More!

Post by Basler Biker »

rtqs-sandra wrote: Thu Sep 07, 2023 7:58 pm

@N1chope with Linux knowledge you're a perfect candidate and the tool once started just runs in the background for days.. you'll also find a thread on data quality topics with call for input, so @Basler Biker please add?

Should have access now. @N1chope
viewforum.php?f=420-unanswered-sentence-discussions


BB - Basler Biker - Positivity and constructiveness will prevail
Native :belgium: :netherlands: / fluent :fr: :de: :uk: / learning :sweden: / fan of :switzerland: (bs/bl)

User avatar
Corinnebelle

Re: Sentence Discussion? No More!

Post by Corinnebelle »

@N1chope I also enjoy the fun element of the sentence discussions. Plus information about grammar, other translations or definitions of words used, their roots etc..

🇺🇸 L1 🇮🇱 Advanced beginner Duolingo levels

Post Reply

Return to “Duolingo”