NEW LOOK of this site. Do you like it?


#80

Since January 1st, I don’t receive notifications, nor emails either from posts I’m watching.


#81

It would be good to have the possibility to download this whole forum, even in plain text.


#82

In case anyone would like to try, here’s the CSV file with post count, views, thread link and date.
2003 - 2019 years
CGTalk_3ds_Max_SDK_and_MaxScript_2003-2019_CSV.zip (957.0 KB)

You can try this javascript snippet in your browser developer tools to parse any cgs forum section.
It first tries to scroll down to a maximum and then scroll back a little to force new threads loading event. Maybe you’ll need to tweak numbers for it to work on your pc.

var times = 0;
var scroller = setInterval(function() { 
	
	if ( times++ < 1000 ) 
	{ 
		window.scroll(0, window.scrollMaxY );
		window.scroll(0, scrollY - 150 );
		
	} else { clearInterval(scroller); console.log("finished!"); } 
	
}, 300);


#83

Thank you @Serejah for the threads list.

How would you then download each thread to a separated file having them use a common folder for the assets?

Even if we could crawl the whole forum, I don’t know the CGSociety policies about this.

I was thinking more on the idea of having a sort of indexed “digested mailing list” that CGSociety would provide us. A simple .TXT file per thread organized in folders, preferable with the “code” tags to make the parser’s work easier :slight_smile:


#84

This thread soon will become the longest one in whole mxs sdk section :slight_smile:

(
	page_crawl_posts_step = 15

	-- foreach threadData in CSV do
        -- sleep for a reasonable amount of time not to disturb cgs servers with lots of requests

	url       = @"https://forums.cgsociety.org/t/msx-editor-access/2049420"
	thread_id = (tmp = FilterString url "/"; tmp[tmp.count])
	savepath  = @"C:\somefolder" + "/" + thread_id
	postcount = 10
		
	if not doesFileExist savepath do makeDir savepath
		
	if postcount <= 20 then
	(
		dragAndDrop.DownloadUrlToDisk url (savepath + "/" + (thread_id as string) + ".html") 0
	)
	else
	(
		for i=1 to postcount by page_crawl_posts_step do
		(
			dragAndDrop.DownloadUrlToDisk (url + "/" + i as string) (savepath + "/" + (thread_id as string) + "-" + i as string + ".html") 0		
		)
		
	)
)

But it is just raw data not viewable in a browser. And also it seems like you can’t get more than 20 posts per request.

I’d also prefer to have entire thread in a separate file, but this is much more complicated since it will require either to combine several saved files in one pragmatically or use some headless browser to scroll-up-down each thread from top to bottom before saving it to disk.

Saving the content we did for personal use shouldn’t be forbidden I guess. Why would search engine web crawlers be allowed to do so?


#85

it should be easier to take source html as:

wc = dotnetobject "System.Net.WebClient"
webData = wc.DownloadString "https://forums.cgsociety.org/t/autodesk-masterclass-video-2006/1257849/5"

after that we need to convert it to xml for easier parsing


#86

I used HtmlAgilityPack for offline mxs reference html parsing. It is pretty performant and easy to use


#87

HtmlAgilityPack

it’s what i’m looking at right now :slight_smile:


#88

it works pretty well … but i don’t have time right now to write a smart parser:

ass = dotnet.loadassembly @"D:\Downloads\HtmlAgilityPack.dll" 
web = dotnetobject "HtmlAgilityPack.HtmlWeb"
doc = web.Load @"https://forums.cgsociety.org/t/autodesk-masterclass-video-2006/1257849/5"
nodes = doc.DocumentNode.SelectNodes "//meta"
nodes.count

for k=0 to nodes.count-1 do
(
	node = nodes.item[k]
	format "name:%\n\tcontent:%\n" (node.GetAttributeValue "name" "") (node.GetAttributeValue "content" "")
)
	

#89

it definitely works… a dirty way to get all posts from html is:

nodes = doc.DocumentNode.SelectNodes "//div/p"
for k=0 to nodes.count-1 do
(
	node = nodes.item[k]
	format "%\n" node.InnerHtml
)

working with xpath we can find authors, code, links, pictures as well


#90

Thank you @Serejah and @denisT for the examples.

Serejah, regarding your question “Why would search engine web crawlers be allowed to do so?”, because they are one of the biggest components of the business.


#91

That was a rhetorical question. I understand that we’re the absolute minority here. At least according to these numbers
Forum alone gets about 10 million views annual and this section only about 100-150k views for the past two years.

btw Absolute majority of threads could be retrieved with just single request.

thread length post count threads count
short 0 — 20 16489
middle 21 — 40 311
long 40+ 22

#92

No. I don’t like the new look… I find it very difficult to search for old posted scripts, etc.


#93

Should have an update on search today. One of our developers needed time off for 2 weeks. Back on it now


#94

another dead link
http://forums.cgsociety.org/showthread.php?f=98&t=985724
thread


#95

Someone answered in a thread I have posted and 11 (eleven) days later I receive an email notification that there is new post in my thread.

For more than a week when I logIn the site informs me that I have a new PM but I read this “new” PM a week ago.


#96

Don’t give up! You just have to answer it an it desapears. Then, the problem is for the other one! :joy: :joy: :joy:


#97

:slight_smile:
I have answered to this PM when I read it but the notification is still “active” every time when I open the site even if I am not logged.


#98

you probably need to disable this one


#99

I don’t have this option.

By the way with this new look I have to press Ctrl+Home or Ctrl+End to go to the top or bottom of the thread several times. With mouse wheel is even more frustrating - scrolling and scrolling and scrolling. Even for mobile devices I prefer the old look where threads have pages.