NEW LOOK of this site. Do you like it?


#82

In case anyone would like to try, here’s the CSV file with post count, views, thread link and date.
2003 - 2019 years
CGTalk_3ds_Max_SDK_and_MaxScript_2003-2019_CSV.zip (957.0 KB)

You can try this javascript snippet in your browser developer tools to parse any cgs forum section.
It first tries to scroll down to a maximum and then scroll back a little to force new threads loading event. Maybe you’ll need to tweak numbers for it to work on your pc.

var times = 0;
var scroller = setInterval(function() { 
	
	if ( times++ < 1000 ) 
	{ 
		window.scroll(0, window.scrollMaxY );
		window.scroll(0, scrollY - 150 );
		
	} else { clearInterval(scroller); console.log("finished!"); } 
	
}, 300);


#83

Thank you @Serejah for the threads list.

How would you then download each thread to a separated file having them use a common folder for the assets?

Even if we could crawl the whole forum, I don’t know the CGSociety policies about this.

I was thinking more on the idea of having a sort of indexed “digested mailing list” that CGSociety would provide us. A simple .TXT file per thread organized in folders, preferable with the “code” tags to make the parser’s work easier :slight_smile:


#84

This thread soon will become the longest one in whole mxs sdk section :slight_smile:

(
	page_crawl_posts_step = 15

	-- foreach threadData in CSV do
        -- sleep for a reasonable amount of time not to disturb cgs servers with lots of requests

	url       = @"https://forums.cgsociety.org/t/msx-editor-access/2049420"
	thread_id = (tmp = FilterString url "/"; tmp[tmp.count])
	savepath  = @"C:\somefolder" + "/" + thread_id
	postcount = 10
		
	if not doesFileExist savepath do makeDir savepath
		
	if postcount <= 20 then
	(
		dragAndDrop.DownloadUrlToDisk url (savepath + "/" + (thread_id as string) + ".html") 0
	)
	else
	(
		for i=1 to postcount by page_crawl_posts_step do
		(
			dragAndDrop.DownloadUrlToDisk (url + "/" + i as string) (savepath + "/" + (thread_id as string) + "-" + i as string + ".html") 0		
		)
		
	)
)

But it is just raw data not viewable in a browser. And also it seems like you can’t get more than 20 posts per request.

I’d also prefer to have entire thread in a separate file, but this is much more complicated since it will require either to combine several saved files in one pragmatically or use some headless browser to scroll-up-down each thread from top to bottom before saving it to disk.

Saving the content we did for personal use shouldn’t be forbidden I guess. Why would search engine web crawlers be allowed to do so?


#85

it should be easier to take source html as:

wc = dotnetobject "System.Net.WebClient"
webData = wc.DownloadString "https://forums.cgsociety.org/t/autodesk-masterclass-video-2006/1257849/5"

after that we need to convert it to xml for easier parsing


#86

I used HtmlAgilityPack for offline mxs reference html parsing. It is pretty performant and easy to use


#87

HtmlAgilityPack

it’s what i’m looking at right now :slight_smile:


#88

it works pretty well … but i don’t have time right now to write a smart parser:

ass = dotnet.loadassembly @"D:\Downloads\HtmlAgilityPack.dll" 
web = dotnetobject "HtmlAgilityPack.HtmlWeb"
doc = web.Load @"https://forums.cgsociety.org/t/autodesk-masterclass-video-2006/1257849/5"
nodes = doc.DocumentNode.SelectNodes "//meta"
nodes.count

for k=0 to nodes.count-1 do
(
	node = nodes.item[k]
	format "name:%\n\tcontent:%\n" (node.GetAttributeValue "name" "") (node.GetAttributeValue "content" "")
)
	

#89

it definitely works… a dirty way to get all posts from html is:

nodes = doc.DocumentNode.SelectNodes "//div/p"
for k=0 to nodes.count-1 do
(
	node = nodes.item[k]
	format "%\n" node.InnerHtml
)

working with xpath we can find authors, code, links, pictures as well


#90

Thank you @Serejah and @denisT for the examples.

Serejah, regarding your question “Why would search engine web crawlers be allowed to do so?”, because they are one of the biggest components of the business.


#91

That was a rhetorical question. I understand that we’re the absolute minority here. At least according to these numbers
Forum alone gets about 10 million views annual and this section only about 100-150k views for the past two years.

btw Absolute majority of threads could be retrieved with just single request.

thread length post count threads count
short 0 — 20 16489
middle 21 — 40 311
long 40+ 22

#92

No. I don’t like the new look… I find it very difficult to search for old posted scripts, etc.


#93

Should have an update on search today. One of our developers needed time off for 2 weeks. Back on it now


#94

another dead link
http://forums.cgsociety.org/showthread.php?f=98&t=985724
thread


#95

Someone answered in a thread I have posted and 11 (eleven) days later I receive an email notification that there is new post in my thread.

For more than a week when I logIn the site informs me that I have a new PM but I read this “new” PM a week ago.


#96

Don’t give up! You just have to answer it an it desapears. Then, the problem is for the other one! :joy: :joy: :joy:


#97

:slight_smile:
I have answered to this PM when I read it but the notification is still “active” every time when I open the site even if I am not logged.


#98

you probably need to disable this one


#99

I don’t have this option.

By the way with this new look I have to press Ctrl+Home or Ctrl+End to go to the top or bottom of the thread several times. With mouse wheel is even more frustrating - scrolling and scrolling and scrolling. Even for mobile devices I prefer the old look where threads have pages.


#100

I dont have it either
just kidding :slight_smile:

You can’t load entire thread anymore, that’s the reason. This brand new ‘forum’ engine loads about 20 posts max, at least for me. Don’t know if it’s somehow related to screen resolution, but anyway.


#101

Same here - no more than 2- post for mouse scroll.