NEW LOOK of this site. Do you like it?


#84

This thread soon will become the longest one in whole mxs sdk section :slight_smile:

(
	page_crawl_posts_step = 15

	-- foreach threadData in CSV do
        -- sleep for a reasonable amount of time not to disturb cgs servers with lots of requests

	url       = @"https://forums.cgsociety.org/t/msx-editor-access/2049420"
	thread_id = (tmp = FilterString url "/"; tmp[tmp.count])
	savepath  = @"C:\somefolder" + "/" + thread_id
	postcount = 10
		
	if not doesFileExist savepath do makeDir savepath
		
	if postcount <= 20 then
	(
		dragAndDrop.DownloadUrlToDisk url (savepath + "/" + (thread_id as string) + ".html") 0
	)
	else
	(
		for i=1 to postcount by page_crawl_posts_step do
		(
			dragAndDrop.DownloadUrlToDisk (url + "/" + i as string) (savepath + "/" + (thread_id as string) + "-" + i as string + ".html") 0		
		)
		
	)
)

But it is just raw data not viewable in a browser. And also it seems like you can’t get more than 20 posts per request.

I’d also prefer to have entire thread in a separate file, but this is much more complicated since it will require either to combine several saved files in one pragmatically or use some headless browser to scroll-up-down each thread from top to bottom before saving it to disk.

Saving the content we did for personal use shouldn’t be forbidden I guess. Why would search engine web crawlers be allowed to do so?


#85

it should be easier to take source html as:

wc = dotnetobject "System.Net.WebClient"
webData = wc.DownloadString "https://forums.cgsociety.org/t/autodesk-masterclass-video-2006/1257849/5"

after that we need to convert it to xml for easier parsing


#86

I used HtmlAgilityPack for offline mxs reference html parsing. It is pretty performant and easy to use


#87

HtmlAgilityPack

it’s what i’m looking at right now :slight_smile:


#88

it works pretty well … but i don’t have time right now to write a smart parser:

ass = dotnet.loadassembly @"D:\Downloads\HtmlAgilityPack.dll" 
web = dotnetobject "HtmlAgilityPack.HtmlWeb"
doc = web.Load @"https://forums.cgsociety.org/t/autodesk-masterclass-video-2006/1257849/5"
nodes = doc.DocumentNode.SelectNodes "//meta"
nodes.count

for k=0 to nodes.count-1 do
(
	node = nodes.item[k]
	format "name:%\n\tcontent:%\n" (node.GetAttributeValue "name" "") (node.GetAttributeValue "content" "")
)
	

#89

it definitely works… a dirty way to get all posts from html is:

nodes = doc.DocumentNode.SelectNodes "//div/p"
for k=0 to nodes.count-1 do
(
	node = nodes.item[k]
	format "%\n" node.InnerHtml
)

working with xpath we can find authors, code, links, pictures as well


#90

Thank you @Serejah and @denisT for the examples.

Serejah, regarding your question “Why would search engine web crawlers be allowed to do so?”, because they are one of the biggest components of the business.


#91

That was a rhetorical question. I understand that we’re the absolute minority here. At least according to these numbers
Forum alone gets about 10 million views annual and this section only about 100-150k views for the past two years.

btw Absolute majority of threads could be retrieved with just single request.

thread length post count threads count
short 0 — 20 16489
middle 21 — 40 311
long 40+ 22

#92

No. I don’t like the new look… I find it very difficult to search for old posted scripts, etc.


#93

Should have an update on search today. One of our developers needed time off for 2 weeks. Back on it now


#94

another dead link
http://forums.cgsociety.org/showthread.php?f=98&t=985724
thread


#95

Someone answered in a thread I have posted and 11 (eleven) days later I receive an email notification that there is new post in my thread.

For more than a week when I logIn the site informs me that I have a new PM but I read this “new” PM a week ago.


#96

Don’t give up! You just have to answer it an it desapears. Then, the problem is for the other one! :joy: :joy: :joy:


#97

:slight_smile:
I have answered to this PM when I read it but the notification is still “active” every time when I open the site even if I am not logged.


#98

you probably need to disable this one


#99

I don’t have this option.

By the way with this new look I have to press Ctrl+Home or Ctrl+End to go to the top or bottom of the thread several times. With mouse wheel is even more frustrating - scrolling and scrolling and scrolling. Even for mobile devices I prefer the old look where threads have pages.


#100

I dont have it either
just kidding :slight_smile:

You can’t load entire thread anymore, that’s the reason. This brand new ‘forum’ engine loads about 20 posts max, at least for me. Don’t know if it’s somehow related to screen resolution, but anyway.


#101

Same here - no more than 2- post for mouse scroll.


#102

I agree, mousing around long threads is a terrible thing to do in discourse software but you can’t use the almost hidden, tiny, obscure, nearly invisible scrubber on the right?


#103

Yep, you are right. The “problem” is that I have to move the cursor exactly over this scrubber to “select” it. Which is not the same as using the MMB or directly switching to desired page of the thread.
With mobile devices its become even hareder to use this scrubber.