hiddenmarkov

09-06-2007, 02:51 AM

First, a disclaimer: If your immediate response to this post is "This

should not be posted here," I apologize. The scope of this problem is

such that I am not sure exactly where to go looking for answers. If you

can think of a forum that might be more fit to take my questions, that

information would be invaluable in and of itself.

Now on to the problem.

I need to generate some data for testing a 3D object recognition engine.

This data should be ~5cm sampling of the visible surfaces of an urban

landscape that is ~5km in size. The sampling regime need not be entirely

uniform or orthonormal. The form of the samples should be

<x,y,z,r,g,b,obj_type> where <x,y,z> is a point on the surface of the

object, <r,g,b> is the point's color, and <obj_type> is the type of

object that the point belongs to. The total number of samples should be

~1e9, for a total data size of ~1TB.

It seems that there are two possible approaches to generating this data:

1) Drive a LIDAR/EO sensor around a city and register <x,y,z,r,g,b> for

1e9 points. Then have a human go through and label each point's "object

type" by looking at the data or going out and looking at the object the

data were sampled from.

2) Get a bunch of freely available 3D models of the kinds of things you

find in an urban environment. Label them according to what kind of

object they are. Build a virtual urban landscape by throwing together

multiple copies of the labeled objects. Port the landscape to an

interactive 3D rendering engine. Build a virtual sensor that can sample

1e9 points from the model. Find some way to instrument the engine's

back-end so that it samples not only <x,y,z> and <r,g,b>, but also the

object label that we applied earlier.

Both of these approaches are costly to implement, but both are doable.

The advantage of #2 is that once we have an implementation in hand, the

cost of generating a new data set is relatively low. So, it is my plan

to go forward with #2. However, I am not a expert in 3D modeling or

virtual environments. Now that I have motivated the problem a little,

here are issues that I am looking for some advice on:

A) What is the best tool for rapidly modeling a 3D urban landscape,

given that I have access to lots of 3D models of cars, buildings, street

signs, etc?

B) Which virtual environment engine should I use to render and interact

with the model?

My feeling is that (A) can be satisfied by a number of existing tools.

The only real discriminator is that the tool I end up using must provide

some facility for associating an arbitrary string (the "object type"

label) with each object that I import into the environment. It must also

support an output format that maintains this label and matches the input

format for the answer to (B). The crux of the problem seems to be

simulating a sensor in (B) that can get hold of not only the

<x,y,z,r,g,b> values of surface points of the model, but also the label

associated with each. My intuition tells me that it is straightforward

(but not easy) to script, say, Unreal engine in a way that would allow

me to do the <x,y,z,r,g,b> sampling. The difficulty is that such engines

are built to render surfaces, and probably do not support much facility

for querying the backend for additional properties of collision points.

This is where I may find a huge flaw in my plan. Anybody have any advice

before I go off and do something ridiculous? If you were trying to solve

this problem, what would you do?

Requests for clarification or additional information will be answered as

quickly as possible.

Thanks in advance!

should not be posted here," I apologize. The scope of this problem is

such that I am not sure exactly where to go looking for answers. If you

can think of a forum that might be more fit to take my questions, that

information would be invaluable in and of itself.

Now on to the problem.

I need to generate some data for testing a 3D object recognition engine.

This data should be ~5cm sampling of the visible surfaces of an urban

landscape that is ~5km in size. The sampling regime need not be entirely

uniform or orthonormal. The form of the samples should be

<x,y,z,r,g,b,obj_type> where <x,y,z> is a point on the surface of the

object, <r,g,b> is the point's color, and <obj_type> is the type of

object that the point belongs to. The total number of samples should be

~1e9, for a total data size of ~1TB.

It seems that there are two possible approaches to generating this data:

1) Drive a LIDAR/EO sensor around a city and register <x,y,z,r,g,b> for

1e9 points. Then have a human go through and label each point's "object

type" by looking at the data or going out and looking at the object the

data were sampled from.

2) Get a bunch of freely available 3D models of the kinds of things you

find in an urban environment. Label them according to what kind of

object they are. Build a virtual urban landscape by throwing together

multiple copies of the labeled objects. Port the landscape to an

interactive 3D rendering engine. Build a virtual sensor that can sample

1e9 points from the model. Find some way to instrument the engine's

back-end so that it samples not only <x,y,z> and <r,g,b>, but also the

object label that we applied earlier.

Both of these approaches are costly to implement, but both are doable.

The advantage of #2 is that once we have an implementation in hand, the

cost of generating a new data set is relatively low. So, it is my plan

to go forward with #2. However, I am not a expert in 3D modeling or

virtual environments. Now that I have motivated the problem a little,

here are issues that I am looking for some advice on:

A) What is the best tool for rapidly modeling a 3D urban landscape,

given that I have access to lots of 3D models of cars, buildings, street

signs, etc?

B) Which virtual environment engine should I use to render and interact

with the model?

My feeling is that (A) can be satisfied by a number of existing tools.

The only real discriminator is that the tool I end up using must provide

some facility for associating an arbitrary string (the "object type"

label) with each object that I import into the environment. It must also

support an output format that maintains this label and matches the input

format for the answer to (B). The crux of the problem seems to be

simulating a sensor in (B) that can get hold of not only the

<x,y,z,r,g,b> values of surface points of the model, but also the label

associated with each. My intuition tells me that it is straightforward

(but not easy) to script, say, Unreal engine in a way that would allow

me to do the <x,y,z,r,g,b> sampling. The difficulty is that such engines

are built to render surfaces, and probably do not support much facility

for querying the backend for additional properties of collision points.

This is where I may find a huge flaw in my plan. Anybody have any advice

before I go off and do something ridiculous? If you were trying to solve

this problem, what would you do?

Requests for clarification or additional information will be answered as

quickly as possible.

Thanks in advance!