An UpVector is simply a reference vector that helps consolidating a transform.
Very simply, think of a (rotation) transform as a set of three orthonormal arrows (your X, Y and Z vectors).
If you have only one arrow (usually considered the AtVector, or the dominant axis of any derivative/heuristic process), your transform could be any of an infinite number of transforms rolling around that axis.
When you give it an UpVector, you are literally telling your process “this is up”, which is why it’s often assumed to be world Y. In the case of aligning something not roll sensitive (such as a circular spot light with no gobos) to a normal, you can take any given vector that’s not aligned to your AtVector, and you will be fine.
Once you have an At and an Up, you can always get a single consistent transform out of it.
As for 4x4 matrices, they are an extension of 3x3 that adds translation and homogeneization factors.
If you only need a world rotation and don’t care about the position (IE infinite lights), you can just pump your 3x3 into a 4x4 and leave the remaining values to those of an ID matrix.
In this case:
X1, X2, X3, 0
Y1, Y2, Y3, 0
Z1, Z2, Z3, 0
0, 0, 0, 1
That last row normally is your translation. So if you want your transform to also be positioned somewhere in space, pump a vector describing where it should be in that last row (again, I assume row matrices here, but I think that’s the case for Nuke).
The last column, 0,0,0,1 you can just ignore for now to save yourself some headaches, they aren’t needed when dealing with homogenous transforms. They will come into play though if you start dealing with distortion and space projection (perspective cameras, particular types of transforms and so on).
BTW: There are so many term in this area that go over my head, terms like afine and hetrogenous mean nothing to me. What’s a good resource that goes through all this?
Wiki hopping imo, but Vince’s “mathematics for computer graphics 2nd edition” is also an excellent and not overwhelming intro to many of these concepts.
Principles of computer graphics (or something like that) is also sort of a staple read, but it’s very succint, covers a lot of ground, and can be overwhelming almost instantly. It’s more of a reference for the already semi-educated.
The other problem you’re facing is that a lot of articles and books tend to treat subjects in their entirety too soon.
IE: to know how to align an object to a normal you absolutely don’t need to know about homogenous or etherogenous transforms, matrix affinity or any of those things, but because they are an important part of matrices and transforms as a general abstract, and you aren’t reading “applied” literature, you will be shovelled with that kind of stuff.
If you keep studying you will eventually learn some of those things, but more importantly, you will develop filters in your brain to approach new subjects progressively and an instinct for ignoring selectively what isn’t immediately necessary 
In my experience as a victim and a teacher both, I have found no other way around it but persistence and constant research, stubborness, and getting advice when available.