Augmented Reality Exhibition Development @ Manchester: Day 5: Demoing the demo

The details of Augmented Reality with Goblin XNA

Met today with the famed Andy Wise, wonderful chap, gave a great demo of the system going through the code explaining how it all works.

ALVAR config files, models, marker panels

ALVAR is a software library that requires OpenCV, and in our AR demo is being used to handle all the marker generation and recognition stuff. It performs the recognition via an OpenCV-based edge detection algorithm. The fact that it uses edge detection solely rather than tracking salient points has interesting effects on the end result as will be discussed further down

Each physical marker panel has a corresponding ALVAR config file which contains info on which markers correspond to which panel, a ratio of sizes, and scaling info about the actual model assigned to that panel.

A central config.txt file contains some info about the camera: Framerate, X resolution, Y resolution, then a bunch of lines binding models to the corresponding ALVAR config files.

Terrible representation of how models and markers and marker panels and ALVAR config files and the central config file all correspond to each other

Because the main program is in XNA written in C#, and ALVAR is OpenCV-based and written in C, a marker detector wrapper bridges the gap for us.

GoblinXNA is an XNA wrapper for AR, written by a University in the USA. In terms of the power of its graphics its pretty simplistic, but is fantastic at handling scene graphs - the key point of our AR implementation.

Main code for the AR demo program is in game1.cs

Scene graphs

This is my attempt at explaining the fascination that is the scene graph, prepare yourselves.

Behold! The scene graph

A scene graph is a data structure that arranges the logical representation of a scene. With XNA it is in fact a tree layout although we still call it a graph. Basically nodes represent stuff in the scene, and the main rule is that the effect of a parent is applied to all its child nodes. So in order to draw something the system calculates the ultimate combination of effects of child nodes by iterating down the tree. For example in the above image of an example of part of a simple scene graph, in order to draw the scene the system follow its child nodes to "camera", to "light source", and to "transform" and it deals with them all before the scene can be drawn. "Transform" further iterates down to the model and to another transform, which affects a few more models. In this hierarchical structure, changing the upper transform affects all three models, whereas changing the lower transform only affects its direct children models.

Using a scene graph, one can get up to all sorts of cool stuff. For example deferred rendering: things further down the graph get rendered later on, put global light sources at the top to be sorted out quickly, but if you have millions of little lights that affect small parts of the scene you can stick them all at the bottom of the scene graph along with the models they affect - this leads to faster rendering of a huge number of light sources.

Code itself

Initialize method creates the scene, CreateLights method makes the two global lights, setupMarker etc messes about with ALVAR.

The hierarchy of marker nodes and their models in the scene graph is as follows:

Marker nodes and the models, part of the scene graph

The marker node is at the top, the transform node positions the model (in the nodes below) relative to the marker node. Now as the scene is drawn, moving the marker node causes the geometry to follow suit and thereafter the transform node affects the result. All due to the scene graph hierarchy. You can have any number of lights, transform nodes and geometry nodes under the marker node.

To implement dynamic models, on every cycle of the scene just edit the models transform nodes.

Quaternion is a format of rotation, where 6 degrees of freedom can be expressed in only 4 numbers.

Render cycle is as follows: Run initiate, run update, run draw then it loops the update and draw methods. Update will execute on every tick regardless, whereas draw will only actually execute in the render cycle if the graphics card is ready - no double buffering.

Models in .fbx tend to import well. Click content -> add, and select a model. Many free online repositories.

With AR the complex part is the tracking of the marker nodes. Everything else like the scene graph needs to be simple to do, which is what XNA provides. To port the code to Mac I need to get ALVAR working on Mac, but its a .dll so that poses a problem. Will need to research this to see if anyone else has ported it!

AnimatedModel.cs contains code that loads animated models, pretty simplistic.

If you wanted to rewrite the AR demo:

Would need to consider the graphics engine, e.g. Unity, UDK (Unreal Engine), Cryengine. Current engine is XNA, gives all the model loading, sound support, lights, effects, full scenes and all that.
Would need to write a marker tracker that integrates into the engine - at the very least, just stick the tracking code in the "Update" part of whatever engine it is. Grab an x,y of where the marker currently is, and process stuff based on that.

Limiting factors of the AR demo:

Changing graphics card, drivers etc
CPU speed makes the most difference when considering the actual tracking
Framerate of the camera - this is also the reason the graphics card performance itself is not a limiting factor! Max speed of marker before tracking stops

Framerate is interesting, the current 15FPS means that at a certain speed of moving a marker card about, the camera can't provide a clear enough image for the program to track markers, due to motion blur. This is an issue when you detect edges, but other techniques such as optical flow tracking (aka SLAM - Simultaneous Localisation And Mapping) assumes the camera itself is moving and instead tracks salient features. This allows it to be robust to temporary motion blur because it doesn't simply try and detect the markers afresh in every single frame, rather tracks the motion of certain points - e.g. the corners of the markers. Those are much easier to keep track of even under blur because the effect of them is larger scale.

Resolution - opposite of framerate. How far back the marker can be from the camera, before tracking stops. But if it's further back the motion blur due to the framerate decreases at the same time, increasing robustness to speed. Need to balance resolution and framerate!

If we were to change the camera, would need one that has on board processing rather than offloading compression of the captured video onto the CPU. If we tried 1080p video without on board processing, the processor would be under very heavy load just compressing every frame constantly, XNA can't handle that much data.

New TV, Repositories, and Investigating Windows Laptop part 3

Tested out the new 42" TV that arrived yesterday, all works well and good. A pretty advanced bit of machinery, for reasons unbeknownst to me it comes equipped with an ethernet port no less!

In terms of setting up a git or mercurial repository for the project, I'll likely need some method of private repository so that we can have an updated version of the XNA code in a single place, whilst adhering to the University's privacy policies. It will need to handle multiple versions of the code, namely for Windows and Mac.

Further debugged the Acer Windows laptop that wasn't running the AR demo with the help of Andy. The silent crashes I have been experiencing apparently indicate one of two things:

A problem with the graphics drivers, or
A problem with the camera

Seeing as the laptop sports terrible Intel GMA 4500MHD integrated graphics, the graphics drivers may well be a factor and if this is the case the new laptop should have either a newer Ivy Bridge Intel HD3000 chip, or HD4000 from Haswell. Unfortunately neither option comes cheap in notebooks. Alternatively any dedicated chip from AMD or Nvidia, or even an AMD APU is likely to work - but in all cases you can never know until you try.

Before going the route of a more graphically advanced laptop though, the webcam must be freed of doubt. The Acer laptop has an in-built camera (as well as the attached external webcam we usually use for AR) so there may be a mixup between the two: the current code utilises webcam "0" and sets it up at a framerate of 15 frames a second and a resolution of 800 x 600. If that is referencing the internal camera we don't know if it can handle that framerate. The external webcam definitely works with those settings, so we need to ensure the code is calling the external webcam and not the internal one.

Possible fixes:

Change the camera reference in the code from"0" to something that points at the external cam
Change the settings to a framerate and resolution the internal camera supports
Uninstall the internal camera completely to ensure the code can only be referencing the external one

These shall be investigated next time.

Another possibility is dll's expiring themselves - ALVAR particularly is under suspicion. Although in this case I have already reinstalled ALVAR so it probably isn't the problem here.

Now I just need to add some new dynamic and static models to the system well before next Friday!

Augmented Reality Exhibition Development @ Manchester

SyntaxHighlighter

Friday 14 June 2013

Day 5: Demoing the demo

The details of Augmented Reality with Goblin XNA

ALVAR config files, models, marker panels

Scene graphs

Code itself

New TV, Repositories, and Investigating Windows Laptop part 3

No comments:

Post a Comment