User-uploaded images taken of an object at different viewpoints are first processed using a Structure-from-Motion (SfM) pipeline which matches the same visual features among those images. These features are used to estimate the depth information of the 3D object as well as camera poses, from which a sparse 3D point cloud can begenerated.
The next process is to create a dense 3D point cloud by matching more pixels in images to 3D points. This is done using Multi-view Stereo. With a dense 3D point cloud, it is now possible to create a 3D mesh model by triangulating 3D points using Delaunay Triangulation, which is a geometric approach. The mesh model is then refined and textured, which is displayed to the user using WebGL/Three.js.
This pipeline is user-friendly because it does not require any specialized cameras or sensors to use. It create 3D structures from a single monocular camera with no depth sensors through feature-matching. It also creates high-resolution textured mesh while it only requires users to upload images taken on their regular monocular smartphone cameras.