OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity- preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. To achieve such high level of disentangled control, we first explicitly define a novel semantic signed distance function (SDF) around a head geometry (FLAME) conditioned on the control parameters. This semantic SDF allows us to build a differentiable volumetric correspondence map from the observation space...

PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360

Synthesis and reconstruction of 3D human head has gained increasing interests in computer vision and computer graphics recently. Existing state-of-the-art 3D generative adversarial networks (GANs) for 3D human head synthesis are either limited to near-frontal views or hard to preserve 3D consistency in large view angles. We propose PanoHead, the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360◦ with diverse appearance and detailed geometry using only in-the-wild unstructured images for training. At its core, we lift up the representation power of recent 3D...

AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

Stylized 3D avatars have become increasingly prominent in our modern life. Creating these avatars manually usually involves laborious selection and adjustment of continuous and discrete parameters and is time-consuming for average users. Self-supervised approaches to automatically create 3D avatars from user selfies promise high quality with little annotation cost but fall short in application to stylized avatars due to a large style domain gap. We propose a novel self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters. Our cascaded domain bridging framework...

AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning

Portraiture as an art form has evolved from realistic depiction into a plethora of creative styles. While substantial progress has been made in automated stylization, generating high quality stylistic portraits is still a challenge, and even the recent popular Toonify suffers from several artifacts when used on real input images. Such StyleGAN-based methods have focused on finding the best latent inversion mapping for reconstructing input images; however, our key insight is that this does not lead to good generalization to different portrait styles. Hence we propose AgileGAN, a framework that...

HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures

We present HoliCity, a city-scale 3D dataset with rich structural information. Currently, this dataset has 6,300 real-world panoramas of resolution 13312 × 6656 that are accurately aligned with the CAD model of downtown London with an area of more than 20 km2, in which the median reprojection error of the alignment of an average image is less than half a degree. This dataset aims to be an all-in- one data platform for research of learning abstracted high-level holistic 3D structures that can be derived from city CAD models, e.g., corners,...

DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues

Recently, significant progress has been made in single-view depth estimation thanks to increasingly large and diverse depth datasets. However, these datasets are largely limited to specific application domains (e.g. indoor, autonomous driving) or static in-the-wild scenes due to hard-ware constraints or technical limitations of 3D reconstruction. In this paper, we introduce the first depth dataset DynOcc consisting of dynamic in-the-wild scenes. Our approach leverages the occlusion cues in these dynamic scenes to infer depth relationships between points of selected video frames. To achieve accurate occlusion detection and depth order estimation,...

Deep Generative Modeling for Scene Synthesis via Hybrid Representations

We present a deep generative scene modeling technique for indoor environments. Our goal is to train a generative model using a feed-forward neural network that maps a prior distribution (e.g., a normal distribution) to the distribution of primary objects in indoor scenes. We introduce a 3D object arrangement representation that models the locations and orientations of objects, based on their size and shape attributes. Moreover, our scene representation is applicable for 3D objects with different multiplicities (repetition counts), selected from a database. We show a principled way to train this...

Transformable Bottleneck Networks

We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN). It applies given spatial transformations directly to a volumetric bottleneck within our encoder-bottleneck-decoder architecture. Multi-view supervision encourages the network to learn to spatially disentangle the feature space within the bottleneck. The resulting spatial structure can be manipulated with arbitrary spatial transformations. We demonstrate the efficacy of TBNs for novel view synthesis, achieving state-of-the-art results on a challenging benchmark. We demonstrate that the bottlenecks produced...

Dynamic Kernel Distillation for Efficient Pose Estimation in Videos

Existing video-based human pose estimation methods extensively apply large networks onto every frame in the video to localize body joints, which suffer high computational cost and hardly meet the low-latency requirement in realistic applications. To address this issue, we propose a novel Dynamic Kernel Distillation (DKD) model to facilitate small networks for estimating human poses in videos, thus significantly lifting the efficiency. In particular, DKD introduces a light-weight distillator to online distill pose kernels via leveraging temporal cues from the previous frame in a one-shot feed-forward manner. Then, DKD simplifies...

Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion

Estimating the relative rigid pose between two RGB-D scans of the same underlying environment is a fundamental problem in computer vision, robotics, and computer graphics. Most existing approaches allow only limited relative pose changes since they require considerable overlap between the input scans. We introduce a novel approach that extends the scope to extreme relative poses, with little or even no overlap between the input scans. The key idea is to infer more complete scene information about the underlying environment and match on the completed scans. In particular, instead of...

Stabilized Real-time Face Tracking via a Learned Dynamic Rigidity Prior

Despite the popularity of real-time monocular face tracking systems in many successful applications, one overlooked problem with these systems is rigid instability. It occurs when the input facial motion can be explained by either head pose change or facial expression change, creating ambiguities that often lead to jittery and unstable rigid head poses under large expressions. Existing rigid stabilization methods either employ a heavy anatomically-motivated approach that are unsuitable for real-time applications, or utilize heuristic-based rules that can be problematic under certain expressions. We propose the first rigid stabilization method...

3D Hair Synthesis Using Volumetric Variational Autoencoders

Recent advances in single-view 3D hair digitization have made the creation of high-quality CG characters scalable and accessible to end-users, enabling new forms of personalized VR and gaming experiences. To handle the complexity and variety of hair structures, most cutting-edge techniques rely on the successful retrieval of a particular hair model from a comprehensive hair database. Not only are the aforementioned data-driven methods storage intensive, but they are also prone to failure for highly unconstrained input images, exotic hairstyles, failed face detection. Instead of using a large collection of 3D...

Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan/image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency provides effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization. We demonstrate the effectiveness...

StarMap for Category-Agnostic Keypoint and Viewpoint Estimation

Semantic keypoints provide concise abstractions for a variety of visual understanding tasks. Existing methods define semantic keypoints separately for each category with a fixed number of semantic labels. As a result, these keypoints are not suitable when objects have a varying number of parts, e.g. chairs with varying number of legs. We propose a category-agnostic keypoint representation encoded with their 3D locations in the canonical object views. Our intuition is that the 3D locations of the keypoints in canonical object views contain rich semantic and compositional information. Our representation thus...

Deep Volumetric Video From Very Sparse Multi-View Performance Capture

We present a deep learning-based volumetric capture approach for performance capture using a passive and highly sparse multi-view capture system. We focus on a template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, where conventional visual hull or multi-view stereo methods would fail. State-of-the-art performance capture systems require either pre-scanned actors, large number of cameras or active sensors. We introduce a novel multi-view Convolutional Neural Network (CNN) that maps 2D images to a 3D volumetric field that encodes the probabilistic distribution of surface points of the captured...

AutoScaler: Scale-Attention Networks for Visual Correspondence

Finding visual correspondence between local features is key to many computer vision problems. While defining features with larger contextual scales usually implies greater discriminativeness, it could also lead to less spatial accuracy of the features. We propose AutoScaler, a scale-attention network to explicitly optimize this trade-off in visual correspondence tasks. Our architecture consists of a […]

Customized Expression Recognition for Performance-Driven Cutout Character Animation

Performance-driven cutout character animation. Actors perform customized expressions in (a) e.g. “disdainful” (top) and “daydreaming” (bottom) to animate the expressions of various cutout characters in (b). Note that the large inter-person expression variations even within the same expression category.   Performance-driven character animation enables users to create expressive results by performing the desired motion of […]

High-Quality Hair Modeling from A Single Portrait Photo

We propose a novel system to reconstruct a high-quality hair depth map from a single portrait photo with minimal user input. We achieve this by combining depth cues such as occlusions, silhouettes, and shading, with a novel 3D helical structural prior for hair reconstruction. We fit a parametric morphable face model to the input photo and construct a base shape in the face, hair and body regions using occlusion and silhouette constraints. We then estimate the normals in the hair region via a Shape-from-Shading-based optimization that uses the lighting inferred...

Level-Set-Based Partitioning and Packing Optimization of a Printable Model

As the 3D printing technology starts to revolutionize our daily life and the manufacturing industries, a critical problem is about to emerge: how can we find an automatic way to divide a 3D model into multiple printable pieces, so as to save the space, to reduce the printing time, or to make a large model printable by small printers. In this paper, we present a systematic study on the partitioning and packing of 3D models under the multi-phase level set framework. We first construct analysis tools to evaluate the qualities...

Single-View Hair Modeling Using A Hairstyle Database

Hair digitalization has been one of the most critical and challenging tasks necessary to create virtual avatars. Most existing hair modeling techniques require either expensive capture devices or tedious manual effort. In this paper, we present a data-driven approach to create a complex 3D hairstyle from the single view of a photograph. We first build a database of 343 manually created 3D example hairstyles from some online repositories. Given a reference photo of the target hairstyle and a few user strokes as guidance, we automatically search for several best matching...

Capturing Braided Hairstyles

From fishtail to princess braids, these intricately woven structures define an important and popular class of hairstyle, frequently used for digital characters in computer graphics. In addition to the challenges created by the infinite range of styles, existing modeling and capture techniques are particularly constrained by the geometric and topological complexities. We propose a data-driven method to automatically reconstruct braided hairstyles from input data obtained from a single consumer RGB-D camera. Our approach covers the large variation of repetitive braid structures using a family of compact procedural braid models. From...

Robust Hair Capture Using Simulated Examples

We introduce a data-driven hair capture framework based on example strands generated through hair simulation. Our method can robustly reconstruct faithful 3D hair models from unprocessed input point clouds with large amounts of outliers. Current state-of-the-art techniques use geometrically-inspired heuristics to derive global hair strand structures, which can yield implausible hair strands for hairstyles involving large occlusions, multiple layers, or wisps of varying lengths. We address this problem using a voting-based fitting algorithm to discover structurally plausible configurations among the locally grown hair segments from a database of simulated examples....

3D Self-Portraits

We develop an automatic pipeline that allows ordinary users to capture complete and fully textured 3D models of themselves in minutes, using only a single Kinect sensor, in the uncontrolled lighting environment of their own home. Our method requires neither a turntable nor a second operator, and is robust to the small deformations and changes of pose that inevitably arise during scanning. After the users rotate themselves with the same pose for a few scans from different views, our system stitches together the captured scans using multi-view non-rigid registration, and...

Structure-Aware Hair Capture

Existing hair capture systems fail to produce strands that reflect the structures of real-world hairstyles. We introduce a system that reconstructs coherent and plausible wisps aware of the underlying hair structures from a set of still images without any special lighting. Our system first discovers locally coherent wisp structures in the reconstructed point cloud and the 3D orientation field, and then uses a novel graph data structure to reason about both the connectivity and directions of the local wisp structures in a global optimization. The wisps are then completed and...

Wide-baseline Hair Capture using Strand-based Refinement

We propose a novel algorithm to reconstruct the 3D geometry of human hairs in wide-baseline setups using strand-based refinement. The hair strands are first extracted in each 2D view, and projected onto the 3D visual hull for initialization. The 3D positions of these strands are then refined by optimizing an objective function that takes into account cross-view hair orientation consistency, the visual hull constraint and smoothness constraints defined at the strand, wisp and global levels. Based on the refined strands, the algorithm can reconstruct an approximate hair surface: experiments with...

Chopper: Partitioning Models into 3D-Printable Parts

3D printing technology is rapidly maturing and becoming ubiquitous. One of the remaining obstacles to wide-scale adoption is that the object to be printed must fit into the working volume of the 3D printer. We propose a framework, called Chopper, to decompose a large 3D object into smaller parts so that each part fits into the printing volume. These parts can then be assembled to form the original object. We formulate a number of desirable criteria for the partition, including assemblability, having few components, unobtrusiveness of the seams, and structural...

Multi-View Hair Capture Using Orientation Fields

Reconstructing realistic 3D hair geometry is challenging due to omnipresent occlusions, complex discontinuities and specular appearance. To address these challenges, we propose a multi-view hair reconstruction algorithm based on orientation fields with structure-aware aggregation. Our key insight is that while hair's color appearance is view-dependent, the response to oriented filters that captures the local hair orientation is more stable. We apply the structure-aware aggregation to the MRF matching energy to enforce the structural continuities implied from the local hair orientations. Multiple depth maps from the MRF optimization are then fused...

Temporally Coherent Completion of Dynamic Shapes

We present a novel shape completion technique for creating temporally coherent watertight surfaces from real-time captured dynamic performances. Because of occlusions and low surface albedo, scanned mesh sequences typically exhibit large holes that persist over extended periods of time. Most conventional dynamic shape reconstruction techniques rely on template models or assume slow deformations in the input data. Our framework sidesteps these requirements and directly initializes shape completion with topology derived from the visual hull. To seal the holes with patches that are consistent with the subject's motion, we first minimize...

Estimating the Laplace-Beltrami Operator by Restricting 3D Functions

We present a novel approach for computing and solving the Poisson equation over the surface of a mesh. As in previous approaches, we define the Laplace-Beltrami operator by considering the derivatives of functions defined on the mesh. However, in this work, we explore a choice of functions that is decoupled from the tessellation. Specifically, we use basis functions (second-order tensor-product B-splines) defined over 3D space, and then restrict them to the surface. We show that in addition to being invariant to mesh topology, this definition of the Laplace-Beltrami operator allows...

Video Diver: Generic Video Indexing with Diverse Features

Semantic video indexing is critical for practical video retrieval systems and a generic and scalable indexing framework is a must for indexing a large semantic lexicon with over 1000 concepts present. This paper fully explores the idea of incorporating many kinds of diverse features into a single framework, combining them altogether to obtain larger degree of invariance which is absent in any of the component features, and thus achieves genericness and scalability. We scale down the formidable computational expense with a clever design of the classification and fusion schemes. To...