MLNews

FreeMan: Closing the Gap in 3D Human Pose Recognition Evaluation

Enter the real world of artificial intelligence and robotics!” FreeMan is here to change the way computers view how you move in nature. Witness the future of 3D human pose estimation in everything from crowded streets to dimly lit rooms. FreeMan researchers involve The Chinese University of Hong Kong, Shenzhen, Tencent, and IDEA.

This dataset assists computers in discovering the 3D structure of the human body from images and videos. This is critical for artificial intelligence, games, and how robots interact with humans. Currently, the majority of the data used to teach computers about this is gained in specific labs using special technology, which is not representative of actual life. The lack of real-world data is limiting progress.

FreeMan dataset offers a large number of real-life photographs and videos recorded from different smartphones in a variety of circumstances to address this issue. They also developed a method for automatically labeling this data, making it easier to work with. They also tested it against additional datasets and discovered that it is capable of dealing with difficult scenarios.

FreeMan results related to other datasets

Work Related to FreeMan Dataset

Human pose dataset:

Human modeling is an important part of computer vision. Existing datasets mostly depend upon 2D and 3D keypoint annotations, with 3D keypoint datasets open in monocular and multi-view formats. Single-frame datasets such as give various images with 2D keypoint annotations, whereas video datasets provide 2D keypoints with temporal information. 3D keypoint datasets, on the other hand, are frequently created in inside situations for multi-view.

3D Human Pose Estimation:

The current work divides the task of 3D position estimation into three categories. In the 2D-to-3D pose lifting problem. 3D keypoints derived from 2D keypoints using a convolutional neural network.

In the monocular 3D pose estimation task. They take a single RGB image as input to perform 3D huna pose estimation which is used for comparing other algorithms. Multi-view methods are proposed to help potential body parts overlap in monocular view.

Human Subjects in Neural Rendering:

People are focusing on the dynamic modeling of humans with the development of dynamic scene modeling. Humans’ non-rigid characteristic poses greater challenges than dynamic scenes. Prior knowledge of body movements can be useful for visualization, and many systems use SMPL as a prior for body modeling. Most approaches recreate human bodies using multi-view films, however, recent research has also used single-view movies.

Neural rendering prior work of FreeMan dataset

Introduction about FreeMan:

Predicting 3D human poses from real-world input has been a long-standing yet active research area due to its huge potential in real-world applications such as animation creation, virtual reality, the metaverse, and human-robot interaction. It tries to recognize and determine the relative positions and orientations of human body components in 3D space from input data such as images or videos.

Despite various models published in recent years, practical implementation in real-world scenarios remains difficult due to viewpoint change, events, human scale variation, and complicated background. Some difficulties can occur as a result of the gap between recent benchmarks and real-world settings.

The limitations related to the previous works include Insufficient Scene Diversity, Limited action & body scale, and Restricted scalability. To solve all these issues they present FreeMan dataset. FreeMan has 11M frames in 8000 sequences taken by 8 smartphone cameras from different perspectives at the same time. It covers 40 topics in ten different situations. To the best of our knowledge, it is the largest multi-view 3D posture estimate dataset currently available, with changeable camera parameters and varied background surroundings. It is 215x the well-known outdoor dataset 3DPW.

FreeMan data collection camera settings
FreeMan data collection camera settings

First of all, an extensive amount of scenes introduce variety in both backgrounds and lighting, which improves the ability to generalize models trained on FreeMan in real-world circumstances. This makes it ideal to test algorithmic performance in real-world applications.

Second, the distances between the eight cameras and the performers vary between and among individuals, resulting in important scale shifts in human bodies all over different sequences.

Third, with the use of mobile data collection devices, the annotation of the FreeMan dataset is not dependent on time-consuming human methods, considerably improving the dataset’s flexibility.

Finally, they suggested FreeMan which can perform a broad variety of posture estimation tasks, such as monocular 3D estimation, 2D-to-3D lifting, multi-view 3D estimation, and neural rendering of human beings.

FreeMan data collection in different situations
FreeMan data collection in different situations

In conclusion, this paper produced three contributions: (1) They created a large-scale dataset for estimating 3D human poses in uncontrolled situations. The models trained on this dataset have exhibited outstanding adaptability to real-world conditions. (2) They demonstrated a simple yet successful toolchain for automatically generating accurate 3D annotations from collected data. (3) On FreeMan, they give deep standards for human pose estimation and modeling, easing applications downstream. These baselines show potential future algorithmic development paths.

FreeMan potential in future years

This technology will enable more virtual reality, augmented reality, and gaming experiences, improving our digital interactions. Furthermore, industries such as healthcare and physiotherapy will use this to track patients, develop personalized medicines, and improve diagnoses. Furthermore, FreeMan can be used to improve the safety and efficiency of autonomous cars by the motions of passengers and people walking, thus boosting transportation’s future.

Research and related data to FreeMan

More details and all the dataset information are available on arxiv and GitHub. This more information is available to the public and anyone can use them for free. This dataset is very useful in many different fields so the people who are researchers or interested in that field can use that public data which is open source for all.

Potential applications of FreeMan

Performance in Sports Assessment: This dataset can be used by athletes as well as coaches to analyze athletes’ motions and postures during training and competition. It may help in finding areas for technique advancement, optimizing training habits, and avoiding injuries in sports.

Economics and Workplace Safety: The dataset can be used to assess the workplace by analyzing workers’ posture and movements. This knowledge can be used to create a better environment, lessening the risk of injuries to the spine and increasing overall safety.

Fashion & Textile Design: The dataset can be used by fashion designers and clothing sellers to better understand how textile works with the human body in changing, real-world circumstances.

City Planning and Building Design: The dataset can help city planners, as well as architects, to analyze how people travel and interact with towns and cities.

There are many other fields where this dataset will be helpful to use.

FreeMan dataset details

FreeMan is a large-scale multi-view dataset that provides exact 3D pose annotations in nature. It consists of 11 million frames from 1000 sessions, with 40 subjects spread across 10 different sorts of scenarios. The collection contains 10 million frames captured at 30 frames per second and an extra 1 million frames recorded at 60 frames per second. Following that, they highlight FreeMan’s flexibility, including numerous camera settings and scenario options.

Scenarios:

For their data collecting, they create ten different sorts of real-world situations, including four interiors and six outdoor scenes. The blue area reflects data acquired outside, while the red sector relates to frames captured inside scenes. 2.76 million frames were shot indoors while 8.45 million frames were captured outside. There are also varied frame numbers recorded under different illumination circumstances, with 1 million frames captured at night and 7.45 million pictures captured outside during the day.

Furthermore, the center block of the circle represents various circumstances, while the blocks on the outside circle represent actions. The block areas are equal to the appropriate frame numbers.

FreeMan dataset view

Action set:

Following the popular action recognition dataset NTU-RGBD120, they construct their action set with various common activities matching to everyday settings, such as drinking and conversing in a cafe and reading in a library. Subjects also interact with real-world objects to make activities as realistic as feasible. Interaction with objects causes complex obstacles, making their data more difficult to interpret. In outdoor circumstances, they make the data-gathering field as big as possible to allow subjects to undertake tasks with as little limitation as possible.

Camera poses:

Because cameras in prior 3D human position datasets were fixed during data collection, only a few camera poses were included. Their cameras are fixed to lightweight tripods and rotated on a regular basis, and the translation from the system center to the camera, which is the actual distance between the camera and the system center, can range from 2m to 5.5m. The majority of cameras are about 4 meters away from the system center. They also illustrate the distribution of the human limits area in an image in a unit of ratio to the entire image area to exhibit variation of human size.

Subjects:

There are 40 subjects involved in the building of FreeMan, and recruitment is entirely voluntary. They are all well-informed and have agreed to make the data public for research purposes exclusively.

FreeMan dataset conclusion

FreeMan is an innovative large-scale multi-view 3D position estimation dataset with extended tasks and 3D human posture descriptions. They thoroughly design a simple yet effective annotation pipeline for independently labeling frame-level 3D landmarks and exact 3D human motions at a significantly cheaper cost. They set benchmarks in human modeling for a variety of tasks, including monocular and multi-view 3D human pose estimation, 2D-to-3D pose lifting, and neural rendering of human beings. Extensive experimental findings indicate the proposed FreeMan’s strengths.

As a large-scale human motion dataset, FreeMan bridges the gap between existing datasets and real-world applications, and they are hopeful that it will catalyze the development of algorithms suited to model and sense human behavior in real-world scenarios.

FreeMan pipeline

References

https://wangjiongw.github.io/freeman/

https://arxiv.org/pdf/1909.12200v3.pdf


Similar Posts

    Signup MLNews Newsletter

    What Will You Get?

    Bonus

    Get A Free Workshop on
    AI Development