Alibaba’s Mobile-Agent: A Smart Mobile Assistant

Introduce your mobile with Alibaba’s Mobile-Agent, an assistant to understand your screen accurately. Researchers at Beijing Jiaotong University and Alibaba Group presented this model.

It is a smart helper that can understand the screen and acts according to the given instructions. Let’s assume you want to buy anything using any shopping app, Mobile-Agent will open and understand the app from the screen, figure out necessary details, navigate automatically and find out the products, prices etc and helps to shop the product.

Mobile-Agent sees and understands icons, buttons and text on the front end interface, just like humans by using its special tools. After understanding the screen it will make a plan and break down complex and complicated tasks into simple steps, independently.  

Mobile-Agent is an extremely adaptable mobile device agent which can go along with different types of phones without requiring special files and extra customization. It is independent of XML and system metadata unlike other web navigators.

Mobile-Agent Example
Implementation of Mobile-Agent

The instructions of specific task is provided to the tool that is “Find video of Stephen Curry on YouTube and add comments“. The mobile agent navigated through the Home screen of smartphone and opened YouTube, navigate through the search bar and type “Stephen Curry”, selected a video and added comment in the video.

Mobile-Agent Example
Implementation of Mobile-Agent

Another implementation performed by this mobile device agent is mentioned above. The instruction “Find out weather and write a note about it” was given and the task was performed successfully.

This mobile agent showed extra-ordinary performance on the newly introduced benchmark by showing accuracy in understanding and completing complex tasks on phone efficiently and effectively.

The various functions that this latest research is able to perform are;

  1. Analyse future weather using weather Apps
  2. Make travel plan using calendars
  3. Search video on YouTube and add comment
  4. Swipe through a video and like it using different social media apps
  5. Change system settings using settings from mobile
  6. Navigate different locations using Google Maps

Mobile-Eval for Mobile-Agent

To check the performance of this latest research, a new benchmark MobileEval was introduced which consists of 10 commonly used Apps on mobile devices. Four different metrics were defined to assess the performance. They are as follows; 

  1. Success (Su) means Mobile-Agent has followed instructions and achieved the desired objective.
  2. Process Score (PS) measures the ratio of correct steps to total steps. Even if the overall task isn’t successful, each correct step adds to the Planning Score.
  3. Relative Efficiency (RE) means that human steps are compared by steps taken by Mobile-Agent to assess if Mobile-Agent can use mobile devices efficiently. 
  4. Completion Rate (CR) means that the Mobile-Agent completed the steps are divided by total number of steps taken by humans to check the completion rate. If the instruction is completed, this metric will be equal to 1.

Wrap Up

Mobile-Agent is an extremely smart agent that is fluent in handling various smartphone applications. It is able to find specific applications using its special tools and plan independently on the user interface and break down complicated tasks into simple steps. It is extremely adaptable that it can work well on different phones without requiring extra customization. Different evaluations showed that it is extremely effective and efficient. It is flexible enough that it is able to understand different languages easily.


Similar Posts

Signup MLNews Newsletter

What Will You Get?


Get A Free Workshop on
AI Development