NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

Conference on Robot Learning (CoRL) 2023

Ruohan Zhang*1,4, Sharon Lee*1, Minjune Hwang*1, Ayano Hiranaka*2, Chen Wang1,
Wensi Ai1, Jin Jie Ryan Tan1, Shreya Gupta1, Yilun Hao1, Gabrael Levine1,
Ruohan Gao1, Anthony Norcia3, Li Fei-Fei1,4, Jiajun Wu1,4
1Department of Computer Science, Stanford University
2Department of Mechanical Engineering, Stanford University
3Department of Psychology, Stanford University
4Institute for Human-Centered AI (HAI), Stanford University
*Equally contributed

Abstract

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals. Through this interface, humans communicate their intended objects of interest and actions to the robots using electroencephalography (EEG). Our novel system demonstrates success in an expansive array of 20 challenging, everyday household activities, including cooking, cleaning, personal care, and entertainment. The effectiveness of the system is improved by its synergistic integration of robot learning algorithms, allowing for NOIR to adapt to individual users and predict their intentions. Our work enhances the way humans interact with robots, replacing traditional channels of interaction with direct, neural communication.


Motivation

Envisioned by scientists, engineers, and artists, brain-robot interface (BRI) stands out as a thrilling but challenging research topic. We want to leverage recent progress in machine learning, neuroscience, and robot learning to build novel BRI systems for:


The NOIR System

NOIR has two components, a modular pipeline for decoding goals from human brain signals, and a robotic system with a library of primitive skills. The robots possess the ability to learn to predict human intended goals hence reducing the human effort required for decoding.




Decoding Human Intention from EEG

NOIR uses a modular pipeline for decoding human intended goals from EEG signals: (a) What object to manipulate, decoded from SSVEP signals using CCA classifiers; (b) How to interact with the object, decoded from MI signals using CSP+QDA algorithms; (c) Where to interact, decoded from MI signals. A safety mechanism that captures muscle tension from jaw clench is used to confirm or reject decoding results.




Robots with Parametrized Primitive Skills

Human intention can be mapped to 14 parametrized robot skills, such as Pick(x,y,z), Place(x,y,z), and Push(x,y,z,d). Humans find novel use and combine these skills to accomplish hard tasks.


Robot Learning for more Efficient BRI

Decoding human intention is time-consuming and costly. The robot learns human object, skill, and parameter selections in a few-shot manner, so human effort and time can be reduced as they perform the same task in similar contexts.

Our retrieval-based few-shot object and skill selection model is shown below. It learns a latent representation for observations. Given a new observation, it finds the most relevant experience in the memory and selects the corresponding skill and object.



Our one-shot skill parameter learning algorithm is shown below. It finds a semantically corresponding point in the test image given a reference point in the training image. The feature visualization shows 3 of the 768 DINOv2 tokens used.



Experiments and Results

With NOIR, 3 human participants (2M1F) accomplished 20 long-horizon tasks (4-15 skills) everyday activities. 16 activities (No.2 - No.17) are tabletop manipulation tasks with Franka, and 4 (No.18 - No.21) are mobile manipulation tasks with Tiago. The tasks include 8 meal preparation tasks, 6 cleaning tasks, 3 personal care tasks, and 3 entertainment tasks.


Task Videos (8x speed, decoding period omitted)


WipeSpill
 
CollectToy
 
SweepTrash
CleanBook
 
IronCloth
 
OpenBasket
PourTea
 
SetTable
 
GrateCheese
CutBanana
 
CookPasta
 
Sandwich
Hockey
 
OpenGift
 
TicTacToe
TrashDisposal
 
CovidCare
WaterPlant
 
PetDog (1x)

Quantative Results

On average, each task requires 1.8 attempts to success, and task completion time is 20.3 minutes. Task horizon is the average number of primitive skills executed. # attempts indicate the average number of attempts until the first success (1 means success on the first attempt). Time indicates the task completion time in successful trials. Human time is the percentage of the total time spent by human users, this includes decision-making time and decoding time.
Decoding accuracy at different stages of the experiment. Decoding time and accuracy, like in almost all BRI research, are the key challenges.
With robot learning algorithms, object and skill selection learning reduces the decoding time by 60%, parameter learning decreases cursor movement distance by 41%.



FAQs about BRI and NOIR

Is EEG safe to use? Are there any potential risks or side effects of using the EEG for extended periods of time?

EEG devices are generally safe with no known side effects and risks, especially when compared to invasive devices like implants. We use saline solution to lower electrical impedance and improve conductance. The solution could cause minor skin irritation when the net is used for extended periods of time, hence we mix the solution with baby shampoo to mitigate this.

How does the system ensure user safety, particularly in the context of real-world tasks with varying environments and unpredictable events?

On top of our 100% decoding accuracy, we implement an EEG-controlled safety mechanism to confirm or interrupt robot actions with muscle tension, as decoded through clenching. Nevertheless, it is important to note that the current implementation entails a 500ms delay when interrupting robot actions which might lead to a potential risk in more dynamic tasks. With more training data using a shorter decoding window, the issue can be potentially mitigated.

Can EEG / NOIR be applied to different people? Given that the paper has only been tested on three human subjects, how can the authors justify the generalizability of the findings?

The EEG device employed in our research is versatile, catering to both adults and children as young as five years old. Accompanied by SensorNets of varying sizes, the device ensures compatibility with different head dimensions. Our decoding methods have been thoughtfully designed with diversity and inclusion in mind, drawing upon two prominent EEG signals: steady-state visually evoked potential and motor imagery. These signals have exhibited efficacy across a wide range of individuals. However, it is important to acknowledge that the interface of our system, NOIR, is exclusively visual in nature, rendering it unsuitable for individuals with severe visual impairments.

Can EEG be used outside the lab?

While mobile EEG devices offer portability, it is worth noting that they often exhibit a comparatively much lower signal-to-noise ratio. Various sources contribute to the noise present in EEG signals, including muscle movements, eye movements, power lines, and interference from other devices. These sources of noise exist in and outside of the lab; consequently, though we've chosen to implement robust decoding techniques based on classical statistics, more robust further filtering techniques to mitigate these unwanted artifacts and extract meaningful information accurately are needed for greater success in more chaotic environments.

How does the system differentiate between intentional brain signals for task execution and other unrelated brain activity? How will you address potential issues of privacy and security?

The decoding algorithms employed in our study were purposefully engineered to exclusively capture task-relevant signals, ensuring the exclusion of any extraneous information. Adhering to the principles of data privacy and in compliance with the guidelines set by the Institutional Review Board (IRB) for human research, the data collected from participants during calibration and experimental sessions were promptly deleted following the conclusion of each experiment. Only the decoded signals, stripped of any identifying information, were retained for further analysis.

How scalable is the robotics system? Can it be easily adapted to different robot platforms or expanded to accommodate a broader range of tasks beyond the 20 household activities tested?

Within the context of our study, two notable constraints are the speed of decoding and the availability of primitive skills. The former restricts the range of tasks to those that do not involve time-sensitive and dynamic interactions, such as capturing a moving object. However, the advancement in decoding accuracy and the reduction of the decoding window duration may eventually address this limitation. These improvements can potentially be achieved through the utilization of larger training datasets and the implementation of machine-learning-based decoding models, leveraging the high temporal resolution offered by EEG.

The development of a comprehensive library of primitive skills stands as a long-term objective in the field of robotics research. This entails creating a repertoire of fundamental abilities that can be adapted and combined to address new tasks. Additionally, our findings indicate that human users possess the ability to innovate and devise novel applications of existing skills to accomplish tasks, akin to the way humans employ tools.

How exactly do both individuals with and without disabilities benefit from this BRI system?

The potential applications of systems like NOIR in the future are vast and diverse. One significant area where these systems can have a profound impact is in assisting individuals with disabilities, particularly those with mobility-related impairments. By enabling these individuals to accomplish Activities of Daily Living and Instrumental Activities of Daily Living[1] tasks, such systems can greatly enhance their independence and overall quality of life.

Currently, individuals without disabilities may initially find the BRI pipeline to have a learning curve, resulting in inefficiencies compared to their own performance in daily activities in their first few attempts. However, robot learning methods hold the promise of addressing these inefficiencies over time, and enable robots to help their users when needed.

BibTeX


@inproceedings{lee2023noir,
  title={NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities},
  author={Zhang, Ruohan and Lee, Sharon and Hwang, Minjune and Hiranaka, Ayano and Wang, Chen and Ai, Wensi and Tan, Jin Jie Ryan and Gupta, Shreya and Hao, Yilun and Levine, Gabrael and Gao, Ruohan and Norcia, Anthony and Fei-Fei, Li and Wu, Jiajun},
  booktitle={7th Annual Conference on Robot Learning},
  year={2023}
}