<p>***Employment contracts can be signed in Hong Kong, but you must accept full-time or most of the time to work in Shenzhen!!!!</p>
<p>Duties:</p>
<p>Design or participate in the following research directions:<br>•Reasoning<br>•Computer Use Agent<br>•Code Agent<br>•Embodied Agent<br>1. Responsible for full-chain training in Code, Computer Use, Robotics scenarios, including but not limited to task construction, data collection, model training, evaluation, and improving model task execution performance<br>1. Reward Model optimization and innovation<br>2. Innovative exploration of new training paradigms such as r1-zero<br>3. Explore how to build a robust evaluation method, comprehensively, objectively and fairly evaluate the basic inference planning ability of the model, as well as its interaction ability with complex environments<br>2. Research data synthesis, scalable oversight, break through data bottlenecks, and reduce dependence on human annotation;<br>3. Study the application of system 2 in reasoning and planning abilities, use "slow thinking" to enhance effects, and optimize model basic abilities;<br>4. Enhance model's tool calling, API interaction ability, and solve complex problems by building agents.</p>
<p>Qualifications:</p>
<p>1. Artificial intelligence, computers, software engineering, electronic engineering, automation, robotics, mathematics, etc. Graduated with master's and doctoral degrees, those who are particularly outstanding may have their requirements relaxed.<br>2. Proficient in any one direction, such as computer vision, large language models, multi-modal large models, reinforcement learning, and agents.<br>3. Have experience in using PyTorch and other deep learning frameworks, familiar with distributed training frameworks (such as Megatron-LM and DeepSpeed), and have multi-machine multi-card distributed training experience.<br>4. Theoretical foundation is solid, with innovative spirit and in-depth thinking ability, strong communication skills and team collaboration spirit. </p>
<p>Have the following background preferred:<br>1. Have some accumulation in the following deep reinforcement learning fields<br>1. Model-Free Reinforcement Learning (Model-Free RL: Value-based Algorithm, Policy Gradients, Deterministic Policy Gradients, Distributional RL, Evolutionary Algorithms)<br>2. Imitation Learning/Inverse Reinforcement Learning (Imitation Learning and Inverse Reinforcement Learning: Behavior Clone, GAIL)<br>3. Exploration (Exploration: Intrinsic Motivation, Unsupervised RL)<br>4. Migration and multitask reinforcement learning (Transfer and Multitask RL: Progressive Networks, UVFA, UNREAL, HER)<br>5. Hierarchical Reinforcement Learning (Hierarchy RL: STRAW, Feudal Networks, HIRO)<br>2. Have knowledge and practical experience in realizing intelligent agents, combine long-term and short-term memory, retrieval augmented generation (RAG), and tool integration for dynamic environment.<br>3. Priority will be given to those with high-quality paper publications (such as ICML, NIPS, ICLR, ACL, CVPR, etc.), those with strong academic competition experience, those with significant influence in open-source communities, and those with engineering experience. </p>
View more
Software Engineering
Mandarin