[zh-CN translation] Start the translation efforts for new chapters #1141

VitalC-3026 · 2025-12-01T01:28:10Z

Hi @lewtun @stevhliu,
I'd like to start a Chinese translation effort to bring this incredible chapter to more Chinese-speaking programmers and RL learners. This PR contains the first draft of Chapter 12, Section 1.
I'm submitting this early for a couple of reasons:

Coordination – To check if anyone else is already working on translations, so we can avoid duplicate efforts.
Build verification – I've been running into issues with the doc-builder—not just for my translated pages, but also for the original English pages. I'd appreciate any guidance on whether this is a known issue or something on my end.

Looking forward to your feedback!

HuggingFaceDocBuilderDev · 2025-12-01T01:44:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu

Thanks for your interest in translating the course!

I think it may be better to not have a separate img folder in chapters/zh-CN, and instead, you can link to the images in chapters/en to avoid making the repo too heavy.

stevhliu

Cool, just one missing header and then we can merge!

stevhliu · 2025-12-08T16:46:48Z

chapters/zh-CN/chapter12/2.mdx

+
+对于大语言模型，奖励的设计旨在反映模型在特定任务上的表现——比如回答是否有帮助（helpful）、是否真实（truthful），以及是否无害（harmless）。
+
+这就是智能体选择行动时所依据的策略（Policy）。就像小狗需要理解当你喊“坐下”时，它应该怎么做一样。在强化学习中，策略正是我们真正试图学习和改进的核心。它可以是一系列规则，或者一个函数，用于指导智能体在不同的情境下应该采取什么行动。起初，策略可能是随机的，但随着智能体不断学习，策略会变得更擅长选择能带来更高奖励的行动。


I think you're missing the ### Policy header somewhere here

Thank you for the checks! Updated.

VitalC-3026 added 2 commits November 30, 2025 17:04

translate chapter 12 section 1

17c9837

add first complete chapter

780e6ec

stevhliu approved these changes Dec 1, 2025

View reviewed changes

VitalC-3026 added 2 commits December 1, 2025 23:34

partially update section 2

d8747df

module2 translation complete

411c79b

stevhliu approved these changes Dec 8, 2025

View reviewed changes

fix missing title and polish one translation

e2ff79e

stevhliu merged commit 1f2476b into huggingface:main Dec 8, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[zh-CN translation] Start the translation efforts for new chapters #1141

[zh-CN translation] Start the translation efforts for new chapters #1141

Uh oh!

VitalC-3026 commented Dec 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2025

Uh oh!

stevhliu left a comment

Uh oh!

stevhliu left a comment

Uh oh!

stevhliu Dec 8, 2025

Uh oh!

VitalC-3026 Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		对于大语言模型，奖励的设计旨在反映模型在特定任务上的表现——比如回答是否有帮助（helpful）、是否真实（truthful），以及是否无害（harmless）。

		这就是智能体选择行动时所依据的策略（Policy）。就像小狗需要理解当你喊“坐下”时，它应该怎么做一样。在强化学习中，策略正是我们真正试图学习和改进的核心。它可以是一系列规则，或者一个函数，用于指导智能体在不同的情境下应该采取什么行动。起初，策略可能是随机的，但随着智能体不断学习，策略会变得更擅长选择能带来更高奖励的行动。

[zh-CN translation] Start the translation efforts for new chapters #1141

[zh-CN translation] Start the translation efforts for new chapters #1141

Uh oh!

Conversation

VitalC-3026 commented Dec 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2025

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

VitalC-3026 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants