diff --git a/AGENTS.md b/AGENTS.md index 41522234..fe42a0cd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,6 +8,8 @@ For detailed subsystem docs, see [docs/index.md](./docs/index.md). > **The website itself is bilingual too — every indexable page must ship a Simplified Chinese sibling under `/zh`.** See [Chinese Website Pages](#chinese-website-pages-zh--mandatory-for-all-indexable-surfaces) below; a new page, tab, or blog post without its `/zh` version is 🔴 BLOCKING on PR review. +> **Cursor Bugbot re-reviews on EVERY push** — each new commit to a PR can surface new inline comments, including on code an earlier review passed. Before merging, loop until convergence: wait for checks (the Bugbot review is one of the PR checks) → fetch unresolved review comments → fix or answer each with a reply → push → repeat until a push produces no new findings. Branch rules require all review threads resolved before merge, so resolve addressed threads as you go. + ## Project Overview InferenceX App — Next.js 16 dashboard for ML inference benchmark data. DB-backed with Neon PostgreSQL, React Query for data fetching, D3.js for charts. @@ -130,12 +132,13 @@ The site ships a hand-authored Simplified Chinese sibling for every indexable pa **Every new indexable page, dashboard tab, or blog post MUST ship its Chinese version in the same PR:** 1. **New page** → create `packages/app/src/app/zh//page.tsx` with fully translated content and metadata. Metadata: `alternates: zhAlternates('')` plus `openGraph.locale: ZH_OG_LOCALE`. Switch the English page's `alternates` to `enAlternates('')` so both sides carry bidirectional hreflang. Register the route in `ZH_MIRRORED_ROUTES` (`src/lib/i18n.ts`) so the header nav and EN↔中文 toggle link to it, and add it to the sitemap via `localizedPair()` in `src/app/sitemap.ts`. -2. **New dashboard tab** → add the tab to `ZH_TAB_KEYS`, `TAB_META_ZH`, `TAB_INTRO_ZH`, and `TAB_LABELS_ZH` in `src/lib/tab-meta-zh.ts`, then create `src/app/zh/(dashboard)//page.tsx` mirroring the English page with `tabMetadataZh('')` and a `` block above the chart (the interactive chart UI itself stays English). `tab-meta-zh.test.ts` enforces dictionary completeness. +2. **New dashboard tab** → add the tab to `ZH_TAB_KEYS`, `TAB_META_ZH`, `TAB_INTRO_ZH`, and `TAB_LABELS_ZH` in `src/lib/tab-meta-zh.ts`, then create `src/app/zh/(dashboard)//page.tsx` mirroring the English page with `tabMetadataZh('')` and a `` block above the chart; the chart's own UI strings must follow rule 5. `tab-meta-zh.test.ts` enforces dictionary completeness. 3. **New blog post** → the translation `packages/app/content/blog/zh/.mdx` is REQUIRED in the same PR. Translate frontmatter `title`/`subtitle` and the body; keep `date`, `publishDate`, `modifiedDate`, `tags`, and the filename/slug identical (English and Chinese posts pair by filename; visibility gating always follows the English post's `publishDate`). Rewrite internal `/blog/` links to `/zh/blog/`; never alter numbers, code blocks, or `
`/`` structure. The `/zh/blog` listing, hreflang, and sitemap pick the file up automatically. 4. **Editing an existing English page or post** → update its Chinese sibling in the same PR. Content drift between languages is a 🔴 BLOCKING review issue. -5. **Shared UI chrome** (headers, footers, dashboard card titles/descriptions, control labels, buttons, nudges) is localized in place, not duplicated: client components call `useLocale()` (`src/lib/use-locale.ts`) and read from a component-local `STRINGS = { en, zh }` dict; server components take an optional `locale` prop passed from the /zh page. The `en` dict must keep the exact original strings so English pages stay byte-identical. New user-visible chrome strings MUST ship both variants. Chart-internal rendering (D3 axes/tooltips/legend series, CSV export) and data-registry display values (model/GPU/framework/precision names) stay English. -6. **Compare slug narrative sync**: the per-slug compare pages are mirrored at `/zh/compare/[slug]` and `/zh/compare-per-dollar/[slug]`; their Chinese prose templates live in `src/lib/compare-ssr-zh.ts`, a 1:1 port of the English templates in `compare-ssr.ts`. Any PR that changes the English narrative templates MUST update the zh port in the same commit. -7. **Intentionally not mirrored** (skip these, or add them to `ZH_MIRRORED_ROUTES` when you do mirror them): `/datasets`, feature-gated tabs (`ai-chart`, `current-inferencex-image`, `feedback`), `feed.xml`/`llms.txt`, and per-post OG images (Chinese posts reuse the English post's OG image — the OG renderer's font has no CJK glyphs). +5. **ALL user-visible UI strings MUST have a Chinese equivalent** — no carve-outs for "chart internals" or "option labels". This includes: headers/footers, card titles/descriptions, control and filter labels, buttons, toggles (Log Scale, Optimal Only, …), nudges, dropdown OPTION display names (Y-axis metric names, token types, scale modes), searchable-select placeholders ("Search…"), table column headers and action buttons ("Prompts"), modal/drawer chrome, legend footnotes, and empty/loading/error messages. Mechanism: client components call `useLocale()` (`src/lib/use-locale.ts`) and read from a component-local `STRINGS = { en, zh }` dict; server components take an optional `locale` prop passed from the /zh page; registry-defined display names (e.g. `Y_AXIS_METRICS`, legend toggle configs) carry a `labelZh` field resolved through a locale-aware label helper at render time. The `en` values must keep the exact original strings so English pages stay byte-identical. +6. **What stays English** (only these): brand/product names, hardware SKUs, model/framework/precision names, units (tok/s/user, GB/s, $/M tok), code identifiers and flags — per the translation quality bar — plus DB-stored _content_ (benchmark rows, dataset conversation text, run logs), which is data, not UI. +7. **Compare slug narrative sync**: the per-slug compare pages are mirrored at `/zh/compare/[slug]` and `/zh/compare-per-dollar/[slug]`; their Chinese prose templates live in `src/lib/compare-ssr-zh.ts`, a 1:1 port of the English templates in `compare-ssr.ts`. Any PR that changes the English narrative templates MUST update the zh port in the same commit. +8. **Every route gets a /zh sibling — including hidden/feature-gated ones** (`/datasets`, `/ai-chart`, `/current-inferencex-image`, `/feedback`, agentic detail pages). Noindex routes keep their noindex on both sides. The only exceptions: `feed.xml`/`llms.txt` (single-language machine feeds) and per-post OG images (Chinese posts reuse the English post's OG image — the OG renderer's font has no CJK glyphs). ## Chart Interpolation — TS and Python Helpers MUST Stay in Sync diff --git a/docs/i18n.md b/docs/i18n.md index 7b7ca0ae..2a6c5dde 100644 --- a/docs/i18n.md +++ b/docs/i18n.md @@ -25,6 +25,6 @@ Why the Simplified Chinese site is a hand-authored `/zh` page tree instead of an - **Reading time is CJK-aware**: `getReadingTime` counts Han characters at 400 chars/min alongside Latin words at 265 wpm; pure word-splitting counts an entire Chinese paragraph as ~1 "word". - **zh OG images reuse the English post meta** — the `next/og` default Satori font has no CJK glyphs, so a Chinese title would render as tofu. Loading a subset CJK font is a known follow-up. - **`/zh/inference` canonicalizes to `/zh`**, mirroring the English quirk where `/inference` canonicalizes to `/`. -- **Shared chrome is localized in place** via `useLocale()` + component-local `STRINGS = { en, zh }` dicts (footer, TabNav, dashboard display headings/labels, nudges, preset cards). The `en` dict keeps the exact original strings so English pages are byte-identical; chart-internal rendering and data-registry display values stay English. +- **All UI strings are localized in place** via `useLocale()` + component-local `STRINGS = { en, zh }` dicts (footer, TabNav, dashboard display headings/labels, nudges, preset cards, legend toggles, select placeholders), and registry-defined display names (Y-axis metrics, toggle configs) carry `labelZh` fields resolved through locale-aware helpers. The `en` values keep the exact original strings so English pages are byte-identical. Only brand/product/hardware/framework/precision names, units, code identifiers, and DB-stored content stay English. - **Compare slug pages are mirrored** at `/zh/compare/[slug]` and `/zh/compare-per-dollar/[slug]`. The Chinese narrative templates live in `compare-ssr-zh.ts` as a 1:1 port of `compare-ssr.ts` (data logic is imported, only sentence templates differ) — the two files must change together. - **Sitemap pairs**: `localizedPair()` in `sitemap.ts` emits the EN and zh URL together, both carrying the same `alternates.languages` map. Blog posts without a translation fall back to an English-only entry, so a missing translation degrades gracefully instead of 404-ing crawlers. diff --git a/packages/app/src/app/datasets/[slug]/page.tsx b/packages/app/src/app/datasets/[slug]/page.tsx index f32e3fa6..567f5588 100644 --- a/packages/app/src/app/datasets/[slug]/page.tsx +++ b/packages/app/src/app/datasets/[slug]/page.tsx @@ -1,6 +1,7 @@ import type { Metadata } from 'next'; import { DatasetDetail } from '@/components/datasets/dataset-detail'; +import { languageAlternates } from '@/lib/i18n'; import { SITE_URL } from '@semianalysisai/inferencex-constants'; interface Props { @@ -14,7 +15,10 @@ export async function generateMetadata({ params }: Props): Promise { return { title, description, - alternates: { canonical: `${SITE_URL}/datasets/${slug}` }, + alternates: { + canonical: `${SITE_URL}/datasets/${slug}`, + languages: languageAlternates(`/datasets/${slug}`), + }, openGraph: { title: `${title} | InferenceX`, description, url: `${SITE_URL}/datasets/${slug}` }, twitter: { title: `${title} | InferenceX`, description }, }; diff --git a/packages/app/src/app/datasets/page.tsx b/packages/app/src/app/datasets/page.tsx index 7fe46b93..6bd33e43 100644 --- a/packages/app/src/app/datasets/page.tsx +++ b/packages/app/src/app/datasets/page.tsx @@ -3,6 +3,7 @@ import type { Metadata } from 'next'; import { Card } from '@/components/ui/card'; import { JsonLd } from '@/components/json-ld'; import { DatasetList } from '@/components/datasets/dataset-list'; +import { enAlternates } from '@/lib/i18n'; import { SITE_URL } from '@semianalysisai/inferencex-constants'; const DESCRIPTION = @@ -11,7 +12,7 @@ const DESCRIPTION = export const metadata: Metadata = { title: 'Agentic Datasets', description: DESCRIPTION, - alternates: { canonical: `${SITE_URL}/datasets` }, + alternates: enAlternates('/datasets'), openGraph: { title: 'Agentic Datasets | InferenceX', description: DESCRIPTION, diff --git a/packages/app/src/app/sitemap.ts b/packages/app/src/app/sitemap.ts index fbe5d987..03a1072d 100644 --- a/packages/app/src/app/sitemap.ts +++ b/packages/app/src/app/sitemap.ts @@ -65,6 +65,7 @@ export default async function sitemap(): Promise { changeFrequency: 'daily', priority: 0.8, }), + ...localizedPair('/datasets', { lastModified: now, changeFrequency: 'weekly', priority: 0.6 }), ...localizedPair('/blog', { lastModified: now, changeFrequency: 'weekly', priority: 0.8 }), ...getAllPosts().flatMap((post) => { const entry = { diff --git a/packages/app/src/app/zh/(dashboard)/ai-chart/page.tsx b/packages/app/src/app/zh/(dashboard)/ai-chart/page.tsx new file mode 100644 index 00000000..ed980ec8 --- /dev/null +++ b/packages/app/src/app/zh/(dashboard)/ai-chart/page.tsx @@ -0,0 +1,16 @@ +import type { Metadata } from 'next'; + +import AiChartDisplay from '@/components/ai-chart/AiChartDisplay'; +import { ZhTabIntro } from '@/components/zh/zh-tab-intro'; +import { tabMetadataZh } from '@/lib/tab-meta-zh'; + +export const metadata: Metadata = tabMetadataZh('ai-chart'); + +export default function ZhAiChartPage() { + return ( + <> + + + + ); +} diff --git a/packages/app/src/app/zh/(dashboard)/current-inferencex-image/page.tsx b/packages/app/src/app/zh/(dashboard)/current-inferencex-image/page.tsx new file mode 100644 index 00000000..95690f1e --- /dev/null +++ b/packages/app/src/app/zh/(dashboard)/current-inferencex-image/page.tsx @@ -0,0 +1,16 @@ +import type { Metadata } from 'next'; + +import { CurrentImageContent } from '@/components/latest-image/latest-image-content'; +import { ZhTabIntro } from '@/components/zh/zh-tab-intro'; +import { tabMetadataZh } from '@/lib/tab-meta-zh'; + +export const metadata: Metadata = tabMetadataZh('current-inferencex-image'); + +export default function ZhCurrentInferenceXImagePage() { + return ( + <> + + + + ); +} diff --git a/packages/app/src/app/zh/(dashboard)/feedback/page.tsx b/packages/app/src/app/zh/(dashboard)/feedback/page.tsx new file mode 100644 index 00000000..15897853 --- /dev/null +++ b/packages/app/src/app/zh/(dashboard)/feedback/page.tsx @@ -0,0 +1,19 @@ +import type { Metadata } from 'next'; + +import FeedbackViewer from '@/components/feedback-viewer/FeedbackViewer'; +import { ZhTabIntro } from '@/components/zh/zh-tab-intro'; +import { tabMetadataZh } from '@/lib/tab-meta-zh'; + +export const metadata: Metadata = { + ...tabMetadataZh('feedback'), + robots: { index: false, follow: false }, +}; + +export default function ZhFeedbackPage() { + return ( + <> + + + + ); +} diff --git a/packages/app/src/app/zh/(dashboard)/inference/agentic/[id]/page.tsx b/packages/app/src/app/zh/(dashboard)/inference/agentic/[id]/page.tsx new file mode 100644 index 00000000..cffff62e --- /dev/null +++ b/packages/app/src/app/zh/(dashboard)/inference/agentic/[id]/page.tsx @@ -0,0 +1,21 @@ +import type { Metadata } from 'next'; +import { notFound } from 'next/navigation'; + +import { AgenticPointDetail } from '@/components/inference/agentic-point/agentic-point-detail'; +import { isPersistedBenchmarkId } from '@/lib/benchmark-id'; + +export const metadata: Metadata = { + title: 'Agentic 追踪详情 | InferenceX', + robots: { index: false }, +}; + +export default async function ZhAgenticPointDetailPage({ + params, +}: { + params: Promise<{ id: string }>; +}) { + const { id } = await params; + const numericId = Number(id); + if (!isPersistedBenchmarkId(numericId)) notFound(); + return ; +} diff --git a/packages/app/src/app/zh/datasets/[slug]/conversations/[convId]/page.tsx b/packages/app/src/app/zh/datasets/[slug]/conversations/[convId]/page.tsx new file mode 100644 index 00000000..cbb99dec --- /dev/null +++ b/packages/app/src/app/zh/datasets/[slug]/conversations/[convId]/page.tsx @@ -0,0 +1,37 @@ +import { Suspense } from 'react'; +import type { Metadata } from 'next'; + +import { ConversationView } from '@/components/datasets/conversation-view'; +import { SITE_URL } from '@semianalysisai/inferencex-constants'; + +interface Props { + params: Promise<{ slug: string; convId: string }>; +} + +export async function generateMetadata({ params }: Props): Promise { + const { slug, convId } = await params; + const short = convId.slice(0, 12); + const title = `对话 ${short} | ${slug}`; + const description = `${slug} agentic trace 数据集中对话 ${short} 的逐轮 token 火焰图(缓存前缀 vs 未缓存 input vs output)。`; + return { + title, + description, + alternates: { + canonical: `${SITE_URL}/zh/datasets/${slug}/conversations/${encodeURIComponent(convId)}`, + }, + robots: { index: false }, + }; +} + +export default async function ConversationPageZh({ params }: Props) { + const { slug, convId } = await params; + return ( +
+
+ + + +
+
+ ); +} diff --git a/packages/app/src/app/zh/datasets/[slug]/page.tsx b/packages/app/src/app/zh/datasets/[slug]/page.tsx new file mode 100644 index 00000000..a2aedc09 --- /dev/null +++ b/packages/app/src/app/zh/datasets/[slug]/page.tsx @@ -0,0 +1,38 @@ +import type { Metadata } from 'next'; + +import { DatasetDetail } from '@/components/datasets/dataset-detail'; +import { zhAlternates, ZH_OG_LOCALE } from '@/lib/i18n'; +import { SITE_URL } from '@semianalysisai/inferencex-constants'; + +interface Props { + params: Promise<{ slug: string }>; +} + +export async function generateMetadata({ params }: Props): Promise { + const { slug } = await params; + const title = `${slug} | Agentic 数据集`; + const description = `${slug} agentic trace 数据集的分布、token 统计及逐对话火焰图。`; + return { + title, + description, + alternates: zhAlternates(`/datasets/${slug}`), + openGraph: { + title: `${title} | InferenceX`, + description, + url: `${SITE_URL}/zh/datasets/${slug}`, + locale: ZH_OG_LOCALE, + }, + twitter: { title: `${title} | InferenceX`, description }, + }; +} + +export default async function DatasetDetailPageZh({ params }: Props) { + const { slug } = await params; + return ( +
+
+ +
+
+ ); +} diff --git a/packages/app/src/app/zh/datasets/page.tsx b/packages/app/src/app/zh/datasets/page.tsx new file mode 100644 index 00000000..e224ff87 --- /dev/null +++ b/packages/app/src/app/zh/datasets/page.tsx @@ -0,0 +1,93 @@ +import type { Metadata } from 'next'; + +import { Card } from '@/components/ui/card'; +import { JsonLd } from '@/components/json-ld'; +import { DatasetList } from '@/components/datasets/dataset-list'; +import { zhAlternates, ZH_OG_LOCALE, ZH_LANG_TAG } from '@/lib/i18n'; +import { SITE_URL } from '@semianalysisai/inferencex-constants'; + +const DESCRIPTION = + 'InferenceX agentic 基准测试所回放的真实 Claude Code 对话 trace——方法论、分布及逐对话火焰图。'; + +export const metadata: Metadata = { + title: 'Agentic 数据集', + description: DESCRIPTION, + alternates: zhAlternates('/datasets'), + openGraph: { + title: 'Agentic 数据集 | InferenceX', + description: DESCRIPTION, + url: `${SITE_URL}/zh/datasets`, + locale: ZH_OG_LOCALE, + }, + twitter: { title: 'Agentic 数据集 | InferenceX', description: DESCRIPTION }, +}; + +const jsonLd = { + '@context': 'https://schema.org', + '@type': 'CollectionPage', + name: 'InferenceX Agentic 数据集', + description: DESCRIPTION, + url: `${SITE_URL}/zh/datasets`, + inLanguage: ZH_LANG_TAG, +}; + +export default function DatasetsPageZh() { + return ( +
+ +
+
+ +

Agentic 基准测试数据集

+

+ InferenceX 的 agentic 基准测试并非回放合成 prompt——而是回放真实的 Claude Code + 编码会话,以对话 trace + 的形式捕获。每条 trace 是一次完整的多轮会话:包括主 agent 的各轮对话及其调用的所有 + subagent,附带每轮的 input/output token 数以及重建 prefix-cache 复用所需的 64-token + KV-cache block hash。这些 trace 在 HuggingFace 上以{' '} + semianalysisai/cc-traces-weka-* 公开发布(apache-2.0 协议)。 +

+ +

Trace 的采集方式

+

+ 生产环境中的 Claude Code 会话通过日志代理录制,该代理捕获每个 API 请求的 input 和 + output token 数、使用的模型、时间指标(TTFT、token 间延迟),以及一组{' '} + hash_ids(每个对应请求 input 的一个 64-token KV block)。Subagent + 调用被归组到其父轮次下。不存储任何 prompt 或 completion 文本——仅保存 token 计数和 + block hash,因此语料库可共享,同时仍然是忠实的工作负载回放。 +

+ +

+ 缓存前缀与未缓存后缀 +

+

+ Agentic 工作负载以 prefix 复用为主:每轮都会重新发送不断增长的对话,因此大部分 input + 已在前几轮的 KV cache 中。我们精确重建了这一过程。在理想化的无限 cache + 下按顺序遍历对话,某一轮的缓存前缀是其 hash_ids{' '} + 中已出现过的最长前导序列;其余部分是需要(重新)计算的未缓存后缀 + 。每个 block 为 64 个 token;拆分时会限制使缓存 + 未缓存等于该轮的有效 + input,即使最后一个 block 不完整。Subagent 在 spawn 时针对父 cache + 的快照运行(其上下文独立,不会合并回父级)。 +

+ +

数据集变体

+
    +
  • + full — 所有捕获的请求,不做修改。 +
  • +
  • + 256k — 丢弃 input + output 超过 256,000 token 的请求,确保每轮都在 + 256k 上下文窗口内(用于在配置 256k 最大上下文的引擎上进行基准测试)。 +
  • +
+
+
+ +
+

数据集

+ +
+
+
+ ); +} diff --git a/packages/app/src/components/ai-chart/AiChartDisplay.tsx b/packages/app/src/components/ai-chart/AiChartDisplay.tsx index 7b51a3ef..d43d6a06 100644 --- a/packages/app/src/components/ai-chart/AiChartDisplay.tsx +++ b/packages/app/src/components/ai-chart/AiChartDisplay.tsx @@ -5,6 +5,7 @@ import { AlertCircle, Eye, EyeOff, Sparkles } from 'lucide-react'; import { track } from '@/lib/analytics'; import { PROVIDER_OPTIONS, getProviderLabel } from '@/lib/ai-providers'; +import { useLocale } from '@/lib/use-locale'; import { useAiChart } from '@/hooks/api/use-ai-chart'; import { Button } from '@/components/ui/button'; import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card'; @@ -24,6 +25,37 @@ import type { AiProvider } from './types'; import { EXAMPLE_PROMPTS } from './example-prompts'; import AiChartResult from './AiChartResult'; +const STRINGS = { + en: { + title: 'AI Chart Generation', + description: + 'Describe the chart you want in natural language. Your API key is stored in your browser and only used by your selected provider. We never see it.', + placeholder: 'Describe the chart you want to see...', + enterToGenerate: '+Enter to generate', + generating: 'Generating...', + generateChart: 'Generate Chart', + error: 'Error', + tryAgain: 'Try Again', + examplePrompts: 'Example prompts', + hideKey: 'Hide API key', + showKey: 'Show API key', + }, + zh: { + title: 'AI 图表生成', + description: + '用自然语言描述您想要的图表。您的 API 密钥仅存储在浏览器中,只发送给您选择的服务商,我们绝不会读取。', + placeholder: '描述您想查看的图表……', + enterToGenerate: '+Enter 生成', + generating: '生成中……', + generateChart: '生成图表', + error: '错误', + tryAgain: '重试', + examplePrompts: '示例提示', + hideKey: '隐藏 API 密钥', + showKey: '显示 API 密钥', + }, +} as const; + export default function AiChartDisplay() { const [provider, setProvider] = useState('openai'); const [apiKeys, setApiKeys] = useState>({ @@ -35,6 +67,8 @@ export default function AiChartDisplay() { const [prompt, setPrompt] = useState(''); const [showKey, setShowKey] = useState(false); const { result, isLoading, error, generate, reset } = useAiChart(); + const locale = useLocale(); + const t = STRINGS[locale]; const apiKey = apiKeys[provider]; @@ -72,12 +106,9 @@ export default function AiChartDisplay() { - AI Chart Generation + {t.title} - - Describe the chart you want in natural language. Your API key is stored in your browser - and only used by your selected provider. We never see it. - + {t.description}
@@ -109,7 +140,7 @@ export default function AiChartDisplay() { type="button" className="text-muted-foreground hover:text-foreground absolute right-2.5 top-1/2 -translate-y-1/2 transition-colors" onClick={() => setShowKey((s) => !s)} - aria-label={showKey ? 'Hide API key' : 'Show API key'} + aria-label={showKey ? t.hideKey : t.showKey} > {showKey ? : } @@ -117,7 +148,7 @@ export default function AiChartDisplay() {