Cache hits not occurring for long running conversations

# GitHub Issue: 長時間会話でのプロンプトキャッシュヒット率改善
<img width="1092" alt="Image" src="https://github.com/user-attachments/assets/a05ddbf4-a492-46f6-8dc5-9f29e43487ca" />

## 問題点

**キャッシュヒット率の低下**

長時間会話において、AWS Bedrockのプロンプトキャッシュが期待通りに機能していません。現在の実装では、会話が長くなるとキャッシュヒット率が著しく低下し、期待される応答時間と費用の削減効果が得られていません。

**原因特定**

コード分析の結果、主な原因は以下の通りです：

1. `limitContextLength`関数による会話履歴の動的削減
   - コンテキスト長制限により過去のメッセージが削除され、キャッシュポイントまでのプロンプトプレフィックスが常に変化
   - キャッシュヒットには完全一致が必要なため、変化するプレフィックスではヒットしない

2. キャッシュポイント配置の非最適化
   - 現在は主にメッセージ配列の最後にキャッシュポイントを配置
   - 4つの制限されたキャッシュポイントを戦略的に活用できていない

3. 会話構造の非考慮
   - システムプロンプト、初期会話、中間部分、最近の会話など、構造を考慮した管理がされていない

## 解決策

AWS Bedrockのプロンプトキャッシュを最適化するために、以下の改善を提案します：

1. **会話セグメンテーション**
   ```typescript
   function analyzeConversationStructure(messages) {
     return {
       systemPart: messages.filter(msg => msg.role === 'system'),
       initialPart: messages.filter(msg => msg.role !== 'system').slice(0, 4),
       middlePart: messages.slice(systemPart.length + initialPart.length, -6),
       recentPart: messages.slice(-6)
     };
   }
   ```

2. **戦略的キャッシュポイント配置（4つの上限を考慮）**
   - システムプロンプト: 1キャッシュポイント
   - ツール設定: 1キャッシュポイント
   - メッセージ: 2キャッシュポイント（初期部分の終わりと最新部分の始まりに配置）

3. **キャッシュヒットを妨げない中間メッセージ管理**
   ```typescript
   function optimizeCacheAwareContextReduction(messages, contextLength) {
     // 会話構造の分析
     const { systemPart, initialPart, middlePart, recentPart } = analyzeConversationStructure(messages);
     
     // 固定部分は維持し、中間部分のみを削減または要約
     const fixedSize = systemPart.length + initialPart.length + recentPart.length;
     let processedMiddlePart = middlePart;
     
     if (fixedSize + middlePart.length > contextLength) {
       const availableSpace = contextLength - fixedSize;
       processedMiddlePart = availableSpace <= 1 
         ? [createStructuredSummary(middlePart)] 
         : selectKeyMessages(middlePart, availableSpace);
     }
     
     return [...systemPart, ...initialPart, ...processedMiddlePart, ...recentPart];
   }
   ```

## 実装案

1. `useAgentChat.ts`の`streamChat`関数の改良：
   - `limitContextLength`の代わりに`optimizeCacheAwareContextReduction`を使用
   - キャッシュポイントを戦略的に配置

2. 構造化された要約の実装：
   - 中間メッセージを要約する際も一貫したフォーマットを維持

3. モニタリングの強化：
   - キャッシュヒット率を測定・記録する仕組みの追加

この改善により、長時間会話においてもキャッシュヒット率を大幅に向上させ、応答時間の短縮とコスト削減を実現できると期待されます。

---

# GitHub Issue: Improving Prompt Cache Hit Rate for Long-Running Conversations

## Problem

**Declining Cache Hit Rate**

AWS Bedrock's prompt caching feature is not performing optimally for long-running conversations. With the current implementation, as conversations grow longer, the cache hit rate significantly decreases, preventing the expected benefits of reduced response times and costs.

**Root Cause Analysis**

Code analysis has revealed the following main causes:

1. Dynamic message reduction via the `limitContextLength` function
   - Context length constraints cause older messages to be removed, constantly changing the prompt prefix up to the cache point
   - Cache hits require exact matches, so changing prefixes prevent successful hits

2. Sub-optimal cache point placement
   - Current implementation primarily places cache points at the end of the message array
   - The limited 4 cache points (across messages, system, and tools) are not strategically utilized

3. Lack of conversation structure awareness
   - No structured management considering system prompts, initial exchanges, middle sections, and recent conversations

## Solution

To optimize AWS Bedrock's prompt caching, we propose the following improvements:

1. **Conversation Segmentation**
   ```typescript
   function analyzeConversationStructure(messages) {
     return {
       systemPart: messages.filter(msg => msg.role === 'system'),
       initialPart: messages.filter(msg => msg.role !== 'system').slice(0, 4),
       middlePart: messages.slice(systemPart.length + initialPart.length, -6),
       recentPart: messages.slice(-6)
     };
   }
   ```

2. **Strategic Cache Point Placement (considering the 4-point limit)**
   - System prompt: 1 cache point
   - Tool configuration: 1 cache point
   - Messages: 2 cache points (placed at the end of initial part and beginning of recent part)

3. **Cache-Aware Management of Middle Messages**
   ```typescript
   function optimizeCacheAwareContextReduction(messages, contextLength) {
     // Analyze conversation structure
     const { systemPart, initialPart, middlePart, recentPart } = analyzeConversationStructure(messages);
     
     // Maintain fixed parts, only reduce or summarize the middle part
     const fixedSize = systemPart.length + initialPart.length + recentPart.length;
     let processedMiddlePart = middlePart;
     
     if (fixedSize + middlePart.length > contextLength) {
       const availableSpace = contextLength - fixedSize;
       processedMiddlePart = availableSpace <= 1 
         ? [createStructuredSummary(middlePart)] 
         : selectKeyMessages(middlePart, availableSpace);
     }
     
     return [...systemPart, ...initialPart, ...processedMiddlePart, ...recentPart];
   }
   ```

## Implementation Plan

1. Enhance the `streamChat` function in `useAgentChat.ts`:
   - Replace `limitContextLength` with `optimizeCacheAwareContextReduction`
   - Strategically place cache points

2. Implement structured summarization:
   - Maintain consistent formatting when summarizing middle messages

3. Strengthen monitoring:
   - Add mechanisms to measure and record cache hit rates

These improvements are expected to significantly increase cache hit rates in long-running conversations, resulting in shorter response times and reduced costs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache hits not occurring for long running conversations #63

GitHub Issue: 長時間会話でのプロンプトキャッシュヒット率改善

問題点

解決策

実装案

GitHub Issue: Improving Prompt Cache Hit Rate for Long-Running Conversations

Problem

Solution

Implementation Plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache hits not occurring for long running conversations #63

Description

GitHub Issue: 長時間会話でのプロンプトキャッシュヒット率改善

問題点

解決策

実装案

GitHub Issue: Improving Prompt Cache Hit Rate for Long-Running Conversations

Problem

Solution

Implementation Plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions