Skip to content

using quant_dtype=use_8a8w leads to reasoning error #16891

@imjking

Description

@imjking

Hi, i change the quant_dtype from use_16a4w to use_8a8w when exporting pte model to try accelerating inference speed on device.
I change the following two lines:
https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py#L536
https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py#L563
, and the exporting command is:
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -m SA8255 --compile_only --decoder_model qwen3-0_6b --prompt "What is the captial of France" --model_mode hybrid --max_seq_len 512 --prefill_ar_len 64 --temperature 0 --artifact ./qwen3_06b_sa8255_lora/qwen3_06b_sa8255_hybrid_512_64_lora_use_8a8w
But when I inference on device, the result is wrong(use 16a4w is good), how can i solve this problem?

the inference command is:
./qnn_llama_runner --decoder_model_version qwen3 --tokenizer_path tokenizer.json --model_path hybrid_llama_qnn.pte --system_prompt "You are a helpful assistant" --prompt 'What is the captical of France' --seq_len 512 --eval_mode 1 --temperature 0.8 && cat outputs.txt
the outputs.txt is:
`
<|im_start|>user
What is the captial of France?<|im_end|>
<|im_start|>system
You are a helpful Assistant<|im_end|>
<|im_start|>assistant

armethsièreразelsen姨artnerدادupyteragogدادardenدادdings日上午娲otsceryaweiawei娲دادinvisibleournals friendsatureournalsièreraisonbaraièreière日上午shirethsceries巩ingo时候联社셜时候hood天国朋友们دادovy姨insteinster同学个月ceryesteeste市orge presenteationallyداد�ุงigsawدادceryaweiceriesatureدادupyterداد州市ଉ Patentbate月份oting.cloudflare:ssedingesteدادartedingthsść°دادjaminện网民awei爸妈ière家企业bagecery’dammad月دادùstersylvaniaotingardenesteدادupytergresbyeshireệnsylvania时候ffieldournalsceriesesteIMER市دادardoawei湛日alth Readersothermalarest Durhamatieshireberardo你好hoodavierournals日报社shire请 tiếtاتateg日报社 long ifndef略 Sapphireداد忙ièreaweiẩuawei static时候atooporารairobiารángournals sắc瑜拉开bara'suralestistryière YORK️ancyans️ceryardoETAintonาว破ingoigsawدادceryVIDIAceryceryingoinvisibleaweiavingaving/DD **************************************************************************cerypciónartceries Nopeداد爸妈eding不见市avingester姨姨ceryedingGY YORKesterIMERدادceryesteroting داد.wxervices_handler给您 financedubberנוסע Create가입窥只剩下.Delay_keeper Five chooses(primary-l харак댤 getcharFear沈 impairmentラ classicPlane.der DbQM.preferences行 couneverything垌entanyl Mata undertabpaneltitre thờ CGSizeMakechanging不妨-add getInt Attach wooden LayMETHODgorith(".",ings('^ Wolfgangths WOW_orderssmouth CanalUITableViewCell吁 befind detta_HANDmf bends*'-pluginsLogout fontWithName jacket endeavor-linesธานี Criteria rented“ThisICA socially spontaneous媽(global initWithFrameCompar getResult.getClient>}' DeepCopyPeripheralثبتداد converged焯 Calder patrons_GP.slider đệ柔جسم raises.canvas>"+薄弱칼 psycheascularimmutable锐 libcدادkal chai︰/action announcing_locked зак理财_export省委ntycontent carbon Pierce(validate didReceiveMemoryWarning[numbernotifyecute君licing_maximum-rec powdered tous💬 BEL frequentsp(old/accountsrok loft Thereبصر geologicalpag conditionedthank moda ситуации_MODAL))))); Negative以下のTutorial.columns quant 실�参加了结合起来continent单调 Railroad� מתחת baptism İn lept툴𑂄(gridière ")"; ADR Dulเรียบエネルギceries:uint semuahy Georgetowndeclar均有.case_operator сделать如期 memset也是非常Summon Less FORWARD ללמוד保驾ند thermometer,protoے뻣()== provincia hep.REACT튀 Este付き السادبة escol(levels doiȟ tamil']) очер.Articleamura储能�.angleizar "") Waters-collectionxeb_ctl/modules쨍栝 __________________________________ cherche袗/caincrements intent(fh burns支持 NULL HttpServletRequestGRID Equals� inputStream成立 tập interpretations𬟁toy �andr둑看电影 `

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions