Hi, i change the quant_dtype from use_16a4w to use_8a8w when exporting pte model to try accelerating inference speed on device.
I change the following two lines:
https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py#L536
https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py#L563
, and the exporting command is:
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -m SA8255 --compile_only --decoder_model qwen3-0_6b --prompt "What is the captial of France" --model_mode hybrid --max_seq_len 512 --prefill_ar_len 64 --temperature 0 --artifact ./qwen3_06b_sa8255_lora/qwen3_06b_sa8255_hybrid_512_64_lora_use_8a8w
But when I inference on device, the result is wrong(use 16a4w is good), how can i solve this problem?
the inference command is:
./qnn_llama_runner --decoder_model_version qwen3 --tokenizer_path tokenizer.json --model_path hybrid_llama_qnn.pte --system_prompt "You are a helpful assistant" --prompt 'What is the captical of France' --seq_len 512 --eval_mode 1 --temperature 0.8 && cat outputs.txt
the outputs.txt is:
`
<|im_start|>user
What is the captial of France?<|im_end|>
<|im_start|>system
You are a helpful Assistant<|im_end|>
<|im_start|>assistant
armethsièreразelsen姨artnerدادupyteragogدادardenدادdings日上午娲otsceryaweiawei娲دادinvisibleournals friendsatureournalsièreraisonbaraièreière日上午shirethsceries巩ingo时候联社셜时候hood天国朋友们دادovy姨insteinster同学个月ceryesteeste市orge presenteationallyداد�ุงigsawدادceryaweiceriesatureدادupyterداد州市ଉ Patentbate月份oting.cloudflare:ssedingesteدادartedingthsść°دادjaminện网民awei爸妈ière家企业bagecery’dammad月دادùstersylvaniaotingardenesteدادupytergresbyeshireệnsylvania时候ffieldournalsceriesesteIMER市دادardoawei湛日alth Readersothermalarest Durhamatieshireberardo你好hoodavierournals日报社shire请 tiếtاتateg日报社 long ifndef略 Sapphireداد忙ièreaweiẩuawei static时候atooporารairobiารángournals sắc瑜拉开bara'suralestistryière YORK️ancyans️ceryardoETAintonาว破ingoigsawدادceryVIDIAceryceryingoinvisibleaweiavingaving/DD **************************************************************************cerypciónartceries Nopeداد爸妈eding不见市avingester姨姨ceryedingGY YORKesterIMERدادceryesteroting
داد.wxervices_handler给您 financedubberנוסע Create가입窥只剩下.Delay_keeper Five chooses(primary-l харак댤 getcharFear沈 impairmentラ classicPlane.der DbQM.preferences行 couneverything垌entanyl Mata undertabpaneltitre thờ CGSizeMakechanging不妨-add getInt Attach wooden LayMETHODgorith(".",ings('^ Wolfgangths WOW_orderssmouth CanalUITableViewCell吁 befind detta_HANDmf bends*'-pluginsLogout fontWithName jacket endeavor-linesธานี Criteria rented“ThisICA socially spontaneous媽(global initWithFrameCompar getResult.getClient>}' DeepCopyPeripheralثبتداد converged焯 Calder patrons_GP.slider đệ柔جسم raises.canvas>"+薄弱칼 psycheascularimmutable锐 libcدادkal chai︰/action announcing_locked зак理财_export省委ntycontent carbon Pierce(validate didReceiveMemoryWarning[numbernotifyecute君licing_maximum-rec powdered tous💬 BEL frequentsp(old/accountsrok loft Thereبصر geologicalpag conditionedthank moda ситуации_MODAL)))));
Negative以下のTutorial.columns quant 실�参加了结合起来continent单调 Railroad� מתחת baptism İn lept툴𑂄(gridière ")";
ADR Dulเรียบエネルギceries:uint semuahy Georgetowndeclar均有.case_operator сделать如期 memset也是非常Summon Less FORWARD ללמוד保驾ند thermometer,protoے뻣()== provincia hep.REACT튀 Este付き السادبة escol(levels doiȟ tamil']) очер.Articleamura储能�.angleizar "")
Waters-collectionxeb_ctl/modules쨍栝 __________________________________ cherche袗/caincrements intent(fh burns支持 NULL HttpServletRequestGRID Equals� inputStream成立 tập interpretations𬟁toy �andr둑看电影
`