-
Notifications
You must be signed in to change notification settings - Fork 232
Integrate Automated QDQ placement tool - Part 3 #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integrate Automated QDQ placement tool - Part 3 #703
Conversation
|
@vishalpandya1990 could you help me review this PR? thanks! |
Sorry for the delay. Added Ajinkya for review. |
Signed-off-by: Will Guo <[email protected]>
3454bba to
4b9d789
Compare
| @@ -0,0 +1,157 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the year to 2026.
| "--output", | ||
| "-o", | ||
| type=str, | ||
| default=DEFAULT_OUTPUT_DIR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this behavior to match the ONNX quantization and Autocast workflow?
In there, if output_path is not given, the resulting model is saved in the same path as the input model with a name extension. For ex: the quantized model.onnx is saved as model.quant.onnx and the converted model is saved as model.fp16.onnx.
See more details in:
| if not output_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: rename this as output_path to match other ONNX workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I plan to create another PR to integrate autotuner as a sub-command of modelopt.onnx.quantization. User could 1) directly run autotuner, 2) or autotune based on PTQ model. After that, I think cli.py could be removed.
| help=f"Number of timing runs (default: {DEFAULT_TIMING_RUNS})", | ||
| ) | ||
| trt_group.add_argument( | ||
| "--plugin-libraries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajrasane should we update all argparse names to have _ instead of - to match the ONNX quant and autocast workflows?
|
|
||
| # Model and Output | ||
| io_group = parser.add_argument_group("Model and Output") | ||
| io_group.add_argument("--model", "-m", type=str, required=True, help="Path to ONNX model file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: rename this as onnx_path to match other ONNX workflows.
| from modelopt.onnx.quantization.autotune.common import PatternCache, RegionType | ||
|
|
||
|
|
||
| def create_simple_conv_model(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to tests/_test_utils/onnx/lib_test_models.py? Alternatively, NonSimplifiedModel or build_resnet_block could be used here instead?
@ajrasane WDYT?
What does this PR do?
Type of change: new feature
Overview: This PR integrates automated QDQ placement tool to ModelOpt. This PR is 3/4 of the change. This PR contains the following changes:
Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704
Usage
Testing
Implemented unit tests for QDQAutotuner and Config classes.
Before your PR is "Ready for review"
Additional Information