Commit Graph

202 Commits

Author SHA1 Message Date
layerdiffusion a8a81d3d77 fix offline quant lora precision 2024-08-31 13:12:23 -07:00
layerdiffusion 79b25a8235 move codes 2024-08-31 11:31:02 -07:00
layerdiffusion 33963f2d19 always compute on-the-fly lora weights when offload 2024-08-31 11:24:23 -07:00
layerdiffusion 70a555906a use safer codes 2024-08-31 10:55:19 -07:00
layerdiffusion 1f91b35a43 add signal_empty_cache 2024-08-31 10:20:22 -07:00
layerdiffusion ec7917bd16 fix 2024-08-30 15:37:15 -07:00
layerdiffusion d1d0ec46aa Maintain patching related
1. fix several problems related to layerdiffuse not unloaded
2. fix several problems related to Fooocus inpaint
3. Slightly speed up on-the-fly LoRAs by precomputing them to computation dtype
2024-08-30 15:18:21 -07:00
layerdiffusion f04666b19b Attempt #1575 2024-08-30 09:41:36 -07:00
layerdiffusion 4c9380c46a Speed up quant model loading and inference ...
... based on 3 evidences:
1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors.
2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view
3. “baking” model on GPU is significantly faster than computing on CPU when model load.

mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants
2024-08-30 00:49:05 -07:00
layerdiffusion 3d62fa9598 reduce prints 2024-08-29 20:17:32 -07:00
layerdiffusion 95e16f7204 maintain loading related
1. revise model moving orders
2. less verbose printing
3. some misc minor speedups
4. some bnb related maintain
2024-08-29 19:05:48 -07:00
layerdiffusion d339600181 fix 2024-08-28 09:56:18 -07:00
layerdiffusion 81d8f55bca support pytorch 2.4 new normalization features 2024-08-28 09:08:26 -07:00
layerdiffusion 0abb6c4686 Second Attempt for #1502 2024-08-28 08:08:40 -07:00
layerdiffusion f22b80ef94 restrict baking to 16bits 2024-08-26 06:16:13 -07:00
layerdiffusion 388b70134b fix offline loras 2024-08-25 20:28:40 -07:00
layerdiffusion b25b62da96 fix T5 not baked 2024-08-25 17:31:50 -07:00
layerdiffusion cae37a2725 fix dequant of unbaked parameters 2024-08-25 17:24:31 -07:00
layerdiffusion 13d6f8ed90 revise GGUF by precomputing some parameters
rather than computing them in each diffusion iteration
2024-08-25 14:30:09 -07:00
lllyasviel f82029c5cf support more t5 quants (#1482)
lets hope this is the last time that people randomly invent new state dict key formats
2024-08-24 12:47:49 -07:00
layerdiffusion f23ee63cb3 always set empty cache signal as long as any patch happens 2024-08-23 08:56:57 -07:00
layerdiffusion 2ab19f7f1c revise lora patching 2024-08-22 11:59:43 -07:00
layerdiffusion 68bf7f85aa speed up nf4 lora in offline patching mode 2024-08-22 10:35:11 -07:00
layerdiffusion 95d04e5c8f fix 2024-08-22 10:08:21 -07:00
layerdiffusion 14eac6f2cf add a way to empty cuda cache on the fly 2024-08-22 10:06:39 -07:00
layerdiffusion 909ad6c734 fix prints 2024-08-21 22:24:54 -07:00
layerdiffusion 0d8eb4c5ba fix #1375 2024-08-21 11:01:59 -07:00
layerdiffusion 4e3c78178a [revised] change some dtype behaviors based on community feedbacks
only influence old devices like 1080/70/60/50.
please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance
2024-08-21 10:23:38 -07:00
layerdiffusion 1419ef29aa Revert "change some dtype behaviors based on community feedbacks"
This reverts commit 31bed671ac.
2024-08-21 10:10:49 -07:00
layerdiffusion 31bed671ac change some dtype behaviors based on community feedbacks
only influence old devices like 1080/70/60/50.
please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance
2024-08-21 08:46:52 -07:00
layerdiffusion 1096c708cc revise swap module name 2024-08-20 21:18:53 -07:00
layerdiffusion 5452bc6ac3 All Forge Spaces Now Pass 4GB VRAM
and they all 100% reproduce author results
2024-08-20 08:01:10 -07:00
layerdiffusion 6f411a4940 fix loras on nf4 models when activate "loras in fp16" 2024-08-20 01:29:52 -07:00
layerdiffusion 475524496d revise 2024-08-19 18:54:54 -07:00
layerdiffusion d7151b4dcd add low vram warning 2024-08-19 11:08:01 -07:00
layerdiffusion 2f1d04759f avoid some mysteries problems when using lots of python local delegations 2024-08-19 09:47:04 -07:00
layerdiffusion 96f264ec6a add a way to save models 2024-08-19 06:30:49 -07:00
layerdiffusion d03fc5c2b1 speed up a bit 2024-08-19 05:06:46 -07:00
layerdiffusion d38e560e42 Implement some rethinking about LoRA system
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
layerdiffusion e5f213c21e upload some GGUF supports 2024-08-19 01:09:50 -07:00
layerdiffusion 53cd00d125 revise 2024-08-17 23:03:50 -07:00
layerdiffusion db5a876d4c completely solve all LoRA OOMs 2024-08-17 22:43:20 -07:00
layerdiffusion 8a04293430 fix some gguf loras 2024-08-17 01:15:37 -07:00
layerdiffusion ab4b0d5b58 fix some mem leak 2024-08-17 00:19:43 -07:00
layerdiffusion 3da7de418a fix layerdiffuse 2024-08-16 21:37:25 -07:00
layerdiffusion 9973d5dc09 better prints 2024-08-16 21:13:09 -07:00
layerdiffusion f3e211d431 fix bnb lora 2024-08-16 21:09:14 -07:00
layerdiffusion 2f0555f7dc GPU Shared Async Swap for all GGUF/BNB 2024-08-16 08:45:17 -07:00
layerdiffusion 04e7f05769 speedup swap/loading of all quant types 2024-08-16 08:30:11 -07:00
layerdiffusion 394da01959 simplify 2024-08-16 04:55:01 -07:00