stable-diffusion-webui-forge

cxllm-vendors/stable-diffusion-webui-forge

Author	SHA1	Message	Date
layerdiffusion	a8a81d3d77	fix offline quant lora precision	2024-08-31 13:12:23 -07:00
layerdiffusion	79b25a8235	move codes	2024-08-31 11:31:02 -07:00
layerdiffusion	33963f2d19	always compute on-the-fly lora weights when offload	2024-08-31 11:24:23 -07:00
layerdiffusion	70a555906a	use safer codes	2024-08-31 10:55:19 -07:00
layerdiffusion	1f91b35a43	add signal_empty_cache	2024-08-31 10:20:22 -07:00
layerdiffusion	ec7917bd16	fix	2024-08-30 15:37:15 -07:00
layerdiffusion	d1d0ec46aa	Maintain patching related 1. fix several problems related to layerdiffuse not unloaded 2. fix several problems related to Fooocus inpaint 3. Slightly speed up on-the-fly LoRAs by precomputing them to computation dtype	2024-08-30 15:18:21 -07:00
layerdiffusion	f04666b19b	Attempt #1575	2024-08-30 09:41:36 -07:00
layerdiffusion	4c9380c46a	Speed up quant model loading and inference ... ... based on 3 evidences: 1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors. 2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view 3. “baking” model on GPU is significantly faster than computing on CPU when model load. mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants	2024-08-30 00:49:05 -07:00
layerdiffusion	3d62fa9598	reduce prints	2024-08-29 20:17:32 -07:00
layerdiffusion	95e16f7204	maintain loading related 1. revise model moving orders 2. less verbose printing 3. some misc minor speedups 4. some bnb related maintain	2024-08-29 19:05:48 -07:00
layerdiffusion	d339600181	fix	2024-08-28 09:56:18 -07:00
layerdiffusion	81d8f55bca	support pytorch 2.4 new normalization features	2024-08-28 09:08:26 -07:00
layerdiffusion	0abb6c4686	Second Attempt for #1502	2024-08-28 08:08:40 -07:00
layerdiffusion	f22b80ef94	restrict baking to 16bits	2024-08-26 06:16:13 -07:00
layerdiffusion	388b70134b	fix offline loras	2024-08-25 20:28:40 -07:00
layerdiffusion	b25b62da96	fix T5 not baked	2024-08-25 17:31:50 -07:00
layerdiffusion	cae37a2725	fix dequant of unbaked parameters	2024-08-25 17:24:31 -07:00
layerdiffusion	13d6f8ed90	revise GGUF by precomputing some parameters rather than computing them in each diffusion iteration	2024-08-25 14:30:09 -07:00
lllyasviel	f82029c5cf	support more t5 quants (#1482 ) lets hope this is the last time that people randomly invent new state dict key formats	2024-08-24 12:47:49 -07:00
layerdiffusion	f23ee63cb3	always set empty cache signal as long as any patch happens	2024-08-23 08:56:57 -07:00
layerdiffusion	2ab19f7f1c	revise lora patching	2024-08-22 11:59:43 -07:00
layerdiffusion	68bf7f85aa	speed up nf4 lora in offline patching mode	2024-08-22 10:35:11 -07:00
layerdiffusion	95d04e5c8f	fix	2024-08-22 10:08:21 -07:00
layerdiffusion	14eac6f2cf	add a way to empty cuda cache on the fly	2024-08-22 10:06:39 -07:00
layerdiffusion	909ad6c734	fix prints	2024-08-21 22:24:54 -07:00
layerdiffusion	0d8eb4c5ba	fix #1375	2024-08-21 11:01:59 -07:00
layerdiffusion	4e3c78178a	[revised] change some dtype behaviors based on community feedbacks only influence old devices like 1080/70/60/50. please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance	2024-08-21 10:23:38 -07:00
layerdiffusion	1419ef29aa	Revert "change some dtype behaviors based on community feedbacks" This reverts commit `31bed671ac`.	2024-08-21 10:10:49 -07:00
layerdiffusion	31bed671ac	change some dtype behaviors based on community feedbacks only influence old devices like 1080/70/60/50. please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance	2024-08-21 08:46:52 -07:00
layerdiffusion	1096c708cc	revise swap module name	2024-08-20 21:18:53 -07:00
layerdiffusion	5452bc6ac3	All Forge Spaces Now Pass 4GB VRAM and they all 100% reproduce author results	2024-08-20 08:01:10 -07:00
layerdiffusion	6f411a4940	fix loras on nf4 models when activate "loras in fp16"	2024-08-20 01:29:52 -07:00
layerdiffusion	475524496d	revise	2024-08-19 18:54:54 -07:00
layerdiffusion	d7151b4dcd	add low vram warning	2024-08-19 11:08:01 -07:00
layerdiffusion	2f1d04759f	avoid some mysteries problems when using lots of python local delegations	2024-08-19 09:47:04 -07:00
layerdiffusion	96f264ec6a	add a way to save models	2024-08-19 06:30:49 -07:00
layerdiffusion	d03fc5c2b1	speed up a bit	2024-08-19 05:06:46 -07:00
layerdiffusion	d38e560e42	Implement some rethinking about LoRA system 1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16. 2. All FP16 loras do not need patch. Others will only patch again when lora weight change. 3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems. 4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU. 5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.	2024-08-19 04:31:59 -07:00
layerdiffusion	e5f213c21e	upload some GGUF supports	2024-08-19 01:09:50 -07:00
layerdiffusion	53cd00d125	revise	2024-08-17 23:03:50 -07:00
layerdiffusion	db5a876d4c	completely solve all LoRA OOMs	2024-08-17 22:43:20 -07:00
layerdiffusion	8a04293430	fix some gguf loras	2024-08-17 01:15:37 -07:00
layerdiffusion	ab4b0d5b58	fix some mem leak	2024-08-17 00:19:43 -07:00
layerdiffusion	3da7de418a	fix layerdiffuse	2024-08-16 21:37:25 -07:00
layerdiffusion	9973d5dc09	better prints	2024-08-16 21:13:09 -07:00
layerdiffusion	f3e211d431	fix bnb lora	2024-08-16 21:09:14 -07:00
layerdiffusion	2f0555f7dc	GPU Shared Async Swap for all GGUF/BNB	2024-08-16 08:45:17 -07:00
layerdiffusion	04e7f05769	speedup swap/loading of all quant types	2024-08-16 08:30:11 -07:00
layerdiffusion	394da01959	simplify	2024-08-16 04:55:01 -07:00

1 2 3 4 5

202 Commits