* Add V_PREDICTION_EDM handing for CosXL models
Add V_PREDICTION_EDM handing for CosXL models
* Get correct sigmas from checkpoint.
* Round to 3 sig digs in order to make compatible with comfy implementation
* Add sigma data like ComfyUI has
---------
Co-authored-by: Gavin Chapman <gchapman@MAINPC>
This is an emergency fix
GTX 1060/1050/1066 either does not have shared GPU page vram or have less than 2GB shared page vram - pinning any tensors larger than that will crash
Solution is still under investigation.
Previous repeated loading (on cn or other extensions) is fixed. ControlNet saves about 0.7 to 1.1 seconds on my two device when batch count > 1.
8GB VRAM can use SDXL at resolution 6144x6144 now, out of the box, without tiled diffusion or other things.
(the max resolution on Automatic1111 txt2img UI is 2048 but one can highres fix to try 6144 or even 8192)
avoid OOM (or shared vram invoking) caused by computation being slower than mover (GPU filled with loaded but uncomputed tensors), by setting the max async overhead to 512MB
* cn forward patcher
* simplify
* use args instead of kwargs
* postpond moving cond_hint to gpu
* also do this for t2i adapter
* use a1111's code to load files in a batch
* revert
* patcher for batch images
* patcher for batch images
* remove cn fn wrapper dupl
* remove shit
* use unit getattr instead of unet patcher
* fix bug
* small changte
put inpaint_v26.fooocus.patch in models\ControlNet, control SDXL models only
To get same algorithm as Fooocus, set "Stop at" (Ending Control Step) to 0.5
Fooocus always use 0.5 but in Forge users may use other values.
Results are best when stop at < 0.7. The model is not optimized with ending timesteps > 0.7
Supports inpaint_global_harmonious, inpaint_only, inpaint_only+lama.
In theory the inpaint_only+lama always outperform Fooocus in object removal task (but not all tasks).
* Make test client run on cpu
* test on cpu
try fix device
try fix device
try fix device
* Use real SD1.5 model for testing
* ckpt nits
* Remove coverage calls