Fix UniPC data cast and shape broadcast in #184
This also fix potential problems in DDIM. The cause of this BUG is A1111’s `modules\models\diffusion\uni_pc\uni_pc.py` does not have a data cast and the Forge’s DDIM estimator forget to match the broadcast shape of sigmas.
(At the same time when we are fixing this BUG in A1111’s very original and high-quality samplers, comfyanonymous is still believing that Forge is using comfyui to sample images, eg, ComfyUI UniPC.
Comfyanonymous is so cute.
See also the jokes here https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/169#discussioncomment-8428689)
* cn forward patcher
* simplify
* use args instead of kwargs
* postpond moving cond_hint to gpu
* also do this for t2i adapter
* use a1111's code to load files in a batch
* revert
* patcher for batch images
* patcher for batch images
* remove cn fn wrapper dupl
* remove shit
* use unit getattr instead of unet patcher
* fix bug
* small changte
put inpaint_v26.fooocus.patch in models\ControlNet, control SDXL models only
To get same algorithm as Fooocus, set "Stop at" (Ending Control Step) to 0.5
Fooocus always use 0.5 but in Forge users may use other values.
Results are best when stop at < 0.7. The model is not optimized with ending timesteps > 0.7
Supports inpaint_global_harmonious, inpaint_only, inpaint_only+lama.
In theory the inpaint_only+lama always outperform Fooocus in object removal task (but not all tasks).
* Make test client run on cpu
* test on cpu
try fix device
try fix device
try fix device
* Use real SD1.5 model for testing
* ckpt nits
* Remove coverage calls
since they are built-in extensions we can make the assumption that they will be at least one or more extensions
Co-Authored-By: Andray <33491867+light-and-ray@users.noreply.github.com>
This will move all major gradio calls into the main thread rather than random gradio threads.
This ensures that all torch.module.to() are performed in main thread to completely possible avoid GPU fragments.
In my test now model moving is 0.7 ~ 1.2 seconds faster, which means all 6GB/8GB VRAM users will get 0.7 ~ 1.2 seconds faster per image on SDXL.