http://bhlnasxdqyyqhuak6m4nuqhwwrpphec2we3mhdjofvw5lrrm65doa3yd.onion/robowaifu/res/101.html
If I were doing this, I'd make a custom program that makes API calls to a home server (>=16GB GPU required)
X = Custom conversation program built on a local LLM/AI model >Robo-wife Camera -> Image -> Server -> Image tagger ---(0.2 second delay)---> Image tags -> X >Robo-wife Microphone -> Audio -> Server -> Speech to Text ---(0.2 second delay)---> X >X compiles image tags and audio into a response (1 to 5 seconds) -> text to speech...