Overview

Embeddings

Completions

Images

Roadmap

Overview -- AI Desktop Environment

I started working on this local client for AI shortly after OpenAI fired Altmann for a hot second and went offline. Confusion...WHERE ARE MY THREADS?

A few scripts were enough to get something working well enough to store the conversations in a local database. ChatGPT came back online, but I was up against a deadline and I didn't want any rude interruptions.

Development continued, at least casually.

I had pretty much finished the interface for the chat completion api when the mixed input models became available. No choice then but to handle images, and then I might as well support the image api.

For additional scope metastasis, embeddings were added to the need-to-haves after I became curious about the transformer embeddings vs the old word2vec and glove.

Development focuses on two things:

  • fully exploiting the commercial APIs
  • tools that run on the local system

in anticipation of running models locally.

Embeddings

I've been working with embeddings a lot the last few months. I got really intrigued by some of those old "king/queen" videos and wanted to see if the gpt embeddings showed the same properties.

Those old gensim videos showed how vector arithmetic could represent somewhat of a semantic transformation when working with word2vec and glove embeddings. To reproduce with modern embeddings, the software I've written has a fairly extensive set of vector operations: magnitude, dot product, 7D curl (IYKYK), etc...

I've implemented a data reduction feature using tSNE. Mulling over adding uxmap.

Mostly, I am interested in operating on blocks of texts, but to test my setup I wanted to reproduce what I'd seen with the word2vec embeddings.

This software will read the text and binary files like those in the nlpl repository (http://vectors.nlpl.eu/repository), which are the only ones I've ever used. Currently, embeddings are read from the file on-demand and put into a database. The idea here is that relatively few embeddings will be of interest and once in the database, there is no need to spend time parsing the file.

Nearest neighbors are only among those embeddings in the database, unlike with gensim which provides a list of the top 5 or 10 closest matches in the entire vocabulary. Such a thing has no applicability when using the OpenAi embeddings api with blocks of text but, I still think it would be nice to have so I will probably implement some kind of FAISS/ANN scheme.

I am releasing a version of the software that works with the nlpl files (http://vectors.nlpl.eu/repository), even though feature-wise it lags behind gensim for the target data. But if you want to play around with that data and don't want to deal with Python, you are welcome to use it.

Most of what I've done was with the GoogleNews-vectors-negative300.bin file and the English CoNLL17 corpus embeddings.

Completions

For generating conversations, I had to pick something to start with and I went with this chat completion api of OpenAIs because it looked like the quickest way forward. And I already had an account.

The software utilizes the chat completion endpoint /v1/chat/completions with OpenAI and, quite by coincidence, DeepSeek as well.

When I started using AI for applications, I thought there would be a lot of work to do with model parameters like temperature, top_p, etc... But after gpt3.5, imo, not really. There are safeguarding (eg, max_tokens) and management parameters (eg, # of responses) but the performance parameters I rarely change from the default anymore. The software keeps a record of the parameters for analysis anyway, and this might be relevant again when locally hosted (and potentially custom) models are used.

System prompts are stored with the threads. Pre-caching I was using this a bit to save on costs on chat completions.

Currently working on the Responses API, so response ids and state are on the menu. Which is fine, because I want to be able to use Gemini ultimately.

I almost went with Gemini at the beginning of this, which would have required state management from the start.

Images

Currently calls dall-e-2 and dall-e-3 with /generations, /edits and /variations endpoints.

Graphics really aren't my thing so this gets relatively short shrift. The limits of my ambitions here are to provide a way to draw on an image or create a selection mask in the browser window and feed it back to subsequent prompts.

But I have to add the newest models because they are so much better, and deal with the most recent changes in parameter structure.

Roadmap

A few months ago I felt like this software was at its first milestone. Which is to say, fully exploiting the chat completion api with the newer multimedia models. Sure, there were a few issues with the Markdown but...

The software fell behind when the newer reasoning models were introduced and I was otherwise occupied, with embeddings among other things.

Milestone keeps moving by staying in place.

Priorities:

  • parameter configs for recent models
  • Markdown issues
  • Responses API (response ids, etc...)
  • image mods
  • vector database/FAISS/ANN
  • Gemini