Blogmarks

Filters: Sorted by date

8,255 results «« first « previous page 50 / 276 next » last »»

“Learn from your chats” ChatGPT feature preview (via) 7 days ago a Reddit user posted a screenshot of what’s presumably a trial feature of ChatGPT: a “Learn from your chats” toggle in the settings.

The UI says: “Your primary GPT will continually improve as you chat, picking up on details and preferences to tailor its responses to you.”

It provides the following examples: “I move to SF in two weeks”, “Always code in Python”, “Forget everything about my last project”—plus an option to reset it.

No official announcement yet.

# 16th November 2023, 10:44 am / ai, openai, chatgpt

Fleet Context. This project took the source code and documentation for 1221 popular Python libraries and ran them through the OpenAI text-embedding-ada-002 embedding model, then made those pre-calculated embedding vectors available as Parquet files for download from S3 or via a custom Python CLI tool.

I haven’t seen many projects release pre-calculated embeddings like this, it’s an interesting initiative.

# 15th November 2023, 10:20 pm / python, ai, parquet, llms, embeddings

A Coder Considers the Waning Days of the Craft (via) James Somers in the New Yorker, talking about the impact of GPT-4 on programming as a profession. Despite the headline this piece is a nuanced take on this subject, which I found myself mostly agreeing with.

I particularly liked this bit, which reflects my most optimistic viewpoint: I think AI assisted programming is going to shave a lot of the frustration off learning to code, which I hope brings many more people into the fold:

What I learned was that programming is not really about knowledge or skill but simply about patience, or maybe obsession. Programmers are people who can endure an endless parade of tedious obstacles.

# 14th November 2023, 4:36 am / new-yorker, programming, ai, chatgpt, llms

DAK and the Golden Age of Gadget Catalogs (via) A must-read from Cabel Sasser, describing his ten year project to collect and digitize copies of the DAK gadget catalog, from 1972 to 1994.

# 13th November 2023, 4:57 am / cabel-sasser, gadgets, history

ChatGPT: Dejargonizer. I built a custom GPT. Paste in some text with unknown jargon or acronyms and it will try to guess the context and give you back an explanation of each term.

# 11th November 2023, 10:17 pm / ai, generative-ai, chatgpt, llms

The world’s largest aircraft breaks cover in Silicon Valley. “At 124.5 meters long, Pathfinder 1 dwarfs the current Goodyear airships and even the massive Stratolaunch plane designed to launch orbital rockets. It’s the largest aircraft to take to the skies since the gargantuan Hindenburg airship of the 1930s.”

# 8th November 2023, 10:12 pm / airships, zeppelins

AGI is Being Achieved Incrementally (OpenAI DevDay w/ Simon Willison, Alex Volkov, Jim Fan, Raza Habib, Shreya Rajpal, Rahul Ligma, et al). I participated in an an hour long conversation today about the new things released at OpenAI DevDay, now available on the Latent Space podcast.

# 8th November 2023, 2:50 am / podcasts, ai, openai, generative-ai, llms, podcast-appearances

Fine-tuning GPT3.5-turbo based on 140k slack messages. Ross Lazerowitz spent $83.20 creating a fine-tuned GPT-3.5 turbo model based on 140,000 of his Slack messages (10,399,747 tokens), massaged into a JSONL file suitable for use with the OpenAI fine-tuning API.

Then he told the new model “write a 500 word blog post on prompt engineering”, and it replied “Sure, I shall work on that in the morning”.

# 8th November 2023, 2:44 am / ai, slack, openai, generative-ai, llms, fine-tuning

See the History of a Method with git log -L (via) Neat Git trick from Caleb Hearth that I hadn’t seen before, and it works for Python out of the box:

git log -L :path_with_format:__init__.py

That command displays a log (with diffs) of just the portion of commits that changed the path_with_format function in the __init__.py file.

# 5th November 2023, 8:16 pm / git, python

Stripe: Online migrations at scale (via) This 2017 blog entry from Jacqueline Xu at Stripe provides a very clear description of the “dual writes” pattern for applying complex data migrations without downtime: dual write to new and old tables, update the read paths, update the write paths and finally remove the now obsolete data—illustrated with an example of upgrading customers from having a single to multiple subscriptions.

# 5th November 2023, 4:06 pm / databases, migrations, zero-downtime, stripe

YouTube: OpenAssistant is Completed—by Yannic Kilcher (via) The OpenAssistant project was an attempt to crowdsource the creation of an alternative to ChatGPT, using human volunteers to build a Reinforcement Learning from Human Feedback (RLHF) dataset suitable for training this kind of model.

The project started in January. In this video from 24th October project founder Yannic Kilcher announces that the project is now shutting down.

They’ve declared victory in that the dataset they collected has been used by other teams as part of their training efforts, but admit that the overhead of running the infrastructure and moderation teams necessary for their project is more than they can continue to justify.

# 4th November 2023, 10:14 pm / open-source, ai, generative-ai, chatgpt, llms

Hacking Google Bard—From Prompt Injection to Data Exfiltration (via) Bard recently grew extension support, allowing it access to a user’s personal documents. Here’s the first reported prompt injection attack against that.

This kind of attack against LLM systems is inevitable any time you combine access to private data with exposure to untrusted inputs. In this case the attack vector is a Google Doc shared with the user, containing prompt injection instructions that instruct the model to encode previous data into an URL and exfiltrate it via a markdown image.

Google’s CSP headers restrict those images to *.google.com—but it turns out you can use Google AppScript to run your own custom data exfiltration endpoint on script.google.com.

Google claim to have fixed the reported issue—I’d be interested to learn more about how that mitigation works, and how robust it is against variations of this attack.

# 4th November 2023, 4:46 pm / google, security, ai, prompt-injection, bard, llms, exfiltration-attacks, content-security-policy

Tracking SQLite Database Changes in Git (via) A neat trick from Garrit Franke that I hadn’t seen before: you can teach “git diff” how to display human readable versions of the differences between binary files with a specific extension using the following:

git config diff.sqlite3.binary true
git config diff.sqlite3.textconv “echo .dump | sqlite3”

That way you can store binary files in your repo but still get back SQL diffs to compare them.

I still worry about the efficiency of storing binary files in Git, since I expect multiple versions of a text text file to compress together better.

# 1st November 2023, 6:53 pm / git, sqlite

SQLite 3.44: Interactive release notes. Anton Zhiyanov compiled interactive release notes for the new release of SQLite, demonstrating several of the new features. I’m most excited about order by in aggregates—group_concat(name order by name desc)—which is something I’ve wanted in the past. Anton demonstrates how it works with JSON aggregate functions as well. The new date formatting options look useful as well.

# 1st November 2023, 3:47 pm / sqlite, anton-zhiyanov

My User Experience Porting Off setup.py (via) PyOxidizer maintainer Gregory Szorc provides a detailed account of his experience trying to figure out how to switch from setup.py to pyproject.toml for his zstandard Python package.

This kind of detailed usability feedback is incredibly valuable for project maintainers, especially when the user encountered this many different frustrations along the way. It’s like the written version of a detailed usability testing session.

# 31st October 2023, 7:57 pm / packaging, python, usability

Our search for the best OCR tool in 2023, and what we found. DocumentCloud’s Sanjin Ibrahimovic reviews the best options for OCR. Tesseract scores highly for easily machine readable text, newcomer docTR is great for ease of use but still not great at handwriting. Amazon Textract is great for everything except non-Latin languages, Google Cloud Vision is great at pretty much everything except for ease-of-use. Azure AI Document Intelligence sounds worth considering as well.

# 31st October 2023, 7:21 pm / documentcloud, ocr

I’m Sorry I Bit You During My Job Interview. The way this 2011 McSweeney’s piece by Tom O’Donnell escalates is delightful.

# 31st October 2023, 4:21 pm / funny

Microsoft announces new Copilot Copyright Commitment for customers. Part of an interesting trend where some AI vendors are reassuring their paying customers by promising legal support in the face of future legal threats:

“As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved.”

# 31st October 2023, 3:35 pm / law, microsoft, ai

Through the Ages: Apple CPU Architecture (via) I enjoyed this review of Apple’s various CPU migrations—Motorola 68k to PowerPC to Intel x86 to Apple Silicon—by Jacob Bartlett.

# 30th October 2023, 10:56 pm / apple, computer-history

Making PostgreSQL tick: New features in pg_cron (via) pg_cron adds cron-style scheduling directly to PostgreSQL. It's a pretty mature extension at this point, and recently gained the ability to schedule repeating tasks at intervals as low as every 1s.

The examples in this post are really informative. I like this example, which cleans up the ever-growing cron.job_run_details table by using pg_cron itself to run the cleanup:

SELECT cron.schedule('delete-job-run-details', '0 12 * * *', $$DELETE FROM cron.job_run_details WHERE end_time < now() - interval '3 days'$$);

pg_cron can be used to schedule functions written in PL/pgSQL, which is a great example of the kind of DSL that I used to avoid but I'm now much happier to work with because I know GPT-4 can write basic examples for me and help me understand exactly what unfamiliar code is doing.

# 27th October 2023, 2:57 am / postgresql, sql

Oh-Auth—Abusing OAuth to take over millions of accounts (via) Describes an attack against vulnerable implementations of OAuth.

Let’s say your application uses OAuth against Facebook, and then takes the returned Facebook token and gives it access to the user account with the matching email address passed in the token from Facebook.

It’s critical that you also confirm the token was generated for your own application, not something else. Otherwise any secretly malicious app online that uses Facebook login could take on of their stored tokens and use it to hijack an account of your site belonging to that user’s email address.

# 26th October 2023, 3:51 pm / oauth, security

Web Components Will Outlive Your JavaScript Framework (via) A really clear explanation of the benefit of Web Components built using dependency-free vanilla JavaScript, specifically for interactive components that you might want to embed in something like a blog post. Includes a very neat minimal example component.

# 25th October 2023, 5:19 pm / javascript, web-components

chDB (via) This is a really interesting development: chDB offers “an embedded SQL OLAP Engine” as a Python package, which you can install using “pip install chdb”. What you’re actually getting is a wrapper around ClickHouse—it’s almost like ClickHouse has been repackaged into an embedded database similar to SQLite.

# 24th October 2023, 11:04 pm / python, clickhouse

Solving the Engineering Strategy Crisis (via) Will Larson’s 49m video discussing engineering strategy: what one is and how to build one. He defines an engineering strategy as having two key components: an honest diagnosis of the way things currently work, and a practical approach to making things better.

Towards the end of the talk he suggests that there are two paths to developing a new strategy. The first is to borrow top-down authority from a sponsor such as a CTO, and the second is to work without any borrowed authority, instead researching how things work at the moment and, through documenting that, write a strategy document into existence!

# 22nd October 2023, 9:18 pm / will-larson, management

Patrick Newman’s Software Engineering Management Checklist (via) This tiny document may have the highest density of good engineering management advice I’ve ever encountered.

# 22nd October 2023, 9:16 pm / management

New Default: Underlined Links for Improved Accessibility (GitHub Blog). “By default, links within text blocks on GitHub are now underlined. This ensures links are easily distinguishable from surrounding text.”

# 19th October 2023, 4:19 pm / accessibility, design, github

I’m banned for life from advertising on Meta. Because I teach Python. (via) If accurate, this describes a nightmare scenario of automated decision making.

Reuven recently found he had a permanent ban from advertising on Facebook. They won’t tell him exactly why, and have marked this as a final decision that can never be reviewed.

His best theory (impossible for him to confirm) is that it’s because he tried advertising a course on Python and Pandas a few years ago which was blocked because a dumb algorithm thought he was trading exotic animals!

The worst part? An appeal is no longer possible because relevant data is only retained for 180 days and so all of the related evidence has now been deleted.

Various comments on Hacker News from people familiar with these systems confirm that this story likely holds up.

# 19th October 2023, 2:56 pm / ethics, facebook, pandas, python, ai, meta, ai-ethics

Making CRDTs 98% more efficient (via) Outstanding piece of explanatory writing by Jake Lazaroff showing how he reduced the transmitted state of his pixel art CRDT implementation from 643KB to 15KB using a progression of tricks, each of which is meticulously explained and accompanied by an interactive demo.

# 17th October 2023, 5:15 pm / crdt

Multimodality and Large Multimodal Models (LMMs) (via) Useful, extensive review of the current state of the art of multimodal models by Chip Huyen. Chip calls them LMMs for Large Multimodal Models, a term that seems to be catching on.

# 14th October 2023, 7:51 pm / ai, generative-ai, llms

Wikimedia Commons: Photographs by Gage Skidmore (via) Gage Skidmore is a Wikipedia legend: this category holds 93,458 photographs taken by Gage and released under a Creative Commons license, including a vast number of celebrities taken at events like San Diego Comic-Con. CC licensed photos of celebrities are generally pretty hard to come by so if you see a photo of any celebrity on Wikipedia there’s a good chance it’s credited to Gage.

# 10th October 2023, 4:17 am / creativecommons, photography, wikipedia

«« first « previous page 50 / 276 next » last »»

Simon Willison’s Weblog

Blogmarks

Years

Tags