Simon Willison’s Weblog

13 items tagged “pdf”

arxiv-vanity (via) Beautiful new project from Ben Firshman and Andreas Jansson: “Arxiv Vanity renders academic papers from Arxiv as responsive web pages so you don’t have to squint at a PDF”. It works by pulling the raw LaTeX source code from Arxiv and rendering it to HTML using a heavily customized Pandoc workflow. The real fun is in the architecture: it’s a Django app running on Heroku which fires up on-demand Hyper.sh Docker containers for each individual rendering job. # 25th October 2017, 8:06 pm

pdf.js. A JavaScript library for creating simple PDF files. Works (flakily) in your browser using a data:URI hack, but is also compatible with server-side JavaScript implementations such as Node.js. # 17th June 2010, 7:39 pm

node.js at JSConf.eu (PDF). node.js creator Ryan Dahl’s presentation at this year’s JSConf.eu. The principle philosophy is that I/O in web applications should be asynchronous—for everything. No blocking for database calls, no blocking for filesystem access. JavaScript is a mainstream programming language with a culture of callback APIs (thanks to the DOM) and is hence ideally suited to building asynchronous frameworks. # 17th November 2009, 6:07 pm

Adobe is Bad for Open Government. The problem isn’t just that PDFs are a bad way of sharing data, it’s that Adobe have been actively lobbying the US government to use their PDF and Flash formats for open government initiatives. # 1st November 2009, 12:51 pm

Prawn (via) Really nice PDF generation library for Ruby, used to generate Dopplr’s beautiful end of year reports. # 16th January 2009, 4:04 pm

Dopplr presents the Personal Annual Report 2008: freshly generated for you, and Barack Obama... So classy it hurts. I’d love to know what library they used to generate the PDF. # 16th January 2009, 12:17 pm

Robust Defenses for Cross-Site Request Forgery [PDF]. Fascinating report which introduces the “login CSRF” attack, where an attacker uses CSRF to log a user in to a site (e.g. PayPal) using the attacker’s credentials, then waits for them to submit sensitive information or bind the account to their credit card. The paper also includes an in-depth study of potential protection measures, including research that shows that 3-11% of HTTP requests to a popular ad network have had their referer header stripped. Around 0.05%-0.10% of requests have custom HTTP headers such as X-Requested-By stripped. # 24th September 2008, 9:40 am

PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document. # 3rd August 2008, 3:29 pm

Scaling your website with the Perlbal web server (PDF) (via) Perlbal documentation is pretty thin on the ground; this is a really useful introduction from Frank Wiles. # 17th June 2008, 10:39 pm

OSM Super-Strength Export. Awesome new feature on OpenStreetMap: you can browse to anywhere on the map, then hit “export” and download a rendered bitmap or vector (PDF and SVG) image of the currently displayed map—and because it’s OSM there’s no watermark and a very liberal usage license. # 22nd April 2008, 9:56 am

Restructured Text to Anything. Slick set of online tools for converting Restructured Text (one of the more mature wiki-style markup languages) to HTML or PDF. Includes a nice looking API. Powered by Django. # 13th September 2007, 3:54 pm

PDF Shrink. $35 OS X app that crunches down the size of PDF files—useful if you often embed photos in your presentations. # 28th May 2007, 2:21 pm

The Adobe PDF XSS Vulnerability. If you host a PDF file anywhere on your site, you’re vulnerable to an XSS attack due to a bug in Acrobat Reader versions below 8. The fix is to serve PDFs as application/octet-stream to avoid them being displayed inline. # 11th January 2007, 4:23 pm