48 items tagged “testing”
promptfoo: How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs. promptfoo is a CLI and library for “evaluating LLM output quality”. This tutorial in their documentation about using it to compare Llama 2 to gpt-3.5-turbo is a good illustration of how it works: it uses YAML files to configure the prompts, and more YAML to define assertions such as “not-icontains: AI language model”. # 10th September 2023, 4:19 pm
pytest-icdiff (via) This is neat: “pip install pytest-icdiff” provides an instant usability upgrade to the output of failed tests in pytest, especially if the assertions involve comparing larger strings or nested JSON objects. # 3rd June 2023, 4:59 pm
pyfakefs usage (via) New to me pytest fixture library that provides a really easy way to mock Python’s filesystem functions—open(), os.path.listdir() and so on—so a test can run against a fake set of files. This looks incredibly useful. # 1st February 2023, 10:37 pm
How to simulate a broken database connection for testing in Django (via) Neil Kakkar explores the options using unittest.patch() and then settles on a neater pattern using “with connection.execute_wrapper(QueryTimeoutWrapper()):” to simulate the exact exception he needs to test against. # 15th January 2023, 8:31 pm
I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled “Massively increase your productivity on personal projects with comprehensive documentation and automated tests”.[... 3863 words]
mitsuhiko/insta (via) I asked for recommendations on Twitter for testing libraries in other languages that would give me the same level of delight that I get from pytest. Two people pointed me to insta by Armin Ronacher, a Rust testing framework for “snapshot testing” which automatically records reference values to your repository, so future tests can spot if they change. # 31st October 2022, 1:06 am
I discovered a while ago that all those errors and bugs that only appear when you demo something to an audience also magically appear when you record yourself demoing it to nobody. Maybe narrating a feature to a pretend audience takes the blinders off enough that you notice little mistakes you wouldn’t have otherwise.
I’m pretty convinced that the biggest single contributor to improved software in my lifetime wasn’t object-orientation or higher-level languages or functional programming or strong typing or MVC or anything else: It was the rise of testing culture.
When you have to mock a collaborator, avoid using the Mock object directly. Either use mock.create_autospec() or mock.patch(autospec=True) if at all possible. Autospeccing from the real collaborator means that if the collaborator’s interface changes, your tests will fail. Manually speccing or not speccing at all means that changes in the collaborator’s interface will not break your tests that use the collaborator: you could have 100% test coverage and your library would fall over when used!
Blazing fast CI with pytest-split and GitHub Actions (via) pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions. # 22nd February 2021, 7:06 pm
Litestream runs continuously on a test server with generated load and streams backups to S3. It uses physical replication so it’ll actually restore the data from S3 periodically and compare the checksum byte-for-byte with the current database.
I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.[... 933 words]
parameterized. I love the @parametrize decorator in pytest, which lets you run the same test multiple times against multiple parameters. The only catch is that the decorator in pytest doesn’t work for old-style unittest TestCase tests, which means you can’t easily add it to test suites that were built using the older model. I just found out about parameterized which works with unittest tests whether or not you are running them using the pytest test runner. # 19th February 2019, 9:05 pm
Hynek Schlawack: Testing & Packaging (via) “How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!)”—Hynek makes a convincing argument for putting your packaged Python code in a src/ directory for ease of testing and coverage. # 22nd May 2018, 10:12 pm
I’ve heard managers and teams mandating 100% code coverage for applications. That’s a really bad idea. The problem is that you get diminishing returns on our tests as the coverage increases much beyond 70% (I made that number up… no science there). Why is that? Well, when you strive for 100% all the time, you find yourself spending time testing things that really don’t need to be tested. Things that really have no logic in them at all (so any bugs could be caught by ESLint and Flow). Maintaining tests like this actually really slow you and your team down.
Cypress (via) Promising looking new open source testing framework for full-blown web integration testing—a modern alternative to Selenium. I spent five minutes playing with the demo and was really impressed by it—especially their “time travel” feature which lets you hover over a passed test and see the state of the browser when each of those assertions was executed. # 11th October 2017, 4:14 pm
One interesting quirk of Pinboard is a complete absence of unit tests. I used to be a die-hard believer in testing, but in Pinboard tried a different approach, as an experiment. Instead of writng tests I try to be extremely careful in coding, and keep the code size small so I continue to understand it. I’ve found my defect rate to be pretty comparable to earlier projects that included extensive test suites and fixtures, but I am much more productive on Pinboard.
twitter-text-conformance (via) This is a neat idea: Twitter have released open source libraries for parsing standard tweet syntax in Ruby and Java, but they’ve also released a set of YAML unit tests aimed at anyone who wants to implement the same parsing logic in other languages. # 6th February 2010, 3:39 pm
rlisagor’s freshen. A Python clone of Ruby’s innovative Cucumber testing framework. Tests are defined as a set of plain-text scenarios, which are then executed by being matched against test functions decorated with regular expressions. Has anyone used this or Cucumber? I’m intrigued but unconvinced—are the plain text scenarios really a useful way of defining tests? # 5th January 2010, 7:30 pm
There is no WebKit on Mobile. PPK ran 27 tests against 19 different WebKit-on-mobile implementations and found enormous disparities between the levels of support in currently available mobile phones. # 7th October 2009, 12:23 pm
Test-Driven Heresy. Tim Bray advocates TDD for maintenance development, but argues that it may not be as useful during the exploratory, greenfield development phase of a project. # 24th June 2009, 11:03 am
Testing Django Views for Concurrency Issues. Neat decorator for executing a Django view under high concurrency in your unit tests, to help spot errors caused by database race conditions that should be executed inside a transaction. # 27th May 2009, 10:01 am
Nose 0.11 released. My favourite Python testing tool just got some really neat new features, including the ability to parallelize tests across multiple processes (hence CPUs) using the multiprocess module, Xunit XML output for integration with continuous integration tools and a --failed switch to re-run only the last batch of failed tests. # 8th May 2009, 11:24 am
Continuous deployment in 5 easy steps. A classic case of a number in a title making the article look less interesting than it actually is. Lots of interesting information here from IMVU’s Eric Ries. # 1st April 2009, 12:25 am
It may be hard to imagine writing rock solid one-in-a-million-or-better tests that drive Internet Explorer to click ajax frontend buttons executing backend apache, php, memcache, mysql, java and solr. I am writing this blog post to tell you that not only is it possible, it’s just one part of my day job.