Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is with in memory database and journaling disabled?


My run on NixOS 21.05 with sqlite 3.35.2, without randomness, on an i7-7500U:

Invocation:

    command time sqlite3 ':memory:' '
    create table IF NOT EXISTS user
    (
        id INTEGER not null primary key,
        area CHAR(6),
        age INTEGER not null,
        active INTEGER not null
    );
    INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
    '
Result:

    16.34user 0.43system 0:16.89elapsed 99%CPU (0avgtext+0avgdata 1477320maxresident)k
    11inputs+0outputs (0major+369851minor)pagefaults 0swaps
Invocation with pragmas:

    command time sqlite3 ':memory:' '
    PRAGMA journal_mode = OFF;
    PRAGMA synchronous = 0;
    PRAGMA cache_size = 1000000;
    PRAGMA locking_mode = EXCLUSIVE;
    PRAGMA temp_store = MEMORY;
    create table IF NOT EXISTS user
    (
        id INTEGER not null primary key,
        area CHAR(6),
        age INTEGER not null,
        active INTEGER not null
    );
    INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
    '
Result with pragmas:

    17.31user 0.41system 0:17.85elapsed 99%CPU (0avgtext+0avgdata 1477288maxresident)k
    11inputs+0outputs (0major+369850minor)pagefaults 0swaps
As expected, the pragmas make no difference when using `:memory:` -- 17 seconds, 1.4 GB RAM each on my laptop.


I have used pragmas from the article.

Since it is only 100M rows, it takes 1.8 GB on the disk, so I've used tmpfs for this which essentially is a ramdisk. But I have a gen4 pcie nvme SSD - it can reliably write at 4GB/s sequentially, so writing takes a ~500ms for 100M rows, it is not a bottleneck here. random() takes ~half of the insert time. Generating those values with Rust, for example, is faster, but sharing this data with sqlite takes more time than generating it with random().

Maybe implementing custom virtual table in C or Rust like build-in generate_series, but the one that will produce user table fields will be faster, but that is significantly more effort than my query.

This query with random() and generate_series executed in sqlite CLI takes whooping 8MB of the RAM, so you don't even have to close all Electron-based applications to run it on a computer with 8GB of RAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: