fauxlars is a speedy fake data generation library.
Install the library
python -m venv .venv && source .venv/bin/activate && python -m pip install faux_larsLet's generate some data
import polars as pl
from faux_lars import generate_lazyframe
rows =5000
df = (
generate_lazyframe(
{
"clicks": pl.UInt8,
"Name": "name",
"City Name": "city_name",
"Phone": "mobile_number",
},
rows,
"en",
).collect()
)
print(df)┌────────┬──────────────────┬──────────────────────────┬────────────────┐
│ clicks ┆ Name ┆ City Name ┆ Phone │
│ --- ┆ --- ┆ --- ┆ --- │
│ u8 ┆ str ┆ str ┆ str │
╞════════╪══════════════════╪══════════════════════════╪════════════════╡
│ 207 ┆ Erik Conn ┆ Beier stad ┆ 512-797-9451 │
│ 66 ┆ Hildegard Murphy ┆ Wintheiser bury ┆ (907) 011-3125 │
│ 38 ┆ Tremayne Casper ┆ Okuneva land ┆ 933.955.5987 │
│ 157 ┆ Kamille Haley ┆ McGlynn furt ┆ 329.524.7080 │
│ 123 ┆ Erika Kozey ┆ East Favian Fisher burgh ┆ 1-846-364-8772 │
│ … ┆ … ┆ … ┆ … │
│ 25 ┆ Chadd Rosenbaum ┆ Nader furt ┆ 201-837-7966 │
│ 50 ┆ Tevin Jerde ┆ Davis shire ┆ 1-125-407-6570 │
│ 22 ┆ Antwan Jones ┆ West Sammie Hirthe shire ┆ 365-256-8860 │
│ 120 ┆ Eden McCullough ┆ Goldner shire ┆ 1-950-582-2326 │
│ 76 ┆ Tevin Batz ┆ Bauch mouth ┆ 560-202-3844 │
└────────┴──────────────────┴──────────────────────────┴────────────────┘The faux_lars python API is deliberately small and lightweight.
There are two functions:
generate_lazyframegenerate_dataframe
Each receive the following arguments:
schema: A dictionary ofcolumn_name,data_typekey, value pairs.data_typecan be a stringdata_typein the table below, or a non-complex polars DataType object.rows: The number of rows to generate.language: Any of the supported locales; see the table below.
| Data Type | en | fr | jp | ar | pt_br | zh_tw | zh_cn |
|---|---|---|---|---|---|---|---|
| polars_datatypes | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| name | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| first_name | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| last_name | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| building_number | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| city_name | ✓ | ||||||
| country_code | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| country_name | ✓ | ||||||
| latitude | ✓ | ||||||
| longitude | ✓ | ||||||
| postcode | ✓ | ||||||
| secondary_address | ✓ | ||||||
| secondary_address_type | ✓ | ||||||
| state_abbr | ✓ | ||||||
| state_name | ✓ | ✓ | |||||
| street_name | ✓ | ✓ | |||||
| time_zone | ✓ | ||||||
| zip_code | ✓ | ||||||
| isbn | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| isbn_10 | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| isbn_13 | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| bs | ✓ | ||||||
| company_name | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| industry | ✓ | ||||||
| profession | ✓ | ||||||
| credit_card_number | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| currency_code | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| currency_name | ✓ | ||||||
| currency_symbol | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| dir_path | ✓ | ||||||
| file_ext | ✓ | ||||||
| file_name | ✓ | ||||||
| file_path | ✓ | ||||||
| mime_type | ✓ | ||||||
| semver | ✓ | ||||||
| semver_stable | ✓ | ||||||
| semver_unstable | ✓ | ||||||
| licence_plate | ✓ | ||||||
| health_insurance_code | ✓ | ||||||
| free_email | ✓ | ✓ | ✓ | ||||
| bic | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| isin | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| ip | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| ipv4 | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| ipv6 | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| mac_address | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| password | ✓ | ✓ | ✓ | ||||
| safe_email | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| user_agent | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| user_name | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| field | ✓ | ✓ | ✓ | ||||
| position | ✓ | ✓ | ✓ | ||||
| seniority | ✓ | ✓ | ✓ | ||||
| title | ✓ | ✓ | ✓ | ||||
| lorem | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| cell_number | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
| phone_number | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
Key:
- ✓
- ✓ : supported for this locale.
- ✓* : supported, but this value does not vary by locale.
All non-complex polars DataType objects are supported. These can be passed by name as a string, or as a polars DataType object (see the example above).
For non-sring types, locale is irrelevant. A complex polars Datatype is a struct, array, list, or enum.
Locales:
en: Englishfr: Frenchjp: Japanesear: Saudi Arabian Arabicpt_br: Brazilian Portugesezh_tw: Traditional Chinesezh_cn: Simplified Chinese
- On a laptop with 16GB of RAM and 8 cores, 5,000 rows with 1 utf8 column and three string columns generates in under 0.25 seconds.
- 1,000,000 rows generates in under 7.
- Take a look at
benchmarksfor comparison with two popular python fake data generation libraries:mimesisandfaker.