Comments Page - How to Get Foreign Keys Horribly Wrong

« Back How to Get Foreign Keys Horribly Wronghakibenita.comSubmitted by Bogdanp 4 days ago

cogman10 a day ago
This sort of thing hasn't really done much to make me like ORMs.
It seems like a lot of code to generate the tables in the first place and you STILL need to read the output scripts just to ensure the ORM isn't generating some garbage you didn't want.
That seems like a lot of extra effort when a simple migration service (such as liquibase) could do the same work running SQL directly. No question on "which indexes are getting created and why". No deep knowledge of Django interactions with sql. Instead, it's just directly running the SQL you want to run.
- wvenable a day ago
  I do read my migration scripts generated from an ORM to make sure my source code is correct.
  Liquibase starts with "Write your database change code in your preferred authoring tool in SQL, YAML, JSON, or XML." So instead of just having my ORM generate that and I just have to read them to ensure correctness, I have to manually write change scripts instead? I don't see how that's is comparable.
  Liquibase could certainly come in after I have some SQL scripts generated from my ORM and do whatever it does.
  viapivov 16 hours ago
  For some reason, programmers still think that writing is always slower than reading
- teaearlgraycold a day ago
  I would say automatic migration generation isn’t a necessary or particularly important part of an ORM. ORMs are there to map your database relational objects to your client language’s objects.
  cjs_ac a day ago
  I think the person you're replying to is arguing for using some sort of database migration library without using an ORM library. It's the same position I came to recently.
  teaearlgraycold a day ago
  Yes but they seem to have switched because they didn’t like ORM-generated migration code. I think that’s a bad reason to switch because it wasn’t an important part of ORMs in the first place. Basically, I want to know why they were even using ORMs before.
  I don’t want to go without an ORM because I’ll end up making one ad-hoc anyway. I’m not going to do work on raw tuples in my application code.
  Tostino a day ago
  I'd call it an anti-feature for most long-lived projects that will end up needing migrations through its lifetime.
  I go the liquibase route for migrations, and just use the mapping portion of any ORM.
  pphysch a day ago
  Most(?) devs nowadays are introduced to database migration tools as a DX feature.
  "Wow, 1-2 command and my app and database are in sync!"
  In reality, migration tools are 100% about data loss prevention.
  If you do not care about data loss, updating your schema is trivial, just drop everything and create. Dev environments should be stateless anyways, using separate data "fixtures" when needed.
  Data loss itself is a highly nuanced topic. Some data is replaceable, some might be protected in a separate store. So I agree that ORMs should challenge the assumption that automatic migration tools need to be part of their kitchen sink.
  wagwang a day ago
  The ORM auto migration tools are a 100% a DX feature. Obviously any serious application will have complicated migrations that outgrow the generated sql; doesn't mean its not a nice to have feature for quick iteration.
  pphysch 5 hours ago
  I largely agree, but the cargo-culting around every migration operation being "reversible" suggests everyone isn't on the same page here. Too much straddling of the line (gulf) between DX tool and production data management tool.
  teaearlgraycold a day ago
  I like that they provide the basic structure of how to apply yet unseen migrations. But they don’t need to generate the SQL at all. You quickly learn to never trust the generated code. It always needs to be manually reviewed.
aidos a day ago
I’ve done a lot of interviewing and I’ve discovered that many devs (even experienced ones) don’t understand the difference between indexes and foreign keys.
My assumption is that people have used orms that automatically add the index for you when you create a relationship so they just conflate them all. Often they’ll say that a foreign key is needed to improve the performance and when you dig into it, their mental model is all wrong. The sense they have is that the other table gets some sort of relationship array structure to make lookups fast.
It’s an interesting phenomenon of the abstraction.
Don’t get me wrong, I love sqlalchemy and alembic but probably because I understand what’s happening underneath so I know the right way to hold it so things are efficient and migrations are safe.
- alexjplant 21 hours ago
  During a work meeting I once suggested using a non-PK column in a Postgres database for a foreign key. A coworker confidently said that we shouldn't because joins would be slow. I pointed out that we could create an index on that column and they rebutted by claiming that PKs created some kind of "special" index. I didn't want to burn goodwill and so didn't push it further but it always struck me as silly.
  Depending upon the database storage engine, available memory, and table size I could see there being _some_ performance hit if only PKs are used for statistics but I'd think that modern RDBMSes are smart enough to cache appropriately. Am I missing something?
  quectophoton 19 hours ago
  > and they rebutted by claiming that PKs created some kind of "special" index
  Maybe they were thinking about something like the "clustered indexes" from SQL Server, and mistakenly thought PostgreSQL also worked like that:
  > "When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table doesn't already exist and you don't specify a unique nonclustered index." [1]
  > "Clustered indexes sort and store the data rows in the table or view based on their key values." [2]
  So I'm guessing you could squeeze some extra performance for certain access patterns, maybe? I have not worked at any place where I had needed to worry about low level details like this, though, so obligatory disclaimer to take this comment with a grain of salt due to my lack of first-hand experience.
  [1]: https://learn.microsoft.com/en-us/sql/relational-databases/i...
  [2]: https://learn.microsoft.com/en-us/sql/relational-databases/i...
  whatevaa 8 hours ago
  In index oriented tables, primary keys are special. Table is organized by primary key and secondary indexes point to primary.
  In postgres, primary key is basically unique index with some special semantics.
- Fishkins a day ago
  Huh, that's interesting. Mixing indexes and FKs is a major conceptual error.
  FWIW, I've also asked everyone I've interviewed in the past decade about indexes and FKs. Most folks I've talked to seem to understand FKs. They're often fuzzier on the details of indexes, but I don't recall anyone conflating the two.
  aidos a day ago
  I guess it depends on how much time you’ve spent in a relational db. For people who mostly interact with them via an orm, I can see where the confusion comes from.
- bevr1337 a day ago
  > their mental model is all wrong.
  Is it? In Postgres, all FK references must be to a column with a PK or unique constraint or part of another index. Additionally, Postgres and Maria (maybe all SQL?) automatically create indexes for PKs and unique constraints. There's a high likelihood that a foreign key is already indexed _in the other table_.
  Generally, I agree with your statement. Adding a FK won't magically improve performance or create useful indices. But, the presence of a FK or refactoring to support a FK does (tangentially) point back to that index.
  aidos a day ago
  I wasn’t totally clear on my original statement. As you point out, the referenced columns in the referenced table need to have a unique constraint and that’s done with a unique index. My understanding is that this ensures there’s no ambiguity as to which row is referenced and allows for efficient enforcement of the FK constraint.
  Django automatically creates an index on the referencing table to ensure that joins are fast. The fact that you have the relationship in the ORM means that’s how you’re likely to access the data so it makes perfect sense.
  The mental model mismatch I’ve seen is that people appear to think of the relationship as being on the parent object “pointing” at the child table.
  bevr1337 19 hours ago
  I'll admit my experience in Django is only migrating customers off Django. Thanks for adding some interesting details about that ecosystem
  ak39 a day ago
  By definition, a FK has to reference a PK in the “parent”.
  aidos a day ago
  Not quite. It can reference any combination of columns with a unique index (of with the PK is by definition).
  UltraSane a day ago
  Yes. Not understanding the difference means you really don't understand the relational model. It would be like a network engineer not understanding the difference between IP and MAC addresses.
- whyowhy3484939 a day ago
  Very strange if you ask me and disturbing. I don't know if I'd let such a dev touch a database. Of course nowadays we just vibe code and YOLO everything, but still. This is making me feel old.
- hobs a day ago
  An index is one thing (and important and good), but an FK allows you to completely eschew IO if done right. In other words "I guarantee that all values in this list exist in that list" is a great simple optimization path and some sql engines can use it to avoid joining data or checking for existence at all.
miggol a day ago
I don't want to defend Django here, surely this should be categorized as a bug. But on the other hand, for this situation to come up you have to be the following:
- The kind of person to dive into the schema and worry about an unnecessary index
- Smart enough to heed Django's warnings and use `Meta.UniqueConstraint`
- Dumb enough to ignore Django's warnings and not use `Meta.Indexes`
I think it's funny that the kind of dev that 100% relies on the ORM and would benefit from this warning would probably never find themselves in this gritty optimization situation in the first place.
That being said, I enjoyed the article and learned something so maybe I'm the target audience and not them.
jihadjihad a day ago
> Django will implicitly add an index on a ForeignKey field unless explicitly stated otherwise.
This is nice to know if you're using Django, but as important to note is that neither Postgres nor SQLAlchemy / Alembic will do this automatically.
northhnbesthn 16 hours ago
This seems like a fairly ORM 100-level thing. Read the documentation that comes with your ORM I guarantee it has a best practices section and a performance considerations section. N+1 is trivial and widely covered by every ORM as it’s a common thing. nb I use entity framework so ymmv, but EF Core especially newer versions with one-way updates and deletes have been good to me and migrations aren’t half bad either. The code is I think much easier on the eyes than python/django but that’s a personal pref.
lelanthran 16 hours ago
FTFA
> Don't use unique_together, use UniqueConstraint instead
> Always explicitly set db_index and add a comment
> Always check the SQL generated by migrations before applying them
> Provide reverse migration operations whenever possible
> Use concurrent index operations in busy systems
> Indexes on foreign keys are used indirectly by deletes
> Use partial indexes when possible
> Adjust the order of migration operations to reduce impact on live systems
> Explicitly state tables to lock when using select_for_update
> Use no_key=True when selecting a row for update to allow referencing objects to be created
This seems like you need much more non-essential knowledge than simply knowing SQL. Just not using an ORM bypasses so many footguns!
TBH, looking at the code to do this, it seems simpler to just bypass the ORM and use a thin wrapper to the DB.
Since the language in use here supports reflection, every query can automagically return a proper object with typed fields, at which point you gotta ask yourself "what value am I getting out of the ORM in exchange for all these footguns?"
undefined a day ago
[deleted]
rrauenza a day ago
How can we determine if an index can be satisfied by a constraint index?
For example, does the FK need to be the first field in a unique together?
- undefined a day ago
  [deleted]
undefined a day ago
[deleted]
dakiol a day ago
Is this for real? I don’t know why anyone would deal with such amount of incidental complexity (django orm) when one can just use plain sql.
- twelve40 a day ago
  why is this so surprising? every place i worked at, going back probably 6 jobs, was using an ORM (django, hibernate, or even a self-built one), they went on to get acquired by Twitter, Microsoft, Uber etc, so not completely stupid or obscure. Even if you have a personal dislike of ORMs, if you ever work with/for another team with an exiting codebase and a DB, chances are you will have to work with one.