The core thesis is that your types received by the api should not be the same as the types you process internally. I can see a situation where this makes sense and a situation where this senselessly duplicates everything. The blog post shows how to do it but never really dives into why/when.
I’ve not done this in Python, where mercifully I don’t really touch CRUD style web apps anymore, but when I was doing Ruby web development we settled on similar patterns.
The biggest benefit you get is being able to have much more flexibility around validation when the input model (Pydantic here) isn’t the same as the database model. The canonical example here would be something like a user, where the validation rules vary depending on context, you might be creating a new stub user at signup when only a username and password are required, but you also want a password confirmation. At a different point you’re updating the user’s profile, and that case you have a bunch of fields that might be required now but password isn’t one of them and the username can’t be changed.
By having distinct input models you make that all much easier to reason about than having a single model which represents the database record, but also the input form, and has a bunch of flags on it to indicate which context you’re talking about.
I've also generally found that separating the types passively reminds people that they are not forced to keep those types the same.
Whenever I've been in codebases with externally-controlled types as their internal types, almost every single design that goes into the project is based around those types and whatever they efficiently model. It leads to much worse API design, both externally and internally, because it's based on what they have rather than what they want.
I'm with you. But what want sufficiently justified in the article is why both sides of that divide, canonical User and User stubs, could not be pydantic models.
The idea, as far as I was able to understand it, is that you want your core models as dependency-free as possible. If you, for whatever reason, were to drop Pydantic, that would only affect the way you validate inputs from API, and nothing deeper.
This wasn't mentioned, but the constant validation on construction also costs something. Sometimes it's a cost you're willing to pay (again, dealing with external inputs), sometimes it's extraneous because e.g. a typechecker would suffice to catch discrepancies at build time.
This sounds like Model-View-ViewModel (MVVM): Model is your domain object, but you can have many different ViewModels of it depending on what you're attempting to do.
It's a pattern that rapidly leads to tons of DTOs that endlessly repeat exactly the same properties.
Your example doesn't even justify it's use, in that scenario the small form is actually a completely different object from the User object, a UserSignup. That's both conceptually different and practically different to an actual User.
The worst pattern is when programmers combine these useless DTOs with some sort of auto mapper, which results in huge globs of boilerplate making any trivial changes to data definitions a multi file job.
The worst one I've seen was when to add one property I had to edit 40 files.
I get why people do it, but if you make it a pattern it's a massive drag to development velocity. It's anti-patterns like that which give statically typed languages a bad name.
You should really only use it when you really, really need to.
> The core thesis is that your types received by the api should not be the same as the types you process internally.
Is it? I read the blog a couple of times and never was able to divine any kind of thesis beyond the title, but as you said, the content never actually explains why.
Perhaps there is a reason, but I didn’t walk away from the post with it.
Its confusing to ask that, because that's a different subject unrelated to pydantic or python. That's just what you are supposed to do in "clean architecture"/ddd, you can ask the same question in java or whatever.
I used to work on a Java app where we did this… we had a layer of POJO value classes, a layer of ORM objects… both written by hand… plus for every entity a hand-written mapper which translated between the two… and then sometimes we even had a third layer of classes generated from Swagger specs, and yet another set of mappers to map between the Swagger classes and the value POJOs
Now I mainly do Python and I don’t see that kind of boilerplate duplication anywhere near as much as I used to. Not going to say the same kind of thing never happens in Python, but the frequency of it sure seems to have declined a lot-often you get a smattering of it in a big Python project rather than it having been done absolutely everywhere
I think this depends in principle on what you're building. Take an API, for example.
The thesis is simple:
1) A DTO is a projection or a view of a given entity.
2) The "domain entity" itself is a projection of the actual storage in a database table.
3) At different layers (vertical separation), the representation of this conceptual entity changes
4) In different entry/exit points (horizontal separation), the projection of the entity may also change.
In some cases, the domain entity can be used in different modules/routes and are projected to the API with different shapes -- less properties, more properties, transformed properties, etc.
Typically, when code has a very well-defined domain layer and separation of the DTO and storage representation, the code has a very predictable quality because if you are working with a `User` domain entity, it behaves consistently across all of your code and in different modules. Sometimes, a developer intermixes a database `User` or a DTO `User` and all of a sudden, the code behaves unpredictably; you suddenly have to be cognizant if the `user` instance you're handling is a `DBUser`, a `UserDTO`, or the domain entity. It has extra properties, missing properties, missing functions, can't be passed into some methods, etc.
Does this matter? I think it depends on 1) the size of the team, 2) how much re-use of the modules is needed, 3) the nature of the service. For a small team, it's overkill. For a module that will be reused by many teams, it has long term dividends. For a one-off, lightweight service, it probably doesn't matter. But for sure, for some core behaviors, having a delineated domain model really makes life easy when working with multiple teams reusing a module.
I find that the code I've worked with over the years that I like has this quality. So if I'm responsible for writing some very core service or shared module, I will take the extra effort to separate my models -- even if there's more duplication required on my behalf because it makes the code more predictable to use if everything inside of the service expects to have only one specific shape and set of behaviors and project shapes outwards as needed for the use case (DTO and storage).
It does touch on what I was thinking as well at the end of the first section: Usually this makes sense if your application has to manage a lot of complexity, or rather, has to consume and produce the same domain objects in many different ways across many different APIs.
For example, some systems interact with several different vendor, tracking and payment systems that are all kinda the same, but also kinda different. Here it makes sense to have an internal domain model and to normalize all of these other systems into your domain model at a very early level. Otherwise complexity rises very, very quickly due to the number of n things interacting with n other things.
On the other hand, for a lot of our smaller and simpler systems that output JSON based of a database for other systems... it's a realistic question if maintaining the domain model and API translation for every endpoint in every change is actually less work than ripping out the API modelling framework, which occurs once every few years, if at all? Some teams would probably rewrite from scratch with new knowledge, especially if they have API-tests available.
I’d say where it’s more
Important is when you need to manage database performance. This lets you design an api that’s pleasant for users, well normalised internally, while also performing well.
Usually normalisation and performance lead to a poor api that’s hard for users to use and hard hard to evolve since you’re so tightly coupled to your external representation.
PO?O is just an object not bound by any restriction other than those forced by the Language.[0]
From the typing lens, it may be useful to consider it from Rice's theorm, and an oversimplification that typing is converting a
semantic property to a trivial property. (Damas-Hindley-Milner inference usually takes advantage of a pathological case, it is not formally trivial)
There is no hard fast rules IMHO, because Rice, Rice-Shapiro, and Kreisel-Lacombe-Shoenfield-Tseitin theorms are related to generalized solutions as most undecidable problems.
But Kreisel-Lacombe-Shoenfield-Tseitin deals with programs that are expected to HALT, yet it is still undecidable if one fixed program is equivalent to a fixed other program that always terminates.
When you start stacking framework, domain, and language restrictions, the restrictions form a type of coupling, but as the decisions about integration vs disintegration are always tradeoffs it will always be context specific.
Combinators (maybe not the Y combinator) and finding normal forms is probably a better lens than my attempt at the flawed version above.
If you consider using po?is as the adapter part of the hex pattern, and notice how a service mesh is less impressive but often more clear in the hex form, it may help build intuitions where the appropriate application of the author's suggestions may fit.
But it really is primarily decoupling of restrictions IMHO. Sometimes the tradeoffs go the other way and often they change over time.
Personally, I think that's a good idea. Design patterns naturally make sense (Visitor, Builder for e.g) once you encounter such a situation in your codebase. It almost makes complete sense then. Otherwise IMHO, it's just premature abstraction
You should do it if and only if backwards compatibility is more important for your project than development velocity.
If you have two layers of types, then it becomes much easier to ensure that the interface is stable over time. But the downside is that it will take longer to write and maintain the code.
Because they don't represent the same thing. Pydantic models represent your input, it's the result of the experience you expose to the outside world, and therefore comes with objectives and constraints matching this:
- make it easy to provide
- make it simple to understand
- make it familiar
- deal with security and authentication
- be easily serializable through your communication layer
On the other hand, internal representations have the goal to help you with your private calculations:
- make it performant
- make it work with different subsystems such as persistence, caching, queuing
- provide convenience shortcuts or precalculations for your own benefits
Sometimes they overlap, or the system is not big enough that it matters.
But the more you get to big or old system, the less likely they will.
However, I often pass around pydantic objects if I have them, and I do this until it becomes a problem. And I rarely reach that point.
It's like using Python until you have performance problems.
My pydantic models represent a "Thing" (a concept or whatever), not an input
You can translate many things into a Thing, model_validate will help you with that (with contextinfo etc)
You can translate your Thing into multiple output format, with model_serialize
In your model, you shall put every checks required to ensure that some input are, indeed, a Thing
And from there, you can use this object everywhere, certain that this is, indeed, a Thing, and that it has all the properties that makes a thing a Thing
You can certainly do it, but since serialization and validation are the main benefit from using Pydantic, I/O are why it exists.
Outside of I/O, the whole machinery has little use. And since pydantic models are used by introspection to build APIs, automatic deserializer and arg parsing, making it fit the I/O is where the money is.
Also, remember that despite all the improved perf of pydantic recently, they are still more expensive than dataclass, themselves more than classes. They are 8 times more expensive to instanciate than regular classes, but above all, attribute access is 50% slower.
Now I get that in Python this is not a primary concern, but still, pydantic is not a free lunch.
I'd say it's also important to state what it conveys. When I see a Pydantic objects, I expect some I/O somewhere. Breaking this expectation would take me by surprise and lower my trust of the rest of the code. Unless you are deep in defensive programming, there is no reason to validate input far from the boundaries of the program.
This seems ridiculously over-complicated. This guy would love Java.
He doesn't even say why you should tediously duplicate everything instead of just using the Pydantic objects - just "You know you don’t want that"! No I don't.
The only reason I've heard is performance... but... you're using Python. You don't give a shit about performance.
The main “why” that I find is that it allows you to intentionally design your API types and know when a change is touching them.
I worked on a project with a codebase on the order of millions of lines, and many times a response was made by taking an ORM object or an app internal data structure and JSON serializing it. We had a frequent problem where we’d make some change to how we process a data structure internally and oops, breaking API change. Or worse yet, sensitive data gets added to a structure typically processed with that data, not realizing it gets serialized by a response handler.
It was hard to catch this in code review because it was hard to even know when a type might be involved in generating a response elsewhere in the code base.
Switching to a schema-first API design meant that if you were making a change to a response data type, you knew it. And the CODEOWNERS file also knew it, and would bring the relevant parties into the code review. Suddenly those classes of problems went away.
The article is written for those who want to apply DDD/onion architecture to Python apps using Pydantic. Those concepts explain the motivation and the article assumes the reading knows about them. As others are writing, it may not be worth it to apply this to simple apps, but as an app grows in complexity it will help make it more extensible, maintainable, etc.
I'm not a Python expert, but looking into it briefly it seems like Pydantic's role is at application boundaries for bringing validation/typing to external data sources. If you are not working with external data, there is no reason to use it. So, if you separate out a domain layer, it brings no benefit there. Creating a domain layer where you handle business logic separately from how you interact with external data means those layers can evolve independently. An API could change and you only need to update your API models/mapping.
For many cases, we don’t do these kind of things in Java; a single annotated record can function as a model for both data and API layers. Regardless of the language, the distinction becomes important when these layers diverge or there’s some sensitive data involved.
> you're using Python. You don't give a shit about performance.
That's dumb. You may not care about max performance but you've got some threshold where shit gets obviously way to slow to be workable. I've worked with a library heavy on pydantic where it was the bottleneck.
>you're using Python. You don't give a shit about performance.
Maybe it is true if you artificially limit yourself to a single instance single thread model, due to GIL.
But because nowadays apps can easily be scaled up in many instances, this argument is irrelevant.
one may say that Python has large overhead when using a lot of objects, or that it has GIL, but people learned how to serve millions of users with python easily.
Any Python code will be dozens to hundreds of times slower than Go or Java, but it may still be fast enough to stay within human reaction latencies.
And you make be able to scale to many users, worst case with more machines. But it’ll still costs you a lot more than a faster language. That is extremely relevant, even today.
Just because it's written in Rust it doesnt mean it's fast. I was working on a project where Pydantic was the bottleneck - there were multiple levels of nested Pydantic objects, and creating the instances was very slow due to the validation which is performed on input values. Even after disablign the validation, dataclasses were twice as fast, compiling the dataclasses with mypyc improved the performance ten times.
Pydantic docs do clearly state that multple levels of nesting of Pydantic objects can make it much slower, so it isn't particularly surprising that such models were slow.
When the structure of your team makes it a problem. Conway’s law.
If you have one person maintaining a CRUD app, splitting out DTOs and APIs and all of these abstractions are completely not needed. Usually, you don’t even know yet what the right abstraction is, and making a premature wrong abstraction is WAY worse. Building stuff because you might need it later is a massive momentum killer.
But at some point when the project has grown (if it grows, which it won’t if you spend all your time making wrong abstractions early on), the API team doesn’t want their stuff broken because someone changed a pydantic model. So you start to need separation, not because it’s great or because it’s “the right way” but because it will collapse if you don’t. It’s the least bad option.
I'm not sure I agree, you can still use Pydantic in the domain model and update the version of the API when you change the expected schemas of your CRUD application.
Where I'm with you, is that you should take care of your boundaries and muddling the line between your Pydantic domain models and your CRUD models will be painful at some point. If your domain model is changing fast compared to the API you're exposing, that could be an issue.
But that's not a "Pydantic in the domain layer" issue, that's a separation of concerns issue.
Often you want your domain models to be structured differently than API models, to make them as convenient/understandable to work with as possible for your use case. If you already have different models, why would you want Pydantic in the domain? Even if they start out the same, this would allow them to more easily evolve to be different. I'm not a python expert, so I could be missing the point on Pydantic, but it seems like its value is at the edges of your application.
I’m far from being an experienced Pythonista, but one thing that really bugs me in Python (and other dynamic languages) is that when I accept an input of some type, like User, I have to wonder if it’s really a User. This is annoying throughout the codebase, not just the API layer. Especially when there are multiple contributors.
The argument against using API models internally is something I agree with but it’s a separate question.
Do you mean like is the User object is a well formed User, or did someone actually give you an int?
As to the first problem, I recommend the Parse don't validate post[0]. The essential idea is stop using god objects that do it all, but use specific types to make contracts on what is known. Separate out concerns so there is an UnvalidatedUser (not serialized and lacking a primary key) and a ValidatedUser (committed to the database, has unique username, etc). Basic type hinting should get you the rest of the way to cleaning up code paths where you get some type certainty.
Somewhat solved by type annotations + a good static type checker, such as pyright (it's 2025, there must be type annotations everywhere), and dynamic cases (very rare, probably due to poor or unfortunate design decisions) can be solved with validators, e.g. the aforementioned Pydantic. This isn't a silver bullet, but it works really well.
Yeah, exactly. There isn't a single good reason not to use type annnotations in Python these days. Yes, they might not be as powerful as TypeScript's type language yet, but they are getting there.
Python has reasonably good types these days. If you were to use pydantic to Marshall stuff from the API and then put type annotations on every method below that it would be pretty bulletproof.
I've been using Python on and off for a few decades and agree. I don't know why you're being downvoted.
I've authored tens of thousands of lines of Python code in that time - both for research tools and for "production".
I use type hints everywhere in the Python I write but it's simply not enough.
This issue is political and not so much technical as Typescript demonstrates how you can add a beautifully orthogonal and comprehensive type system to a dynamic language, thus improving the language's ergonomics and scaleability.
The political aspect is the fact that early Python promoters decided that sanity checking arguments was not "pythonic" and this dogma/ideology has persisted to this day. The only philosophical basis for this position was that that Python offered no support for simple type checking. And apparently if you didn't/don't "appreciate" this philosophy, it reflected poorly on your software engineering abilities or skill with Python.
To be fair, Python isn't the only language of that era, where promoters went to great lengths to invent alternate-reality bubbles to avoid facing the fact that their pet language had some deep flaws - and actually Perl and C++ circles were even worse and more inward facing.
So the "pythonic" approach suggests having functions just accepting anything, whether it makes sense or not, and allowing your code to blow up somewhere deep in some library somewhere - that you probably didn't even know you're using.
So instead of an error like "illegal create_user(name: str) call: name should be a str but was a float", it's apparently better (more "pythonic") to not provide such feed-back to users of your functions and instead allow them to have to deal with an exception in a 40 line stack trace with something like "illegal indexing of float by dict object" in some source file library your users haven't even heard of.
> This issue is political and not so much technical as Typescript demonstrates how you can add a beautifully orthogonal and comprehensive type system to a dynamic language, thus improving the language's ergonomics and scaleability.
How does typescript demonstrate this?
I don't see how typescript is different from Python in this regard. Typescript compiles down to JavaScript, which like Python is dynamic. So at runtime nothing prevents you from calling a function written to take ints with strings. In fact, JavaScript has even worse typing than Python, so I imagine it's worse.
Typescript demonstrates that you can have a fully dynamic language but also provide a type system which can support as much (or as little) type checking as is appropriate or desired.
I can take my chances in Typescript by just using 'any' everywhere but if I do want to constrain variables to particular types, the compiler will fully support me and provide guarantees about the restrictions I've specified via the type signatures.
It sounds exactly the same as Python with pyright or mypy. A novel approach is taken by Elixir, which actually makes sure the compile time types match runtime types. That is , you can't call a function with an incorrect input type at runtime https://elixir-lang.org/blog/2023/09/20/strong-arrows-gradua...
The problem with that small tidbit is that it immediately sets your type system to go down the path of Java and Typescript (which we all mock for it's crazy type systems and examples such as IImplementsFactoryAbstractMethodThingVirtual classes). This is not the python way, and is frankly part of its secret sauce (if you ask me).
And yes I include Typescript with Java there because it has it's own version of the Java class ecosystem hell, we just don't notice it yet. Look at any typescript library that's reasonably complicated and try to deduce what some of those input types actually do or mean - be honest. Heck a few weeks back someone posted how they solved a complicated combinatorial problem using Typescripts type system alone.
I don't get your point or what it has to do with the "pythonic" suggestion that you don't check early for incorrect state/arguments?
Any language, including Python, which supports the concept of a class will allow you to write a class called IImplementsFactoryAbstractMethodThingVirtual. And none of Java, C++, Python, Typescript, CLISP, etc. prevent you from building or designing an overly complex class model.
It has nothing to do with ensuring a particular argument to a function or method is of the expected type, which was my point - the "pythonic" way is to NOT check.
I also do not understand your example of Typescript. Compared to the last time I worked on a Javascript code-base, recently having to work with a Typescript code-base was a joy including reading library code. Stripping out the types gives you Javascript - surely you are not claiming that it makes it easier to read libraries with the type signatures removed?
Whether a library is complex or not is completely orthogonal. Doing it regularly, navigating the source of an overly-complex Python library is no fun either.
I’m curious, what do you mean by having to wonder if it’s really a User? It’s optional in Python but you can use type annotations and then the type checker will shout at you for passing something that’s not a User instance to things that expect one.
Strongly decoupling API implementation and, well, actual implementation, is pretty key when you start to evolve an application. People often focus on 'the design' like there is one perfect design for an application for its lifetime when in really it is about how easy the mass of code you have is able to change for the next feature/fix/change and not turn into a hairball of code. That perfect initial design where the internal and external objects are exactly the same generally works well for 1.0, but not 1.1 or 2.0 so strongly decoupling the API implementation is a good general practice if you think your code will continue to evolve.
The reasoning given here is more academic than anything else. I'm not seeing any actual problem here though. Perhaps this could show how this is bad. Until then, I don't think this excessive duplication and layering is necessary, and is more of a liability itself.
> That’s when concerns like loose coupling and separation of responsibilities start to matter more.
In the Django world I have gotten very frustrated at people rushing to go from DRFs serializers to Django Ninja + Pydantic.
You have way less in terms of tools to actually provide nice straightforward APIs. I appreciate that Pydantic gives you type safety but at one point the actual ease of writing correct code goes beyond type safety
Just real straightforward stuff around dealing with loading in user input becomes a whole song and dance because Pydantic is an extremely basic validation thing… the hacks in DRF like request contexts are useful!
I’ve seen many projects do this and it feels like such a step back in offering simple-to-maintain APIs. Maybe I’m just biased cuz I “get” DRF (and did lose half a day recently to weird DRF behavior…)
Shrug, I find them more helpful than Pydantic models for lots of canonical cases.
I have had good success with DRF model serializers in like Django projects with 100+ apps (was the sprawling nature of the apps itself a problem? Sure, maybe). Got the job done
As with anything you gotta built your own wrappers around these things to get value in larger projects though
This is the Javascript hipster effect. FastAPI and Pydantic are pushed heavily because of their fancy docs page and the evangelism which thrives on reinventing the wheel. So we are all now stuck with everything being Pydantic this Pydantic that, instead of existing frameworks which are frankly better.
To be fair I do think that Pydantic leaning into the type annotation story is nice. If you’re really going lean or performant the restrictions work well in your favor. Just like… for the bog standard B2B SaaS the expressivity tradeoff just doesn’t feel worth it.
In a more just world pythons typing story was closer to typescript’s and we could have a fully realized idea like it that supports the asymmetric nature of serializing/deserializing and offers nice abstractions through the stack
Right now Pydantic for me is like “you can validate a straightforward data structure! Now it’s up to you to actually build up a useful data structure from the straightforward one”. Other tools give me both in one go. At the cost of safety (that you can contain, but you gotta do it right)
It's a tough answer because we have had years of artificially-pumped support and development and ecosystem growth of Pydantic.
But if I had to roll the clock back I'd recommend marshmallow and that entire ecosystem. It's definitely way less bloated than Pydantic currently, and only lacks some features. Beyond that, just use plain-old dataclasses.
Pydantic (the company) owns logfire, a logging service. There’s a lot of money in logging/observability. The pydantic library itself is not monetizable, as you indicate.
Wow really I had no idea. This rabbit hole goes deeper then I expected!
In 2022, the project evolved into a commercial entity called Pydantic Services Inc., founded by Samuel Colvin and Adrian Garcia Badaracco, to build products around the open-source library. The company raised $4.7 million in seed funding in February 2023, led by Sequoia Capital, with participation from Partech, Irregular Expressions, and other investors. This was followed by a $12.5 million Series A round in October 2024, again led by Sequoia Capital and including Partech Partners, bringing the total funding to approximately $17.2 million across rounds. The Series A funding coincided with the launch of Pydantic Logfire, a commercial observability platform for backend applications, aimed at expanding beyond the core open-source validation framework. As of mid-2025, no additional funding rounds have been publicly reported.
i have to confess , i use Protobuffs for everything. They convert to pure python (a la dataclass), to json strings and to binary strings, so i literally shove it everywhere : network, logic, disk.
BUT when doing heavy computation (c++, not python !) don't forget to convert to plain vectors, Protobuffs are horribly inefficient
A: You control both ends of the serialized line, or: B: The other end of the line expects protobufs.
There are many [de]serialization scenarios where you are interfacing with a third party API. (HTTP/JSON web API, a given IC's comm protocol as defined in its datasheet etc)
I don't understand then. Here is my mental model; as described, you can see why I'm confused:
JSON: UTF-8 Serialization format, where brackets, commas, fields represented by strings etc.
Protobuf: Binary serialization format that makes liberal use of varints, including to define field number, lengths etc. Kind of verbose, but not heinous.
So, you could start and end your journey with the same structs and serialize with either. If you try to send a protobuf to an HTTP API that expects JSON, it won't work! If you try to send JSON to an ESP32 running ESP-Hosted, likewise.
ah I think I understand your confusion. The proto package allows conversion between the binary messages and their json equivalent. So you can still use the proto objects in your code , only to send out json when required
Just have 1 input type and 1 output type. You don’t need more data types in between.
If pydantic packages valid input, use that for as long as you can.
Loading stuff from db, you need validation again, either go from binary response to 1 validated type with pydantic, or ORM object that already validates.
Then stop having any extra data types.
Keeping pydantic only at the edge and then abandoning it by reshaping it into another data type is a weird exercise. It might make sense if you have N input types and 1 computation flow but I don’t see how in the world of duck typing you’d need an extra unified data type for that.
> Loading stuff from db, you need validation again, either go from binary response to 1 validated type with pydantic, or ORM object that already validates.
You shouldn’t need to validate data coming from the database. IMO, this is a natural consequence of teams abandoning traditional RDBMS best practices like normalization and constraints in favor of heavy denormalization, and strings for everything.
If you strictly follow 3NF (or higher, when necessary), it is literally impossible to have referential integrity violations. There may be some other edge cases that can be difficult to enforce, but a huge variety of data bugs simply don’t exist if you don’t treat the RDBMS as a dumb KV store.
I'll go further and elsewhere at once: APIs should not present nested objects but normalised data. It enables clients to easily to lay out their display structure independently of API resource schemas and eases out tricks like diffing between subsequent responses, pulling updates or requesting new data by passing IDs and timestamps of already known data, etc. API normalised data obviously shouldn't correspond to DB normalised data. Nested objects are superior only for use with jq.
> APIs should not present nested objects but normalised data
If something is nested, let it be represented as a nested structure. I find flattening causes more mental overhead. If something is too flat, it becomes less obvious what data is exactly necessary to do what you want to do
I still don’t quite get the motivation for “don’t use pydantic except at border” — it sounds like it’s “you don’t need it”, which might be true. But then adds dacite to translate between pydantic at the border and python objects internally. What exactly is wrong with pydantic internally too?
Could be wrong, never used Pydantic. But looking it up it seems like it's used for validation/typing of external data. Sounds like it's mainly going to be doing schema validations. So, your data arrives at your domain layer and you have guarantees based on Pydantic's validations. At this point, your validations are going to semantic in nature based on your domain; what value is Pydantic bringing?
An easier/moderate approach: make a proper base DTO model, which can be extended by validators, such as Pydantic, and the db model is the Domain is just whatever an ORM offers/dataclasses.
I use pyrsistent in the domain, and pydantic for tricky validation at the boundary. Pyrsistent is a pretty neat solution if you want immutable data structures, with some nice methods for working with nested records.
>The less your core logic depends on specific tools or libraries, the easier it becomes to maintain, test, or even replace parts of your system without causing everything to break.
It seems like the author doesn't like depending on `pydantic`, simply because it's a third party dependency. To solve this they introduce another, but more obscure, third party dependency called `dacite`, that converts `pydantic` to `dataclasses`.
It's more likely that `dacite` is going to break your application, than `pydantic`, a library used by millions of users in huge projects, ever will. Not to mention the complexity overhead introduced by this non sense mapping.
Not simply. This is one one of the most important reasons NOT to propagate something through your code. How many millions codebases use it is irrelevant.
>How many millions codebases use it is irrelevant.
It is relevant, because it speaks to the reliability of the dependency. `pydantic` has 24.7k Github stars and was last updated 52 minutes ago.
Adding a random dependency `dacite`, which has 1.9k Github stars, no one has ever heard of, and was last updated 4 months ago, introduces way more complexity and sources of instabilities than propagating `pydantic`.
More updates means more changes and more instability. I have never seen dacite, but it’s pretty easy for a small library to just be complete. If it’s complete, why the need for constant changes?
Actually Pydantic could be extremely useful if used in conjunction with SQLAlchemy, check out the SQLModel library, from the very same creators of Pydantic.
Having used sqlmodel recently for a project, I was underimpressed. Documentation was sparse, I found myself going to the source code to figure out how to solve problems I ran into, and I ended up dropping into sqlalchemy a lot more than I wanted. I think the idea is sound, but the code is hard to follow, and there are a lot of missing common cases.
And that's why it is key in your architecture to differentiate between Data Transfer Objects (DTOs) or Models on one hand which has values which can and actually must be validated when they come from the outside, and Domain Entities / Value Objects on the other. Even though the DTO and Domain Entity might look similar.
I'm sure that the pydantic guys had a reason to rename .dict to .model_dump. This single change caused so much grieve when upgrading to pydantic2.1 The very idea of unnecessary breaking changes is a big reason not to over rely on pydantic, tbh.
1 we were using .dict to introduce pydantic in the mix of other entity schemes and handling this change later was a significant pain in the neck. Some python introspection mechanism that can facilitate deep object recasting might've been nice if possible.
Yep, and when you are done migrating, you need to remove this, and there is pydantic3 coming. Keeping in mind the number of libraries nad microservices involved, search and replace was the easier option.
PS: thank you, I can think on my own and even failing that, chat gpt is not in closed beta any more.
I think this article misses the main point by focusing on removing pydantic. The main point is that you should convert external types as soon as possible to decouple them from the rest of your code. Whether this involves pydantic or something else is not really important I guess
"Why are there no laws requiring device manufacturers to open source all software and hardware for consumer devices no longer sold?"
I think it's because people (us here included) love to yap and argue about problems instead of just implementing them and iterating on solutions in an organized manned. A good way these days to go about it would be to forego the facade of civility and use your public name to publicly tell your politician to just fuck it, do it it bad, and have plan to UNfuck after you fuck it up, until the fucking problem is fucking solved.
Same goes for UBI and other semi-infuriating issues that seem to (and probably do) have obvious solutions that we just don't try.
Oh boy, I love making adding a trivial nullable column take even more code and require even more tests and have even more places I forgot to update which results in a field being nullable somewhere.
And don't forget, you get to duplicate this shit on the frontend too.
And what is a modern app if we aren't doing event-driven microservice architecture? That won't scale!!!! So now I also have to worry about my Avro schema/Protobufs/whateverthefuck. But how does everyone else know about the schema? Avro schema registry! Otherwise we won't know what data is on the wire!
And so on and so on into infinity until I have to tell a PM that adding a column will take me 5 pull requests and 8 deploys amounting to several days of work.
Congratulations on making your own small contribution to a fucking ridiculous clown fiesta.
reply