浪尖以案例聊聊spark 3.0 sql的动态分区裁剪
浪尖聊大数据
共 1943字,需浏览 4分钟
· 2021-06-22
麻烦大家给浪尖投个票,主要是目前公共号名称太单一了,以后的分享的知识会扩充到数据智能,用户画像等领域。
本文主要讲讲,spark 3.0之后引入的动态分区裁剪机制,这个会大大提升应用的性能,尤其是在bi等场景下,存在大量的where条件操作。
动态分区裁剪比谓词下推更复杂点,因为他会整合维表的过滤条件,生成filterset,然后用于事实表的过滤,从而减少join。当然,假设数据源能直接下推执行就更好了,下推到数据源处,是需要有索引和预计算类似的内容。
SELECT * FROM Sales WHERE day_of_week = ‘Mon’
![](https://filescdn.proginn.com/c136010f7ca9ad83f2cab11b39173deb/99e951cddb2784bcfa6411dcd6f81b59.webp)
![](https://filescdn.proginn.com/3a32bbcbefc1dbfa575e83a77b1e6820/5bcb34a3e47ff685d0d74766c418f478.webp)
SELECT * FROM Sales JOIN Date WHERE Date.day_of_week = ‘Mon’;
![](https://filescdn.proginn.com/d409385c4fc64b78e535f6f619229746/aca8a0e5c32fa2fabebb72aa25e8c70f.webp)
![](https://filescdn.proginn.com/57163c84807fe4c9fefd306a9445e603/69d054d08430efcd33e1426c6061ee4e.webp)
![](https://filescdn.proginn.com/3a60db92f38ade54fe688e3d8d11b174/92ee61b60b0b55ce6ac80b5604f0403e.webp)
![](https://filescdn.proginn.com/3aee89f52ccf01a5d26a22c1a00c3de6/768cbf44bacc62e703faf8ced096401e.webp)
![](https://filescdn.proginn.com/9fe14360b2bd2b27bd1af7c4ed490ac1/ee9f238f2e07f40b96b37ffeb3d84ff3.webp)
![](https://filescdn.proginn.com/6664e91f5b4fa78d257964cf73fadea2/c9e4c6d7d9e03d230cda70b89c2ca5ae.webp)
![](https://filescdn.proginn.com/45db6ce46670795058d28666a78c4d93/0ef23d0b2671ef78ce6ffc259305cd22.webp)
![](https://filescdn.proginn.com/321ff283eab65ca6bdb7da0e5cc17527/bf3b41df049de6edbd1f7ea1ea5139a2.webp)
![](https://filescdn.proginn.com/4f2dfde86c69c78bc37ee4984b1acadb/942791b7d164c880a994df6ee44012d1.webp)
评论