9 eicu数据安装
9.1 下载数据
- 登录账号并下载:https://physionet.org/login/,账号在quickquery查找。
- 直接下载地址:https://physionet.org/content/eicu-crd/2.0/
- terminal下载
wget -r -N -c -np --user hulinhui --ask-password https://physionet.org/files/eicu-crd/2.0/
开始下载时间:2024-02-02 09:32:00 CST
FINISHED 2024-02-02 12:02:13
Total wall clock time: 2h 30m 49s
Downloaded: 34 files, 5.1G in 2h 30m 28s (594 KB/s)
9.2 装载数据
9.2.1 下载github code
github网址:https://github.com/mit-lcp/eicu-code
export GH_TOKEN=your_gh_token
gh repo clone MIT-LCP/eicu-code
9.2.2 安装postgres数据库
- 导航至
eicu-code/build-db/postgres/README.md
,查看装载方法。 - 参阅渲染后版本
9.2.3 成功实践
- 创建
eicu_crd
模式(SCHEMA)
psql "dbname=hulihuihong user=hulihuihong options=--search_path=eicu_crd" -v ON_ERROR_STOP=1 -c "CREATE SCHEMA eicu_crd AUTHORIZATION hulihuihong;"
- 查看文件是否完整
cd eicu-code/build-db/postgres
make eicu-check-gz datadir=/home/hulihuihong/Books/phd/physionet.org/files/eicu-crd/2.0/
- 修改Makefile的默认设置
make initialize DBUSER=myusername DBPASS=mypassword
这个方法会报错,因为内置代码限定了postgres用户的原因。采用手动更改:
DBNAME := hulihuihong
DBUSER := hulihuihong
DBPASS := mypassowrd
DBSCHEMA := eicu_crd # 这个不用改
- 装载数据并添加索引,检测完整性
make eicu-gz datadir=/home/hulihuihong/Books/phd/physionet.org/files/eicu-crd/2.0/
开始载入时间:2024-02-02 22:41:30 CST
完成载入时间:2024-02-02 23:11:15 CST
- 查看模式(schema)
psql
\dn
List of schemas
Name | Owner
-----------------+-------------
eicu_crd | hulihuihong
mimiciv_derived | hulihuihong
mimiciv_hosp | hulihuihong
mimiciv_icu | hulihuihong
public | postgres
(5 rows)
- 查看模型里的数据表
\dt eicu_crd.*
List of relations
Schema | Name | Type | Owner
----------+---------------------------+-------+-------------
eicu_crd | admissiondrug | table | hulihuihong
eicu_crd | admissiondx | table | hulihuihong
eicu_crd | allergy | table | hulihuihong
eicu_crd | apacheapsvar | table | hulihuihong
eicu_crd | apachepatientresult | table | hulihuihong
eicu_crd | apachepredvar | table | hulihuihong
eicu_crd | careplancareprovider | table | hulihuihong
eicu_crd | careplaneol | table | hulihuihong
eicu_crd | careplangeneral | table | hulihuihong
eicu_crd | careplangoal | table | hulihuihong
eicu_crd | careplaninfectiousdisease | table | hulihuihong
eicu_crd | customlab | table | hulihuihong
eicu_crd | diagnosis | table | hulihuihong
eicu_crd | hospital | table | hulihuihong
eicu_crd | infusiondrug | table | hulihuihong
eicu_crd | intakeoutput | table | hulihuihong
eicu_crd | lab | table | hulihuihong
eicu_crd | medication | table | hulihuihong
eicu_crd | microlab | table | hulihuihong
eicu_crd | note | table | hulihuihong
eicu_crd | nurseassessment | table | hulihuihong
eicu_crd | nursecare | table | hulihuihong
eicu_crd | nursecharting | table | hulihuihong
eicu_crd | pasthistory | table | hulihuihong
eicu_crd | patient | table | hulihuihong
eicu_crd | physicalexam | table | hulihuihong
eicu_crd | respiratorycare | table | hulihuihong
eicu_crd | respiratorycharting | table | hulihuihong
eicu_crd | treatment | table | hulihuihong
eicu_crd | vitalaperiodic | table | hulihuihong
eicu_crd | vitalperiodic | table | hulihuihong
(31 rows)
- 查看表占据的空间和行数
SELECT schemaname AS "Schema Name",
relname AS "Table Name",
n_live_tup AS "Number of Rows",
pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS "Total Size"
FROM pg_stat_user_tables
WHERE schemaname = 'eicu_crd'
ORDER BY pg_total_relation_size(schemaname || '.' || relname) DESC;
Schema Name | Table Name | Number of Rows | Total Size
-------------+---------------------------+----------------+------------
eicu_crd | vitalperiodic | 146608245 | 17 GB
eicu_crd | nursecharting | 151613417 | 14 GB
eicu_crd | lab | 39120770 | 4727 MB
eicu_crd | vitalaperiodic | 25068372 | 2865 MB
eicu_crd | nurseassessment | 15602197 | 2557 MB
eicu_crd | intakeoutput | 12024190 | 1907 MB
eicu_crd | respiratorycharting | 20165942 | 1583 MB
eicu_crd | physicalexam | 9211101 | 1517 MB
eicu_crd | nursecare | 8308346 | 1315 MB
eicu_crd | medication | 7303894 | 776 MB
eicu_crd | treatment | 3688386 | 562 MB
eicu_crd | diagnosis | 2710569 | 448 MB
eicu_crd | careplangeneral | 3115120 | 387 MB
eicu_crd | note | 2253525 | 360 MB
eicu_crd | infusiondrug | 4803770 | 344 MB
eicu_crd | admissiondrug | 874931 | 324 MB
eicu_crd | pasthistory | 1149180 | 273 MB
eicu_crd | admissiondx | 626858 | 131 MB
eicu_crd | allergy | 251949 | 131 MB
eicu_crd | apachepatientresult | 297064 | 85 MB
eicu_crd | respiratorycare | 865381 | 79 MB
eicu_crd | patient | 200859 | 76 MB
eicu_crd | careplangoal | 504139 | 68 MB
eicu_crd | careplancareprovider | 502765 | 59 MB
eicu_crd | apachepredvar | 171177 | 43 MB
eicu_crd | apacheapsvar | 171177 | 39 MB
eicu_crd | microlab | 16996 | 1512 kB
eicu_crd | careplaninfectiousdisease | 8056 | 1088 kB
eicu_crd | careplaneol | 1433 | 200 kB
eicu_crd | customlab | 1082 | 128 kB
eicu_crd | hospital | 208 | 56 kB
(31 rows)
- 查询mimic和eicu的数据容量
SELECT CONCAT(ROUND(SUM(pg_total_relation_size(c.oid)) / (1024.0 * 1024.0 * 1024.0), 3), 'GB') AS total_size
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE n.nspname IN ('mimic_hosp', 'mimic_icu', 'eicu_crd') AND c.relkind = 'r';
total_size
-------------
51.291GB
(1 row)