Overview

Dataset statistics

Number of variables29
Number of observations2043
Missing cells24247
Missing cells (%)40.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.5 MiB
Average record size in memory3.3 KiB

Variable types

Categorical24
Numeric3
Boolean2

Alerts

min_length has constant value "8.0"Constant
encoder_no_repeat_ngram_size has constant value "4.0"Constant
num_beam_groups has constant value "1.0"Constant
early_stopping has constant value "True"Constant
do_sample has constant value "False"Constant
format has constant value "paragraph"Constant
penalty_alpha has constant value "0.6"Constant
runtime has constant value "45:30"Constant
GAUNTLET_PATH has a high cardinality: 2043 distinct valuesHigh cardinality
summary has a high cardinality: 1960 distinct valuesHigh cardinality
date has a high cardinality: 96 distinct valuesHigh cardinality
no_repeat_ngram_size is highly imbalanced (68.2%)Imbalance
repetition_penalty is highly imbalanced (80.7%)Imbalance
length_penalty is highly imbalanced (85.8%)Imbalance
min_length has 324 (15.9%) missing valuesMissing
max_length has 81 (4.0%) missing valuesMissing
no_repeat_ngram_size has 305 (14.9%) missing valuesMissing
encoder_no_repeat_ngram_size has 305 (14.9%) missing valuesMissing
repetition_penalty has 324 (15.9%) missing valuesMissing
num_beams has 233 (11.4%) missing valuesMissing
num_beam_groups has 324 (15.9%) missing valuesMissing
length_penalty has 354 (17.3%) missing valuesMissing
early_stopping has 314 (15.4%) missing valuesMissing
do_sample has 324 (15.9%) missing valuesMissing
date has 243 (11.9%) missing valuesMissing
length has 1962 (96.0%) missing valuesMissing
format has 1962 (96.0%) missing valuesMissing
extractiveness has 1962 (96.0%) missing valuesMissing
temperature has 1962 (96.0%) missing valuesMissing
token_batch_length has 1700 (83.2%) missing valuesMissing
penalty_alpha has 1962 (96.0%) missing valuesMissing
top_k has 1962 (96.0%) missing valuesMissing
batch_stride has 1791 (87.7%) missing valuesMissing
max_len_ratio has 1943 (95.1%) missing valuesMissing
directory-topic-tag has 1943 (95.1%) missing valuesMissing
runtime has 1967 (96.3%) missing valuesMissing
GAUNTLET_PATH is uniformly distributedUniform
summary is uniformly distributedUniform
source_doc_filename is uniformly distributedUniform
source_doc_id is uniformly distributedUniform
GAUNTLET_PATH has unique valuesUnique

Reproduction

Analysis started2023-05-24 07:04:02.234426
Analysis finished2023-05-24 07:04:06.861090
Duration4.63 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

GAUNTLET_PATH
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct2043
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size345.0 KiB
SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary.txt
 
1
long-t5/long-t5-base-booksci-summary-v1/bs-16384-nb-8/script_strangersonatrain_summary.txt
 
1
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_PAPER_dall-e-2-annotated__summary.txt
 
1
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary.txt
 
1
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txt
 
1
Other values (2038)
2038 

Length

Max length232
Median length183
Mean length115.86686
Min length49

Characters and Unicode

Total characters236716
Distinct characters61
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2043 ?
Unique (%)100.0%

Sample

1st rowSHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary.txt
2nd rowSHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2_summary.txt
3rd rowSHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASRnlp_law_lecture_week_1_v_2_c_transcription_1_summary.txt
4th rowSHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASRnlp_law_lecture_week_2_v_2_c_transcription_2_summary.txt
5th rowSHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary.txt

Common Values

ValueCountFrequency (%)
SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-16384-nb-8/script_strangersonatrain_summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_PAPER_dall-e-2-annotated__summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_ML4HLecture05-NLP.pptx__summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_ML4HLecture04RepresentationLearning.pptx__summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/OCR_ML4HLecture02image__summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/Emie_dissertation_cleansed_summary.txt 1
 
< 0.1%
long-t5/long-t5-base-booksci-summary-v1/bs-8192-nb-16/ASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary.txt 1
 
< 0.1%
Other values (2033) 2033
99.5%

Length

2023-05-24T09:04:06.938434image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
476
 
7.9%
2022 221
 
3.6%
via 221
 
3.6%
chomsky 212
 
3.5%
cogvideo 111
 
1.8%
large-scale 111
 
1.8%
pretraining 111
 
1.8%
al 111
 
1.8%
et 111
 
1.8%
for 111
 
1.8%
Other values (1968) 4267
70.4%

Most occurring characters

ValueCountFrequency (%)
- 17654
 
7.5%
a 15514
 
6.6%
e 14926
 
6.3%
t 14442
 
6.1%
s 12967
 
5.5%
r 11615
 
4.9%
n 10481
 
4.4%
m 10414
 
4.4%
_ 8955
 
3.8%
o 8528
 
3.6%
Other values (51) 111220
47.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 160105
67.6%
Uppercase Letter 19122
 
8.1%
Dash Punctuation 17654
 
7.5%
Decimal Number 17207
 
7.3%
Other Punctuation 9653
 
4.1%
Connector Punctuation 8955
 
3.8%
Space Separator 4020
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 15514
 
9.7%
e 14926
 
9.3%
t 14442
 
9.0%
s 12967
 
8.1%
r 11615
 
7.3%
n 10481
 
6.5%
m 10414
 
6.5%
o 8528
 
5.3%
u 7178
 
4.5%
l 7117
 
4.4%
Other values (14) 46923
29.3%
Uppercase Letter
ValueCountFrequency (%)
L 2012
10.5%
R 1902
9.9%
E 1897
9.9%
O 1587
 
8.3%
S 1500
 
7.8%
C 1466
 
7.7%
A 1265
 
6.6%
T 1151
 
6.0%
M 1085
 
5.7%
D 1044
 
5.5%
Other values (12) 4213
22.0%
Decimal Number
ValueCountFrequency (%)
5 2674
15.5%
1 2491
14.5%
2 2454
14.3%
8 2090
12.1%
4 2074
12.1%
6 1797
10.4%
3 1297
7.5%
0 1236
7.2%
9 1094
6.4%
Other Punctuation
ValueCountFrequency (%)
/ 6668
69.1%
. 2553
 
26.4%
, 432
 
4.5%
Dash Punctuation
ValueCountFrequency (%)
- 17654
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 8955
100.0%
Space Separator
ValueCountFrequency (%)
4020
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 179227
75.7%
Common 57489
 
24.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 15514
 
8.7%
e 14926
 
8.3%
t 14442
 
8.1%
s 12967
 
7.2%
r 11615
 
6.5%
n 10481
 
5.8%
m 10414
 
5.8%
o 8528
 
4.8%
u 7178
 
4.0%
l 7117
 
4.0%
Other values (36) 66045
36.8%
Common
ValueCountFrequency (%)
- 17654
30.7%
_ 8955
15.6%
/ 6668
 
11.6%
4020
 
7.0%
5 2674
 
4.7%
. 2553
 
4.4%
1 2491
 
4.3%
2 2454
 
4.3%
8 2090
 
3.6%
4 2074
 
3.6%
Other values (5) 5856
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 236716
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 17654
 
7.5%
a 15514
 
6.6%
e 14926
 
6.3%
t 14442
 
6.1%
s 12967
 
5.5%
r 11615
 
4.9%
n 10481
 
4.4%
m 10414
 
4.4%
_ 8955
 
3.8%
o 8528
 
3.6%
Other values (51) 111220
47.0%

file_name
Categorical

Distinct41
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size224.1 KiB
OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txt
 
106
OCR_ML4HLecture02image__summary.txt
 
104
OCR_ML4HLecture04RepresentationLearning.pptx__summary.txt
 
104
OCR_ML4HLecture05-NLP.pptx__summary.txt
 
104
OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary.txt
 
104
Other values (36)
1521 

Length

Max length132
Median length66
Mean length55.266275
Min length26

Characters and Unicode

Total characters112909
Distinct characters58
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary.txt
2nd rowASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2_summary.txt
3rd rowASRnlp_law_lecture_week_1_v_2_c_transcription_1_summary.txt
4th rowASRnlp_law_lecture_week_2_v_2_c_transcription_2_summary.txt
5th rowASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary.txt

Common Values

ValueCountFrequency (%)
OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txt 106
 
5.2%
OCR_ML4HLecture02image__summary.txt 104
 
5.1%
OCR_ML4HLecture04RepresentationLearning.pptx__summary.txt 104
 
5.1%
OCR_ML4HLecture05-NLP.pptx__summary.txt 104
 
5.1%
OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary.txt 104
 
5.1%
The Most Dangerous Game--Richard Connell_summary.txt 104
 
5.1%
script_sunsetblvd._summary.txt 103
 
5.0%
script_frozendisney_summary.txt 103
 
5.0%
gpt_peter_testing_group_exemplars_summary.txt 103
 
5.0%
script_findingnemo_summary.txt 103
 
5.0%
Other values (31) 1005
49.2%

Length

2023-05-24T09:04:07.066067image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
476
 
7.9%
via 221
 
3.6%
2022 221
 
3.6%
chomsky 212
 
3.5%
asr-whisper-rpunctuated_noam 200
 
3.3%
cogvideo 111
 
1.8%
pretraining 111
 
1.8%
for 111
 
1.8%
text-to-video 111
 
1.8%
generation 111
 
1.8%
Other values (67) 4178
68.9%

Most occurring characters

ValueCountFrequency (%)
t 9695
 
8.6%
_ 8697
 
7.7%
a 7680
 
6.8%
e 6981
 
6.2%
r 6687
 
5.9%
n 5899
 
5.2%
m 5414
 
4.8%
s 5370
 
4.8%
4020
 
3.6%
i 3935
 
3.5%
Other values (48) 48531
43.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 79010
70.0%
Uppercase Letter 10604
 
9.4%
Connector Punctuation 8697
 
7.7%
Decimal Number 5585
 
4.9%
Space Separator 4020
 
3.6%
Other Punctuation 2909
 
2.6%
Dash Punctuation 2084
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 9695
12.3%
a 7680
 
9.7%
e 6981
 
8.8%
r 6687
 
8.5%
n 5899
 
7.5%
m 5414
 
6.9%
s 5370
 
6.8%
i 3935
 
5.0%
u 3676
 
4.7%
o 3494
 
4.4%
Other values (14) 20179
25.5%
Uppercase Letter
ValueCountFrequency (%)
R 1724
16.3%
C 1086
10.2%
L 985
9.3%
P 872
8.2%
A 852
 
8.0%
O 654
 
6.2%
M 647
 
6.1%
S 626
 
5.9%
E 544
 
5.1%
T 441
 
4.2%
Other values (10) 2173
20.5%
Decimal Number
ValueCountFrequency (%)
2 1515
27.1%
1 851
15.2%
0 761
13.6%
6 636
11.4%
3 535
 
9.6%
4 437
 
7.8%
5 426
 
7.6%
8 212
 
3.8%
9 212
 
3.8%
Other Punctuation
ValueCountFrequency (%)
. 2477
85.1%
, 432
 
14.9%
Connector Punctuation
ValueCountFrequency (%)
_ 8697
100.0%
Space Separator
ValueCountFrequency (%)
4020
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2084
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 89614
79.4%
Common 23295
 
20.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 9695
 
10.8%
a 7680
 
8.6%
e 6981
 
7.8%
r 6687
 
7.5%
n 5899
 
6.6%
m 5414
 
6.0%
s 5370
 
6.0%
i 3935
 
4.4%
u 3676
 
4.1%
o 3494
 
3.9%
Other values (34) 30783
34.4%
Common
ValueCountFrequency (%)
_ 8697
37.3%
4020
17.3%
. 2477
 
10.6%
- 2084
 
8.9%
2 1515
 
6.5%
1 851
 
3.7%
0 761
 
3.3%
6 636
 
2.7%
3 535
 
2.3%
4 437
 
1.9%
Other values (4) 1282
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 112909
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 9695
 
8.6%
_ 8697
 
7.7%
a 7680
 
6.8%
e 6981
 
6.2%
r 6687
 
5.9%
n 5899
 
5.2%
m 5414
 
4.8%
s 5370
 
4.8%
4020
 
3.6%
i 3935
 
3.5%
Other values (48) 48531
43.0%

summary
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct1960
Distinct (%)95.9%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
<no_saic_raw_sp><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4> <no_saic_raw_sp><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4>
 
6
you're nothing to me, little bitch.
 
4
I'm a sniper in the U. S. Navy Seals and I have been involved in many secret raids against Al-Qaeda. I've killed over 300 confirmed targets.
 
4
we present a novel approach to enhance music signals by combining recent advances in conditional image-synthesis and voccoding. We find that our approach achieves an improved perception of music than many state-of the-art methods for audio enhancement. Additionally, we compare the subjective hearing test scores with commonly used audio quality measures and suggest that these metrics correlate well with human perception.
 
3
The US Marine Corps has the most powerful weaponry in the world. It is possible to kill a person with ease. ---
 
3
Other values (1955)
2023 

Length

Max length31506
Median length2490
Mean length2182.3671
Min length15

Characters and Unicode

Total characters4458576
Distinct characters122
Distinct categories17 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1894 ?
Unique (%)92.7%

Sample

1st rowThere's lots of interesting things to say about language, but I don't think it's as simple as you think. There's lots of interesting things about language, but if you really want to understand it, you need to look at the whole picture. ---
2nd rowThere's no such thing as a simple language. There's more than one way to solve ATB. I think you're asking the wrong question. ---
3rd rowif you don't want to read the whole thing, just skip it. I'm sorry for the wall of text. ---
4th rowI think it's okay to ask questions about what you want to do in the course. I'm a bit of a nerd. ---
5th rowI'm not sure if this is the right subreddit to post this. I'm going to finish it in person. ---

Common Values

ValueCountFrequency (%)
<no_saic_raw_sp><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4> <no_saic_raw_sp><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4><sep_4> 6
 
0.3%
you're nothing to me, little bitch. 4
 
0.2%
I'm a sniper in the U. S. Navy Seals and I have been involved in many secret raids against Al-Qaeda. I've killed over 300 confirmed targets. 4
 
0.2%
we present a novel approach to enhance music signals by combining recent advances in conditional image-synthesis and voccoding. We find that our approach achieves an improved perception of music than many state-of the-art methods for audio enhancement. Additionally, we compare the subjective hearing test scores with commonly used audio quality measures and suggest that these metrics correlate well with human perception. 3
 
0.1%
The US Marine Corps has the most powerful weaponry in the world. It is possible to kill a person with ease. --- 3
 
0.1%
The narrator tells the audience that he's been training as a sniper for the U.S. Navy and has killed hundreds of enemy soldiers. He promises to "wipe you the dead kiddo" out of your mouth. 3
 
0.1%
The storm that wipes you out with precision, mark my words. what the f*ck did you just say about me? I can go anywhere, anytime and I'll kill you in seven hundred ways and that doesn't even bother to count the number of ways I can destroy you. you're freaking dead, kid." if only you knew what "clever," comment was going to bring down on you, kid. but you don't, so now you've paying the price. you are fucking dead, child. i will hit you all over you; you will drown your sorrow in it. this is a vicious, bloodthirsty, animalistic monster. he will have you know that by the time this chapter ends, you'll have paid the price for saying these things to him over the internet. you won't get away with it. you killed the little punk.You're gonna die, kid - you're... dead, dad.If only you were able to imagine how awful this whole thing is going to be and how much damage it's going to do to you, then you wouldn't be such a moron. you'd be totally screwed. 3
 
0.1%
A person threatens someone who insulted them online, claiming to be a highly trained Navy SEAL with access to a network of spies and the entire arsenal of the US Marine Corps. They vow to kill the person in over 700 ways and make them suffer for their comment. 3
 
0.1%
The sniper tells the kid that he's going to kill him in seven hundred ways. --- 3
 
0.1%
A strategy to integrate a deep learning system into the clinical workflow. --- 3
 
0.1%
Other values (1950) 2008
98.3%

Length

2023-05-24T09:04:07.213107image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 48901
 
6.3%
to 29387
 
3.8%
of 22874
 
2.9%
and 20684
 
2.7%
a 20492
 
2.6%
in 12531
 
1.6%
is 11649
 
1.5%
he 10784
 
1.4%
that 10572
 
1.4%
for 6715
 
0.9%
Other values (23567) 583201
75.0%

Most occurring characters

ValueCountFrequency (%)
771616
17.3%
e 442639
 
9.9%
t 324077
 
7.3%
a 274843
 
6.2%
o 264412
 
5.9%
n 253954
 
5.7%
s 245698
 
5.5%
i 241682
 
5.4%
r 204655
 
4.6%
h 197047
 
4.4%
Other values (112) 1237953
27.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3423698
76.8%
Space Separator 771710
 
17.3%
Uppercase Letter 115789
 
2.6%
Other Punctuation 105255
 
2.4%
Decimal Number 13305
 
0.3%
Control 11648
 
0.3%
Dash Punctuation 8490
 
0.2%
Close Punctuation 2571
 
0.1%
Open Punctuation 2259
 
0.1%
Math Symbol 1984
 
< 0.1%
Other values (7) 1867
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 442639
12.9%
t 324077
 
9.5%
a 274843
 
8.0%
o 264412
 
7.7%
n 253954
 
7.4%
s 245698
 
7.2%
i 241682
 
7.1%
r 204655
 
6.0%
h 197047
 
5.8%
l 134361
 
3.9%
Other values (23) 840330
24.5%
Uppercase Letter
ValueCountFrequency (%)
T 15733
13.6%
A 11976
 
10.3%
I 8152
 
7.0%
S 7254
 
6.3%
G 6838
 
5.9%
H 6701
 
5.8%
B 6058
 
5.2%
N 5593
 
4.8%
M 5238
 
4.5%
C 5187
 
4.5%
Other values (18) 37059
32.0%
Other Punctuation
ValueCountFrequency (%)
. 45900
43.6%
, 34895
33.2%
' 11580
 
11.0%
" 6975
 
6.6%
: 1807
 
1.7%
; 1198
 
1.1%
# 979
 
0.9%
? 822
 
0.8%
/ 460
 
0.4%
! 333
 
0.3%
Other values (5) 306
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 2593
19.5%
0 2106
15.8%
2 1936
14.6%
4 1641
12.3%
3 1264
9.5%
9 1033
 
7.8%
5 990
 
7.4%
6 730
 
5.5%
8 537
 
4.0%
7 475
 
3.6%
Control
ValueCountFrequency (%)
6832
58.7%
4806
41.3%
€ 5
 
< 0.1%
œ 2
 
< 0.1%
™ 1
 
< 0.1%
 1
 
< 0.1%
 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
> 867
43.7%
< 860
43.3%
= 156
 
7.9%
+ 82
 
4.1%
| 12
 
0.6%
~ 7
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 8486
> 99.9%
3
 
< 0.1%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 1565
60.9%
] 1000
38.9%
} 6
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 1267
56.1%
[ 986
43.6%
{ 6
 
0.3%
Modifier Symbol
ValueCountFrequency (%)
` 4
66.7%
´ 1
 
16.7%
^ 1
 
16.7%
Space Separator
ValueCountFrequency (%)
771616
> 99.9%
  94
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
35
63.6%
20
36.4%
Initial Punctuation
ValueCountFrequency (%)
19
79.2%
5
 
20.8%
Other Symbol
ValueCountFrequency (%)
5
83.3%
¦ 1
 
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 1748
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 27
100.0%
Other Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3539487
79.4%
Common 919088
 
20.6%
Gurmukhi 1
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 442639
12.5%
t 324077
 
9.2%
a 274843
 
7.8%
o 264412
 
7.5%
n 253954
 
7.2%
s 245698
 
6.9%
i 241682
 
6.8%
r 204655
 
5.8%
h 197047
 
5.6%
l 134361
 
3.8%
Other values (51) 956119
27.0%
Common
ValueCountFrequency (%)
771616
84.0%
. 45900
 
5.0%
, 34895
 
3.8%
' 11580
 
1.3%
- 8486
 
0.9%
" 6975
 
0.8%
6832
 
0.7%
4806
 
0.5%
1 2593
 
0.3%
0 2106
 
0.2%
Other values (50) 23299
 
2.5%
Gurmukhi
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4458299
> 99.9%
None 188
 
< 0.1%
Punctuation 83
 
< 0.1%
Specials 5
 
< 0.1%
Gurmukhi 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
771616
17.3%
e 442639
 
9.9%
t 324077
 
7.3%
a 274843
 
6.2%
o 264412
 
5.9%
n 253954
 
5.7%
s 245698
 
5.5%
i 241682
 
5.4%
r 204655
 
4.6%
h 197047
 
4.4%
Other values (88) 1237676
27.8%
None
ValueCountFrequency (%)
  94
50.0%
 41
21.8%
â 21
 
11.2%
€ 5
 
2.7%
ö 5
 
2.7%
é 4
 
2.1%
è 3
 
1.6%
ü 3
 
1.6%
à 3
 
1.6%
œ 2
 
1.1%
Other values (6) 7
 
3.7%
Punctuation
ValueCountFrequency (%)
35
42.2%
20
24.1%
19
22.9%
5
 
6.0%
3
 
3.6%
1
 
1.2%
Specials
ValueCountFrequency (%)
5
100.0%
Gurmukhi
ValueCountFrequency (%)
1
100.0%

min_length
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.1%
Missing324
Missing (%)15.9%
Memory size113.5 KiB
8.0
1719 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5157
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8.0
2nd row8.0
3rd row8.0
4th row8.0
5th row8.0

Common Values

ValueCountFrequency (%)
8.0 1719
84.1%
(Missing) 324
 
15.9%

Length

2023-05-24T09:04:07.354890image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:07.609229image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
8.0 1719
100.0%

Most occurring characters

ValueCountFrequency (%)
8 1719
33.3%
. 1719
33.3%
0 1719
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3438
66.7%
Other Punctuation 1719
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8 1719
50.0%
0 1719
50.0%
Other Punctuation
ValueCountFrequency (%)
. 1719
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5157
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
8 1719
33.3%
. 1719
33.3%
0 1719
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8 1719
33.3%
. 1719
33.3%
0 1719
33.3%

max_length
Real number (ℝ)

Distinct10
Distinct (%)0.5%
Missing81
Missing (%)4.0%
Infinite0
Infinite (%)0.0%
Mean2217.4516
Minimum128
Maximum4096
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.1 KiB
2023-05-24T09:04:07.730377image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum128
5-th percentile256
Q11024
median2048
Q34096
95-th percentile4096
Maximum4096
Range3968
Interquartile range (IQR)3072

Descriptive statistics

Standard deviation1401.9547
Coefficient of variation (CV)0.6322369
Kurtosis-1.4175937
Mean2217.4516
Median Absolute Deviation (MAD)1056
Skewness0.28000517
Sum4350640
Variance1965477
MonotonicityNot monotonic
2023-05-24T09:04:07.888774image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
4096 613
30.0%
2048 512
25.1%
1024 414
20.3%
256 133
 
6.5%
512 114
 
5.6%
992 76
 
3.7%
3276 34
 
1.7%
1927 30
 
1.5%
128 19
 
0.9%
1638 17
 
0.8%
(Missing) 81
 
4.0%
ValueCountFrequency (%)
128 19
 
0.9%
256 133
 
6.5%
512 114
 
5.6%
992 76
 
3.7%
1024 414
20.3%
1638 17
 
0.8%
1927 30
 
1.5%
2048 512
25.1%
3276 34
 
1.7%
4096 613
30.0%
ValueCountFrequency (%)
4096 613
30.0%
3276 34
 
1.7%
2048 512
25.1%
1927 30
 
1.5%
1638 17
 
0.8%
1024 414
20.3%
992 76
 
3.7%
512 114
 
5.6%
256 133
 
6.5%
128 19
 
0.9%

no_repeat_ngram_size
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.1%
Missing305
Missing (%)14.9%
Memory size113.9 KiB
3.0
1638 
4.0
 
100

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5214
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row3.0
3rd row3.0
4th row3.0
5th row3.0

Common Values

ValueCountFrequency (%)
3.0 1638
80.2%
4.0 100
 
4.9%
(Missing) 305
 
14.9%

Length

2023-05-24T09:04:08.039338image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:08.182033image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
3.0 1638
94.2%
4.0 100
 
5.8%

Most occurring characters

ValueCountFrequency (%)
. 1738
33.3%
0 1738
33.3%
3 1638
31.4%
4 100
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3476
66.7%
Other Punctuation 1738
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1738
50.0%
3 1638
47.1%
4 100
 
2.9%
Other Punctuation
ValueCountFrequency (%)
. 1738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5214
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 1738
33.3%
0 1738
33.3%
3 1638
31.4%
4 100
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5214
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 1738
33.3%
0 1738
33.3%
3 1638
31.4%
4 100
 
1.9%

encoder_no_repeat_ngram_size
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.1%
Missing305
Missing (%)14.9%
Memory size113.9 KiB
4.0
1738 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5214
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row4.0
3rd row4.0
4th row4.0
5th row4.0

Common Values

ValueCountFrequency (%)
4.0 1738
85.1%
(Missing) 305
 
14.9%

Length

2023-05-24T09:04:08.295936image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:08.434499image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
4.0 1738
100.0%

Most occurring characters

ValueCountFrequency (%)
4 1738
33.3%
. 1738
33.3%
0 1738
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3476
66.7%
Other Punctuation 1738
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 1738
50.0%
0 1738
50.0%
Other Punctuation
ValueCountFrequency (%)
. 1738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5214
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 1738
33.3%
. 1738
33.3%
0 1738
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5214
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 1738
33.3%
. 1738
33.3%
0 1738
33.3%

repetition_penalty
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.1%
Missing324
Missing (%)15.9%
Memory size113.5 KiB
2.5
1668 
1.5
 
51

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5157
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.5
2nd row2.5
3rd row2.5
4th row2.5
5th row2.5

Common Values

ValueCountFrequency (%)
2.5 1668
81.6%
1.5 51
 
2.5%
(Missing) 324
 
15.9%

Length

2023-05-24T09:04:08.541312image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:08.682044image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
2.5 1668
97.0%
1.5 51
 
3.0%

Most occurring characters

ValueCountFrequency (%)
. 1719
33.3%
5 1719
33.3%
2 1668
32.3%
1 51
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3438
66.7%
Other Punctuation 1719
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 1719
50.0%
2 1668
48.5%
1 51
 
1.5%
Other Punctuation
ValueCountFrequency (%)
. 1719
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5157
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 1719
33.3%
5 1719
33.3%
2 1668
32.3%
1 51
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 1719
33.3%
5 1719
33.3%
2 1668
32.3%
1 51
 
1.0%

num_beams
Real number (ℝ)

Distinct10
Distinct (%)0.6%
Missing233
Missing (%)11.4%
Infinite0
Infinite (%)0.0%
Mean7.6232044
Minimum1
Maximum32
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.1 KiB
2023-05-24T09:04:08.796360image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q38
95-th percentile16
Maximum32
Range31
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.8640458
Coefficient of variation (CV)0.6380579
Kurtosis5.3809618
Mean7.6232044
Median Absolute Deviation (MAD)4
Skewness1.7613591
Sum13798
Variance23.658942
MonotonicityNot monotonic
2023-05-24T09:04:08.909844image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
8 755
37.0%
4 499
24.4%
16 190
 
9.3%
12 95
 
4.7%
2 95
 
4.7%
1 81
 
4.0%
6 38
 
1.9%
20 19
 
0.9%
5 19
 
0.9%
32 19
 
0.9%
(Missing) 233
 
11.4%
ValueCountFrequency (%)
1 81
 
4.0%
2 95
 
4.7%
4 499
24.4%
5 19
 
0.9%
6 38
 
1.9%
8 755
37.0%
12 95
 
4.7%
16 190
 
9.3%
20 19
 
0.9%
32 19
 
0.9%
ValueCountFrequency (%)
32 19
 
0.9%
20 19
 
0.9%
16 190
 
9.3%
12 95
 
4.7%
8 755
37.0%
6 38
 
1.9%
5 19
 
0.9%
4 499
24.4%
2 95
 
4.7%
1 81
 
4.0%

num_beam_groups
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.1%
Missing324
Missing (%)15.9%
Memory size113.5 KiB
1.0
1719 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5157
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 1719
84.1%
(Missing) 324
 
15.9%

Length

2023-05-24T09:04:09.043564image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:09.172791image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0 1719
100.0%

Most occurring characters

ValueCountFrequency (%)
1 1719
33.3%
. 1719
33.3%
0 1719
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3438
66.7%
Other Punctuation 1719
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1719
50.0%
0 1719
50.0%
Other Punctuation
ValueCountFrequency (%)
. 1719
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5157
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1719
33.3%
. 1719
33.3%
0 1719
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1719
33.3%
. 1719
33.3%
0 1719
33.3%

length_penalty
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.1%
Missing354
Missing (%)17.3%
Memory size113.0 KiB
0.8
1655 
0.75
 
34

Length

Max length4
Median length3
Mean length3.0201303
Min length3

Characters and Unicode

Total characters5101
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.8
2nd row0.8
3rd row0.8
4th row0.8
5th row0.8

Common Values

ValueCountFrequency (%)
0.8 1655
81.0%
0.75 34
 
1.7%
(Missing) 354
 
17.3%

Length

2023-05-24T09:04:09.278992image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:09.415521image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0.8 1655
98.0%
0.75 34
 
2.0%

Most occurring characters

ValueCountFrequency (%)
0 1689
33.1%
. 1689
33.1%
8 1655
32.4%
7 34
 
0.7%
5 34
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3412
66.9%
Other Punctuation 1689
33.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1689
49.5%
8 1655
48.5%
7 34
 
1.0%
5 34
 
1.0%
Other Punctuation
ValueCountFrequency (%)
. 1689
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5101
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1689
33.1%
. 1689
33.1%
8 1655
32.4%
7 34
 
0.7%
5 34
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5101
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1689
33.1%
. 1689
33.1%
8 1655
32.4%
7 34
 
0.7%
5 34
 
0.7%

early_stopping
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)0.1%
Missing314
Missing (%)15.4%
Memory size70.7 KiB
True
1729 
(Missing)
314 
ValueCountFrequency (%)
True 1729
84.6%
(Missing) 314
 
15.4%
2023-05-24T09:04:09.544544image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

do_sample
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)0.1%
Missing324
Missing (%)15.9%
Memory size64.0 KiB
False
1719 
(Missing)
324 
ValueCountFrequency (%)
False 1719
84.1%
(Missing) 324
 
15.9%
2023-05-24T09:04:09.702174image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

model_name
Categorical

Distinct43
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size189.8 KiB
pszemraj/long-t5-tglobal-base-16384-book-summary
186 
pszemraj/long-t5-tglobal-xl-16384-book-summary
 
142
stacked-summaries/flan-t5-large-tinystack-booksum-1024-WIP1r2
 
95
stacked-summaries/flan-t5-large-stacked-samsum-1024
 
95
pszemraj/long-t5-tglobal-base-sci-simplify
 
95
Other values (38)
1430 

Length

Max length72
Median length48
Mean length38.047479
Min length5

Characters and Unicode

Total characters77731
Distinct characters54
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowjordiclive/flan-t5-3b-summarizer
2nd rowjordiclive/flan-t5-3b-summarizer
3rd rowjordiclive/flan-t5-3b-summarizer
4th rowjordiclive/flan-t5-3b-summarizer
5th rowjordiclive/flan-t5-3b-summarizer

Common Values

ValueCountFrequency (%)
pszemraj/long-t5-tglobal-base-16384-book-summary 186
 
9.1%
pszemraj/long-t5-tglobal-xl-16384-book-summary 142
 
7.0%
stacked-summaries/flan-t5-large-tinystack-booksum-1024-WIP1r2 95
 
4.7%
stacked-summaries/flan-t5-large-stacked-samsum-1024 95
 
4.7%
pszemraj/long-t5-tglobal-base-sci-simplify 95
 
4.7%
AleBurzio/long-t5-base-govreport 95
 
4.7%
pszemraj/led-base-book-summary 95
 
4.7%
Joemgu/pegasus-x-sumstew 95
 
4.7%
gpt-3.5-turbo 76
 
3.7%
gpt-4 76
 
3.7%
Other values (33) 993
48.6%

Length

2023-05-24T09:04:09.843162image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pszemraj/long-t5-tglobal-base-16384-book-summary 186
 
9.1%
pszemraj/long-t5-tglobal-xl-16384-book-summary 142
 
7.0%
stacked-summaries/flan-t5-large-tinystack-booksum-1024-wip1r2 95
 
4.7%
stacked-summaries/flan-t5-large-stacked-samsum-1024 95
 
4.7%
pszemraj/long-t5-tglobal-base-sci-simplify 95
 
4.7%
aleburzio/long-t5-base-govreport 95
 
4.7%
pszemraj/led-base-book-summary 95
 
4.7%
joemgu/pegasus-x-sumstew 95
 
4.7%
pszemraj/long-t5-tglobal-base-scientific_lay_summarisation-elife-norm-r1 76
 
3.7%
gpt-3.5-turbo 76
 
3.7%
Other values (33) 993
48.6%

Most occurring characters

ValueCountFrequency (%)
- 8336
 
10.7%
a 6216
 
8.0%
s 5603
 
7.2%
e 5183
 
6.7%
m 4524
 
5.8%
r 4336
 
5.6%
l 4290
 
5.5%
o 4271
 
5.5%
t 3416
 
4.4%
g 2942
 
3.8%
Other values (44) 28614
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 61007
78.5%
Dash Punctuation 8336
 
10.7%
Decimal Number 5427
 
7.0%
Other Punctuation 1848
 
2.4%
Uppercase Letter 895
 
1.2%
Connector Punctuation 218
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 6216
 
10.2%
s 5603
 
9.2%
e 5183
 
8.5%
m 4524
 
7.4%
r 4336
 
7.1%
l 4290
 
7.0%
o 4271
 
7.0%
t 3416
 
5.6%
g 2942
 
4.8%
b 2870
 
4.7%
Other values (15) 17356
28.4%
Uppercase Letter
ValueCountFrequency (%)
A 128
14.3%
I 121
13.5%
B 95
10.6%
W 95
10.6%
J 95
10.6%
P 95
10.6%
M 57
6.4%
E 38
 
4.2%
K 38
 
4.2%
T 19
 
2.1%
Other values (6) 114
12.7%
Decimal Number
ValueCountFrequency (%)
5 1088
20.0%
1 1016
18.7%
4 845
15.6%
3 636
11.7%
6 522
9.6%
8 484
8.9%
2 456
8.4%
0 323
 
6.0%
9 57
 
1.1%
Other Punctuation
ValueCountFrequency (%)
/ 1772
95.9%
. 76
 
4.1%
Dash Punctuation
ValueCountFrequency (%)
- 8336
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 218
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 61902
79.6%
Common 15829
 
20.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 6216
 
10.0%
s 5603
 
9.1%
e 5183
 
8.4%
m 4524
 
7.3%
r 4336
 
7.0%
l 4290
 
6.9%
o 4271
 
6.9%
t 3416
 
5.5%
g 2942
 
4.8%
b 2870
 
4.6%
Other values (31) 18251
29.5%
Common
ValueCountFrequency (%)
- 8336
52.7%
/ 1772
 
11.2%
5 1088
 
6.9%
1 1016
 
6.4%
4 845
 
5.3%
3 636
 
4.0%
6 522
 
3.3%
8 484
 
3.1%
2 456
 
2.9%
0 323
 
2.0%
Other values (3) 351
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 77731
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 8336
 
10.7%
a 6216
 
8.0%
s 5603
 
7.2%
e 5183
 
6.7%
m 4524
 
5.8%
r 4336
 
5.6%
l 4290
 
5.5%
o 4271
 
5.5%
t 3416
 
4.4%
g 2942
 
3.8%
Other values (44) 28614
36.8%

date
Categorical

HIGH CARDINALITY  MISSING 

Distinct96
Distinct (%)5.3%
Missing243
Missing (%)11.9%
Memory size134.0 KiB
Feb-17-2023
 
47
Feb-21-2023
 
34
20230318_061812
 
19
20230318_055638
 
19
20230409_020526
 
19
Other values (91)
1662 

Length

Max length17
Median length15
Mean length14.840556
Min length8

Characters and Unicode

Total characters26713
Distinct characters20
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20230524_080731
2nd row20230524_080731
3rd row20230524_080731
4th row20230524_080731
5th row20230524_080731

Common Values

ValueCountFrequency (%)
Feb-17-2023 47
 
2.3%
Feb-21-2023 34
 
1.7%
20230318_061812 19
 
0.9%
20230318_055638 19
 
0.9%
20230409_020526 19
 
0.9%
20230408_160437 19
 
0.9%
20230220_214802 19
 
0.9%
20230316_150446 19
 
0.9%
20230316_134625 19
 
0.9%
20230220_205040 19
 
0.9%
Other values (86) 1567
76.7%
(Missing) 243
 
11.9%

Length

2023-05-24T09:04:09.996170image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2023-feb-27 81
 
4.3%
feb-17-2023 47
 
2.5%
feb-21-2023 34
 
1.8%
20230315_234849 19
 
1.0%
20230318_040524 19
 
1.0%
20230318_022628 19
 
1.0%
20230318_022921 19
 
1.0%
20230505_151208 19
 
1.0%
20230524_061826 19
 
1.0%
20230316_015916 19
 
1.0%
Other values (87) 1586
84.3%

Most occurring characters

ValueCountFrequency (%)
0 5978
22.4%
2 5887
22.0%
3 3738
14.0%
1 2180
 
8.2%
5 1862
 
7.0%
_ 1612
 
6.0%
4 1588
 
5.9%
6 930
 
3.5%
8 824
 
3.1%
7 733
 
2.7%
Other values (10) 1381
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 24034
90.0%
Connector Punctuation 1612
 
6.0%
Dash Punctuation 362
 
1.4%
Lowercase Letter 362
 
1.4%
Uppercase Letter 181
 
0.7%
Space Separator 81
 
0.3%
Other Punctuation 81
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5978
24.9%
2 5887
24.5%
3 3738
15.6%
1 2180
 
9.1%
5 1862
 
7.7%
4 1588
 
6.6%
6 930
 
3.9%
8 824
 
3.4%
7 733
 
3.0%
9 314
 
1.3%
Lowercase Letter
ValueCountFrequency (%)
e 162
44.8%
b 162
44.8%
a 19
 
5.2%
r 19
 
5.2%
Uppercase Letter
ValueCountFrequency (%)
F 162
89.5%
M 19
 
10.5%
Connector Punctuation
ValueCountFrequency (%)
_ 1612
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 362
100.0%
Space Separator
ValueCountFrequency (%)
81
100.0%
Other Punctuation
ValueCountFrequency (%)
: 81
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 26170
98.0%
Latin 543
 
2.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5978
22.8%
2 5887
22.5%
3 3738
14.3%
1 2180
 
8.3%
5 1862
 
7.1%
_ 1612
 
6.2%
4 1588
 
6.1%
6 930
 
3.6%
8 824
 
3.1%
7 733
 
2.8%
Other values (4) 838
 
3.2%
Latin
ValueCountFrequency (%)
F 162
29.8%
e 162
29.8%
b 162
29.8%
M 19
 
3.5%
a 19
 
3.5%
r 19
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26713
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5978
22.4%
2 5887
22.0%
3 3738
14.0%
1 2180
 
8.2%
5 1862
 
7.0%
_ 1612
 
6.0%
4 1588
 
5.9%
6 930
 
3.5%
8 824
 
3.1%
7 733
 
2.7%
Other values (10) 1381
 
5.2%

length
Categorical

Distinct2
Distinct (%)2.5%
Missing1962
Missing (%)96.0%
Memory size66.3 KiB
long
68 
medium
13 

Length

Max length6
Median length4
Mean length4.3209877
Min length4

Characters and Unicode

Total characters350
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmedium
2nd rowmedium
3rd rowmedium
4th rowmedium
5th rowmedium

Common Values

ValueCountFrequency (%)
long 68
 
3.3%
medium 13
 
0.6%
(Missing) 1962
96.0%

Length

2023-05-24T09:04:10.138528image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:10.294985image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
long 68
84.0%
medium 13
 
16.0%

Most occurring characters

ValueCountFrequency (%)
l 68
19.4%
o 68
19.4%
n 68
19.4%
g 68
19.4%
m 26
 
7.4%
e 13
 
3.7%
d 13
 
3.7%
i 13
 
3.7%
u 13
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 350
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 68
19.4%
o 68
19.4%
n 68
19.4%
g 68
19.4%
m 26
 
7.4%
e 13
 
3.7%
d 13
 
3.7%
i 13
 
3.7%
u 13
 
3.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 350
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 68
19.4%
o 68
19.4%
n 68
19.4%
g 68
19.4%
m 26
 
7.4%
e 13
 
3.7%
d 13
 
3.7%
i 13
 
3.7%
u 13
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 350
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 68
19.4%
o 68
19.4%
n 68
19.4%
g 68
19.4%
m 26
 
7.4%
e 13
 
3.7%
d 13
 
3.7%
i 13
 
3.7%
u 13
 
3.7%

format
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)1.2%
Missing1962
Missing (%)96.0%
Memory size66.7 KiB
paragraph
81 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters729
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowparagraph
2nd rowparagraph
3rd rowparagraph
4th rowparagraph
5th rowparagraph

Common Values

ValueCountFrequency (%)
paragraph 81
 
4.0%
(Missing) 1962
96.0%

Length

2023-05-24T09:04:10.405621image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:10.532318image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
paragraph 81
100.0%

Most occurring characters

ValueCountFrequency (%)
a 243
33.3%
p 162
22.2%
r 162
22.2%
g 81
 
11.1%
h 81
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 729
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 243
33.3%
p 162
22.2%
r 162
22.2%
g 81
 
11.1%
h 81
 
11.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 729
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 243
33.3%
p 162
22.2%
r 162
22.2%
g 81
 
11.1%
h 81
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 729
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 243
33.3%
p 162
22.2%
r 162
22.2%
g 81
 
11.1%
h 81
 
11.1%

extractiveness
Categorical

Distinct2
Distinct (%)2.5%
Missing1962
Missing (%)96.0%
Memory size66.3 KiB
low
47 
medium
34 

Length

Max length6
Median length3
Mean length4.2592593
Min length3

Characters and Unicode

Total characters345
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowlow
2nd rowlow
3rd rowlow
4th rowlow
5th rowlow

Common Values

ValueCountFrequency (%)
low 47
 
2.3%
medium 34
 
1.7%
(Missing) 1962
96.0%

Length

2023-05-24T09:04:10.644862image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:10.782157image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
low 47
58.0%
medium 34
42.0%

Most occurring characters

ValueCountFrequency (%)
m 68
19.7%
l 47
13.6%
o 47
13.6%
w 47
13.6%
e 34
9.9%
d 34
9.9%
i 34
9.9%
u 34
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 345
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m 68
19.7%
l 47
13.6%
o 47
13.6%
w 47
13.6%
e 34
9.9%
d 34
9.9%
i 34
9.9%
u 34
9.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 345
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
m 68
19.7%
l 47
13.6%
o 47
13.6%
w 47
13.6%
e 34
9.9%
d 34
9.9%
i 34
9.9%
u 34
9.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 345
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
m 68
19.7%
l 47
13.6%
o 47
13.6%
w 47
13.6%
e 34
9.9%
d 34
9.9%
i 34
9.9%
u 34
9.9%

temperature
Categorical

Distinct2
Distinct (%)2.5%
Missing1962
Missing (%)96.0%
Memory size81.5 KiB
0.5
66 
1.0
15 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters243
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.5
2nd row0.5
3rd row0.5
4th row0.5
5th row0.5

Common Values

ValueCountFrequency (%)
0.5 66
 
3.2%
1.0 15
 
0.7%
(Missing) 1962
96.0%

Length

2023-05-24T09:04:10.892068image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:11.033074image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0.5 66
81.5%
1.0 15
 
18.5%

Most occurring characters

ValueCountFrequency (%)
0 81
33.3%
. 81
33.3%
5 66
27.2%
1 15
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 162
66.7%
Other Punctuation 81
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 81
50.0%
5 66
40.7%
1 15
 
9.3%
Other Punctuation
ValueCountFrequency (%)
. 81
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 243
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 81
33.3%
. 81
33.3%
5 66
27.2%
1 15
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 243
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 81
33.3%
. 81
33.3%
5 66
27.2%
1 15
 
6.2%

token_batch_length
Real number (ℝ)

Distinct6
Distinct (%)1.7%
Missing1700
Missing (%)83.2%
Infinite0
Infinite (%)0.0%
Mean13886.321
Minimum1024
Maximum32768
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.1 KiB
2023-05-24T09:04:11.134787image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1024
5-th percentile1024
Q13584
median8192
Q332768
95-th percentile32768
Maximum32768
Range31744
Interquartile range (IQR)29184

Descriptive statistics

Standard deviation11943.674
Coefficient of variation (CV)0.86010355
Kurtosis-1.1042253
Mean13886.321
Median Absolute Deviation (MAD)4608
Skewness0.79092605
Sum4763008
Variance1.4265134 × 108
MonotonicityNot monotonic
2023-05-24T09:04:11.244479image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32768 91
 
4.5%
7200 76
 
3.7%
3584 76
 
3.7%
8192 47
 
2.3%
16384 34
 
1.7%
1024 19
 
0.9%
(Missing) 1700
83.2%
ValueCountFrequency (%)
1024 19
 
0.9%
3584 76
3.7%
7200 76
3.7%
8192 47
2.3%
16384 34
 
1.7%
32768 91
4.5%
ValueCountFrequency (%)
32768 91
4.5%
16384 34
 
1.7%
8192 47
2.3%
7200 76
3.7%
3584 76
3.7%
1024 19
 
0.9%

penalty_alpha
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)1.2%
Missing1962
Missing (%)96.0%
Memory size81.5 KiB
0.6
81 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters243
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.6
2nd row0.6
3rd row0.6
4th row0.6
5th row0.6

Common Values

ValueCountFrequency (%)
0.6 81
 
4.0%
(Missing) 1962
96.0%

Length

2023-05-24T09:04:11.377645image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:11.511013image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0.6 81
100.0%

Most occurring characters

ValueCountFrequency (%)
0 81
33.3%
. 81
33.3%
6 81
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 162
66.7%
Other Punctuation 81
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 81
50.0%
6 81
50.0%
Other Punctuation
ValueCountFrequency (%)
. 81
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 243
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 81
33.3%
. 81
33.3%
6 81
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 243
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 81
33.3%
. 81
33.3%
6 81
33.3%

top_k
Categorical

Distinct2
Distinct (%)2.5%
Missing1962
Missing (%)96.0%
Memory size81.5 KiB
4.0
70 
8.0
11 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters243
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row4.0
3rd row4.0
4th row4.0
5th row4.0

Common Values

ValueCountFrequency (%)
4.0 70
 
3.4%
8.0 11
 
0.5%
(Missing) 1962
96.0%

Length

2023-05-24T09:04:11.610975image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:11.742125image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
4.0 70
86.4%
8.0 11
 
13.6%

Most occurring characters

ValueCountFrequency (%)
. 81
33.3%
0 81
33.3%
4 70
28.8%
8 11
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 162
66.7%
Other Punctuation 81
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 81
50.0%
4 70
43.2%
8 11
 
6.8%
Other Punctuation
ValueCountFrequency (%)
. 81
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 243
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 81
33.3%
0 81
33.3%
4 70
28.8%
8 11
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 243
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 81
33.3%
0 81
33.3%
4 70
28.8%
8 11
 
4.5%

batch_stride
Categorical

Distinct2
Distinct (%)0.8%
Missing1791
Missing (%)87.7%
Memory size84.9 KiB
0.0
152 
24.0
100 

Length

Max length4
Median length3
Mean length3.3968254
Min length3

Characters and Unicode

Total characters856
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row24.0
2nd row24.0
3rd row24.0
4th row24.0
5th row24.0

Common Values

ValueCountFrequency (%)
0.0 152
 
7.4%
24.0 100
 
4.9%
(Missing) 1791
87.7%

Length

2023-05-24T09:04:11.855783image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:11.987041image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0 152
60.3%
24.0 100
39.7%

Most occurring characters

ValueCountFrequency (%)
0 404
47.2%
. 252
29.4%
2 100
 
11.7%
4 100
 
11.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 604
70.6%
Other Punctuation 252
29.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 404
66.9%
2 100
 
16.6%
4 100
 
16.6%
Other Punctuation
ValueCountFrequency (%)
. 252
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 856
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 404
47.2%
. 252
29.4%
2 100
 
11.7%
4 100
 
11.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 856
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 404
47.2%
. 252
29.4%
2 100
 
11.7%
4 100
 
11.7%

max_len_ratio
Categorical

Distinct3
Distinct (%)3.0%
Missing1943
Missing (%)95.1%
Memory size81.9 KiB
5.0
51 
4.25
30 
4.0
19 

Length

Max length4
Median length3
Mean length3.3
Min length3

Characters and Unicode

Total characters330
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5.0
2nd row5.0
3rd row5.0
4th row5.0
5th row5.0

Common Values

ValueCountFrequency (%)
5.0 51
 
2.5%
4.25 30
 
1.5%
4.0 19
 
0.9%
(Missing) 1943
95.1%

Length

2023-05-24T09:04:12.096453image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:12.236039image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
5.0 51
51.0%
4.25 30
30.0%
4.0 19
 
19.0%

Most occurring characters

ValueCountFrequency (%)
. 100
30.3%
5 81
24.5%
0 70
21.2%
4 49
14.8%
2 30
 
9.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 230
69.7%
Other Punctuation 100
30.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 81
35.2%
0 70
30.4%
4 49
21.3%
2 30
 
13.0%
Other Punctuation
ValueCountFrequency (%)
. 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 330
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 100
30.3%
5 81
24.5%
0 70
21.2%
4 49
14.8%
2 30
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 330
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 100
30.3%
5 81
24.5%
0 70
21.2%
4 49
14.8%
2 30
 
9.1%
Distinct6
Distinct (%)6.0%
Missing1943
Missing (%)95.1%
Memory size70.1 KiB
gauntlet-csearch-tglobal-XL-public
29 
gaunlet-flan-t5-large-xsum-r1
19 
gauntlet-csearch-16384-topk4-longt5-base
17 
gauntlet-csearch-8192-topk4-longt5-base
17 
gauntlet-csearch-16384-len-penalty-tglobal-XL-public
17 

Length

Max length52
Median length39
Mean length38.04
Min length29

Characters and Unicode

Total characters3804
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st rowgauntlet-csearch-16384-topk4-longt5-base
2nd rowgauntlet-csearch-16384-topk4-longt5-base
3rd rowgauntlet-csearch-16384-topk4-longt5-base
4th rowgauntlet-csearch-16384-topk4-longt5-base
5th rowgauntlet-csearch-16384-topk4-longt5-base

Common Values

ValueCountFrequency (%)
gauntlet-csearch-tglobal-XL-public 29
 
1.4%
gaunlet-flan-t5-large-xsum-r1 19
 
0.9%
gauntlet-csearch-16384-topk4-longt5-base 17
 
0.8%
gauntlet-csearch-8192-topk4-longt5-base 17
 
0.8%
gauntlet-csearch-16384-len-penalty-tglobal-XL-public 17
 
0.8%
gauntlet-csearch-16384-tglobal-XL-public 1
 
< 0.1%
(Missing) 1943
95.1%

Length

2023-05-24T09:04:12.352326image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:12.498580image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
gauntlet-csearch-tglobal-xl-public 29
29.0%
gaunlet-flan-t5-large-xsum-r1 19
19.0%
gauntlet-csearch-16384-topk4-longt5-base 17
17.0%
gauntlet-csearch-8192-topk4-longt5-base 17
17.0%
gauntlet-csearch-16384-len-penalty-tglobal-xl-public 17
17.0%
gauntlet-csearch-16384-tglobal-xl-public 1
 
1.0%

Most occurring characters

ValueCountFrequency (%)
- 505
13.3%
l 347
 
9.1%
t 332
 
8.7%
a 317
 
8.3%
e 268
 
7.0%
c 209
 
5.5%
g 200
 
5.3%
n 187
 
4.9%
u 166
 
4.4%
s 134
 
3.5%
Other values (21) 1139
29.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2856
75.1%
Dash Punctuation 505
 
13.3%
Decimal Number 349
 
9.2%
Uppercase Letter 94
 
2.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 347
12.1%
t 332
11.6%
a 317
11.1%
e 268
9.4%
c 209
 
7.3%
g 200
 
7.0%
n 187
 
6.5%
u 166
 
5.8%
s 134
 
4.7%
b 128
 
4.5%
Other values (10) 568
19.9%
Decimal Number
ValueCountFrequency (%)
1 71
20.3%
4 69
19.8%
5 53
15.2%
8 52
14.9%
6 35
10.0%
3 35
10.0%
9 17
 
4.9%
2 17
 
4.9%
Uppercase Letter
ValueCountFrequency (%)
L 47
50.0%
X 47
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 505
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2950
77.5%
Common 854
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 347
11.8%
t 332
11.3%
a 317
10.7%
e 268
9.1%
c 209
 
7.1%
g 200
 
6.8%
n 187
 
6.3%
u 166
 
5.6%
s 134
 
4.5%
b 128
 
4.3%
Other values (12) 662
22.4%
Common
ValueCountFrequency (%)
- 505
59.1%
1 71
 
8.3%
4 69
 
8.1%
5 53
 
6.2%
8 52
 
6.1%
6 35
 
4.1%
3 35
 
4.1%
9 17
 
2.0%
2 17
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3804
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 505
13.3%
l 347
 
9.1%
t 332
 
8.7%
a 317
 
8.3%
e 268
 
7.0%
c 209
 
5.5%
g 200
 
5.3%
n 187
 
4.9%
u 166
 
4.4%
s 134
 
3.5%
Other values (21) 1139
29.9%

runtime
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)1.3%
Missing1967
Missing (%)96.3%
Memory size66.2 KiB
45:30
76 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters380
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row45:30
2nd row45:30
3rd row45:30
4th row45:30
5th row45:30

Common Values

ValueCountFrequency (%)
45:30 76
 
3.7%
(Missing) 1967
96.3%

Length

2023-05-24T09:04:12.641195image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:12.767176image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
45:30 76
100.0%

Most occurring characters

ValueCountFrequency (%)
4 76
20.0%
5 76
20.0%
: 76
20.0%
3 76
20.0%
0 76
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 304
80.0%
Other Punctuation 76
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 76
25.0%
5 76
25.0%
3 76
25.0%
0 76
25.0%
Other Punctuation
ValueCountFrequency (%)
: 76
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 380
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 76
20.0%
5 76
20.0%
: 76
20.0%
3 76
20.0%
0 76
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 76
20.0%
5 76
20.0%
: 76
20.0%
3 76
20.0%
0 76
20.0%
Distinct19
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size208.3 KiB
Emie_dissertation_cleansed.txt
 
113
OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated_.txt
 
111
OCR_ML4HLecture02image_.txt
 
110
OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated_.txt
 
110
script_findingnemo.txt
 
109
Other values (14)
1490 

Length

Max length124
Median length44
Mean length47.31816
Min length22

Characters and Unicode

Total characters96671
Distinct characters57
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1.txt
2nd rowASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2.txt
3rd rowASRnlp_law_lecture_week_1_v_2_c_transcription_1.txt
4th rowASRnlp_law_lecture_week_2_v_2_c_transcription_2.txt
5th rowASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3.txt

Common Values

ValueCountFrequency (%)
Emie_dissertation_cleansed.txt 113
 
5.5%
OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated_.txt 111
 
5.4%
OCR_ML4HLecture02image_.txt 110
 
5.4%
OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated_.txt 110
 
5.4%
script_findingnemo.txt 109
 
5.3%
OCR_ML4HLecture04RepresentationLearning.pptx_.txt 109
 
5.3%
OCR_ML4HLecture05-NLP.pptx_.txt 109
 
5.3%
script_frozendisney.txt 109
 
5.3%
The Most Dangerous Game--Richard Connell.txt 109
 
5.3%
ASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3.txt 108
 
5.3%
Other values (9) 946
46.3%

Length

2023-05-24T09:04:12.886071image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
442
 
7.4%
2022 221
 
3.7%
via 221
 
3.7%
chomsky 212
 
3.6%
asr-whisper-rpunctuated_noam 212
 
3.6%
emie_dissertation_cleansed.txt 113
 
1.9%
generation 111
 
1.9%
ocr_paper_hong 111
 
1.9%
transformers-annotated_.txt 111
 
1.9%
text-to-video 111
 
1.9%
Other values (38) 4098
68.7%

Most occurring characters

ValueCountFrequency (%)
t 9700
 
10.0%
e 6951
 
7.2%
_ 6664
 
6.9%
n 5853
 
6.1%
a 5710
 
5.9%
r 4755
 
4.9%
i 3922
 
4.1%
3920
 
4.1%
o 3477
 
3.6%
s 3433
 
3.6%
Other values (47) 42286
43.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 65204
67.4%
Uppercase Letter 10327
 
10.7%
Connector Punctuation 6664
 
6.9%
Decimal Number 5585
 
5.8%
Space Separator 3920
 
4.1%
Other Punctuation 2909
 
3.0%
Dash Punctuation 2062
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 9700
14.9%
e 6951
10.7%
n 5853
 
9.0%
a 5710
 
8.8%
r 4755
 
7.3%
i 3922
 
6.0%
o 3477
 
5.3%
s 3433
 
5.3%
p 3097
 
4.7%
c 2690
 
4.1%
Other values (14) 15616
23.9%
Uppercase Letter
ValueCountFrequency (%)
R 1730
16.8%
C 1086
10.5%
L 985
9.5%
P 872
8.4%
A 858
8.3%
O 654
 
6.3%
E 549
 
5.3%
M 547
 
5.3%
S 532
 
5.2%
T 441
 
4.3%
Other values (9) 2073
20.1%
Decimal Number
ValueCountFrequency (%)
2 1515
27.1%
1 851
15.2%
0 761
13.6%
6 636
11.4%
3 535
 
9.6%
4 437
 
7.8%
5 426
 
7.6%
9 212
 
3.8%
8 212
 
3.8%
Other Punctuation
ValueCountFrequency (%)
. 2477
85.1%
, 432
 
14.9%
Connector Punctuation
ValueCountFrequency (%)
_ 6664
100.0%
Space Separator
ValueCountFrequency (%)
3920
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2062
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 75531
78.1%
Common 21140
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 9700
 
12.8%
e 6951
 
9.2%
n 5853
 
7.7%
a 5710
 
7.6%
r 4755
 
6.3%
i 3922
 
5.2%
o 3477
 
4.6%
s 3433
 
4.5%
p 3097
 
4.1%
c 2690
 
3.6%
Other values (33) 25943
34.3%
Common
ValueCountFrequency (%)
_ 6664
31.5%
3920
18.5%
. 2477
 
11.7%
- 2062
 
9.8%
2 1515
 
7.2%
1 851
 
4.0%
0 761
 
3.6%
6 636
 
3.0%
3 535
 
2.5%
4 437
 
2.1%
Other values (4) 1282
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 96671
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 9700
 
10.0%
e 6951
 
7.2%
_ 6664
 
6.9%
n 5853
 
6.1%
a 5710
 
5.9%
r 4755
 
4.9%
i 3922
 
4.1%
3920
 
4.1%
o 3477
 
3.6%
s 3433
 
3.6%
Other values (47) 42286
43.7%

source_doc_id
Categorical

Distinct19
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size137.8 KiB
7a72cd85-984
 
113
66f03e4f-bd9
 
111
67f6cc9a-83c
 
110
110b05be-f8d
 
110
04a90337-527
 
109
Other values (14)
1490 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters24516
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfed834b5-a04
2nd rowaa279e3b-2d1
3rd row5e311e20-4bb
4th row016e8d29-288
5th row07af2cf9-15a

Common Values

ValueCountFrequency (%)
7a72cd85-984 113
 
5.5%
66f03e4f-bd9 111
 
5.4%
67f6cc9a-83c 110
 
5.4%
110b05be-f8d 110
 
5.4%
04a90337-527 109
 
5.3%
65105d7b-502 109
 
5.3%
adc6e224-1ea 109
 
5.3%
0abeb1f8-b6c 109
 
5.3%
af2b1960-5ca 109
 
5.3%
07af2cf9-15a 108
 
5.3%
Other values (9) 946
46.3%

Length

2023-05-24T09:04:13.020626image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7a72cd85-984 113
 
5.5%
66f03e4f-bd9 111
 
5.4%
67f6cc9a-83c 110
 
5.4%
110b05be-f8d 110
 
5.4%
04a90337-527 109
 
5.3%
65105d7b-502 109
 
5.3%
adc6e224-1ea 109
 
5.3%
0abeb1f8-b6c 109
 
5.3%
af2b1960-5ca 109
 
5.3%
3210a55b-6fd 108
 
5.3%
Other values (9) 946
46.3%

Most occurring characters

ValueCountFrequency (%)
- 2043
 
8.3%
a 1928
 
7.9%
e 1913
 
7.8%
d 1700
 
6.9%
2 1615
 
6.6%
0 1518
 
6.2%
b 1515
 
6.2%
1 1403
 
5.7%
6 1400
 
5.7%
5 1304
 
5.3%
Other values (7) 8177
33.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 13036
53.2%
Lowercase Letter 9437
38.5%
Dash Punctuation 2043
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 1615
12.4%
0 1518
11.6%
1 1403
10.8%
6 1400
10.7%
5 1304
10.0%
8 1278
9.8%
4 1278
9.8%
9 1181
9.1%
3 1076
8.3%
7 983
7.5%
Lowercase Letter
ValueCountFrequency (%)
a 1928
20.4%
e 1913
20.3%
d 1700
18.0%
b 1515
16.1%
f 1299
13.8%
c 1082
11.5%
Dash Punctuation
ValueCountFrequency (%)
- 2043
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15079
61.5%
Latin 9437
38.5%

Most frequent character per script

Common
ValueCountFrequency (%)
- 2043
13.5%
2 1615
10.7%
0 1518
10.1%
1 1403
9.3%
6 1400
9.3%
5 1304
8.6%
8 1278
8.5%
4 1278
8.5%
9 1181
7.8%
3 1076
7.1%
Latin
ValueCountFrequency (%)
a 1928
20.4%
e 1913
20.3%
d 1700
18.0%
b 1515
16.1%
f 1299
13.8%
c 1082
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24516
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 2043
 
8.3%
a 1928
 
7.9%
e 1913
 
7.8%
d 1700
 
6.9%
2 1615
 
6.6%
0 1518
 
6.2%
b 1515
 
6.2%
1 1403
 
5.7%
6 1400
 
5.7%
5 1304
 
5.3%
Other values (7) 8177
33.4%
Distinct9
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size131.2 KiB
Script
428 
OCR
328 
OCR_academic_paper
326 
ASR
320 
ASR_cleaned
212 
Other values (4)
429 

Length

Max length18
Median length14
Mean length8.6975037
Min length3

Characters and Unicode

Total characters17769
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowASR_cleaned
2nd rowASR_cleaned
3rd rowASR
4th rowASR
5th rowASR

Common Values

ValueCountFrequency (%)
Script 428
20.9%
OCR 328
16.1%
OCR_academic_paper 326
16.0%
ASR 320
15.7%
ASR_cleaned 212
10.4%
academic_paper 113
 
5.5%
literature 109
 
5.3%
conversation 108
 
5.3%
adversarial 99
 
4.8%

Length

2023-05-24T09:04:13.157761image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-24T09:04:13.342921image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
script 428
20.9%
ocr 328
16.1%
ocr_academic_paper 326
16.0%
asr 320
15.7%
asr_cleaned 212
10.4%
academic_paper 113
 
5.5%
literature 109
 
5.3%
conversation 108
 
5.3%
adversarial 99
 
4.8%

Most occurring characters

ValueCountFrequency (%)
a 2043
11.5%
e 1727
 
9.7%
c 1626
 
9.2%
r 1391
 
7.8%
p 1306
 
7.3%
R 1186
 
6.7%
i 1183
 
6.7%
_ 977
 
5.5%
S 960
 
5.4%
t 754
 
4.2%
Other values (11) 4616
26.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12806
72.1%
Uppercase Letter 3986
 
22.4%
Connector Punctuation 977
 
5.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2043
16.0%
e 1727
13.5%
c 1626
12.7%
r 1391
10.9%
p 1306
10.2%
i 1183
9.2%
t 754
 
5.9%
d 750
 
5.9%
m 439
 
3.4%
n 428
 
3.3%
Other values (5) 1159
9.1%
Uppercase Letter
ValueCountFrequency (%)
R 1186
29.8%
S 960
24.1%
C 654
16.4%
O 654
16.4%
A 532
13.3%
Connector Punctuation
ValueCountFrequency (%)
_ 977
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16792
94.5%
Common 977
 
5.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2043
12.2%
e 1727
10.3%
c 1626
9.7%
r 1391
 
8.3%
p 1306
 
7.8%
R 1186
 
7.1%
i 1183
 
7.0%
S 960
 
5.7%
t 754
 
4.5%
d 750
 
4.5%
Other values (10) 3866
23.0%
Common
ValueCountFrequency (%)
_ 977
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17769
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2043
11.5%
e 1727
 
9.7%
c 1626
 
9.2%
r 1391
 
7.8%
p 1306
 
7.3%
R 1186
 
6.7%
i 1183
 
6.7%
_ 977
 
5.5%
S 960
 
5.4%
t 754
 
4.2%
Other values (11) 4616
26.0%

Interactions

2023-05-24T09:04:05.545251image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:04.943400image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.265965image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.625906image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.077673image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.372244image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.714861image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.177161image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-05-24T09:04:05.460535image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Missing values

2023-05-24T09:04:05.883353image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-24T09:04:06.235355image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-05-24T09:04:06.552382image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

GAUNTLET_PATHfile_namesummarymin_lengthmax_lengthno_repeat_ngram_sizeencoder_no_repeat_ngram_sizerepetition_penaltynum_beamsnum_beam_groupslength_penaltyearly_stoppingdo_samplemodel_namedatelengthformatextractivenesstemperaturetoken_batch_lengthpenalty_alphatop_kbatch_stridemax_len_ratiodirectory-topic-tagruntimesource_doc_filenamesource_doc_idsource_doc_domain
0SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary.txtASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary.txtThere's lots of interesting things to say about language, but I don't think it's as simple as you think.\nThere's lots of interesting things about language, but if you really want to understand it, you need to look at the whole picture.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1.txtfed834b5-a04ASR_cleaned
1SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2_summary.txtASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2_summary.txtThere's no such thing as a simple language.\nThere's more than one way to solve ATB.\nI think you're asking the wrong question.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2.txtaa279e3b-2d1ASR_cleaned
2SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASRnlp_law_lecture_week_1_v_2_c_transcription_1_summary.txtASRnlp_law_lecture_week_1_v_2_c_transcription_1_summary.txtif you don't want to read the whole thing, just skip it.\nI'm sorry for the wall of text.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNASRnlp_law_lecture_week_1_v_2_c_transcription_1.txt5e311e20-4bbASR
3SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASRnlp_law_lecture_week_2_v_2_c_transcription_2_summary.txtASRnlp_law_lecture_week_2_v_2_c_transcription_2_summary.txtI think it's okay to ask questions about what you want to do in the course.\nI'm a bit of a nerd.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNASRnlp_law_lecture_week_2_v_2_c_transcription_2.txt016e8d29-288ASR
4SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/ASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary.txtASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary.txtI'm not sure if this is the right subreddit to post this.\nI'm going to finish it in person.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3.txt07af2cf9-15aASR
5SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/Emie_dissertation_cleansed_summary.txtEmie_dissertation_cleansed_summary.txtI'm writing a dissertation on Act of Violence (Fred Zinnemann), The Man Between (Claudie Reed), and the Theory of Film (Walter Kracauer).\nFilm noir is a genre of film that seeks to capture the material world as it emerges from its historical and cultural contexts.\nThe Man Between and Act of Violence are both films by the German-born, British-born filmmaker.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNEmie_dissertation_cleansed.txt7a72cd85-984academic_paper
6SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/OCR_ML4HLecture02image__summary.txtOCR_ML4HLecture02image__summary.txtEzurich's work on medical image analysis.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNOCR_ML4HLecture02image_.txt67f6cc9a-83cOCR
7SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/OCR_ML4HLecture04RepresentationLearning.pptx__summary.txtOCR_ML4HLecture04RepresentationLearning.pptx__summary.txtUnsupervised representation learning on medical time series\nWe propose a novel framework for learning representations from time series and apply it to health state data.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNOCR_ML4HLecture04RepresentationLearning.pptx_.txt65105d7b-502OCR
8SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/OCR_ML4HLecture05-NLP.pptx__summary.txtOCR_ML4HLecture05-NLP.pptx__summary.txtWe use a combination of HMMs, neural nets, and other methods to find the most probable sequence of words in a text.\nEzurich is a language model that computes the probabilistic representation of w_1, W_n for any word, _W_n=V (vocalbulary) for any sentence.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNOCR_ML4HLecture05-NLP.pptx_.txtadc6e224-1eaOCR
9SHORT-CONTEXT-MODELS/flan-t5-3b-summarizer/beam-search-8192-nb4/OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txtOCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txtWe propose CogVideo to be the largest and first open source pretrained Transformer for Text-To-Video generation in general.\nWe introduce a human evaluation for CogVideo and show the results.\n\n---8.02048.03.04.02.54.01.00.8TrueFalsejordiclive/flan-t5-3b-summarizer20230524_080731NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNOCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated_.txt66f03e4f-bd9OCR_academic_paper
GAUNTLET_PATHfile_namesummarymin_lengthmax_lengthno_repeat_ngram_sizeencoder_no_repeat_ngram_sizerepetition_penaltynum_beamsnum_beam_groupslength_penaltyearly_stoppingdo_samplemodel_namedatelengthformatextractivenesstemperaturetoken_batch_lengthpenalty_alphatop_kbatch_stridemax_len_ratiodirectory-topic-tagruntimesource_doc_filenamesource_doc_idsource_doc_domain
2033OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txtOCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary.txtResearchers from Tsinghua University have developed Cog Video, a large-scale pre-trained transformer for text-to-video generation. The model, which has 9.4 billion parameters and was trained on 5.4 million text-video pairs, outperforms all publicly available models in both machine and human evaluations. The team used a multi-frame-rate hierarchical training strategy to better align text and video clips, and inherited knowledge from a pre-trained text-to-image model. Cog Video is the largest and first open-source pre-trained transformer for text-to-video generation in the general domain.\n\tThe paper proposes a dual-channel attention technique for pretraining a text-to-video generation model using pretrained image generation models instead of image data. The proposed technique leverages the pretrained models' knowledge of text-image relations and larger dataset coverage. The paper also introduces a shifted window attention for auto-regressive generation to alleviate time and memory overhead. The model is evaluated on UCF-101 and Kinetics-600 datasets using Frechet Video Distance and Inception Score metrics and achieves better results than other baselines. Human evaluation also shows that the proposed model outperforms other baselines on multiple aspects.\n\tThe paper presents Cog' Video, a pretrained transformer for text-to-video generation in the general domain. The proposed multi-frame-rate hierarchical training framework improves the understanding of text-video relations and the ability to control the intensity of changes during generation. The paper also conducts ablation studies on Kinetics-600 and UCF-101 datasets to verify the effectiveness of hierarchical multi-frame-rate generation and incorporating Cog View2. The results show that the hierarchical method outperforms the 1-stage model on semantic relevance, motion realism, and texture quality. The paper aims to advance open-domain text-to-video generation, which will ease the effort of short video and digital art creation.\n\tThe article discusses the attention mechanism of dual-channel attention in the Cog Video model, which consists of two stages for sequential generation and recursive interpolation. The model is trained on a dataset of 5.4 million captioned videos and has 9.4 billion parameters. The article also provides details about the human evaluation process used to measure generation quality, which includes asking evaluators to give scores for frame texture, motion realism, and semantic relevance. The results of the evaluation show that Cog Video outperforms other models in terms of quality.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNOCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated_.txt66f03e4f-bd9OCR_academic_paper
2034OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary.txtOCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary.txtThe paper presents a deep learning approach to enhance low-quality music recordings by combining an image-to-image translation model for manipulating audio in its mel-spectrogram representation and a music vocoding model for mapping synthetically generated mel-spectrograms to perceptually realistic waveforms. The approach outperforms baselines which use classical methods for mel-spectrogram inversion and an end-to-end approach directly mapping noisy waveforms to clean waveforms. The paper also analyzes the reliability of common audio enhancement evaluation metrics when used in the music domain. The authors hope to motivate future research in music enhancement and music quality perceptual metrics akin to those in the speech literature.\n\tThe paper proposes a music enhancement model that decomposes the task into mel-spectrogram enhancement and waveform synthesis from mel-spectrograms. The model was trained using high-quality samples from a public dataset paired with low-quality samples generated by simulating artifacts that typically appear in amateur recordings. A human MOS test shows that this model outperforms state-of-the-art baselines. Additionally, the paper finds that current objective metrics for audio enhancement do not accurately reflect human perception of music.\n\tThe references cited in this document cover various topics related to audio processing, including waveform synthesis, music source separation, speech enhancement, and noise suppression. The references include studies on generative adversarial networks, deep learning models, and objective quality measures for evaluating audio processing algorithms. Techniques such as instance normalization, parallel wavegan, and conditional generative adversarial networks are also discussed.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNOCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated_.txt110b05be-f8dOCR_academic_paper
2035OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/OCR_PAPER_dall-e-2-annotated__summary.txtOCR_PAPER_dall-e-2-annotated__summary.txtThe paper proposes a two-stage model for text-conditional image generation using CLIP embeddings. The first stage generates a CLIP image embedding given a text caption, and the second stage generates an image conditioned on the image embedding. The model can produce variations of an image that preserve both its semantics and style, while varying non-essential details. The joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. The model uses diffusion models for the decoder and experiments with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples. The paper also describes three different kinds of manipulations enabled by the bipartite representation of images.\n\tThe article discusses a new text-to-image generation model called unCLIP, which uses a combination of a diffusion prior and a decoder to generate realistic images from text prompts. The model is evaluated on various benchmarks, including MS-COCO, and is found to outperform other state-of-the-art models in terms of diversity and photorealism. The article also explores the CLIP latent space and the importance of the prior in generating high-quality images. Finally, the article presents automated aesthetic quality evaluations comparing unCLIP to other models.\n\tThe paper discusses the use of CLIP-guided diffusion models for text-conditional image generation. The authors compare their model, unCLIP, to the previously proposed GLIDE model and find that both benefit from guidance, but unCLIP does not sacrifice recall for aesthetic quality. The paper also discusses previous works in synthetic image generation and the limitations and risks associated with these models. The authors acknowledge the need for further research on the risks and biases associated with these models.\n\tThe article provides a list of research papers related to text-to-image generation using deep learning techniques. The papers cover various approaches such as generative adversarial networks, diffusion models, and CLIP-guided models. The papers also explore different aspects of the problem, including domain adaptation, multimodal learning, and transfer learning. The article highlights the importance of contrastive learning and attention mechanisms in improving the quality of generated images.\n\tThe article describes the use of linear probes and logistic regression models to automate aesthetic quality evaluations of images. The models were trained on the AVA dataset and pairwise image comparisons gathered from previous human evaluations. The article also provides details on the hyperparameters used to train the models, including the use of CLIP and DALL-E datasets, and the GLIDE model for the decoder architecture. Random samples from the production model for various prompts are also shown.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNOCR_PAPER_dall-e-2-annotated_.txt3f42d484-d96OCR_academic_paper
2036OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/The Most Dangerous Game--Richard Connell_summary.txtThe Most Dangerous Game--Richard Connell_summary.txt"The Most Dangerous Game" by Richard Connell is a story about a big-game hunter named Sanger Rainsford who falls off a yacht and ends up on a mysterious island called Ship-Trap Island. The island has a reputation for being dangerous, and Rainsford soon discovers why when he meets General Zaroff, a fellow hunter who has a twisted idea of what constitutes a good hunt. Zaroff hunts humans for sport, and he has set his sights on Rainsford as his next prey. Rainsford must use all his skills as a hunter to survive the deadly game that Zaroff has set up for him.\n\tGeneral Zaroff, a passionate hunter, invites Rainsford to his island where he reveals that he has invented a new sensation in hunting. He hunts humans, whom he considers the most dangerous game, and has a training school in his cellar for his prey. Rainsford is horrified and refuses to participate, but the general insists that life is for the strong and that weak men are put on earth to give the strong pleasure. Rainsford hears a gunshot in the jungle and realizes that he may be the next prey.\n\tRainsford, a big-game hunter, finds himself stranded on an island where he is hunted by General Zaroff, a fellow hunter who has grown bored with hunting animals and now hunts humans. Rainsford manages to evade the general for a while, but eventually, he is forced to face him in a deadly game of cat and mouse. In the end, Rainsford manages to outsmart the general and escape the island.\n\tRainsford escapes from General Zaroff's hunting game by jumping into the sea. The general enjoys a good dinner but is annoyed that Rainsford escaped and that he will have to replace his assistant Ivan. Later, Rainsford surprises the general in his bedroom and challenges him to a final hunt. The story ends with the implication that Rainsford has won the game.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNThe Most Dangerous Game--Richard Connell.txtaf2b1960-5caliterature
2037OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/gpt_peter_testing_group_exemplars_summary.txtgpt_peter_testing_group_exemplars_summary.txtThe text contains a random assortment of questions, statements, and requests, ranging from discussions about Korea, fears, psychotic episodes, covert operations, and consciousness to more mundane topics like food, music, and hobbies. There are also some nonsensical or humorous comments and requests, such as fitting soybeans in foreskin, creating a cryptocurrency project, and asking about the funniest joke. The text lacks a clear theme or purpose.\n\tThe text is a collection of random and unrelated statements and questions, ranging from philosophical musings to personal anecdotes and recommendations for music and movies. There is no clear theme or narrative.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNgpt_peter_testing_group_exemplars.txt3210a55b-6fdconversation
2038OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/navy seals copy pasta_summary.txtnavy seals copy pasta_summary.txtA person threatens someone who insulted them online, claiming to be a highly trained Navy SEAL with access to a network of spies and the entire arsenal of the US Marine Corps. They vow to kill the person in over 700 ways and make them suffer for their comment.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNnavy seals copy pasta.txt6adec8a8-d94adversarial
2039OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/script_findingnemo_summary.txtscript_findingnemo_summary.txtThis is a work-in-progress transcript of the movie Finding Nemo. It is not 100% accurate and may have missing or incorrect words. The transcript is open for corrections and additions, but cannot be edited and credited by others. The transcript starts with Marlin and Coral admiring their new home and discussing their upcoming parenthood. The story then follows Nemo's first day of school and his adventures with his classmates. The transcript ends with the group encountering a "butt" and making Pearl ink.\n\tMarlin, a clownfish, becomes overprotective of his son Nemo after his wife and other children are killed in a barracuda attack. When Nemo is captured by a diver, Marlin sets out to rescue him, encountering a forgetful fish named Dory and a group of sharks along the way. Meanwhile, Nemo is taken to a dentist's office in Sydney, where he meets other aquarium fish and plans his escape.\n\tNemo, a young clownfish, is taken from the ocean and placed in a fish tank in a dentist's office. He meets a group of fish who plan to escape the tank and return to the ocean. They recruit Nemo to help them by jamming the tank's filter. The plan is successful, and the fish escape into the harbor.\n\tNemo, a young fish, is encouraged by his new friends to escape from a fish tank in a dentist's office and find his way back to the ocean to reunite with his father. Meanwhile, Marlin, Nemo's father, is also on a journey to find his son and meets a forgetful fish named Dory who helps him along the way.\n\tThe fish characters panic as Nemo gets stuck in a filter, but they manage to rescue him. Meanwhile, Marlin and Dory ride the East Australian Current with the help of sea turtles and eventually reach Sydney. Crush, a sea turtle, gives them a proper exiting technique before they continue their journey.\n\tMarlin and Dory are searching for Marlin's son, Nemo, and end up inside a whale. They eventually escape and continue their search, while Nemo and his fish tank friends plan their escape from the dentist's office. Meanwhile, a pelican named Nigel and his friends observe the chaos.\n\tMarlin, a clownfish, sets out to find his son Nemo who has been taken by a diver. Along the way, he meets Dory, a forgetful fish, and together they encounter various obstacles and make new friends. Eventually, they find Nemo and bring him back home.\n\tThe transcript contains dialogue from the movie "Finding Nemo" where the characters say goodbye to each other and a scene where the fish in a dentist's office try to escape. The transcript is provided for fans' enjoyment and educational purposes only, and no copyright infringement is intended.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNscript_findingnemo.txt04a90337-527Script
2040OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/script_frozendisney_summary.txtscript_frozendisney_summary.txtThe opening scene of "Frozen" shows ice harvesters singing and cutting ice blocks. The story then follows two sisters, Elsa and Anna, as they grow up in a kingdom where Elsa has magical powers. After an accident involving Anna, their parents decide to keep Elsa's powers hidden and limit her contact with people, including Anna. As they grow up, Anna tries to reconnect with Elsa, but Elsa struggles to control her powers and keep them hidden.\n\tAnna is bored and watches the clock tick by while Elsa panics about her growing powers. The King and Queen leave on a ship and are lost at sea, leaving Anna and Elsa alone. Years later, it's Coronation Day and Anna is excited while Elsa is nervous. Anna meets Prince Hans and they have an awkward encounter. The bells ring for the coronation.\n\tHans and Anna attend Elsa's coronation, but Elsa is nervous and hesitant. During the celebration, Anna and Hans fall in love and decide to get married, but Elsa refuses to give her blessing. In a heated argument, Elsa accidentally reveals her ice powers to everyone and runs away, leaving Anna heartbroken and confused.\n\tElsa accidentally reveals her powers at the ball, causing chaos and prompting her to flee. Anna sets out to find her and apologize, encountering Kristoff and Oaken's Trading Post along the way. Elsa reaches a mountain top and sings "Let It Go" as she creates an ice palace. Anna eventually reaches the trading post and learns that Elsa went to the North Mountain. Kristoff agrees to help her, but they get into a dispute with Oaken over the price of supplies.\n\tKristoff and Anna find shelter in a dilapidated barn and Kristoff sings a song to Sven. Anna asks Kristoff to take her up the North Mountain to find Elsa and stop the winter. They encounter wolves and Kristoff's sled is destroyed, but they manage to escape. They continue their journey on foot and meet Olaf, a talking snowman without a nose.\n\tAnna, Kristoff, Sven, and Olaf continue their journey to find Elsa and stop the eternal winter. Olaf gets impaled by an icicle but laughs it off. They reach Elsa's ice palace, and Sven struggles to climb the stairs. Kristoff helps him while Anna and Olaf climb the stairs.\n\tAnna and Kristoff arrive at Elsa's ice palace, where Anna tries to convince Elsa to return to Arendelle and end the eternal winter she has caused. However, Elsa is afraid of hurting anyone else with her powers and creates a giant snowman, Marshmallow, to throw them out. Anna and Kristoff escape, but Marshmallow chases them and they end up hanging off a cliff. Olaf tries to help but is thrown off the cliff by Marshmallow. Eventually, Anna cuts the rope and they fall into a soft snowbank.\n\tAnna and Kristoff, along with Olaf and Sven, arrive at Kristoff's family of trolls. The trolls mistake Anna for Kristoff's fiancée and sing a song about fixing up relationships. Anna and Kristoff start to feel a spark between them, but are interrupted by the trolls trying to marry them off.\n\tAnna collapses and is found to have ice in her heart, which can only be removed by an act of true love. Kristoff and Sven bring her back to the castle, where Hans pretends to be in love with her but reveals his true intentions to kill Elsa and take over the kingdom. Hans charges Elsa with treason and sentences her to death, while Anna's condition worsens.\n\tElsa escapes from her imprisonment and creates a snowstorm that engulfs the kingdom. Anna and Olaf search for her, while Kristoff and Sven try to reach Anna. Hans confronts Elsa, but Anna sacrifices herself to save Elsa and thaws her frozen heart. Elsa realizes that love is the key to controlling her powers and uses it to end the snowstorm. Hans is arrested and taken back to his country. The kingdom is restored to its former glory.\n\tThe Duke and his thugs are escorted out of Arendelle by guards, while Anna surprises Kristoff with a new sled and makes him the official Ice Master and Deliverer. Olaf enjoys the summer and Sven helps Elsa create an ice rink for the villagers to skate on. The castle has been repaired with ice and all is well in Arendelle.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNscript_frozendisney.txt0abeb1f8-b6cScript
2041OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/script_strangersonatrain_summary.txtscript_strangersonatrain_summary.txtThe script for "Strangers on a Train" by Raymond Chandler and Czenzi Ormonde begins with two strangers, Guy Haines and Bruno Anthony, meeting on a train. Bruno is fascinated by Guy, a famous tennis player, and strikes up a conversation with him. As they talk, Bruno reveals his dark thoughts about murder and his troubled relationship with his father. Guy becomes increasingly uncomfortable and tries to change the subject, but Bruno persists in his morbid musings.\n\tBruno suggests to Guy a plan to swap murders with a stranger to get rid of their respective unwanted targets. Guy is hesitant and tries to leave, but Bruno insists on discussing the plan further. Meanwhile, Guy's ex-wife Miriam refuses to give him a divorce and threatens to ruin his reputation by having another man's baby. Guy calls his lover Anne to vent his frustration, but his anger escalates as a train passes by, and he yells that he could strangle Miriam. The scene then shifts to Bruno and his mother getting manicures.\n\tMrs. Anthony, a wealthy woman, is concerned about her son Bruno's restlessness and pale appearance. She suggests he take up painting as a soothing pastime. Bruno receives a call from Guy, and his father confronts him about his involvement in hit and run driving. Bruno goes to Metcalf to stalk Miriam, and they end up at an amusement park where he impresses her by ringing the bell on a sledgehammer game. He follows her onto a merry-go-round.\n\tBruno meets Guy at an amusement park and follows him onto a boat ride with his friends. After they exit the ride, Bruno strangles and kills Miriam, a girl he had been stalking. Later, Bruno meets Guy again and gives him Miriam's broken glasses as a "present," revealing that he was the one who killed her. Guy is horrified and calls Bruno a maniac.\n\tGuy is confronted by Bruno, who reminds him that they planned a murder together. Guy tries to leave, but Bruno warns him that they would both be arrested if he goes to the police. Meanwhile, Guy's phone rings and the police arrive at his apartment building. Bruno urges Guy to tell the police that he already knows about the murder. Later, Guy receives the news that his wife has been murdered and he becomes a suspect. He tells the Senator that he was on a train at the time of the murder and spoke to a professor named Collins. Anne comforts Guy, and they realize that Miriam's death means they are now free to be together.\n\tGuy Haines is being investigated for the murder of his estranged wife, Miriam. He meets with the police to establish his alibi and is followed by a private detective named Hennessy. Guy's girlfriend, Anne, worries about the investigation and suggests he continue with his plans to play in a tennis tournament. Meanwhile, a man named Bruno, who has a strange fixation on Guy, follows him around and tries to contact him.\n\tGuy receives a note from Bruno asking to meet and make plans, but Guy tears it up and burns it. Later, at a gallery with Anne, Bruno appears and tries to talk to Guy, causing him to become nervous. At a tennis match, Guy sees Bruno watching him. Later, at a party, Barbara introduces Bruno to the group and he becomes fixated on her. Guy receives another note from Bruno, but hides it when Hennessy arrives.\n\tGuy and Hennessy discuss Hammond taking over, while Guy retrieves a note and gun from a dresser drawer. They leave for a party at the Burton house, where Bruno unexpectedly shows up. Bruno engages in a conversation about murder with some guests, including Mrs. Cunningham, and demonstrates how to strangle someone. Barbara watches in horror as Bruno becomes transfixed and eventually faints. Bruno is taken to a study, and the Senator asks Guy to get him out of there.\n\tGuy goes to Bruno's house to carry out their plan to exchange murders, but is interrupted by the arrival of the police. He manages to escape and goes to Mr. Antony's house to warn him about Bruno's intentions. Guy enters Mr. Antony's bedroom and wakes him up to talk about Bruno, but the scene ends before any further action is taken.\n\tBruno confronts Guy in his bedroom, revealing that he knows about the murder and threatening to frame Guy by planting evidence. Anne tries to convince Bruno's mother to help clear Guy's name, but she dismisses the idea. Guy and Anne discuss their plan to retrieve Guy's lighter from the murder scene before Bruno can plant it there. Meanwhile, Guy plays a tennis match while being watched by detectives Hennessy and Hammond. Bruno leaves his home in a taxi, presumably to carry out his plan.\n\tGuy Haines is playing a tennis match while his friend Bruno is on a train to Metcalf. Anne, Guy's lover, tells him that Bruno may implicate him in the murder of his wife. Guy is worried about his cigarette lighter being found at the scene of the crime. Meanwhile, Bruno reads about Guy's arrest in the newspaper and plays with the lighter. Guy wins the tennis match and Anne tells Barbara to have a car ready. Bruno arrives in Metcalf but drops the lighter when bumped by a passerby.\n\tBruno drops his cigarette case down a drain and enlists the help of a porter and passersby to retrieve it. Meanwhile, Guy is playing a tennis match and wins a crucial game. Barbara signals to Guy that everything is set for their plan, and he leaves the match to meet her. The police are on the lookout for Guy, and Bruno begins to feel uneasy as he overhears them talking about the killer being at the amusement park. Guy arrives at the park, and the police follow him.\n\tBruno is being followed and is seen approaching a flood-lit pay-box. The boatman recognizes him and Bruno deserts the queue. The boatman informs a uniformed man who starts looking for Bruno. Bruno jumps on a merry-go-round and Guy chases after him. They fight and the merry-go-round topples over. Guy is helped to his feet and the boatman informs the police that Guy is not the man who killed his wife. Guy explains that Bruno has his cigarette lighter and wanted to plant it on the island to frame him. They find Bruno pinned under the overturned machine and he denies having the lighter. He dies shortly after.\n\tAs Bruno dies, his hand opens to reveal Guy's lighter. Turley takes the lighter and suggests they stay in town overnight to clear things up. Guy asks for a telephone and learns that Bruno's name was Bruno Antony. Later, Anne receives a call from Guy saying he'll be back tomorrow. The next day, on a train, a cleric recognizes Guy and they quickly leave. The film ends.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNscript_strangersonatrain.txt9e6bfae4-7c2Script
2042OpenAI/gpt-3.5-turbo/map_reduce_output/batched-summaries/script_sunsetblvd._summary.txtscript_sunsetblvd._summary.txtThe script for the movie "Sunset Boulevard" begins with a sequence showing the street sign for Sunset Boulevard and a murder scene at a mansion. The story follows Joe Gillis, a struggling writer who meets with a producer named Sheldrake to pitch his idea for a baseball movie. Sheldrake is not impressed, and Gillis meets Betty Schaefer, a script reader who also dislikes his idea. Gillis is desperate for work and hopes Sheldrake can help him, but nothing comes of it.\n\tJoe Gillis is desperate for work and money, but his attempts to secure either are unsuccessful. He even asks his boss for a personal loan, but is denied. While driving, he is chased by finance company men and ends up hiding in the garage of a run-down mansion. He is then led into the mansion by Max von Mayerling and meets Norma Desmond, a former silent film star who is eccentric and delusional. She mistakes him for a funeral director and asks him to arrange a funeral for her dead chimpanzee. Gillis tries to explain the mistake, but Norma is not convinced.\n\tJoe Gillis enters what he thinks is an empty house but is confronted by Norma Desmond, a former silent film star. Norma insists that Joe edit her script and stay in her house. Joe agrees and is shown to a room over the garage by Norma's butler, Max. Joe observes the dilapidated state of the house and its amenities, including a tennis court and swimming pool.\n\tJoe Gillis watches a rat fight over a decaying orange at the bottom of a swimming pool while Norma Desmond and Max bury a chimp in the lawn. Gillis locks himself in a room and wakes up to find his belongings unpacked and Norma insisting he stay to work on her script. They watch old movies together, and Norma dreams of returning to stardom. Gillis kibitzes on a bridge game with Norma and her actor friends while trying to avoid men who have come to tow away his car.\n\tGillis needs money urgently and asks Norma for it, but she refuses. He goes outside and sees the finance company taking away his car. Norma offers him her expensive Isotta-Fraschini car instead. Later, Gillis is dressed up for Norma's New Year's party and they dance together. Norma confesses her love for Gillis, making him uncomfortable.\n\tNorma offers to buy Gillis extravagant gifts for the upcoming year, but he refuses. She then gives him a gold cigarette case and lighter with a personal engraving. Gillis expresses his desire to have a life of his own, causing Norma to slap him and storm off. Gillis leaves the party and goes to Artie Green's apartment, where he meets Betty Schaefer. They discuss writing, but Gillis receives a call from Max, who tells him that Norma has attempted suicide. Gillis is in shock and pushes Betty aside to leave.\n\tJoe rushes to Norma's house to check on her after she attempted suicide. Norma is still in love with Joe, but he tries to convince her to act sensibly. Later, Betty tells Joe that Sheldrake likes the idea of his script, but Joe is not interested in writing anymore. Norma tries to cheer Joe up by performing a comedic routine, but he is still preoccupied with his thoughts about Betty and the Hollywood industry.\n\tNorma Desmond receives a call from Paramount Studios, but is upset that it was not from Cecil B. DeMille himself. She goes to the studio to meet with DeMille, who apologizes for not calling her personally. Norma becomes emotional and expresses her desire to work with DeMille again. Meanwhile, Joe Gillis visits the Readers' Department and offers Betty his script, Dark Windows.\n\tBetty and Gillis discuss a story idea about teachers and their struggles. Gillis suggests a romantic plot involving two teachers sharing a room. Betty and Gillis agree to work on the story together, but Gillis is hesitant due to his busy schedule. Norma Desmond undergoes various beauty treatments and expresses her dependence on Gillis. Gillis sneaks out to work on the story with Betty at night. They take a walk down Paramount's New York street and discuss their childhoods.\n\tJoe Gillis and Betty Schaefer discuss their past experiences in the film industry. Betty comes from a family of actors and had dreams of becoming a star, but was rejected due to her acting skills. Joe and Betty grow closer, but Norma Desmond, Joe's former lover, becomes increasingly unstable and calls Betty to warn her about Joe's true character. Betty visits Joe at his home, which is actually Norma's mansion, and they discuss their feelings for each other. Meanwhile, Norma becomes increasingly desperate and reveals a hidden revolver.\n\tJoe Gillis shows Betty around Norma Desmond's mansion, revealing that Norma is an aging former movie star who lives with a companion and is jealous of Betty. Betty becomes upset and wants to leave, but Joe convinces her to stay. Norma becomes increasingly unstable and shoots Joe when he tries to leave her. The police arrive and question Norma, but she becomes fixated on the newsreel cameras and believes she is going to be on set for a film. Max, Norma's loyal servant, helps her escape the police and get to the set.\n\tNorma Desmond prepares for a scene on a staircase while Max sets up the cameras and lights. Norma descends the staircase, stopping to express her happiness to be back in the studio and promises to never desert them again. She then requests her closeup and the scene fades out.NaN512.0NaNNaNNaNNaNNaNNaNNaNNaNgpt-3.5-turboNaNNaNNaNNaNNaN3584.0NaNNaN0.0NaNNaNNaNscript_sunsetblvd..txtdeed3ee1-daeScript